AthCon 2013 RE Challenge

From time to time I really like to spend my free time trying to solve crackmes but lately it’s hard to find something interesting. Fortunately Kyriakos Economou & Nikolaos Tsapakis released a RE challenge for AthCon conference. It’s not really hard and the idea behind it is not so complex but the implementation used to hide the core of the algorithm deserves a particular attention.
(You can download this tutorial plus two idc scripts from here)

At a first glance the code is fully visible, the crackme is debug-able and it’s not protected, the protection routine starts at 407730 address. After some lines of code you can identify a sort of cycle that is repeated again and again (code between 40779A and 40A0E7). This is the core of the entire challenge, if you understand what’s going on here and how to deal with it you’ll be on a good point for solving the challenge!

I’ll present the code inside 40779A/40A0E7 range dividing it into 3 parts: initialization, core and finalization.

Part #1: initialization
40779A sub esp, 800h 4077A0 call $+5 4077A5 pop edi 4077A6 sub edi, 79h 4077A9 jmp short loc_4077AC


4077AC   mov     edi, [edi]			<-- edi is initially 40CADE and it points to a sequence of bytes

4077AE   call    $+5

4077B3   pop     esi

4077B4   add     esi, 28C1h         <-- esi = 40A074

4077BA   xor     ebx, ebx			<-- ebx is the counter of the next loop
4077BC   mov     ebp, esp

4077BE   mov     ecx, edi

4077C0   add     ecx, ebx

4077C2   mov     esp, ecx

4077C4   pop     edx                <-- take 4 bytes from the 40CADE+ebx sequence of bytes

4077C5   mov     esp, ebp

4077C7   mov     ebp, esp

4077C9   mov     ecx, esi           <-- use the buffer at 40A074

4077CB   add     ecx, ebx

4077CD   mov     esp, ecx

4077CF   pop     ecx

4077D0   mov     cl, dl

4077D2   push    ecx                <-- it takes the current byte from 40CADE+ebx putting it at 40A074+ebx

4077D3   mov     esp, ebp

4077D5   inc     ebx                <-- increase the counter

4077D6   push    ebx

4077D7   xor     ebx, 0Fh

4077DA   pop     ebx

4077DB   jz      short loc_4077DF   <-- it repeats the loop 0x0F times

4077DD   jmp     short loc_4077BC

4077DF mov ebp, esp 4077E1 mov edi, esi <-- edi = 40A074 4077E3 mov esp, edi 4077E5 pop ebx <-- first 4 bytes from 40A074 4077E6 mov esp, ebp 4077E8 mov cl, bl <-- 1° of the 16 bytes moved above 4077EA mov ebp, esp 4077EC mov edi, esi 4077EE inc edi 4077EF mov esp, edi 4077F1 pop ebx 4077F2 mov esp, ebp 4077F4 mov bh, bl <-- 2° of the 16 bytes moved above 407804 xor bl, 8Ah <-- 2° of the 16 bytes moved above xored with 0x8A

These are the instructions of the first part, basically it’s a simple block of code used to move 16 bytes from a buffer into another one. The destination buffer is always the same while I can’t say the same thing for the other one. The value of edi @4077AC changes every time and it’s always increased by a value between 1 and 16 during each iteration.
Why 16, what’s the meaning of this value? Take in mind that number.

Part #2: the core
This is the central part of the code I’m trying to understand, it’s huge because it contains many instructions inside. Looking at the code you’ll see a lot of similar snippet of code repeated. It’s the peculiarity of the protection used by the author. For a trained eye it’s easy to understand what’s going on, but for a novice could be really hard. Instead of telling you what’s behind that obscure part I prefer to show you some of the blocks:

1° block:
40788D xor cl, al 40788F pushf 407890 pop edx 407891 and dl, 40h 407894 push ebx 407895 pop ecx 407896 push ecx 407897 xor cl, 0A5h 40789A pushf 40789B add al, 4 40789D xor al, 18h 40789F pop edx 4078A0 pop ebx 4078A1 and edx, 40h 4078A4 jnz 40A834

2° block:
4078AA xor cl, al 4078AC pushf 4078AD pop edx 4078AE and dl, 40h 4078B1 push ebx 4078B2 pop ecx 4078B3 push ecx 4078B4 xor cl, 0B5h 4078B7 pushf 4078B8 add al, 4 4078BA xor al, 18h 4078BC pop edx 4078BD pop ebx 4078BE and edx, 40h 4078C1 jnz 40A87A

3° block:
4078C7 xor cl, al 4078C9 pushf 4078CA pop edx 4078CB and dl, 40h 4078CE push ebx 4078CF pop ecx 4078D0 push ecx 4078D1 xor cl, 95h 4078D4 pushf 4078D5 add al, 4 4078D7 xor al, 18h 4078D9 pop edx 4078DA pop ebx 4078DB and edx, 40h 4078DE jnz 40A89D

What’s the difference between these 3 snippets? As you can see there are two minor differences: “xor cl, xx” and the conditional jump at the end. cl value has been set inside the Initialization part and it represents the first of the 16 bytes copied to 40A074_buffer xored with 0x8A. So, trying to understand this part of the code you can think of a big sequence of IF statement where ‘cl’ is compared with a lot of values and every single value has its own code flow:

if (cl == 0xA5) execute_snippet_for_A5; else if (cl == 0xB5) execute_snippet_for_B5; else if (cl == 0x95) execute_snippet_for_95

The sequence of bytes from 40CADE has a sense now, the first of the 16 bytes defines a specific flow for the algo. Now that I know that I’ll try to understand what’s behind every single code snippet showing you 2 more pieces of code:

if (cl == 0xE5):
40A8C0 mov eax, [edi] <-- 40A8C2 inc eax <-- 40A8C3 mov eax, [eax] <-- 40A8C5 mov edx, [edi-24h] <-- preparation 40A8C8 mov ebx, [edi-4] <-- 40A8CB push ebx <-- 40A8CC popf <-- 40A8CD mov edx, eax <------------ execution 40A8CF pushf <-- 40A8D0 pop eax <-- 40A8D1 mov [edi-4], eax <-- 40A8D4 mov [edi-24h], edx <-- conclusion 40A8D7 mov eax, [edi] <-- 40A8D9 add eax, 5 <-- 40A8DC mov [edi], eax <-- 40A8DE jmp loc_40A0CD <--

if (cl == 0xB7):
40B401 mov edx, [edi-24h] <-- 40B404 mov ebx, [edi-14h] <-- 40B407 mov eax, [edi-4] <-- preparation 40B40A push eax <-- 40B40B popf <-- 40B40C mov edx, ebx <----------- execution 40B40E pushf <-- 40B40F pop eax <-- 40B410 mov [edi-4], eax <-- 40B413 mov [edi-24h], edx <-- 40B416 mov eax, [edi] <-- conclusion 40B418 add eax, 2 <-- 40B41B mov [edi], eax <-- 40B41D jmp loc_40A0CD <--

This is more or less the scheme of every defined snippet, “popf” represents the end of the preparation part of the snippet and “pushf” is the beginning of the conclusion part. As you can see almost all the operations are done using values taken from “edi-xx”, a closer look at some more snippets reveals that the program uses [edi], [edi-4], [edi-8], [edi-0C], [edi-10], [edi-14], [edi-18], [edi-1C], [edi-20] and [edi-24].
From a first glance I can say that:
[edi]: it’s always increased at the end of each block. It’s sometimes taken at the beginning of a block and used to retrieve one or more byte from the buffer pointer by 40CADE
[edi-4]: this value is updated with the EFLAGS register obtained by the *execution* instruction
[edi-8]..[edi-24]: values used inside the snippet

Before the next part I have to add something more about the bytes inside the buffer at 40CADE. I said that a single byte from the buffer is compared with a fixed value in a sort of IF sequence but I was not really correct because some values are not covered by the various IF statements. The second snippet above (the one with cl == 0xB7) represents a perfect example, you reach 40B401 from here:

40A4A2 push ebx 40A4A3 xor bh, 16h 40A4A6 pop ebx 40A4A7 jz 40B401

As you can see this piece of code is really different from the 3 blocks I presented before.
Every respectable IF has an ELSE. So, think of a big sequence of IF statements followed by ELSE; ELSE starts here:

408041 xor cl, al ; 1° byte has not been found in the previous checks 408043 pushf 408044 pop edx 408045 and dl, 40h 408048 push ebx 408049 mov ebx, [edi] ; ebx = [edi] 40804B add ebx, 1200h 408051 mov bl, [ebx] ; ebx points to a buffer of 00 and 01 values (it starts from 40DCDE) 408053 cmp bl, 1 408056 jnz loc_408145

Inside the ELSE branch the flow of the program depends on the value of this 00_01 buffer too, if the value inside bl is 01 a new IF statement is implemented. This time the checks are not like the previous ones because there’s a double check over the first and sometimes the second of the 16 bytes moved above. A valid check identifies a block of code to execute.
The 00 value is another story, the code starting from 408145 is essentially used to patch at most 16 bytes between 40A074 and 40A083 at runtime.
The 1° byte is taken from 407608+index where index is the 1° of the 16 bytes, all the next bytes are simply copied. The result of the preparation/execution/conclusion is the next piece of code, “xchg eax, ebx” is the instruction created:

40A06A MOV ESP,DWORD PTR DS:[EDI-8] 40A06D PUSH DWORD PTR DS:[EDI-4] 40A070 POPFD 40A071 MOV EDI,DWORD PTR DS:[EDI-10] <-- preparation ends here 40A074 XCHG EAX,EBX <-- instruction created at runtime (xchg is only 1 byte opcode) 40A075 JMP SHORT 0040A083 <-- to avoid non sense instructions a jmp to the end in necessary 40A077 PUSH ECX 40A078 POP EDX 40A079 INC EDX 40A07A PUSH ESI 40A07B POP EDI 40A07C INC EDI 40A07D POP ESI 40A07E POP EBX 40A07F POP EDX 40A080 POP EAX 40A081 LEAVE 40A082 RETN 40A083 PUSH EDI <-- conclusion starts here 40A084 PUSHFD 40A085 SUB ESP,800

Ok, what do I have discovered so far? There are different situations: according to the first two bytes the program executes a static block of code or a snippet of code with a particular instruction created at runtime.
It’s not so easy to explain the system with words and parts of code, I suggest you to step everything few times and I’m sure you’ll get the scheme:

if ((first_byte ^ 0x8A) == 0xA5) execute_snippet_for_A5; else if ((first_byte ^ 0x8A) == 0xB5) execute_snippet_for_B5; ... else if ((first_byte ^ 0x8A) == 0x95) execute_snippet_for_95 else if (byte_from_buffer_40DCDE == 0) runtime_patch else // another sequence of IF with a check over the first and sometimes the second byte of the 40A074 buffer

Part #3: finalization
40A0CD mov ebx, eax <-- ebx points to a byte from 40CADE buffer 40A0CF sub ecx, 10h 40A0D2 sub edx, 14h 40A0D5 add esp, 400h 40A0DB sub esi, 18h 40A0DE add esp, 400h 40A0E4 sub edi, 1Ch 40A0E7 jmp loc_40779A

This is the last part of the cycle, it’s only a series of instructions used to update some specific values. EBX register contains the same value updated before inside [edi].

#1,#2, #3: reveal the secret
So, we have collected some important information and I think we are ready to understand what’s behind this part of the challenge:

– 16 bytes are moved from 40CADE+current_offset to a new buffer
– the first byte of the new buffer is used to select a block of code and then the block will be executed
– in some cases the first two bytes from the new buffer are used to select a block of code to execute
– in some cases a value from 40DCDE buffer forces the execution of an instruction created at runtime
– one of the variables used inside the block of code is always increased at the end of the block
– one of the variables used inside the block of code is updated with the EFLAGS value returned by the most important instructions of the block
– the block of code modifies one or more variables (I’m referring to [edi-8]..[edi-24])

What’s the object that resembles the seven points explained above? The entire code is an implementation of a virtual machine which is most similar to a real machine:
– 16 bytes is the maximum length in bytes of a single instruction
– you can have one or two byte opcode defined instructions
– it has the same registers of a real machine, here is the correspondence:
Virtual machine (VM) Real machine (RM)
[edi] = EIP [edi-4] = EFLAGS [edi-8] = ESP [edi-0C] = EBP [edi-10] = EDI [edi-14] = ESI [edi-18] = EDX [edi-1C] = ECX [edi-20] = EBX [edi-24] = EAX
EIP of the VM starts from 40CADE.

If you are still confused here’s is the explanation applied to the “snippet 1” I used above:

40A8C0 mov eax, [edi] <-- eip 40A8C2 inc eax <-- eip+1 40A8C3 mov eax, [eax] <-- take the dword pointed by eip 40A8C5 mov edx, [edi-24h] <-- eax 40A8C8 mov ebx, [edi-4] <-- EFLAGS 40A8CB push ebx <-- push EFLAGS 40A8CC popf <-- real machine has EFLAFG of the virtual machine 40A8CD mov edx, eax <-- move the dword obtained at 40A8C3 40A8CF pushf <-- push the current EFLAGS value (obtained by the mov at 40A8CD) 40A8D0 pop eax <-- get the value of EFLAGS 40A8D1 mov [edi-4], eax <-- store EFLAGS inside EFLAGS register of the virtual machine 40A8D4 mov [edi-24h], edx <-- move the dword taken at 40A8C3 inside EAX register of the virtual machine 40A8D7 mov eax, [edi] <-- eip 40A8D9 add eax, 5 <-- calculate the new EIP value adding the length of the current instruction: 5 40A8DC mov [edi], eax <-- update the EIP value of the virtual machine

A 32 bit value is taken from eip+1 and it’s moved inside EAX register of the VM, it’s easy to understand that this snippet is an implementation of a real “mov eax, val32” instruction. Take a look at how EFLAGS are updated, there is a switch between real and virtual EFLAGS values before and after the execution of the main instruction of the snippet. In the end EIP is updated adding the size of this instruction which is 5 (1 byte opcode plus 4 bytes used to define val32).

Now that you know the hardware of the VM you should not have problem understanding all the possible instructions used inside the real protection algorithm. To help you in the identification process I’m going to explain some more cases.
If you use Ida and you want to locate a specific snippet you can use this simple idc script:

#include static main() { auto vmOpcode; auto xorBytes= "80 F1"; auto startAddress = 0x0040780A; auto xorAddress; auto xorVal;


   vmOpcode = AskAddr("", "Insert the VM opcode");

   vmOpcode = vmOpcode ^ 0x8A;

// Look for the right 'xor cl, xx' instruction while(startAddress < 0x408191) { xorAddress = FindBinary(startAddress, SEARCH_DOWN, xorBytes); xorVal = GetOperandValue(xorAddress, 1); if (xorVal == vmOpcode) { Jump(xorAddress); return -1; } startAddress = xorAddress + 3; } Message("\nOpcode not defined..."); }

1 byte opcode defined instruction
Opcode sequence to study: “96 xx”

40CA9A mov ecx, [edi-1Ch] ; ECX from VM 40CA9D dec cx ; dec ECX 40CA9F mov [edi-1Ch], ecx ; update ECX value of VM 40CAA2 cmp cx, 0 ; is it 0? 40CAA6 jz short loc_40CAD2 40CAA8 mov esi, [edi] ; EIP 40CAAA xor ebx, ebx 40CAAC mov bl, [esi+1] ; byte after the opcode of the current instruction 40CAAF cmp bl, 80h ; byte(EIP+1) is positive or negative? 40CAB2 jb short loc_40CAC4 40CAB4 mov eax, [edi] 40CAB6 add eax, ebx ; calculate EIP + byte(EIP+1) 40CAB8 sub eax, 0FEh ; -2 is the size in byte of the current VM instruction 40CABD mov [edi], eax ; update EIP 40CABF jmp loc_40A0CD


40CAC4 mov eax, [edi]

40CAC6 add eax, ebx ; calculate EIP + byte(EIP+1)

40CAC8 add eax, 2 ; +2 is the size in byte of the current VM instruction

40CACB mov [edi], eax ; update EIP

40CACD jmp loc_40A0CD

40CAD2 mov eax, [edi] ; EIP 40CAD4 add eax, 2 ; calculate the new EIP 40CAD7 mov [edi], eax ; update EIP 40CAD9 jmp loc_40A0CD

Not direct as “mov eax, val32” indeed but easy to understand. The snippet defines a “Loop rel8” instruction: CX is the counter, if CX is zero the EIP is updated with the address of the instruction after this one otherwise EIP is replaced with the address of the first instruction of the loop (offsets of -128..+127 are allowed).

2 bytes opcode defined instruction
Opcode sequence to study: “BD 74”

40BACA mov edx, [edi-1Ch] ; ECX from VM 40BACD mov eax, [edi-4] ; EFLAGS 40BAD0 push eax 40BAD1 popf ; load the VM EFLAGS inside real machine 40BAD2 and edx, edx ; and ECX, ECX 40BAD4 pushf 40BAD5 pop eax 40BAD6 mov [edi-4], eax ; store new EFLAGS 40BAD9 mov eax, [edi] ; EIP 40BADB add eax, 2 ; EIP + EIP + 2 40BADE mov [edi], eax ; update EIP

“and ECX, ECX”, 2 bytes long instruction.

Runtime patched instruction
This particular instructions are generated at runtime.
Opcode sequence to study: “1E A1 30 00 00 00”. 0x1E is not defined as a 1 byte opcode and the value from 00_01 buffer is 0x00:

408049 MOV EBX,DWORD PTR DS:[EDI] ; [edi] = 40CB23 40804B ADD EBX,1200 ; ebx = 40DD23 408051 MOV BL,BYTE PTR DS:[EBX] ; bl = 0x00 408053 CMP BL,1 408056 JNE 00408145 ; jump!

Ok, I’m sure about the fact that the current instruction of the VM is generated at runtime. As I announced before only the first byte from the opcode sequence is changed, all the others remain untouched. The first byte is the index of value 0x1E inside buffer_407608:
buffer_407608[0x64] = 0x1E
64 is the first byte and the generated instruction will be:

64 A1 30 00 00 00 MOV EAX,DWORD PTR FS:[30]
And now?
Now there’s something more to say because this VM is not the only one used inside the program, the challenge uses some more VM inside the protection algorithm. Don’t be scared, this time it’s really easy to understand them because at the hardware level they are like the one I presented to you. The difference resides in the VM’s byte sequence and the base of code of the machine. Every new VM is called from the one I presented to you using a “call” instruction. To recognize a new VM you simply have to catch 0x36 opcode byte, you’ll see that the VM_call will be used a lot of times inside the algo.

As a sample look at this decoded snippet to see how the VM_401ADE is called:

40CB16 pop eax 40CB17 call 401ADE 40CB1C push ebp --- 401ADE push 41F730h ; Address for VM_401ADE definition 401AE3 retn

If you go to 41F730 and you follow the way of thinking used for the previous VM you’ll be able to understand this new one in few minutes.
You can apply the same method for all the other defined virtual machines.

How to approach the real protection algorithm
Now that I have a good background of the protection mechanism I need a strategy. You can of course step line by line the entire code, you’ll understand everything about the protection routine for sure but it takes a looot of time. You can’t predict how long is the real protection routine and the “fs:[30]” should discourage you. Anyway, if you want you will have to:
1. use some smart bpx (conditional above all)
2. need to patch the VM byte sequence in some places if you want to avoid anti debug checks (yes, the code has some anti tricks inside too!).

I did almost all the work directly from a dead list of the real protection routine, I used an idc script to decode the VM and I have to admit it’s not so hard to understand what’s going on by simply reading the instructions sequence. Despite that, with an extra effort I think you can make the protection routine fully debug-able. The idea is to decode every single VM instruction putting them all inside a piece of unused code. When you have all the decoded instruction you can step them! I did try this strategy for few instructions only when I didn’t understand some obscure parts from the dead list approach. Above all you have to pay attention to two things:
1. look at the length in bytes of each VM instruction, sometimes it’s not the same of a real identical instruction (i.e. VM bytes: 40 01)
2. absolute and conditional jumps need to take care of extra bytes of the previous/next instructions

You can find the idc script I used to decode the virtual machine inside the attachment. It doesn’t cover all the VM opcodes because I added the necessary opcodes only; you can complete it if you want :)

Antidebug
Try to run the idc script, it produces a huge list of instructions (take in mind that there are some more because the instructions inside the calls are not listed). To slow down the static analysis the code was filled with junk code, you’ll surely find useless jump instructions, loop and even entire blocks of instruction used to annoy your analysis. Here are some examples:

40CAE2: mov ecx, 0x00000100 <-- ecx = 0x100 40CAE7: mov edi, ebx 40CAE9: sub ecx, 1 40CAEA: jp 40CAEC <-- useless jmp instruction 40CAEC: je 40CAEE <-- useless jmp instruction 40CAEE: jo 40CAF0 <-- useless jmp instruction 40CAF0: loop 40CAE7 <-- useless loop

What’s the sense of these piece of code? It doesn’t have sense indeed and I think it was used to prevent a single step debugging session, will you step a 0x100 loop from a VM line by line? I don’t think so.
Now, take a look at this block:

40CAF4: push eax 40CAF5: push ebx 40CAF6: push ecx 40CAF7: push edx 40CAF8: push esi 40CAF9: push edi 40CAFA: mov eax, 0x00000074 40CAFF: mov ebx, ecx 40CB01: push 0x78657464 40CB06: pop esi 40CB07: jmp 40CB0B 40CB09: mov eax, 0 40CB0B: mov al, 0 40CB0D: push edi 40CB0E: xchg eax, ebx 40CB0F: pop ecx 40CB10: sub ecx, 1 40CB11: pop edi 40CB12: pop esi 40CB13: pop edx 40CB14: pop ecx 40CB15: pop ebx 40CB16: pop eax

Well, as you can see all the modifications applied inside 40CAFA/40CB10 are nullified by the push/pop instructions.
In addition to the junk code pay attention to the “call 401ADE”, it’s the antidebug VM and it’s called a lot of times. Inside this VM you’ll find:

– GetTickCount: a classical check over the time passed between the execution of two distinct instructions, if the gap is above a specific fixed value it means the user is currently debugging the challenge.

– bpx check over some functions: first of all the challenge takes the name of all the exported functions from ntdll, and then it applies a checksum algo trying to identify some specific Zw* functions, the checksum algo is:

424C3C MOVZX EBX,BYTE PTR DS:[EAX] <-- current char of the export name 424C3F ROL EBX,7 424C42 ADD ESI,EBX <-- esi is the current checksum (it starts from value 0) 424C44 ROL ESI,7 424C47 XOR ESI,EBX 424C49 ADD EAX, 1 <-- move to the next char of the export name 424C4A CMP BYTE PTR DS:[EAX],0 <-- check to see if it's the end of the export name 424C4D JZ 424C51 <-- jump if scan is over 424C4F JMP 424C3C <-- jump up and check the next char 424C51 MOV EAX, ESI 424C53 MUL ESI 424C55 ADD EAX,EDX 424C57 CMP EAX,318A50B7 <-- check the calculated checksum 424C5C JZ 00424D8E

The checksum value is compared with some fixed values, here is the list:
0x318A50B7: ZwSetInformationThread
0xE27847F7: ZwFreeVirtualMemory
0x95AAF2E1: ZwDelayExecution
0x8AFA4D6D: ZwQueryInformationProcess
0xD2950638: ZwGetContextThread
0x25F2995D: ZwQueryVirtualMemory
0x217A4264: ZwAllocateVirtualMemory

When a function is found the challenge performs a check over the first byte of the function. This is the snippet used to check the correctness of ZwSetInformationThread:

424C62 MOV EAX,DWORD PTR SS:[EBP-44] <-- address of the 1° byte in memory of the function 424C65 MOVZX EBX,BYTE PTR DS:[EAX] <-- take the first byte, 0xB8 424C68 ADD BL,10 <-- BL = 0xB8 + 0x10 = 0xC8 424C6B CMP BL,0C8 <-- 1° byte check! 424C6E jz 424CA2 <-- jump if the byte is ok

The check appears to be a bpx check over the function. This kind of check is performed over the other functions too (the code is slightly modified but the sense is the same).

– ThreadHideFromDebugger antidebug: it uses ZwSetInformationThread with THREAD_INFORMATION_CLASS equal to ThreadHideFromDebugger. After the execution (via SYSENTER) of this function it’s impossible to debug the challenge.

– Check over the parameters passed to some Zw functions: this check is performed on ZwSetInformationThread and ZwQueryInformationProcess and the scheme is the same, a series of cmp-jnz for every single parameter passed to the function. Here is what happens with ZwSetInformationThread:

424D32 CMP DWORD PTR SS:[ESP],0 <-- 0 has been pushed at 424CF0 424D36 JNZ 424CFE <-- jump if the parameter is not the expected value 424D38 CMP DWORD PTR SS:[ESP+4],-2 <-- 0xFFFFFFE9 pushed at 424CE6 424D3D JNZ 424CFE 424D3F CMP DWORD PTR SS:[ESP+8],11 <-- 0x11 (ThreadHideFromDebugger) pushed at 404CE4 424D44 JNZ 424CFE 424D46 CMP DWORD PTR SS:[ESP+0C],0 <-- 0 pushed at 424CE2 424D4B JNZ 424CFE 424D4D CMP DWORD PTR SS:[ESP+10],0 <-- 0 pushed at 424CE0 424D52 JNZ 424CFE

As you can see if there’s an unexpected value the conditional jump will bring you to 424CFE which is the part of the code called when the debugger has been caught!

– ProcessDebugPort check: another check used to reveal an active debugger over the challenge.

424F2E PUSH 0x0000000 <-- ReturnLength 424F30 PUSH 0x0000004 <-- ProcessInformationLength 424F32 LEA EAX, [EBP-74h] 424F35 PUSH EAX <-- ProcessInformation 424F36 PUSH 0x0000007 <-- ProcessDebugPort 424F38 PUSH 0xFFFFFFFF <-- ProcessHandle 424F3A PUSH 0x0000000 ... 424F4A SYSENTER <-- ZwQueryInformationProcess 424F4C AND EAX, EAX <-- NTSTATUS success or error code? 424F4E JE 424F82 ... 424FDA AND EAX, EAX <-- eax = port number of the debugger for the process 424FDC JE 425010 <-- 0 if you are not debugging the challenge

– NtDelayExecution trick: the use of this function seems to be useful against automated scanning system because they have a time-out for the scanning task. I can’t proof if this trick works or not…

– Debug registers check: it uses NtGetContextThread to retrieve the CONTEXT information, then it checks the content of some specific dr registers.

425469 LEA EAX,[EBP-200] <-- eax = CONTEXT structure 42546F ADD EAX,4 425472 MOV EAX,DWORD PTR DS:[EAX] <-- eax = _CONTEXT.Dr0 425474 AND EAX, EAX <-- Dr0 = 0 means no bpx 425476 JZ 4254A8 ... 42552A LEA EAX, [EBP-200h] 425530 ADD EAX,8 425533 MOV EAX,DWORD PTR DS:[EAX] <-- eax = _CONTEXT.Dr1 425535 AND EAX, EAX 425537 JZ 425569 ... 425879 LEA EAX,[EBP-200] 42587F ADD EAX,0C 425882 MOV EAX,DWORD PTR DS:[EAX] <-- eax = _CONTEXT.Dr2 425884 AND EAX, EAX 425886 JZ 4258BA ... 425C1E LEA EAX,[EBP-200] 425C24 ADD EAX,10 425C27 MOV EAX,DWORD PTR DS:[EAX] <-- eax = _CONTEXT.Dr3 425C29 AND EAX, EAX 425C2B JZ 425C5F

– Sections protect value: the challenge uses ZwQueryVirtualMemory when it has to check _MEMORY_BASIC_INFORMATION.Protect values. It controls 3 sections: .text, .rdata and .data and the associated values (PAGE_EXECUTE_READWRITE (0x40), PAGE_READONLY (0x02) and PAGE_EXECUTE_WRITECOPY (0x80)).
ZwAllocateVirtualMemory and ZwFreeVirtualMemory are used to allocate/free memory.

I hope I haven’t forgot something… Anyway, among all these antidebugs there’s something you should have noted because it often occurs among the VM instructions: I’m talking about the operations involving dword values pointed by 4065F3, 4065F7, 4065FB, 4065FF. At the moment I don’t say anything else but look back at the code trying to understand something more from these values; you’ll find out that the antidebug tricks are somehow linked to them. i.e.:

425C27 MOV EAX,DWORD PTR DS:[EAX] <-- eax = _CONTEXT.Dr3 425C29 AND EAX, EAX <-- check over Dr3 425C2B JZ 425C5F ... 425C5F ADD DWORD PTR DS:[4065F3],EAX <-- eax is the Dr3 value 425C65 INC DWORD PTR DS:[4065F3] 425C6B ADD DWORD PTR DS:[4065F7],EAX 425C71 INC DWORD PTR DS:[4065F7] 425C77 ADD DWORD PTR DS:[4065FB],EAX 425C7D INC DWORD PTR DS:[4065FB] 425C83 ADD DWORD PTR DS:[4065FF],EAX 425C89 INC DWORD PTR DS:[4065FF]

More about these obscure dwords later!

Final algorithm
There’s only one more thing to do, solve the challenge.
To be registered the crackme needs a valid keyfile, to check the keyfile it uses some Zw* specific functions. The names of these functions are not visible inside the exe file, but they are obtained at runtime by the same checksum algorithm used before inside the antidebug procedure. This time the functions with the respective checksums are:

0x946CE828: ZwUnmapViewOfSection
0x5F43B254: ZwMapViewOfSection
0xA7AFD948: ZwCreateSection
0x848955AC: ZwCreateFile
0x67F17733: ZwClose

Now the interesting parts (I removed useless instructions):

40CC73 PUSH 406320 <-- pointer to the name of the keyfile: "g" 40CC78 PUSH 00020002 40CC92 PUSH DWORD PTR DS:[EAX+2C] <-- 0000000C, CurrentDirectoryHandle 40CC95 PUSH 00000018 40CCA0 PUSH 01 <-- CreateDisposition: FILE_OPEN 40CCA2 PUSH 0 40CCA4 PUSH 00000080 40CCA9 PUSH 0 40CCAB LEA EAX,[ESP+6C] 40CCAF PUSH EAX 40CCB0 LEA EAX,[ESP+2C] 40CCB4 PUSH EAX <-- ObjectAttributes.ObjectName.Buffer = 406320 40CCB5 PUSH 80100080 40CCBA LEA EAX,[ESP+90] 40CCC1 PUSH EAX ... 40CCCF LEA EAX,[EBP-28] <-- [ebp-28] identifies ZwCreateFile 40CCD2 MOV EAX,DWORD PTR DS:[EAX] <-- eax = 00000025, ZwCreateFile 40CCD4 MOV EDX, ESP 40CCD6 SYSENTER <-- ZwCreateFile

The keyfile’s name is visible inside the crackme but you won’t be able to get it looking at the string list, it’s only 1 byte long. The file needs to be in the same directory of the crackme file.
To get the keyfile’s content the challenge calls both ZwCreateSection and ZwMapViewOfSection, quite unusual approach for a keyfile protection! Once it has the content of the file it performs the real and final algorithm:

40CD7C MOV EAX,DWORD PTR DS:[EBX] <-- eax points to the content of the keyfile 40CD7E ADD ESP,4C 40CD81 CMP EAX,0 <-- check to see if it's an empty keyfile or not 40CD84 JE 40D3BD <-- Jump if empty

The conditional jump will lead us to a new virtual machine used to display the error message. The text is crypted, and you won’t find the message in the string list. Filling the keyfile with some bytes you’ll face the next check:

... ; Here I have in eax, ebx, ecx and edx the 1°, 2°, 3° and 4° dword of the keyfile 40CF8F CMP AL,DL <-- is the 1° byte of the serial 0x00? 40CF93 JZ 0040D3BD 40CF9B SHR EAX, 0x08 <-- take the 2° byte 40CFA2 CMP AL,DL <-- is the 2° byte of the serial 0x00? 40CFA4 JE 40D3BD 40CFAC SHR EAX, 0x08 <-- take the 3° byte 40CFB3 CMP AL,DL <-- is the 3° byte of the serial 0x00? 40CFB9 JE 40D3BD 40CFBF SHR EAX, 0x08 <-- take the 4° byte 40CFC4 CMP AL,DL <-- is the 4° byte of the serial 0x00? 40CFCC JE 40D3BD

This piece of code is repeated in a similar way for the values stored inside ebx, ecx and edx (the rest of the keyfile’s content) and it’s just a check over each single byte. It’s obvious that the length of the keyfile must be 16 bytes long.

40D103 POP EDX <-- k4: 4° dword of the keyfile 40D108 POP ECX <-- k3: 3° dword of the keyfile 40D10D POP EBX <-- k2: 2° dword of the keyfile 40D110 POP EAX <-- k1: 1° dword of the keyfile 40D127 XOR EAX, EBX <-- k1 ^ k2 40D147 XOR EAX, ECX <-- (k1 ^ k2) ^ k3 40D175 XOR EAX, EDX <-- ((k1 ^ k2) ^ k3) ^ k4 40D198 PUSH 004065f3 40D19F POP EDI <-- edi = 4065F3 40D1A4 MOV EBP, ESP <-- change stack 40D1AA MOV ESP, EDI 40D1AE POP EBX <-- ebx = [4065F3] 40D1B1 MOV ESP, EBP <-- restore the stack 40D1D6 XOR EAX, EBX <-- (((k1 ^ k2) ^ k3) ^ k4) ^ [4065F3] 40D1F6 ADD EDI, 1 <-- edi = 4065F4 40D1F9 ADD EDI, 1 <-- edi = 4065F5 40D1FC ADD EDI, 1 <-- edi = 4065F6 40D1FF ADD EDI, 1 <-- edi = 4065F7 40D204 MOV EBP, ESP 40D208 MOV ESP, EDI <-- edi = 4065F7 40D20C POP ECX <-- ecx = [4065F7] 40D211 MOV ESP, EBP <-- restore the stack 40D21B MOV EBX, EDI <-- ebx = 4065F7 40D234 XOR EAX, ECX <-- ((((k1 ^ k2) ^ k3) ^ k4) ^ [4065F3]) ^ [4065F7] 40D25D ADD EDI, 1 <-- edi = 4065F8 40D260 ADD EDI, 1 <-- edi = 4065F9 40D265 ADD EDI, 1 <-- edi = 4065FA 40D268 ADD EDI, 1 <-- edi = 4065FB 40D26D MOV EBP, ESP 40D273 MOV ESP, EDI <-- edi = 4065FB 40D277 POP EDX <-- edx = [4065FB] 40D27A MOV ESP, EBP <-- restore the stack 40D29C XOR EAX, EDX <-- (((((k1 ^ k2) ^ k3) ^ k4) ^ [4065F3]) ^ [4065F7]) ^ [4065FB] 40D2BF INC EDI <-- edi = 4065FC 40D2C2 INC EDI <-- edi = 4065FD 40D2C5 INC EDI <-- edi = 4065FE 40D2CA INC EDI <-- edi = 4065FF 40D2CB MOV EBP, ESP 40D2CF MOV ESP, EDI 40D2D3 POP EBX <-- ebx = [4065FF] 40D2D6 MOV ESP, EBP 40D300 XOR EAX, EDX <-- ((((((k1 ^ k2) ^ k3) ^ k4) ^ [4065F3]) ^ [4065F7]) ^ [4065FB]) ^ [4065FF] 40D31D SUB EAX,4E1A9001 <-- (((((((k1 ^ k2) ^ k3) ^ k4) ^ [4065F3]) ^ [4065F7]) ^ [4065FB]) ^ [4065FF]) - 0x4E1A9001 40D347 PUSH EAX <-- push the obtained value 40D35F PUSH EAX <-- parameter: 0xA0 40D360 PUSH 406663 <-- parameter: 0x406663 40D365 CALL 401000 <-- A new VM: it calculates a value using the 3 parameters passed to the virtual machine 40D36A PUSH EAX 40D38A PUSH A640740E 40D3A7 POP EAX 40D3B0 POP EBX <-- value obtained from VM at 40D365 40D3B7 SUB EAX, EBX 40D3BB JE 40D3C4 40D3BF CALL 401A5E <-- Show error message box

40D3C4: ... 40D3EC POP EAX 40D411 JMP EAX <-- jmp and show congratulation box

That’s it, the final algorithm is all here. I’m not saying it’s easy to solve but as you can see there are some maths operations. How can I reverse everything obtaining the right 16bytes keyfile?
As always I need to start from the end, which is the address of the congratulation box?
Look @40D3BB, at the conditional jump I have to jump down and this is possible if the value obtained from the VM is 0xA640740E. Compared to the other VM, this one has only few instructions:

414ADE: push ebp 414ADF: mov ebp, esp 414AE1: push esi 414AE2: push edi 414AE3: push ebx 414AE4: push ecx 414AE5: push edx 414AE6: mov ecx, 0 414AE8: mov esi, 0x00406767 414AED: mov eax, ecx 414AEF: mov edx, 0x00000008 414AF4: test eax, 1 414AF9: ja 414B04 414AFB: shr eax, 1 414AFD: xor eax, 0EDB88320h 414B02: jmp 414B06 414B04: shr eax, 1 414B06: sub edx, 1 414B07: jne 414AF4 414B09: mov [esi], eax 414B0B: add esi, 4 414B0E: add ecx, 1 414B0F: cmp ecx, 100h 414B15: jne 414AED 414B17: push dword ptr [ebp+0Ch] 414B1A: push dword ptr [ebp+8] 414B1D: mov esi, 0x00406767 414B22: mov edi, 0xFFFFFFFF 414B27: mov ecx, 0 414B29: mov eax, [esp] 414B2C: movzx eax, byte ptr [ecx+eax] 414B30: mov edx, edi 414B32: and edx, 0FFh 414B38: xor eax, edx 414B3A: mov ebx, [esi+eax*4] 414B3D: shr edi, 8 414B40: xor edi, ebx 414B42: add ecx, 1 414B43: cmp ecx, [esp+4] 414B47: jne 414B29 414B49: add esp, 8 414B4C: not edi 414B4E: mov eax, edi 414B50: pop ebx 414B51: pop ecx 414B52: pop ebx 414B53: pop edi 414B54: pop esi 414B55: leave 414B56: retn 8

I don’t know if it’s reversable or not, I have to admit I didn’t try to fully understand it because it’s brute-able in few seconds:

unsigned char buffer_406767[1024] = { 0x00, 0x00, 0x00, 0x00, 0x96, 0x30, 0x07, 0x77, 0x2C, 0x61, ... } unsigned char buffer_406663[0xA0] = { 0x3C, 0xFE, 0xFF, ... }

uint val; uint v = 0x00401000; // The brute starts from this value bool success = false; while (!success) { val = 0xFFFFFFFF; __asm { mov eax, dword ptr [v]; mov dword ptr [buffer_406663+0x50], eax; mov edi, 0xFFFFFFFF; xor ecx, ecx; lea esi, buffer_406767; _iterate: lea eax, buffer_406663; movzx eax, byte ptr [ecx+eax]; mov edx, edi; and edx, 0FFh; xor eax, edx; mov ebx, [esi+eax*4]; shr edi, 8; xor edi, ebx; add ecx, 1; cmp ecx, 0xA0; jne _iterate; not edi; mov dword ptr [val], edi; } if (val == 0xA640740E) { printf("Val: %X", v); success = true; } else v++; }

The code returns 0x40D44E which is the right value. To check its correctness you can patch the challenge at runtime and you’ll see the right message box.
So, the final equation to solve is:

(((((((k1 ^ k2) ^ k3) ^ k4) ^ [4065F3]) ^ [4065F7]) ^ [4065FB]) ^ [4065FF]) – 0x4E1A9001 = 0x40D44E

Knowing all the fixed values it’s really easy to obtain k1, k2, k3 and k4 (fix three of them and calculate the other one…); the problem is that I can’t predict the values inside the four dwords: [4065F3], [4065F7], [4065FB] and [4065FF]. If you remember these values are updated a lot of times inside the antidebug VM.
How to get the correct values in a simple way? To solve this puzzle I used the good old “EB FE” bytes sequence. These bytes are used to send a program in an infinite loop, it’s a “jmp eip” instruction. If you patch the challenge in the right places you can sniff the right values directly from the memory. The right bytes used to patch the VM bytes are “D6 FE” because D6 is the opcode for “JMP val8” instruction and 0xFE is the offset.
I patched the exe 4 times, every time where the “xor [4065Fx]” occurs. Doing so I got the following values: 0xC3EC8A62, 0x4292F007, 0xE9E6474E and 0x55CA2C39.
In the end, there are tons of possible valid keyfiles; among all I create the one with these 16 bytes:
11 11 11 11 22 22 22 22 33 33 33 33 5D 75 09 73

Final words
I hope to see some more challenges like this one in the near future.
Ciao!

My infected computer

something strange happens inside it

AthCon 2013 RE Challenge

Leave a comment Cancel reply

Recent Posts

Archives

Categories

Blogroll

Follow me on Twitter

AthCon 2013 RE Challenge

Share this:

Related

Leave a comment Cancel reply

Recent Posts

Archives

Categories

Blogroll

Follow me on Twitter