From time to time I really like to spend my free time trying to solve crackmes but lately it’s hard to find something interesting. Fortunately Kyriakos Economou & Nikolaos Tsapakis released a RE challenge for AthCon conference. It’s not really hard and the idea behind it is not so complex but the implementation used to hide the core of the algorithm deserves a particular attention.
(You can download this tutorial plus two idc scripts from here)
At a first glance the code is fully visible, the crackme is debug-able and it’s not protected, the protection routine starts at 407730 address. After some lines of code you can identify a sort of cycle that is repeated again and again (code between 40779A and 40A0E7). This is the core of the entire challenge, if you understand what’s going on here and how to deal with it you’ll be on a good point for solving the challenge!
I’ll present the code inside 40779A/40A0E7 range dividing it into 3 parts: initialization, core and finalization.
Part #1: initialization
40779A sub esp, 800h
4077A0 call $+5
4077A5 pop edi
4077A6 sub edi, 79h
4077A9 jmp short loc_4077AC
4077AC mov edi, [edi] <-- edi is initially 40CADE and it points to a sequence of bytes
4077AE call $+5
4077B3 pop esi
4077B4 add esi, 28C1h <-- esi = 40A074
4077BA xor ebx, ebx <-- ebx is the counter of the next loop
4077BC mov ebp, esp
4077BE mov ecx, edi
4077C0 add ecx, ebx
4077C2 mov esp, ecx
4077C4 pop edx <-- take 4 bytes from the 40CADE+ebx sequence of bytes
4077C5 mov esp, ebp
4077C7 mov ebp, esp
4077C9 mov ecx, esi <-- use the buffer at 40A074
4077CB add ecx, ebx
4077CD mov esp, ecx
4077CF pop ecx
4077D0 mov cl, dl
4077D2 push ecx <-- it takes the current byte from 40CADE+ebx putting it at 40A074+ebx
4077D3 mov esp, ebp
4077D5 inc ebx <-- increase the counter
4077D6 push ebx
4077D7 xor ebx, 0Fh
4077DA pop ebx
4077DB jz short loc_4077DF <-- it repeats the loop 0x0F times
4077DD jmp short loc_4077BC
4077DF mov ebp, esp
4077E1 mov edi, esi <-- edi = 40A074
4077E3 mov esp, edi
4077E5 pop ebx <-- first 4 bytes from 40A074
4077E6 mov esp, ebp
4077E8 mov cl, bl <-- 1° of the 16 bytes moved above
4077EA mov ebp, esp
4077EC mov edi, esi
4077EE inc edi
4077EF mov esp, edi
4077F1 pop ebx
4077F2 mov esp, ebp
4077F4 mov bh, bl <-- 2° of the 16 bytes moved above
407804 xor bl, 8Ah <-- 2° of the 16 bytes moved above xored with 0x8A
These are the instructions of the first part, basically it’s a simple block of code used to move 16 bytes from a buffer into another one. The destination buffer is always the same while I can’t say the same thing for the other one. The value of edi @4077AC changes every time and it’s always increased by a value between 1 and 16 during each iteration.
Why 16, what’s the meaning of this value? Take in mind that number.
Part #2: the core
This is the central part of the code I’m trying to understand, it’s huge because it contains many instructions inside. Looking at the code you’ll see a lot of similar snippet of code repeated. It’s the peculiarity of the protection used by the author. For a trained eye it’s easy to understand what’s going on, but for a novice could be really hard. Instead of telling you what’s behind that obscure part I prefer to show you some of the blocks:
1° block:
40788D xor cl, al
40788F pushf
407890 pop edx
407891 and dl, 40h
407894 push ebx
407895 pop ecx
407896 push ecx
407897 xor cl, 0A5h
40789A pushf
40789B add al, 4
40789D xor al, 18h
40789F pop edx
4078A0 pop ebx
4078A1 and edx, 40h
4078A4 jnz 40A834
2° block:
4078AA xor cl, al
4078AC pushf
4078AD pop edx
4078AE and dl, 40h
4078B1 push ebx
4078B2 pop ecx
4078B3 push ecx
4078B4 xor cl, 0B5h
4078B7 pushf
4078B8 add al, 4
4078BA xor al, 18h
4078BC pop edx
4078BD pop ebx
4078BE and edx, 40h
4078C1 jnz 40A87A
3° block:
4078C7 xor cl, al
4078C9 pushf
4078CA pop edx
4078CB and dl, 40h
4078CE push ebx
4078CF pop ecx
4078D0 push ecx
4078D1 xor cl, 95h
4078D4 pushf
4078D5 add al, 4
4078D7 xor al, 18h
4078D9 pop edx
4078DA pop ebx
4078DB and edx, 40h
4078DE jnz 40A89D
What’s the difference between these 3 snippets? As you can see there are two minor differences: “xor cl, xx” and the conditional jump at the end. cl value has been set inside the Initialization part and it represents the first of the 16 bytes copied to 40A074_buffer xored with 0x8A. So, trying to understand this part of the code you can think of a big sequence of IF statement where ‘cl’ is compared with a lot of values and every single value has its own code flow:
if (cl == 0xA5)
execute_snippet_for_A5;
else if (cl == 0xB5)
execute_snippet_for_B5;
else if (cl == 0x95)
execute_snippet_for_95
The sequence of bytes from 40CADE has a sense now, the first of the 16 bytes defines a specific flow for the algo. Now that I know that I’ll try to understand what’s behind every single code snippet showing you 2 more pieces of code:
if (cl == 0xE5):
40A8C0 mov eax, [edi] <--
40A8C2 inc eax <--
40A8C3 mov eax, [eax] <--
40A8C5 mov edx, [edi-24h] <-- preparation
40A8C8 mov ebx, [edi-4] <--
40A8CB push ebx <--
40A8CC popf <--
40A8CD mov edx, eax <------------ execution
40A8CF pushf <--
40A8D0 pop eax <--
40A8D1 mov [edi-4], eax <--
40A8D4 mov [edi-24h], edx <-- conclusion
40A8D7 mov eax, [edi] <--
40A8D9 add eax, 5 <--
40A8DC mov [edi], eax <--
40A8DE jmp loc_40A0CD <--
if (cl == 0xB7):
40B401 mov edx, [edi-24h] <--
40B404 mov ebx, [edi-14h] <--
40B407 mov eax, [edi-4] <-- preparation
40B40A push eax <--
40B40B popf <--
40B40C mov edx, ebx <----------- execution
40B40E pushf <--
40B40F pop eax <--
40B410 mov [edi-4], eax <--
40B413 mov [edi-24h], edx <--
40B416 mov eax, [edi] <-- conclusion
40B418 add eax, 2 <--
40B41B mov [edi], eax <--
40B41D jmp loc_40A0CD <--
This is more or less the scheme of every defined snippet, “popf” represents the end of the preparation part of the snippet and “pushf” is the beginning of the conclusion part. As you can see almost all the operations are done using values taken from “edi-xx”, a closer look at some more snippets reveals that the program uses [edi], [edi-4], [edi-8], [edi-0C], [edi-10], [edi-14], [edi-18], [edi-1C], [edi-20] and [edi-24].
From a first glance I can say that:
[edi]: it’s always increased at the end of each block. It’s sometimes taken at the beginning of a block and used to retrieve one or more byte from the buffer pointer by 40CADE
[edi-4]: this value is updated with the EFLAGS register obtained by the *execution* instruction
[edi-8]..[edi-24]: values used inside the snippet
Before the next part I have to add something more about the bytes inside the buffer at 40CADE. I said that a single byte from the buffer is compared with a fixed value in a sort of IF sequence but I was not really correct because some values are not covered by the various IF statements. The second snippet above (the one with cl == 0xB7) represents a perfect example, you reach 40B401 from here:
40A4A2 push ebx
40A4A3 xor bh, 16h
40A4A6 pop ebx
40A4A7 jz 40B401
As you can see this piece of code is really different from the 3 blocks I presented before.
Every respectable IF has an ELSE. So, think of a big sequence of IF statements followed by ELSE; ELSE starts here:
408041 xor cl, al ; 1° byte has not been found in the previous checks
408043 pushf
408044 pop edx
408045 and dl, 40h
408048 push ebx
408049 mov ebx, [edi] ; ebx = [edi]
40804B add ebx, 1200h
408051 mov bl, [ebx] ; ebx points to a buffer of 00 and 01 values (it starts from 40DCDE)
408053 cmp bl, 1
408056 jnz loc_408145
Inside the ELSE branch the flow of the program depends on the value of this 00_01 buffer too, if the value inside bl is 01 a new IF statement is implemented. This time the checks are not like the previous ones because there’s a double check over the first and sometimes the second of the 16 bytes moved above. A valid check identifies a block of code to execute.
The 00 value is another story, the code starting from 408145 is essentially used to patch at most 16 bytes between 40A074 and 40A083 at runtime.
The 1° byte is taken from 407608+index where index is the 1° of the 16 bytes, all the next bytes are simply copied. The result of the preparation/execution/conclusion is the next piece of code, “xchg eax, ebx” is the instruction created:
40A06A MOV ESP,DWORD PTR DS:[EDI-8]
40A06D PUSH DWORD PTR DS:[EDI-4]
40A070 POPFD
40A071 MOV EDI,DWORD PTR DS:[EDI-10] <-- preparation ends here
40A074 XCHG EAX,EBX <-- instruction created at runtime (xchg is only 1 byte opcode)
40A075 JMP SHORT 0040A083 <-- to avoid non sense instructions a jmp to the end in necessary
40A077 PUSH ECX
40A078 POP EDX
40A079 INC EDX
40A07A PUSH ESI
40A07B POP EDI
40A07C INC EDI
40A07D POP ESI
40A07E POP EBX
40A07F POP EDX
40A080 POP EAX
40A081 LEAVE
40A082 RETN
40A083 PUSH EDI <-- conclusion starts here
40A084 PUSHFD
40A085 SUB ESP,800
Ok, what do I have discovered so far? There are different situations: according to the first two bytes the program executes a static block of code or a snippet of code with a particular instruction created at runtime.
It’s not so easy to explain the system with words and parts of code, I suggest you to step everything few times and I’m sure you’ll get the scheme:
if ((first_byte ^ 0x8A) == 0xA5)
execute_snippet_for_A5;
else if ((first_byte ^ 0x8A) == 0xB5)
execute_snippet_for_B5;
...
else if ((first_byte ^ 0x8A) == 0x95)
execute_snippet_for_95
else
if (byte_from_buffer_40DCDE == 0)
runtime_patch
else
// another sequence of IF with a check over the first and sometimes the second byte of the 40A074 buffer
Part #3: finalization
40A0CD mov ebx, eax <-- ebx points to a byte from 40CADE buffer
40A0CF sub ecx, 10h
40A0D2 sub edx, 14h
40A0D5 add esp, 400h
40A0DB sub esi, 18h
40A0DE add esp, 400h
40A0E4 sub edi, 1Ch
40A0E7 jmp loc_40779A
This is the last part of the cycle, it’s only a series of instructions used to update some specific values. EBX register contains the same value updated before inside [edi].
#1,#2, #3: reveal the secret
So, we have collected some important information and I think we are ready to understand what’s behind this part of the challenge:
– 16 bytes are moved from 40CADE+current_offset to a new buffer
– the first byte of the new buffer is used to select a block of code and then the block will be executed
– in some cases the first two bytes from the new buffer are used to select a block of code to execute
– in some cases a value from 40DCDE buffer forces the execution of an instruction created at runtime
– one of the variables used inside the block of code is always increased at the end of the block
– one of the variables used inside the block of code is updated with the EFLAGS value returned by the most important instructions of the block
– the block of code modifies one or more variables (I’m referring to [edi-8]..[edi-24])
What’s the object that resembles the seven points explained above? The entire code is an implementation of a virtual machine which is most similar to a real machine:
– 16 bytes is the maximum length in bytes of a single instruction
– you can have one or two byte opcode defined instructions
– it has the same registers of a real machine, here is the correspondence:
Virtual machine (VM) Real machine (RM)
[edi] = EIP
[edi-4] = EFLAGS
[edi-8] = ESP
[edi-0C] = EBP
[edi-10] = EDI
[edi-14] = ESI
[edi-18] = EDX
[edi-1C] = ECX
[edi-20] = EBX
[edi-24] = EAX
EIP of the VM starts from 40CADE.
If you are still confused here’s is the explanation applied to the “snippet 1” I used above:
40A8C0 mov eax, [edi] <-- eip
40A8C2 inc eax <-- eip+1
40A8C3 mov eax, [eax] <-- take the dword pointed by eip
40A8C5 mov edx, [edi-24h] <-- eax
40A8C8 mov ebx, [edi-4] <-- EFLAGS
40A8CB push ebx <-- push EFLAGS
40A8CC popf <-- real machine has EFLAFG of the virtual machine
40A8CD mov edx, eax <-- move the dword obtained at 40A8C3
40A8CF pushf <-- push the current EFLAGS value (obtained by the mov at 40A8CD)
40A8D0 pop eax <-- get the value of EFLAGS
40A8D1 mov [edi-4], eax <-- store EFLAGS inside EFLAGS register of the virtual machine
40A8D4 mov [edi-24h], edx <-- move the dword taken at 40A8C3 inside EAX register of the virtual machine
40A8D7 mov eax, [edi] <-- eip
40A8D9 add eax, 5 <-- calculate the new EIP value adding the length of the current instruction: 5
40A8DC mov [edi], eax <-- update the EIP value of the virtual machine
A 32 bit value is taken from eip+1 and it’s moved inside EAX register of the VM, it’s easy to understand that this snippet is an implementation of a real “mov eax, val32” instruction. Take a look at how EFLAGS are updated, there is a switch between real and virtual EFLAGS values before and after the execution of the main instruction of the snippet. In the end EIP is updated adding the size of this instruction which is 5 (1 byte opcode plus 4 bytes used to define val32).
Now that you know the hardware of the VM you should not have problem understanding all the possible instructions used inside the real protection algorithm. To help you in the identification process I’m going to explain some more cases.
If you use Ida and you want to locate a specific snippet you can use this simple idc script:
#include
static main()
{
auto vmOpcode;
auto xorBytes= "80 F1";
auto startAddress = 0x0040780A;
auto xorAddress;
auto xorVal;
vmOpcode = AskAddr("", "Insert the VM opcode");
vmOpcode = vmOpcode ^ 0x8A;
// Look for the right 'xor cl, xx' instruction
while(startAddress < 0x408191) {
xorAddress = FindBinary(startAddress, SEARCH_DOWN, xorBytes);
xorVal = GetOperandValue(xorAddress, 1);
if (xorVal == vmOpcode) {
Jump(xorAddress);
return -1;
}
startAddress = xorAddress + 3;
}
Message("\nOpcode not defined...");
}
1 byte opcode defined instruction
Opcode sequence to study: “96 xx”
40CA9A mov ecx, [edi-1Ch] ; ECX from VM
40CA9D dec cx ; dec ECX
40CA9F mov [edi-1Ch], ecx ; update ECX value of VM
40CAA2 cmp cx, 0 ; is it 0?
40CAA6 jz short loc_40CAD2
40CAA8 mov esi, [edi] ; EIP
40CAAA xor ebx, ebx
40CAAC mov bl, [esi+1] ; byte after the opcode of the current instruction
40CAAF cmp bl, 80h ; byte(EIP+1) is positive or negative?
40CAB2 jb short loc_40CAC4
40CAB4 mov eax, [edi]
40CAB6 add eax, ebx ; calculate EIP + byte(EIP+1)
40CAB8 sub eax, 0FEh ; -2 is the size in byte of the current VM instruction
40CABD mov [edi], eax ; update EIP
40CABF jmp loc_40A0CD
40CAC4 mov eax, [edi]
40CAC6 add eax, ebx ; calculate EIP + byte(EIP+1)
40CAC8 add eax, 2 ; +2 is the size in byte of the current VM instruction
40CACB mov [edi], eax ; update EIP
40CACD jmp loc_40A0CD
40CAD2 mov eax, [edi] ; EIP
40CAD4 add eax, 2 ; calculate the new EIP
40CAD7 mov [edi], eax ; update EIP
40CAD9 jmp loc_40A0CD
Not direct as “mov eax, val32” indeed but easy to understand. The snippet defines a “Loop rel8” instruction: CX is the counter, if CX is zero the EIP is updated with the address of the instruction after this one otherwise EIP is replaced with the address of the first instruction of the loop (offsets of -128..+127 are allowed).
2 bytes opcode defined instruction
Opcode sequence to study: “BD 74”
40BACA mov edx, [edi-1Ch] ; ECX from VM
40BACD mov eax, [edi-4] ; EFLAGS
40BAD0 push eax
40BAD1 popf ; load the VM EFLAGS inside real machine
40BAD2 and edx, edx ; and ECX, ECX
40BAD4 pushf
40BAD5 pop eax
40BAD6 mov [edi-4], eax ; store new EFLAGS
40BAD9 mov eax, [edi] ; EIP
40BADB add eax, 2 ; EIP + EIP + 2
40BADE mov [edi], eax ; update EIP
“and ECX, ECX”, 2 bytes long instruction.
Runtime patched instruction
This particular instructions are generated at runtime.
Opcode sequence to study: “1E A1 30 00 00 00”. 0x1E is not defined as a 1 byte opcode and the value from 00_01 buffer is 0x00:
408049 MOV EBX,DWORD PTR DS:[EDI] ; [edi] = 40CB23
40804B ADD EBX,1200 ; ebx = 40DD23
408051 MOV BL,BYTE PTR DS:[EBX] ; bl = 0x00
408053 CMP BL,1
408056 JNE 00408145 ; jump!
Ok, I’m sure about the fact that the current instruction of the VM is generated at runtime. As I announced before only the first byte from the opcode sequence is changed, all the others remain untouched. The first byte is the index of value 0x1E inside buffer_407608:
buffer_407608[0x64] = 0x1E
64 is the first byte and the generated instruction will be:
64 A1 30 00 00 00 MOV EAX,DWORD PTR FS:[30]
And now?
Now there’s something more to say because this VM is not the only one used inside the program, the challenge uses some more VM inside the protection algorithm. Don’t be scared, this time it’s really easy to understand them because at the hardware level they are like the one I presented to you. The difference resides in the VM’s byte sequence and the base of code of the machine. Every new VM is called from the one I presented to you using a “call” instruction. To recognize a new VM you simply have to catch 0x36 opcode byte, you’ll see that the VM_call will be used a lot of times inside the algo.
As a sample look at this decoded snippet to see how the VM_401ADE is called:
40CB16 pop eax
40CB17 call 401ADE
40CB1C push ebp
---
401ADE push 41F730h ; Address for VM_401ADE definition
401AE3 retn
If you go to 41F730 and you follow the way of thinking used for the previous VM you’ll be able to understand this new one in few minutes.
You can apply the same method for all the other defined virtual machines.
How to approach the real protection algorithm
Now that I have a good background of the protection mechanism I need a strategy. You can of course step line by line the entire code, you’ll understand everything about the protection routine for sure but it takes a looot of time. You can’t predict how long is the real protection routine and the “fs:[30]” should discourage you. Anyway, if you want you will have to:
1. use some smart bpx (conditional above all)
2. need to patch the VM byte sequence in some places if you want to avoid anti debug checks (yes, the code has some anti tricks inside too!).
I did almost all the work directly from a dead list of the real protection routine, I used an idc script to decode the VM and I have to admit it’s not so hard to understand what’s going on by simply reading the instructions sequence. Despite that, with an extra effort I think you can make the protection routine fully debug-able. The idea is to decode every single VM instruction putting them all inside a piece of unused code. When you have all the decoded instruction you can step them! I did try this strategy for few instructions only when I didn’t understand some obscure parts from the dead list approach. Above all you have to pay attention to two things:
1. look at the length in bytes of each VM instruction, sometimes it’s not the same of a real identical instruction (i.e. VM bytes: 40 01)
2. absolute and conditional jumps need to take care of extra bytes of the previous/next instructions
You can find the idc script I used to decode the virtual machine inside the attachment. It doesn’t cover all the VM opcodes because I added the necessary opcodes only; you can complete it if you want :)
Antidebug
Try to run the idc script, it produces a huge list of instructions (take in mind that there are some more because the instructions inside the calls are not listed). To slow down the static analysis the code was filled with junk code, you’ll surely find useless jump instructions, loop and even entire blocks of instruction used to annoy your analysis. Here are some examples:
40CAE2: mov ecx, 0x00000100 <-- ecx = 0x100
40CAE7: mov edi, ebx
40CAE9: sub ecx, 1
40CAEA: jp 40CAEC <-- useless jmp instruction
40CAEC: je 40CAEE <-- useless jmp instruction
40CAEE: jo 40CAF0 <-- useless jmp instruction
40CAF0: loop 40CAE7 <-- useless loop
What’s the sense of these piece of code? It doesn’t have sense indeed and I think it was used to prevent a single step debugging session, will you step a 0x100 loop from a VM line by line? I don’t think so.
Now, take a look at this block:
40CAF4: push eax
40CAF5: push ebx
40CAF6: push ecx
40CAF7: push edx
40CAF8: push esi
40CAF9: push edi
40CAFA: mov eax, 0x00000074
40CAFF: mov ebx, ecx
40CB01: push 0x78657464
40CB06: pop esi
40CB07: jmp 40CB0B
40CB09: mov eax, 0
40CB0B: mov al, 0
40CB0D: push edi
40CB0E: xchg eax, ebx
40CB0F: pop ecx
40CB10: sub ecx, 1
40CB11: pop edi
40CB12: pop esi
40CB13: pop edx
40CB14: pop ecx
40CB15: pop ebx
40CB16: pop eax
Well, as you can see all the modifications applied inside 40CAFA/40CB10 are nullified by the push/pop instructions.
In addition to the junk code pay attention to the “call 401ADE”, it’s the antidebug VM and it’s called a lot of times. Inside this VM you’ll find:
– GetTickCount: a classical check over the time passed between the execution of two distinct instructions, if the gap is above a specific fixed value it means the user is currently debugging the challenge.
– bpx check over some functions: first of all the challenge takes the name of all the exported functions from ntdll, and then it applies a checksum algo trying to identify some specific Zw* functions, the checksum algo is:
424C3C MOVZX EBX,BYTE PTR DS:[EAX] <-- current char of the export name
424C3F ROL EBX,7
424C42 ADD ESI,EBX <-- esi is the current checksum (it starts from value 0)
424C44 ROL ESI,7
424C47 XOR ESI,EBX
424C49 ADD EAX, 1 <-- move to the next char of the export name
424C4A CMP BYTE PTR DS:[EAX],0 <-- check to see if it's the end of the export name
424C4D JZ 424C51 <-- jump if scan is over
424C4F JMP 424C3C <-- jump up and check the next char
424C51 MOV EAX, ESI
424C53 MUL ESI
424C55 ADD EAX,EDX
424C57 CMP EAX,318A50B7 <-- check the calculated checksum
424C5C JZ 00424D8E
The checksum value is compared with some fixed values, here is the list:
0x318A50B7: ZwSetInformationThread
0xE27847F7: ZwFreeVirtualMemory
0x95AAF2E1: ZwDelayExecution
0x8AFA4D6D: ZwQueryInformationProcess
0xD2950638: ZwGetContextThread
0x25F2995D: ZwQueryVirtualMemory
0x217A4264: ZwAllocateVirtualMemory
When a function is found the challenge performs a check over the first byte of the function. This is the snippet used to check the correctness of ZwSetInformationThread:
424C62 MOV EAX,DWORD PTR SS:[EBP-44] <-- address of the 1° byte in memory of the function
424C65 MOVZX EBX,BYTE PTR DS:[EAX] <-- take the first byte, 0xB8
424C68 ADD BL,10 <-- BL = 0xB8 + 0x10 = 0xC8
424C6B CMP BL,0C8 <-- 1° byte check!
424C6E jz 424CA2 <-- jump if the byte is ok
The check appears to be a bpx check over the function. This kind of check is performed over the other functions too (the code is slightly modified but the sense is the same).
– ThreadHideFromDebugger antidebug: it uses ZwSetInformationThread with THREAD_INFORMATION_CLASS equal to ThreadHideFromDebugger. After the execution (via SYSENTER) of this function it’s impossible to debug the challenge.
– Check over the parameters passed to some Zw functions: this check is performed on ZwSetInformationThread and ZwQueryInformationProcess and the scheme is the same, a series of cmp-jnz for every single parameter passed to the function. Here is what happens with ZwSetInformationThread:
424D32 CMP DWORD PTR SS:[ESP],0 <-- 0 has been pushed at 424CF0
424D36 JNZ 424CFE <-- jump if the parameter is not the expected value
424D38 CMP DWORD PTR SS:[ESP+4],-2 <-- 0xFFFFFFE9 pushed at 424CE6
424D3D JNZ 424CFE
424D3F CMP DWORD PTR SS:[ESP+8],11 <-- 0x11 (ThreadHideFromDebugger) pushed at 404CE4
424D44 JNZ 424CFE
424D46 CMP DWORD PTR SS:[ESP+0C],0 <-- 0 pushed at 424CE2
424D4B JNZ 424CFE
424D4D CMP DWORD PTR SS:[ESP+10],0 <-- 0 pushed at 424CE0
424D52 JNZ 424CFE
As you can see if there’s an unexpected value the conditional jump will bring you to 424CFE which is the part of the code called when the debugger has been caught!
– ProcessDebugPort check: another check used to reveal an active debugger over the challenge.
424F2E PUSH 0x0000000 <-- ReturnLength
424F30 PUSH 0x0000004 <-- ProcessInformationLength
424F32 LEA EAX, [EBP-74h]
424F35 PUSH EAX <-- ProcessInformation
424F36 PUSH 0x0000007 <-- ProcessDebugPort
424F38 PUSH 0xFFFFFFFF <-- ProcessHandle
424F3A PUSH 0x0000000
...
424F4A SYSENTER <-- ZwQueryInformationProcess
424F4C AND EAX, EAX <-- NTSTATUS success or error code?
424F4E JE 424F82
...
424FDA AND EAX, EAX <-- eax = port number of the debugger for the process
424FDC JE 425010 <-- 0 if you are not debugging the challenge
– NtDelayExecution trick: the use of this function seems to be useful against automated scanning system because they have a time-out for the scanning task. I can’t proof if this trick works or not…
– Debug registers check: it uses NtGetContextThread to retrieve the CONTEXT information, then it checks the content of some specific dr registers.
425469 LEA EAX,[EBP-200] <-- eax = CONTEXT structure
42546F ADD EAX,4
425472 MOV EAX,DWORD PTR DS:[EAX] <-- eax = _CONTEXT.Dr0
425474 AND EAX, EAX <-- Dr0 = 0 means no bpx
425476 JZ 4254A8
...
42552A LEA EAX, [EBP-200h]
425530 ADD EAX,8
425533 MOV EAX,DWORD PTR DS:[EAX] <-- eax = _CONTEXT.Dr1
425535 AND EAX, EAX
425537 JZ 425569
...
425879 LEA EAX,[EBP-200]
42587F ADD EAX,0C
425882 MOV EAX,DWORD PTR DS:[EAX] <-- eax = _CONTEXT.Dr2
425884 AND EAX, EAX
425886 JZ 4258BA
...
425C1E LEA EAX,[EBP-200]
425C24 ADD EAX,10
425C27 MOV EAX,DWORD PTR DS:[EAX] <-- eax = _CONTEXT.Dr3
425C29 AND EAX, EAX
425C2B JZ 425C5F
– Sections protect value: the challenge uses ZwQueryVirtualMemory when it has to check _MEMORY_BASIC_INFORMATION.Protect values. It controls 3 sections: .text, .rdata and .data and the associated values (PAGE_EXECUTE_READWRITE (0x40), PAGE_READONLY (0x02) and PAGE_EXECUTE_WRITECOPY (0x80)).
ZwAllocateVirtualMemory and ZwFreeVirtualMemory are used to allocate/free memory.
I hope I haven’t forgot something… Anyway, among all these antidebugs there’s something you should have noted because it often occurs among the VM instructions: I’m talking about the operations involving dword values pointed by 4065F3, 4065F7, 4065FB, 4065FF. At the moment I don’t say anything else but look back at the code trying to understand something more from these values; you’ll find out that the antidebug tricks are somehow linked to them. i.e.:
425C27 MOV EAX,DWORD PTR DS:[EAX] <-- eax = _CONTEXT.Dr3
425C29 AND EAX, EAX <-- check over Dr3
425C2B JZ 425C5F
...
425C5F ADD DWORD PTR DS:[4065F3],EAX <-- eax is the Dr3 value
425C65 INC DWORD PTR DS:[4065F3]
425C6B ADD DWORD PTR DS:[4065F7],EAX
425C71 INC DWORD PTR DS:[4065F7]
425C77 ADD DWORD PTR DS:[4065FB],EAX
425C7D INC DWORD PTR DS:[4065FB]
425C83 ADD DWORD PTR DS:[4065FF],EAX
425C89 INC DWORD PTR DS:[4065FF]
More about these obscure dwords later!
Final algorithm
There’s only one more thing to do, solve the challenge.
To be registered the crackme needs a valid keyfile, to check the keyfile it uses some Zw* specific functions. The names of these functions are not visible inside the exe file, but they are obtained at runtime by the same checksum algorithm used before inside the antidebug procedure. This time the functions with the respective checksums are:
0x946CE828: ZwUnmapViewOfSection
0x5F43B254: ZwMapViewOfSection
0xA7AFD948: ZwCreateSection
0x848955AC: ZwCreateFile
0x67F17733: ZwClose
Now the interesting parts (I removed useless instructions):
40CC73 PUSH 406320 <-- pointer to the name of the keyfile: "g"
40CC78 PUSH 00020002
40CC92 PUSH DWORD PTR DS:[EAX+2C] <-- 0000000C, CurrentDirectoryHandle
40CC95 PUSH 00000018
40CCA0 PUSH 01 <-- CreateDisposition: FILE_OPEN
40CCA2 PUSH 0
40CCA4 PUSH 00000080
40CCA9 PUSH 0
40CCAB LEA EAX,[ESP+6C]
40CCAF PUSH EAX
40CCB0 LEA EAX,[ESP+2C]
40CCB4 PUSH EAX <-- ObjectAttributes.ObjectName.Buffer = 406320
40CCB5 PUSH 80100080
40CCBA LEA EAX,[ESP+90]
40CCC1 PUSH EAX
...
40CCCF LEA EAX,[EBP-28] <-- [ebp-28] identifies ZwCreateFile
40CCD2 MOV EAX,DWORD PTR DS:[EAX] <-- eax = 00000025, ZwCreateFile
40CCD4 MOV EDX, ESP
40CCD6 SYSENTER <-- ZwCreateFile
The keyfile’s name is visible inside the crackme but you won’t be able to get it looking at the string list, it’s only 1 byte long. The file needs to be in the same directory of the crackme file.
To get the keyfile’s content the challenge calls both ZwCreateSection and ZwMapViewOfSection, quite unusual approach for a keyfile protection! Once it has the content of the file it performs the real and final algorithm:
40CD7C MOV EAX,DWORD PTR DS:[EBX] <-- eax points to the content of the keyfile
40CD7E ADD ESP,4C
40CD81 CMP EAX,0 <-- check to see if it's an empty keyfile or not
40CD84 JE 40D3BD <-- Jump if empty
The conditional jump will lead us to a new virtual machine used to display the error message. The text is crypted, and you won’t find the message in the string list. Filling the keyfile with some bytes you’ll face the next check:
... ; Here I have in eax, ebx, ecx and edx the 1°, 2°, 3° and 4° dword of the keyfile
40CF8F CMP AL,DL <-- is the 1° byte of the serial 0x00?
40CF93 JZ 0040D3BD
40CF9B SHR EAX, 0x08 <-- take the 2° byte
40CFA2 CMP AL,DL <-- is the 2° byte of the serial 0x00?
40CFA4 JE 40D3BD
40CFAC SHR EAX, 0x08 <-- take the 3° byte
40CFB3 CMP AL,DL <-- is the 3° byte of the serial 0x00?
40CFB9 JE 40D3BD
40CFBF SHR EAX, 0x08 <-- take the 4° byte
40CFC4 CMP AL,DL <-- is the 4° byte of the serial 0x00?
40CFCC JE 40D3BD
This piece of code is repeated in a similar way for the values stored inside ebx, ecx and edx (the rest of the keyfile’s content) and it’s just a check over each single byte. It’s obvious that the length of the keyfile must be 16 bytes long.
40D103 POP EDX <-- k4: 4° dword of the keyfile
40D108 POP ECX <-- k3: 3° dword of the keyfile
40D10D POP EBX <-- k2: 2° dword of the keyfile
40D110 POP EAX <-- k1: 1° dword of the keyfile
40D127 XOR EAX, EBX <-- k1 ^ k2
40D147 XOR EAX, ECX <-- (k1 ^ k2) ^ k3
40D175 XOR EAX, EDX <-- ((k1 ^ k2) ^ k3) ^ k4
40D198 PUSH 004065f3
40D19F POP EDI <-- edi = 4065F3
40D1A4 MOV EBP, ESP <-- change stack
40D1AA MOV ESP, EDI
40D1AE POP EBX <-- ebx = [4065F3]
40D1B1 MOV ESP, EBP <-- restore the stack
40D1D6 XOR EAX, EBX <-- (((k1 ^ k2) ^ k3) ^ k4) ^ [4065F3]
40D1F6 ADD EDI, 1 <-- edi = 4065F4
40D1F9 ADD EDI, 1 <-- edi = 4065F5
40D1FC ADD EDI, 1 <-- edi = 4065F6
40D1FF ADD EDI, 1 <-- edi = 4065F7
40D204 MOV EBP, ESP
40D208 MOV ESP, EDI <-- edi = 4065F7
40D20C POP ECX <-- ecx = [4065F7]
40D211 MOV ESP, EBP <-- restore the stack
40D21B MOV EBX, EDI <-- ebx = 4065F7
40D234 XOR EAX, ECX <-- ((((k1 ^ k2) ^ k3) ^ k4) ^ [4065F3]) ^ [4065F7]
40D25D ADD EDI, 1 <-- edi = 4065F8
40D260 ADD EDI, 1 <-- edi = 4065F9
40D265 ADD EDI, 1 <-- edi = 4065FA
40D268 ADD EDI, 1 <-- edi = 4065FB
40D26D MOV EBP, ESP
40D273 MOV ESP, EDI <-- edi = 4065FB
40D277 POP EDX <-- edx = [4065FB]
40D27A MOV ESP, EBP <-- restore the stack
40D29C XOR EAX, EDX <-- (((((k1 ^ k2) ^ k3) ^ k4) ^ [4065F3]) ^ [4065F7]) ^ [4065FB]
40D2BF INC EDI <-- edi = 4065FC
40D2C2 INC EDI <-- edi = 4065FD
40D2C5 INC EDI <-- edi = 4065FE
40D2CA INC EDI <-- edi = 4065FF
40D2CB MOV EBP, ESP
40D2CF MOV ESP, EDI
40D2D3 POP EBX <-- ebx = [4065FF]
40D2D6 MOV ESP, EBP
40D300 XOR EAX, EDX <-- ((((((k1 ^ k2) ^ k3) ^ k4) ^ [4065F3]) ^ [4065F7]) ^ [4065FB]) ^ [4065FF]
40D31D SUB EAX,4E1A9001 <-- (((((((k1 ^ k2) ^ k3) ^ k4) ^ [4065F3]) ^ [4065F7]) ^ [4065FB]) ^ [4065FF]) - 0x4E1A9001
40D347 PUSH EAX <-- push the obtained value
40D35F PUSH EAX <-- parameter: 0xA0
40D360 PUSH 406663 <-- parameter: 0x406663
40D365 CALL 401000 <-- A new VM: it calculates a value using the 3 parameters passed to the virtual machine
40D36A PUSH EAX
40D38A PUSH A640740E
40D3A7 POP EAX
40D3B0 POP EBX <-- value obtained from VM at 40D365
40D3B7 SUB EAX, EBX
40D3BB JE 40D3C4
40D3BF CALL 401A5E <-- Show error message box
40D3C4:
...
40D3EC POP EAX
40D411 JMP EAX <-- jmp and show congratulation box
That’s it, the final algorithm is all here. I’m not saying it’s easy to solve but as you can see there are some maths operations. How can I reverse everything obtaining the right 16bytes keyfile?
As always I need to start from the end, which is the address of the congratulation box?
Look @40D3BB, at the conditional jump I have to jump down and this is possible if the value obtained from the VM is 0xA640740E. Compared to the other VM, this one has only few instructions:
414ADE: push ebp
414ADF: mov ebp, esp
414AE1: push esi
414AE2: push edi
414AE3: push ebx
414AE4: push ecx
414AE5: push edx
414AE6: mov ecx, 0
414AE8: mov esi, 0x00406767
414AED: mov eax, ecx
414AEF: mov edx, 0x00000008
414AF4: test eax, 1
414AF9: ja 414B04
414AFB: shr eax, 1
414AFD: xor eax, 0EDB88320h
414B02: jmp 414B06
414B04: shr eax, 1
414B06: sub edx, 1
414B07: jne 414AF4
414B09: mov [esi], eax
414B0B: add esi, 4
414B0E: add ecx, 1
414B0F: cmp ecx, 100h
414B15: jne 414AED
414B17: push dword ptr [ebp+0Ch]
414B1A: push dword ptr [ebp+8]
414B1D: mov esi, 0x00406767
414B22: mov edi, 0xFFFFFFFF
414B27: mov ecx, 0
414B29: mov eax, [esp]
414B2C: movzx eax, byte ptr [ecx+eax]
414B30: mov edx, edi
414B32: and edx, 0FFh
414B38: xor eax, edx
414B3A: mov ebx, [esi+eax*4]
414B3D: shr edi, 8
414B40: xor edi, ebx
414B42: add ecx, 1
414B43: cmp ecx, [esp+4]
414B47: jne 414B29
414B49: add esp, 8
414B4C: not edi
414B4E: mov eax, edi
414B50: pop ebx
414B51: pop ecx
414B52: pop ebx
414B53: pop edi
414B54: pop esi
414B55: leave
414B56: retn 8
I don’t know if it’s reversable or not, I have to admit I didn’t try to fully understand it because it’s brute-able in few seconds:
unsigned char buffer_406767[1024] = { 0x00, 0x00, 0x00, 0x00, 0x96, 0x30, 0x07, 0x77, 0x2C, 0x61, ... }
unsigned char buffer_406663[0xA0] = { 0x3C, 0xFE, 0xFF, ... }
uint val;
uint v = 0x00401000; // The brute starts from this value
bool success = false;
while (!success) {
val = 0xFFFFFFFF;
__asm {
mov eax, dword ptr [v];
mov dword ptr [buffer_406663+0x50], eax;
mov edi, 0xFFFFFFFF;
xor ecx, ecx;
lea esi, buffer_406767;
_iterate:
lea eax, buffer_406663;
movzx eax, byte ptr [ecx+eax];
mov edx, edi;
and edx, 0FFh;
xor eax, edx;
mov ebx, [esi+eax*4];
shr edi, 8;
xor edi, ebx;
add ecx, 1;
cmp ecx, 0xA0;
jne _iterate;
not edi;
mov dword ptr [val], edi;
}
if (val == 0xA640740E) {
printf("Val: %X", v);
success = true;
}
else
v++;
}
The code returns 0x40D44E which is the right value. To check its correctness you can patch the challenge at runtime and you’ll see the right message box.
So, the final equation to solve is:
(((((((k1 ^ k2) ^ k3) ^ k4) ^ [4065F3]) ^ [4065F7]) ^ [4065FB]) ^ [4065FF]) – 0x4E1A9001 = 0x40D44E
Knowing all the fixed values it’s really easy to obtain k1, k2, k3 and k4 (fix three of them and calculate the other one…); the problem is that I can’t predict the values inside the four dwords: [4065F3], [4065F7], [4065FB] and [4065FF]. If you remember these values are updated a lot of times inside the antidebug VM.
How to get the correct values in a simple way? To solve this puzzle I used the good old “EB FE” bytes sequence. These bytes are used to send a program in an infinite loop, it’s a “jmp eip” instruction. If you patch the challenge in the right places you can sniff the right values directly from the memory. The right bytes used to patch the VM bytes are “D6 FE” because D6 is the opcode for “JMP val8” instruction and 0xFE is the offset.
I patched the exe 4 times, every time where the “xor [4065Fx]” occurs. Doing so I got the following values: 0xC3EC8A62, 0x4292F007, 0xE9E6474E and 0x55CA2C39.
In the end, there are tons of possible valid keyfiles; among all I create the one with these 16 bytes:
11 11 11 11 22 22 22 22 33 33 33 33 5D 75 09 73
Final words
I hope to see some more challenges like this one in the near future.
Ciao!