Kraken is the word of the month for sure, but it has nothing to do with the beast from an old nice book written by Jules Verne, Twenty Thousand Leagues Under the Sea.
The word refers to a series of malwares, something like the Storm trojan, but with much more strength. Kraken seems to be out from August 2006, but until today I’ve never heard about it. Some days ago I read an article about it, the interesting part is here:
“One somewhat interesting feature of the code is that the binary is not packed, as many malware binaries tend to be. However, Royal said that the code does have some other forms of obfuscation that make it difficult to analyze completely.”. I decided to look at it.
I’m not going to give out a detailed explanation about the sample I’m working on (MD5 = 592523a88df3d043d61a14b11a79bd55), but I’ll spend some words on the “forms of obfuscation” used by the malware.
Detectors are not able to recognize any specific packer/protector. The file is not packed, but from the first lines of code it’s pretty easy to understand that a sort of obfuscation/encryption was included inside the file. I have not found interesting imports/strings, so I tried running the malware. Just to be sure to retrieve some useful information I started logging all API(s) called by the malware.
The malware calls some nice functions. Almost all the code of the binary file has been decrypted at runtime. The malware spawns one file and it deletes itself, you can spy the decrypted code but I didn’t get anything useful from it. The best thing to do is to look at the code trying to identify a general obfuscation scheme or a decryption routine. Don’t think to trace the entire exe, it’s madness!
In case like this one, if you are able to see a light over your head you are lucky, otherwise you can step and look at each instruction for the eternity. I was lucky… the real code has been hidden behind a virtual machine. I’m not a virtual machine expert for sure, I only read some articles about this kind of protection.
I won’t rebuild the entire machine, I’ll give out my findings only. If you think they are wrong and/or you want to add some more information about the virtual machine I’ll be happy to see a comment from you.
Like every virtual machine out there, after a little initialization it goes into a semi-infinite loop that starts at 4012DA. It simply selects a virtual machine instruction and jump to the code to run. There are a lot of instructions inside the loop, avoiding some junk code you can see the snippet used to select (and then jump to) the next instruction to execute:
004012E4 MOV AL,BYTE PTR DS:[ESI-1] // Byte pointed by esi-1 decides everything
004012F3 ADD AL,BL
0040F807 DEC AL
004103D9 DEC ESI // Shift to the next byte
004103E7 ROL AL,2
004103F7 DEC AL
0040F590 XOR AL,0CF
0040F594 SUB AL,6B
004104A6 ADD BL,AL
004104AF MOVZX EAX,AL
004104B7 MOV ECX,DWORD PTR DS:[EAX*4+40FABB] // EAX = index of the selected instruction
004104C6 NOT ECX
0040129C ROR ECX,1C
00410213 SUB ECX,4DCBE90C
0041021F ROL ECX,7
00410229 INC ECX
0041070D BSWAP ECX
00401195 ADD ECX,5E1E81EF
0040119C XOR ECX,77B911BC
004011AE NOT ECX
0041071B ADD ECX,60334BE6 // ECX = address of the selected instruction
0040FFF3 MOV DWORD PTR SS:[ESP+48],ECX
0040FFFB PUSH DWORD PTR SS:[ESP+48]
0040FFFF RETN 4C // Go to the selected instruction
Everything starts from the value stored inside the buffer pointed by (esi-1), the buffer contains a series of bytes and they are used to select the virtual machine instruction to execute (Moreover they are used to retrieve one or more vm_instruction’s operand). The new value stored inside EAX (obtained after some minor operations) is used to retrieve a dword value, EAX represents the index of the vector that starts at 0x40FABB. As you can see from the code above the new value is used to obtain the address of the vm_instruction to execute.
Unlike a classical virtual machine this one doesn’t have a clear Instruction Table, spying the dead list from your favorite disassembler you won’t see the address of every single vm_instruction. The Instruction Table has been crypted and the first entry is located at 0x40FABB (there are 256 entries).
The virtual machine has 16 registers (from r_0 to r_15), they can be used to store byte, word or dword data. EDI register points to the first one, the registers are stored in memory consecutively starting from r_0 to r_15.
The virtual machine has a stack with a fixed size, EBP register contains the vm_esp value. After almost all push vm_instructions there’s a stack overflow check. The alignment is two bytes, “push byte_value” is not allowed and to push a single byte the virtual machine will extend the byte to a word value.
Is there a cmp/test instruction inside the snippet? Is there a reference to a vm_eip register? Seems like this virtual machine doesn’t need them. vm_eip is replaced by (esi-1), it’s not an eip per se but it *guides* the virtual machine. I haven’t all the vm_instructions on my notes but I think there are no direct cmp/test instructions. Seems like they are not included inside the virtual machine, strange.
From what I have seen there are more than 45 vm_instructions included in the virtual machine, to identify each vm_instruction you have to remove a lot of junk code. Once you have all the vm_instructions it’s not immediate to understand what the malware is trying to do.
Example: here are the vm_instructions used to patch a dword at 0x41CE06 (1° column represents the initial address of the vm_instruction, 2° column represents the name I gave to the vm_instruction):
401028: push_dword val // push F440C1CB
401028: push_dword val // push 8040414A
40F5BE: nor_stack // The value at vm_esp+4 is updated with a nor(vm_esp+4, vm_esp) operation
4105FA: pop_dword r_i // r_15 = 0x00000202
40F36F: push_dword r_i // r_0 = 0x0041CE05
401028: push_dword val // push 98754A9F
401028: push_dword val // push 43179031
40F198: push_dword vm_esp // push vm_esp
401396: mov_stack_pstack // mov dword ptr [vm_esp], dword ptr [dword ptr [vm_esp]]
40F25C: pop_word r_i // r_14 = 0x00009031
401028: push_dword val // push 678AB562
40F198 push_dword vm_esp // push vm_esp
40FEF3: push_bdword val // push 0x00000006, push a dword but the last 24 bits are 0, so it's like a push byte extended to dword
410452: add_stack // add dword ptr [vm_esp+4], dword ptr [vm_esp]
4105FA: pop_dword r_i // r_15 = 0x216
40F0A0: pp_mov_dword // mov dword ptr [pop t1], (pop t2)
40F25C: pop_word r_i // r_11 = 0x015E4317
410452: add_stack // add dword ptr [vm_esp+4], dword ptr [vm_esp] <-- 98754A9F + 678AB562 = 1
4105FA: pop_dword r_i // r_14
410452: add_stack // add dword ptr [vm_esp+4], dword ptr [vm_esp] <-- 41CE05 + 1 = 41CE06
4105FA: pop_dword r_i // r_15
410171: mov_stack_pstack // mov dword ptr [dword ptr [vm_esp]], dword ptr [vm_esp+4] <-- patch
Quite a simple patch operation, but the author didn’t use the straight way for sure. Believe it or not, this is the nature of the malware. Now you can understand the phrase: “Don’t think to trace the entire exe, it’s madness!”.
I tried inspecting some more samples of the same Kraken family. There are some similarities/differences:
– they are protected by a virtual machine too
– the routine used to select the next vm_instruction is not the same
– (I think) the vm_instructions are equal, but they are not defined in the same way. I mean, the code used to define a push is not the same but the result is the same infact in both cases you have a push vm_instruction
– the (encrypted)Instruction Table is not the same. At index i you won’t have the same vm_instruction for malware_x and malware_y
– the vm protection exists for the spawned file too
Now I fully understand the words used by the author of the interview, it’s complex to understand what’s going on…