April 2008


There are a lot of online storage services around the net, private or public. With this kind of services it’s pretty easy to save/share personal data. There’s a huge use of this kind of services, especially the ones that let you share files. They offer a free service (you often have a sort of Mb limit) and a fee service (no limit). I never tried uploading a file but I sometimes download files using Rapidshare, the most popular I think.

Like every fee service it’s prone to phishing/fraud. I stumbled on a phish site just today when I wanted to download an archive. As always you click on a link and the initial Rapidshare page appears. Not this time.

The Rapidshare’s link was obscured using ProtectLinks. The address of the archive appears like: “http://protect-links.com/_a_number”. They simply assign a number to a specific web page displaying the content of the web page in this way:

It’s an empty page with a definition of an iframe at the end. iframe tag is used to create an inline frame that contains another document. You can set one or more attributes (frameborder, height,name, width and src), I’m interested in the src attribute only. src is used to define the url of the document to show inside the iframe. From what I have seen that’s how protect-links protects a web page.

This is only one of the services available around the net. In general, I don’t know why people need to protect a page with this kind of services btw.

Anyway, how to protect a rapidshare link? A classic rapidshare link looks like:
http://rapidshare.com/files/_a_number_/_filename_
A protected link declared inside the src attribute looks like:
src=”http://_server_name_path/?link=_original_url_”
_orignal_url_ is the parameter passed to the php page and it represents the original rapidshare link.

Trying to download the file I got this page:

The image above represents an error message, it’s generally displayed when you don’t have a premium cookie saved on your system. This is not the common page I see when I want to download a file. Normally, the original page contains two boxes and it lets you decide to use a free or a premium service. Hitting the premium button without a premium cookie you get this kind of error message.

The page is well defined, the design is like the original one but it’s a fake page. Inspecting some menu items you’ll see that they don’t have the same initial part of the url, they point to two different servers.
Anyway, if you are a registered premium user and you see the error message you simply use your account to login… that’s the problem, when you hit the login button you won’t see anything else than a white page. The result is obvious, your data are now property of someone else.

Can you understand why some people need to protect the link? Well, when a link has been protected you can’t see the original url… and you don’t know where you are sending your login details. This is an unfair use of the protector service for sure.

What to do to protect ourselves from this kind of fraud?
There’s a security advise at rapidshare.com, part of the text sound like: “Generally you should never enter your login information on any websites other than rapidshare.com. Your account information would most likely be stolen.”. That’s a good hint to follow!

Kraken is the word of the month for sure, but it has nothing to do with the beast from an old nice book written by Jules Verne, Twenty Thousand Leagues Under the Sea.
The word refers to a series of malwares, something like the Storm trojan, but with much more strength. Kraken seems to be out from August 2006, but until today I’ve never heard about it. Some days ago I read an article about it, the interesting part is here:
“One somewhat interesting feature of the code is that the binary is not packed, as many malware binaries tend to be. However, Royal said that the code does have some other forms of obfuscation that make it difficult to analyze completely.”. I decided to look at it.

I’m not going to give out a detailed explanation about the sample I’m working on (MD5 = 592523a88df3d043d61a14b11a79bd55), but I’ll spend some words on the “forms of obfuscation” used by the malware.

Detectors are not able to recognize any specific packer/protector. The file is not packed, but from the first lines of code it’s pretty easy to understand that a sort of obfuscation/encryption was included inside the file. I have not found interesting imports/strings, so I tried running the malware. Just to be sure to retrieve some useful information I started logging all API(s) called by the malware.
The malware calls some nice functions. Almost all the code of the binary file has been decrypted at runtime. The malware spawns one file and it deletes itself, you can spy the decrypted code but I didn’t get anything useful from it. The best thing to do is to look at the code trying to identify a general obfuscation scheme or a decryption routine. Don’t think to trace the entire exe, it’s madness!

In case like this one, if you are able to see a light over your head you are lucky, otherwise you can step and look at each instruction for the eternity. I was lucky… the real code has been hidden behind a virtual machine. I’m not a virtual machine expert for sure, I only read some articles about this kind of protection.
I won’t rebuild the entire machine, I’ll give out my findings only. If you think they are wrong and/or you want to add some more information about the virtual machine I’ll be happy to see a comment from you.

Like every virtual machine out there, after a little initialization it goes into a semi-infinite loop that starts at 4012DA. It simply selects a virtual machine instruction and jump to the code to run. There are a lot of instructions inside the loop, avoiding some junk code you can see the snippet used to select (and then jump to) the next instruction to execute:

004012E4 MOV AL,BYTE PTR DS:[ESI-1] // Byte pointed by esi-1 decides everything
004012F3 ADD AL,BL
0040F807 DEC AL
004103D9 DEC ESI // Shift to the next byte
004103E7 ROL AL,2
004103F7 DEC AL
0040F590 XOR AL,0CF
0040F594 SUB AL,6B
004104A6 ADD BL,AL
004104AF MOVZX EAX,AL
004104B7 MOV ECX,DWORD PTR DS:[EAX*4+40FABB] // EAX = index of the selected instruction
004104C6 NOT ECX
0040129C ROR ECX,1C
00410213 SUB ECX,4DCBE90C
0041021F ROL ECX,7
00410229 INC ECX
0041070D BSWAP ECX
00401195 ADD ECX,5E1E81EF
0040119C XOR ECX,77B911BC
004011AE NOT ECX
0041071B ADD ECX,60334BE6 // ECX = address of the selected instruction
0040FFF3 MOV DWORD PTR SS:[ESP+48],ECX
0040FFFB PUSH DWORD PTR SS:[ESP+48]
0040FFFF RETN 4C // Go to the selected instruction

Everything starts from the value stored inside the buffer pointed by (esi-1), the buffer contains a series of bytes and they are used to select the virtual machine instruction to execute (Moreover they are used to retrieve one or more vm_instruction’s operand). The new value stored inside EAX (obtained after some minor operations) is used to retrieve a dword value, EAX represents the index of the vector that starts at 0×40FABB. As you can see from the code above the new value is used to obtain the address of the vm_instruction to execute.
Unlike a classical virtual machine this one doesn’t have a clear Instruction Table, spying the dead list from your favorite disassembler you won’t see the address of every single vm_instruction. The Instruction Table has been crypted and the first entry is located at 0×40FABB (there are 256 entries).
The virtual machine has 16 registers (from r_0 to r_15), they can be used to store byte, word or dword data. EDI register points to the first one, the registers are stored in memory consecutively starting from r_0 to r_15.
The virtual machine has a stack with a fixed size, EBP register contains the vm_esp value. After almost all push vm_instructions there’s a stack overflow check. The alignment is two bytes, “push byte_value” is not allowed and to push a single byte the virtual machine will extend the byte to a word value.

Is there a cmp/test instruction inside the snippet? Is there a reference to a vm_eip register? Seems like this virtual machine doesn’t need them. vm_eip is replaced by (esi-1), it’s not an eip per se but it *guides* the virtual machine. I haven’t all the vm_instructions on my notes but I think there are no direct cmp/test instructions. Seems like they are not included inside the virtual machine, strange.

From what I have seen there are more than 45 vm_instructions included in the virtual machine, to identify each vm_instruction you have to remove a lot of junk code. Once you have all the vm_instructions it’s not immediate to understand what the malware is trying to do.
Example: here are the vm_instructions used to patch a dword at 0×41CE06 (1° column represents the initial address of the vm_instruction, 2° column represents the name I gave to the vm_instruction):

401028: push_dword val // push F440C1CB
401028: push_dword val // push 8040414A
40F5BE: nor_stack // The value at vm_esp+4 is updated with a nor(vm_esp+4, vm_esp) operation
4105FA: pop_dword r_i // r_15 = 0x00000202
40F36F: push_dword r_i // r_0 = 0x0041CE05
401028: push_dword val // push 98754A9F
401028: push_dword val // push 43179031
40F198: push_dword vm_esp // push vm_esp
401396: mov_stack_pstack // mov dword ptr [vm_esp], dword ptr [dword ptr [vm_esp]]
40F25C: pop_word r_i // r_14 = 0x00009031
401028: push_dword val // push 678AB562
40F198 push_dword vm_esp // push vm_esp
40FEF3: push_bdword val // push 0x00000006, push a dword but the last 24 bits are 0, so it's like a push byte extended to dword
410452: add_stack // add dword ptr [vm_esp+4], dword ptr [vm_esp]
4105FA: pop_dword r_i // r_15 = 0x216
40F0A0: pp_mov_dword // mov dword ptr [pop t1], (pop t2)
40F25C: pop_word r_i // r_11 = 0x015E4317
410452: add_stack // add dword ptr [vm_esp+4], dword ptr [vm_esp] <-- 98754A9F + 678AB562 = 1
4105FA: pop_dword r_i // r_14
410452: add_stack // add dword ptr [vm_esp+4], dword ptr [vm_esp] <-- 41CE05 + 1 = 41CE06
4105FA: pop_dword r_i // r_15
410171: mov_stack_pstack // mov dword ptr [dword ptr [vm_esp]], dword ptr [vm_esp+4] <-- patch

Quite a simple patch operation, but the author didn’t use the straight way for sure. Believe it or not, this is the nature of the malware. Now you can understand the phrase: “Don’t think to trace the entire exe, it’s madness!”.

I tried inspecting some more samples of the same Kraken family. There are some similarities/differences:
- they are protected by a virtual machine too
- the routine used to select the next vm_instruction is not the same
- (I think) the vm_instructions are equal, but they are not defined in the same way. I mean, the code used to define a push is not the same but the result is the same infact in both cases you have a push vm_instruction
- the (encrypted)Instruction Table is not the same. At index i you won’t have the same vm_instruction for malware_x and malware_y
- the vm protection exists for the spawned file too

Now I fully understand the words used by the author of the interview, it’s complex to understand what’s going on…

Just yesterday a new version of Ollydbg was released, but I’m still using the old 1.10 version. It’s a really good debugger and until some days ago I did hit on few errors inside the disasm engine, nothing compared with Ida’s bug btw. Look here:

0047C720 6E OUTS DX,BYTE PTR ES:[EDI]
0047C721 6F OUTS DX,DWORD PTR ES:[EDI]

According to Intel Manual’s opcode map 0×6E is defined as “OUTS/OUTSB DX, Xb”.
The first operand is DX register, and the second one is defined as an “Xb” operand.
X: memory addressed by DS:(E)SI…
b : byte, regardless of operand-size attribute
The error is obvious, Ollydbg shows EDI instead of ESI.

There’s something similar with A6 opcode. Ollydbg v1.10 shows:
004012FA A6 CMPS BYTE PTR DS:[ESI],BYTE PTR ES:[EDI]
but the right line is:
004012FA A6 CMPS BYTE PTR DS:[EDI],BYTE PTR ES:[ESI]

It’s an oversight on X and Y addressing method.
The errors occour in v1.10 only, v2 shows the right instructions. I asked to Olly (Oleh Yuschuk) and he kindly replied: “Unfortunately, I will not correct it in 1.10…This project is closed, and I don’t want to make any modifications.”. Ok, I’ll switch to v2.

Few days ago I was inspecting a malware using my disassembler, and I stumbled on this piece of code:

C6 diZaZZembler

I use “!?!?!” string for undefined/reserved opcode. I had some problems testing reserved opcodes so I decided to check this case carefully. The first check is given by a comparative method, I loaded the malware into IDA. Look here:

c6 Ida

The first thing I thought of was: “Damn, there’s a bug inside my disasm engine”.

I took a look at the printed version of my “Intel® IA-32 Architectures Software Developer’s Manual – Volume 2B: Instruction Set Reference, N-Z”. According to one-byte opcode map, C6 opcode is defined as a “Grp 11 (1A) – MOV”.

What does it mean?
The opcode can’t give me the exact meaning of the instruction. I need some extra information, which are given by the opcode extension: ModR/M byte (0×22 in the example). To retrieve the necessary information about this opcode I have to check a new table: “Opcode Extensions for One- and Two-byte Opcodes by Group Number”. I’m interested in row denoted as Group_11:

Group 11

This is only a part of the entire table, it shows the header and the row of the group I’m focused on.

ModR/M byte is divided into 3 parts: mod, nnn and r/m.
0×22 = 00100010b
mod = 00 (bit 7, 6)
nnn = 100 (bit 5, 4, 3)
r/m = 010 (bit 2,1,0)

These numbers help you to locate the right instruction definition into the opcode extension’s table. To make things short, nnn value identifies the right cell to pick out. In this case 100b points to a blank cell, what does it mean?
According to Intel manual: “All blanks in all opcode maps are reserved and must not be used. Do not depend on the operation of undefined or reserved opcodes“.

Is it really an invalid instruction? All my initial investigations were done using the printed version of the Intel manual, and since of I had found some errors in it I decided to look at the most recent online version.
This new check doesn’t change anything, seems like IDA is able to disassemble an invalid instruction. Weird.

Now the question is: is this a bug or do they (IDA’s developers) know how to handle undocumented opcodes? To answer this question I have two options:
1. try loading the malware into some more disassemblers
2. try stepping the instruction using a debugger

Option number 1
Windbg’s output:

c6 Windbg

Ollydbg’s output

c6 Ollydbg

The result is the same, it’s an invalid instruction.

Option number 2
This is the last check I did. I wrote a new exe file including an instruction with C6 opcode in it. The program is really simple and the source is right here:

.text:00401000 BA B2 10 40 00 mov edx, offset word_4010B2
.text:00401005 C6 22 FB mov byte ptr [edx], 0FBh
.text:00401008 6A 00 push 0
.text:0040100A 68 1D 30 40 00 push offset Caption
.text:0040100F 68 55 30 40 00 push offset Text
.text:00401014 6A 00 push 0
.text:00401016 E8 91 00 00 00 call MessageBoxA
.text:0040101B C3 retn

According to Ida it should move a byte inside 0×4010B2 address (it has full access) showing a simple messagebox, nothing more. Unfortunately the result is not the same.

If you run the file without a debugger it crashes and the classic error box appears. Spying inside the message error’s box I see that the error occours at offset 0×1005, C6 opcode!
If you run the file with Ollydbg you’ll get almost the same result, the debugger stops signalling the error “Illegal instruction” at 0×401005. Again, C6 opcode!
If you run the file using IDA’s debugger you’ll get a simple warning: “An attempt was mode to execute an illegal instruction (0×401005)”. After that you’ll get a sequence of error boxes, seems like Ida’s debugger is not fully able to handle execution of illegal instruction…this is another story, btw.

I did some more test and seems like the problem occours with all the *blank cells*; I tried with all the possible C6 combinations and with some different opcodes too. The result is always the same, Ida shows a disassembled instruction which is totally wrong!!!

I tried reading Ida’s help file but there was no mention about the problem, I don’t think there’s an hidden option to set. I tried googling without luck. Due to this fact I’m not 100% sure but… I think it’s a bug!