Last summer I wrote two blog posts about Kins malware; the first one with a detailed explanation of the Kins’s Virtual Machine (available here), and the other with an introduction to the similarities between Kins and Zeus malwares in their initialization process (available here). I had in mind to write a final chapter about the argument, but I gave up because there were already a lot of discussions around the web about them. I stopped everything but I started to think of how could it be possible to check if a snippet from an executable is used by another program.
So, today I’m going to introduce my SnippetDetector, a program able to recognize known snippets inside an executable. I know that there are some diffing tools available on the net, I haven’t tested all of them but I think mine is slightly different from the others (If I’m wrong don’t hesitate to tell me).
The idea comes from a meditation over the term “Copy&Paste”; how many time did you Copy&Paste a piece of code? I bet you did it at least one time in your life. Think a little, the net is full of source codes, and if you don’t want to waste time studying something new you will copy and then paste the source inside your program. Believe it or not that’s the spirit of most developers, it’s an old news. Moreover, the possibility to put hands on leaked source codes will improve the usage of the Copy&Paste feature for the new born malwares. What will be the result? You probably will reverse the same function several times. That’s why I think my tool differs from the others: it uses a database storing snippets you have already been analyzed!
SnippetDetector is basically a program able to recognize a snippet that is already analyzed by simply quering a database. The result of the query is one the next three possibilities:
– syntactic match: there’s a perfect copy of the snippet inside the database
– semantic match: the semantic of the snippet inside the database is the same of the checked snippet
– no match: there’s no trace of the snippet inside the database
The project involves an executable and some idc scripts because an interaction with IDA is necessary.
Here are some practical examples to clarify all. To show you how it works I’ll use two malwares, Zeus and Kins.
Is Kins using Zeus’s code?
Suppose you spent a lot of time reversing a malware saving all the reversed snippets inside SnippetDetector database. What would happen if you need to reverse a derived malware?
This is the perfect scenario for Zeus and Kins. My detection database contains a lot of saved functions from Zeus and I want to check if SnippetDetector is able to reveal something inside Kins malware executable.
It’s possible to send a sequence of snippets to SnippetDetector, the program will check if they are inside the database or not. The snippet list contains all the Kins functions. The list defines every single snippet using a specific structure; you can define the list by hand or using an IDA idc script.
Here’s the screenshot of the result:
The operation produces a six columns view:
1. Result: the answer from the database: “Syntactic match”, “Semantic match” or “No match”
2. The name of the external function to check (as you can see the name comes from an unknown function produced by IDA disasm and loaded with an Idc script)
3. Snippet start offset inside the file to check
4. Snippet end offset inside the file to check
5. Name of the snippet inside the database (if and only if there’s a match)
6. Brief description of what the snippet does
I’m lucky because SnippetDetector recognizes a lot of functions! This is only a little part of the output, there are a lot of positive matches indeed.
A double click over one of the entry reveals the stored code inside the database:
SnippetDetector contains his own disasm engine, it’s really simple but it does his job and it’s necessary because the detection of a snippet relies in his byte sequence.
When I save a new snippet inside the database I can personalize the comment of every single instruction of the snippet, moreover I can load the comments directly from the output produced by IDA. “Zeus_Mem::free” has been added at hand by me while the others (“lpMem” and “dwBytes”) comes from IDA.
And now? Well, I can export the details of one or more snippet inside IDA. After that going back to IDA you’ll see “Zeus_Mem_reallocEx” instead of “sub_41911B”, and the code is also full of my comments.
False positive or not?
The engine behind SnippetDetector is not perfect and it has surely some lacks in the way it checks for a semantic match. The problem arises by the fact that a single snippet can be used for various things and it’s nearly impossible to mark it with a unique name/description. Take a look at this sample:
SnippetDetector detects the same semantic snippet in two different places, 419160 and 419178. This is not an error, but it requires a deeper check:
The disasm on the left comes from the database while the other one is one of the two snippets to check. The red delimited area reveals the arcane: “alloc” and “quickAlloc” are defined in the same way except for the dwFlags parameter. Semantically speaking the two functions perform the same thing: they alloc memory.
That’s why I can have more than one functions marked by SnippetDetector with the same name.
So, is it a false positive? In this case the tool can’t predict what the snippet does really but it can help you in the analysis process reducing your time passed over the code.
Does it work for Zeus-Kins only or…?
It works with every single snippet you save inside the database. Snippets could be loaded from Zeus, Carberp, a bug code (in case you want to see if it’s replicated inside some more files) or something else, basically whatever you want.
Here is what I got last night while I was reversing a ddos malware:
The malware uses a function that is a perfect Copy&Paste from MSDN help page of a specific function! I uploaded the snippet inside the database because I found inside a malware but I didn’t imagine to see it again.
I know that a program can be compiled with different products and each one has a large variety of options but I don’t care about it, I simply used to save every single function I reverse. The power of SnippetDetector relies on his database: the more snippets you save inside it, the more accurate the detection will be.
I have some more features in mind right now, I’ll let you know about it!