Can I blog an incomplete solution or an incomplete analysis? Why not! That’s the spirit of this blog entry!
More than one year ago I started a project with Kayaker, we decided to write a tool able to show hidden callbacks. If I remember correctly the idea was born while we were putting our hands on a rootkit. In the same days I bet there were many reversers around thinking the same thing because the same tool was developed by others. As you can imagine our tool never see the light, but not because there are similar tools available online; mostly because we are two old lazy reversers!
I bet you are thinking: why the hell are you writing this stupid intro? Well, the tools I mentioned before were bugged and some months ago I discovered the same thing, they are still bugged (I don’t know if they have solved their problems right now…). Strange that no one else noticed it yet.
Anyway, we won’t complete the tool, but with this blog post I would like to tell you some notes about our investigations. At the beginning I wanted to write a detailed and complete article about the subject, but I don’t know when I’ll be able to end this project so I decided to spread out some of my notes.
It’s a sort of two minds work so credit goes to Kayaker too!
The idea is to try to retrieve hidden callbacks that has been installed via CmRegisterCallback, PsSetCreateProcessNotifyRoutine, PsSetCreateThreadNotifyRoutine and PsSetLoadImageNotifyRoutine. After that it would be good to deregister one or more of them.
Where to start?
First of all you have to understand what’s behind functions like CmRegisterCallback, and others. Then, you’ll have something to work on. I’ll start with CmRegisterCallback (from XP SP2), the function is used to register a RegistryCallback routine, and I think the XP version is the most simple one to fully undestand the principles behind the function. There are some differencies between XP and 7 versions, but I think you’ll be able to fully understand 7 structure too! Here is the disassembled function (without useless parts of course):
487E6B push 'bcMC' ; Pool Tag: "CMcb"
487E70 xor ebx, ebx
487E72 push 38h ; NumberOfBytes: 0x38
487E74 inc ebx
487E75 push ebx ; PoolType: PAGEDPOOL
487E76 call ExAllocatePoolWithTag ; ExAllocatePoolWithTag(x,x,x): allocates pool memory
487E7B mov esi, eax ; eax is the pointer to the allocated pool memory, PCM_CALLBACK_CONTEXT_BLOCK
487E7D xor edi, edi
487E7F cmp esi, edi ; Is PCM_CALLBACK_CONTEXT_BLOCK a NULL pointer?
487E81 jz cmRegisterCallback_fails ; yes: function fails...
487E87 push esi
487E88 push [ebp+Function] ; PEX_CALLBACK_FUNCTION, pointer to callback function
487E8B call _ExAllocateCallBack ; allocates and fill EX_CALLBACK_ROUTINE_BLOCK structure (more on this later...)
487E90 cmp eax, edi ; ExAllocateCallback success or not?
487E92 mov [ebp+PEX_CALLBACK_ROUTINE_BLOCK], eax ; store the pointer to the allocated pool memory
487E95 jnz short _ExAllocateCallBack_success
... ; fill CM_CALLBACK_CONTEXT_BLOCK fields
487EDC mov ebx, offset CmpCallBackVector
487EE1 mov [ebp+i], edi ; i = 0
487EE4 push edi ; OldBlock: NULL
487EE5 push [ebp+PEX_CALLBACK_ROUTINE_BLOCK] ; NewBlock with information to add
487EE8 push ebx ; CmpCallbackVector[i]
487EE9 call _ExCompareExchangeCallBack ; try to *insert* the new callback inside CmpCallBack vector
487EEE test al, al ;check the result...
487EF0 jnz short free_slot_has_been_found ; jump if the vector has an empty space for the new entry
487EF2 add [ebp+i], 4 ; i++, increase the counter
487EF6 add ebx, 4 ; shift to the next item of the vector to check
487EF9 cmp [ebp+i], 190h ; is the end of the vector?
487F00 jb short try_next_slot ; no: try another one. yes: no free slot!
487F11 mov eax, STATUS_INSUFFICIENT_RESOURCES
487F1A retn 0Ch
487F1D mov eax, 1
487F22 mov ecx, offset _CmpCallBackCount ; CmpCallBackCount: number of not NULL item inside the vector
487F27 xadd [ecx], eax ; there's a new callback, it increases the number of item inside the vector
487F2A xor eax, eax
487F2C jmp short end_CmRegisterCallback
As you can see the idea behind the function is really simple!
Basically, it tries to add a new entry inside a vector named CmpCallBackVector, and when the entry is correctly inserted the registration process will end with a success.
How do I know is it using a vector? The add instruction at 0x487EF6 represents a clear clue, and the cmp at 0x487EF9 reveals the fixed length of the vector (the vector has 100 items (0x190/4…)). Now that I have this information I’m going to try to explain the entire procedure in detail. The algorithm could be divided into 5 big blocks:
1: try to allocate 0x38 bytes for a structure named CM_CALLBACK_CONTEXT_BLOCK
2: try to allocate 0x0C bytes for a structure named EX_CALLBACK_ROUTINE_BLOCK
3: fill CM_CALLBACK_CONTEXT_BLOCK fields
4: look for an empty slot, insert a sort of PEX_CALLBACK_ROUTINE_BLOCK in it and update CmpCallBackCount
5: notify success or error and exit
Point #1 is pretty simple to understand, it’s only a call to ExAllocatePoolWithTag.
To understand point #2 you have to see what’s going on behind ExAllocateCallBack procedure. Let’s start taking a look at it:
52AB35 push 'brbC' ; Pool Tag: Cbrb
52AB3A push 0Ch ; NumberOfBytes: 0x0C
52AB3C push 1 ; PoolType: PAGED_POOL
52AB3E call ExAllocatePoolWithTag ; alloc a EX_CALLBACK_ROUTINE_BLOCK structure
52AB43 test eax, eax ; ExAllocatePoolWithTag success or not?
52AB45 jz short _ExAllocateCallBack_fails
52AB47 mov ecx, [ebp+_pex_callback_function] ; pointer to callback function (PEX_CALLBACK_FUNCTION)
52AB4A and dword ptr [eax], 0 ; 1° field: 0
52AB4D mov [eax+4], ecx ; 2° field: _pex_callback_function
52AB50 mov ecx, [ebp+_pool_allocated_memory] ; PCM_CALLBACK_CONTEXT_BLOCK
52AB53 mov [eax+8], ecx ; 3° field: _pcm_callback_context_block
The procedure is used to allocate and fill a special structure:
typedef struct _EX_CALLBACK_ROUTINE_BLOCK
} EX_CALLBACK_ROUTINE_BLOCK, *PEX_CALLBACK_ROUTINE_BLOCK;
As you can see from the lines above the first field has been setted to 0 while the other fields are filled with two pointers: the function to register and the context containing info about the callback.
While point #3 is just a series of mov instructions used to fill CM_CALLBACK_ROUTINE_BLOCK structure, point #4 gives some usefull information to us: CmpCallBackVector has 100 elements and this part of code is used to scan the entire vector until an empty element is found. A failure leads us to a non-registration of the callback. What happens when there’s a empty slot inside the vector? The new entry will be added inside the vector. Most of the job is done by the function named ExCompareExchangeCallBack, here is the core of the function:
52AB81 mov eax, [ebp+CmpCallbackVector] ; vector at the current position
52AB84 mov ebx, [eax] ; ebx is a PEX_CALLBACK_ROUTINE_BLOCK, the item could be NULL or not
52AB86 mov eax, ebx
52AB88 xor eax, [ebp+OldBlock] ; OldBlock is NULL for a registration process
52AB8B mov [ebp+current_pex_callback_routine_block], ebx
52AB8E cmp eax, 7 ; check used to see if the current item is NULL or not
52AB91 ja short loc_52ABB5 ; jump if not NULL
52AB93 test esi, esi ; is NewBlock NULL?
52AB95 jz short loc_52ABA1 ; jump if it's NULL
52AB97 mov eax, esi ; esi, NewBlock pointer (changed...)
52AB99 or eax, 7 ; PAY ATTENTION HERE: or 7 !?!
52AB9C mov [ebp+NewBlock], eax ; change NewBlock pointer: NewBlock = NewBlock OR 7
52AB9F jmp short loc_52ABA5
52ABA5 mov eax, [ebp+var_4] ; here if CmpCallbackVector's item is null
52ABA8 mov ecx, [ebp+CmpCallbackVector] ; current empty slot
52ABAB mov edx, [ebp+NewBlock] ; new pointer to insert
52ABAE cmpxchg [ecx], edx ; insert the new pointer inside the empty slot!
52ABB1 cmp eax, ebx
52ABB3 jnz short loc_52AB81
52ABB5 and ebx, not 7 ; PAY ATTENTION HERE!
52ABB8 cmp ebx, [ebp+OldBlock] ; here if CmpCallbackVector's item is not null
52ABBB jnz short loc_52AC19
52ABBD test ebx, ebx
52ABBF jz short loc_52AC15
The routine contains some more things inside, but we can stop here with the analysis because we have everything we need. If the pointer to the NewBlock to insert is not NULL and there’s an available empty slot the pointer is inserted inside the vector; after that CmpCallBackCount value will be updated (remember the snippet at the beginning of this blog entry?).
The last part of the algorithm (point #5) is a simple return with a success or insuccess value:
52AC15 mov al, 1 ; 1 means success, new item has been added to CmpCallbackVector
52AC17 jmp short loc_52AC29
52AC19 test esi, esi ; esi -> NewBlock
52AC1B jz short loc_52AC27
52AC1D push 8
52AC1F pop edx
52AC20 mov ecx, esi
52AC22 call ExReleaseRundownProtectionEx ; if esi is not null something went wrong...
52AC27 xor al, al ; 0 means insuccess, new item has not been added to CmpCallbackVector
Ok, I think we have a general idea about the vector; each entry contains a *sort* of pointer to a EX_CALLBACK_ROUTINE_BLOCK, and to reveal all of them you only have to scan the entire vector!
To sum up, I have 3 possible scenes:
1. CmpCallbackVector’s item is empty:
the new block will be inserted inside the vector. The added value is not the one passed to ExCompareExchangeCallBack, but it’s the value modified by a “OR 7” logic operation.
2. CmpCallbackVector’s item is full:
it simply returns STATUS_INSUCCESS and it will try with the next item of the vector
3. Someone is working on the CmpCallbackVector’s item:
the registration process reveals an interesting behaviour, just to be sure to be the only one accessing the resource the system uses a lock mechanism. The OR and AND operations are the core of that mechanism (0x52AB99 and 0x52ABB5, commented using “PAY ATTENTION HERE!”). If the current item of the vector is not NULL the compare instruction at 0x52AB8E fails and the code flow continues from 0x52ABB5. At this point the real address of the item is extracted (stored_value AND NOT 7) and compared with NULL; it’s obviously not NULL and as you can see around 0x52AC22 the resource is released because someone else is working on it. Now you should understand why the hell the system uses to OR by 7 the value to add inside the vector.
With all this kind of information I can finally write a routine able to read all the stored callbacks:
cells = 0x64; // cells inside CmpCallbackVector
nMod = *(DWORD*)_sysmodBuffer; // _sysmodBuffer filled by "ZwQuerySystemInformation(SystemModuleInformation..."
// take current item from CmpCallbackVector (look at the "& ~7" operation)
pCBRB = (PEX_CALLBACK_ROUTINE_BLOCK)((*(DWORD*)(_CmpCallbackVectorAddress + 4*i )) & ~7);
if (pCBRB != 0)
sysmodTmp = (PSYSTEM_MODULE_INFORMATION)((DWORD)_sysmodBuffer + 4);
j = 0;
while (jFunction) Base + (DWORD)sysmodTmp->Size) &&
((DWORD)pCBRB->Function) > ((DWORD)sysmodTmp->Base))
// Callback has been found
DbgPrint("Result: %LX: %s\r\n", pCBRB->Function, sysmodTmp->ImageName);
// get the next module
sysmodTmp = (PSYSTEM_MODULE_INFORMATION)((DWORD)sysmodTmp + sizeof(SYSTEM_MODULE_INFORMATION));
j = j + 1;
It’s important to scan all the cells inside the vector! One of the tool available on the web fails to retrieve callbacks stored after an empty element of the vector.
Well, the only thing to reveal about the code above is CmpCallbackVectorAddress, the address of CmpCallBackVector. How can I locate the exact address of CmpCallBackVector? Imho, that’s the hardest part of the entire process!
How to find CmpCallbackVector address
To develop a tool for a specific OS is pretty easy because the vector’s address is hardcoded; it would be nice to discover an OS independent technique.
I think the most used approach is a byte-search based on a specific sequence of bytes; it’s a nice idea but I don’t want to list every OS version known to man inside my source code. We (I and kayaker) spent a lot of time over this point, we both wanted to develop something that is not totally related to a specific OS version; something that doesn’t require a series of “if OS == xxx” statements inside the code. It’s quite impossible to write a non OS dependent code but I believe it’s possible to remove some OS checks from the code.
We finally came up with two ideas, a practical and a theoretical idea. I hate theory and mine is the practical solution of course. I think both ideas are valid and just to be sure to find the right vector’s address we decided to combine them inside a hypothetical tool, four eyes are always better than two!
The practical approach
My idea is really simple, since of the vector’s address is hardcoded you’ll surely have it in two different parts of the code:
PAGE:005392D0 BB 20 05 48 00 mov ebx, offset _CmpCallBackVector
.data:00480520 _CmpCallBackVector db 0
The address is inside two sections, PAGE and data. An *xref-search* is the core of the idea! It’s pretty stupid indeed, but from what I’ve seen so far it works!
The pseudo code of my xref search is explained here, basically it scans the entire PAGE section trying to locate the right address:
callbackAddress = CmUnregisterCallback address in memory
pagePointer = pointer_to_PAGE_section
while (pagePointer < pointer_to_PAGE_section + size_of_PAGE_section)
value = get dword pointed by pagePointer
if (value is inside DATA section)
if ((pagePointer > callbackAddress) && (pagePointer < callbackAddress + range))
CmpCallbackVector = value
As you can imagine a simple xref-search is unable to find out the right value, you need one more check. That’s why I added the line:
if ((pagePointer > callbackAddress) && (pagePointer < callbackAddress + range))
where callbackAddress is the address of CmUnregisterCallback. What does it mean? Well, ‘pagePointer’ should be inside the first “range” bytes of CmUnregisterCallback function. If both “if” statements are satisfied I’m pretty sure about the vector’s address value.
There are still 2 points to clarify:
– what's range variable?
– why CmUnregisterCallback?
range is just a numerical value and you'll only have to decide a value to assign to it. Under XP the first bytes of the CmUnregisterCallback function are:
PAGE:005392C3 8B FF mov edi, edi
PAGE:005392C5 55 push ebp
PAGE:005392C6 8B EC mov ebp, esp
PAGE:005392C8 51 push ecx
PAGE:005392C9 83 65 FC 00 and [ebp+var_4], 0
PAGE:005392CD 53 push ebx
PAGE:005392CE 56 push esi
PAGE:005392CF 57 push edi
PAGE:005392D0 BB 20 05 48 00 mov ebx, offset _CmpCallBackVector
In this specific case 16 could be a possible value… What about the other OSs? Well, as I said before I think it's hard to write a universal piece of code, but as far as I have seen it's possible to adjust the "range" to cover some more OSs. I don't have Vista and 7 running on my system and I'm working on the dead list only, but I think 148 could be a nice value to set and it should cover all the OSs. If you are still reading and you have Vista or 7, can you confirm that?
One more thing about the search pattern: I use CmUnregisterCallback because (inspecting all the OSs) CmRegisterCallback doesn't always store the CmpCallbackVector value inside the main routine, but it hides it under some calls. i.e. look at CmRegisterCallback from 7:
PAGE:0065712A mov edi, edi
PAGE:0065712C push ebp
PAGE:0065712D mov ebp, esp
PAGE:0065712F push [ebp+Cookie]
PAGE:00657132 mov eax, offset stru_4FFDF0
PAGE:00657137 push 1
PAGE:00657139 push [ebp+Context]
PAGE:0065713C push [ebp+Function]
PAGE:0065713F call sub_657153 ; It's everything inside this call!!!
PAGE:00657144 pop ebp
PAGE:00657145 retn 0Ch
It’s much more complex to attack a procedure with sub-routines, don't you think? That's why I did opt for CmUnregisterCallback.
What about the PsSet* functions?
At the beginning of this blog post I mentioned some more functions, it's time to spend some words for them too.
The functions are:
There are some similarities between CmRegisterCallback and the new three functions: they all register something, they all use a vector to store the information, and they all use the same function! YES, to register a function they use the same scheme:
1. get the address of a specific vector
2. try to insert the new item inside the vector calling ExCompareExchangeCallBack
Just to clarify everything look at this snippet, taken from PsSetCreateThreadNotifyRoutine:
4ED7C4 mov esi, offset _threadVector ; the vector
4ED7C9 push 0
4ED7CB push ebx
4ED7CC push esi
4ED7CD call _ExCompareExchangeCallBack ; the function
4ED7D2 test al, al
4ED7D4 jnz short loc_4ED7F3
4ED7D6 add edi, 4
4ED7D9 add esi, 4
4ED7DC cmp edi, 20h ; the check over the number of items inside the vector
4ED7DF jb short loc_4ED7C9
The only different thing is the length of the vector:
_callbackVector: 0x64 slots
_processVector: 0x8 slots
_threadVector: 0x8 slots
_imageVector: 0x8 slots
Well, you can use all the info I gave you about CmRegisterCallback for these three functions too! I think you'll be able to retrieve all the hidden callbacks, and -just in case- unregister a callback. There are so many ways from the dirty one (put NULL inside the vector's slot) to the right one (calling the right unregister function)… you only have to decide!