The Curious Case of QueueUserAPC
Summary
Due to the nature of the .NET compiled language runtime, user asynchronous procedure calls (APCs) are processed upon the exit of any .NET assembly without manually triggering an alertable state from managed code.
Further, if a user spawns a new process and queues an APC into the process’s main thread, a race condition exists between the APC queue and the process’s main routine such that the APC queue is processed first.
What are User APCs?
Asynchronous procedure calls (APCs) are ways for an executing thread within an application to perform a task asynchronously from its current operations. Each thread in an application has a queue of these APCs that are performed in first-in first-out (FIFO) order once that thread enters an “alertable state.” A thread enters an alertable state when it calls SleepEx, SignalObjectAndWait, WaitForSingleObjectEx, WaitForMultipleObjectsEx, MsgWaitForMultipleObjectsEx, or NtTestAlert. Below is an image that demonstrates the control flow of how APCs are processed.
The Problem
While Matt Nelson was experimenting with this technique in .NET, he realized that the APC queue was being processed even though he did not force the thread to enter an alertable state. His code was roughly as follows:
Reading the gist, it’s not clear what’s causing the thread to enter an alertable state. To further compound the confusion, he also noticed that when creating a new process and adding a new function to the APC queue of that process’s main thread, the queue is processed and fires the APC.
His last observation was that when queuing an APC into an already running processes’ main thread, the APC did not fire like example above. This leaves us with three questions:
- What is causing the main thread of .NET binaries to enter an alertable state?
- Why does queuing an APC into a freshly spawned process immediately get processed?
- Why does queuing an APC into an already running process not trigger as it does above?
Deriving Answers via Logic and Dynamic Testing
To begin narrowing down where these activities can be occurring, we’ll assume that the following statement is true: “APC queues are not processed until they enter an alertable state.” This is not an unreasonable assumption given this comes directly from Microsoft documentation.
If this assumption is true, then we can answer problem statement three by saying that the main thread simply has not entered an alertable state yet.
For problem statement two, reading on in the QueueUserAPC documentation, it states that “If an application queues an APC before the thread begins running, the thread begins by calling the APC function.” In this case the thread we’re preempting is the main thread of the application. We can validate this dynamically by introducing a sleep before queuing the APC into the remote process’s main thread. With the artificial sleep induced after the process start, queuing the APC does not cause the function to be triggered. This is in-line with the documentation.
Given all that we know about user APCs thus far, the question still remains: Why does any .NET assembly become alertable when none of the requisite calls are made from managed code?
Derived Conclusions via RTFM and Dynamic Testing
To answer the problem of “why is this thread becoming alertable,” we should first resolve when it’s becoming alertable. To do so I pasted the first gist from above into Visual Studio and used the debugger to try and see when the APC was being triggered. Stepping through line by line of the program and pausing after each execution, it became clear that the program was becoming alertable during the function epilogue somewhere in unmanaged code. This was clear as only after my program attempted to exit did it hang indefinitely executing the shellcode.
To gain further insight I would need to go beyond the tooling Visual Studio provides and use WinDbg to uncover the root cause. Using SOS Debugging we can debug managed programs gracefully; otherwise, when the application loads the CLR the debugger will run the program without stepping through each instruction, making it a fruitless exercise.
Once the Windows Driver Kit has been installed and SOS Debugging has been enabled, load the .NET assembly that has a QueueUserAPC call targeting the main thread of the binary in WinDbg by going to File > Launch an Executable as shown below.
Once the .NET assembly has been paused by the debugger for initial execution, you’ll want to break when the .NET CLR is loaded into the process. Once the CLR is loaded you can enable SOS debugging to walk through the assembly by the following set of commands:
// Break on CLR load
sxe ld:clr
// Continue until we hit the CLR load
g
// Enable SOS debugging and load symbols associated with .NET framework
.cordll -ve -u -l
If the symbols have loaded successfully, you should see the following output:
CLRDLL: Loaded DLL C:\Windows\Microsoft.NET\Framework64\v4.0.30319\mscordacwks.dll
Automatically loaded SOS Extension
CLR DLL status: Loaded DLL C:\Windows\Microsoft.NET\Framework64\v4.0.30319\mscordacwks.dll
Next, we set breaks on any function that could cause our thread to enter an alertable state. You can set these breakpoints by:
bp kernel32!SleepEx
bp kernel32!SignalObjectAndWait
bp kernel32!WaitForSingleObject
bp kernel32!WaitForSingleObjectEx
bp kernel32!WaitForMultipleObjects
bp kernel32!WaitForMultipleObjectsEx
bp user32!MsgWaitForMultipleObjectsEx
bp ntdll!NtTestAlert
After the breakpoints have been set, continue executing the assembly. There are certainly better methods of narrowing down what functions are causing the thread to become alertable, but in general, my methodology was:
- Run the assembly and keep count of the number of breakpoints you hit before you land into your shellcode. Call this number N.
- At the Nth breakpoint, “Step Out” of the function call and get your bearings. Note the function that called the alertable function. If you fail to get more context, try again by stopping at the breakpoint N-1 and step out.
- Continue stepping out and through functions until you’ve reached a loose call tree.
Once you’ve formulated a loose call tree, you can then break at each function in the call tree and step through them instruction-by-instruction. Depending on where you decide to break and step out of, you can have a variety of different calling trees with differing levels of detail. Since I knew this code had to be called from the CLR, I narrowed my search to only include CLR functions and exclude any KERNEL32 or NTDLL calls. Using my own prescribed filter, I came up with this loose calling tree:
clr!EEShutDown
clr!EEShutDownHelper
clr!FinalizerThread::FinalizerThreadWatchDog
clr!FinalizerThread::FinalizerThreadWatchDogHelper
clr!CLREventBase::Wait
clr!Thread::DoAppropriateWait
clr!Thread::DoAppropriateWaitWorker
clr!WaitForMultipleObjectsEx_SO_TOLERANT
Based exclusively on the above calling tree, you can see that a function named EEShutDown is responsible for initiating several events related to tearing down the .NET CLR. It must perform this work in some sort of parallel fashion and wait for all threads to exit gracefully through the use of WaitForMultipleObjectsEx. Further, by using the same methodology above, I was able to create a rough control flow of how .NET executables are loaded and ran through the CLR.
Scouring Source: Diving Into Core CLR
While the project I built specifically was using the .NET Framework and not .NET Core, I made the leap of faith that Core would be close enough to .NET Framework for our purposes. This leap of faith isn’t strictly necessary to gain insight, but does make the job significantly easier if we accept a loose translation between the two.
Diving into the coreclr project we start at the EEShutDown function. Searching through we find an interesting comment that states EEStartup is responsible for creating default and shared AppDomains as well as loading the fundamental types, such as System.Object. EEShutDown simply performs the inverse operation.
Following the code flow we see that EEShutdown calls EEShutDownHelper, which is responsible for the bulk of the shutdown process. Following along the application flow, we see that during FinalizerThread::RaiseForShutdownEvents, a helpful comment states that “ this wait must be alertable to handle cases where the current thread’s context is needed” before proceeding to call a series of wait functions that all eventually call WaitForSingleObjectEx.
Reading through Core’s source and comments, it becomes clear how our APC is finally called. Through EEStartup, the default AppDomain is instantiated. Within this AppDomain lives our managed code. This managed code gets the current thread and queues the APC into it. Upon termination of the program, EEShutDown is called which initiates the teardown process for the AppDomains and other executing threads. To synchronize this process, these threads become alertable to share information amongst one another. In the process, these wait calls trigger our APC, and we have successfully executed our APC without calling any of the alertable function calls ourselves.
Conclusion
What started out as a blatant contradiction to Microsoft documentation led to several personal discoveries about the intricacies of QueueUserAPC and .NET. As a process injection technique, if you race the remote process’s main thread, you can jump to an APC without having to set the thread into an alertable state. Further, if you’re running managed .NET executables compiled by the .NET Framework or .NET Core, the default executing thread will always become alertable due to the way AppDomain unloading is handled on process exit.
The value of this technique as an offensive operator is that you usurp regular QueueUserAPC code execution by having the compiler add the alertable calls for you. For example, one could imagine leveraging a scheduled task that runs a company-signed .NET binary. Using WMI event triggers, you could create persistence on that tasks’s creation and inject an APC into that .NET’s runtime, and upon exit of that scheduled task persistence would be triggered. As a defender, one should be aware of how malware can influence code execution of .NET assemblies without adhering to regular code paths.