Adventures in Extremely Strict Device Guard Policy Configuration Part 1 — Device Drivers

Note: The Device Guard policy I created as a result of this post can be found here.

Introduction

Recently, I decided to attempt to craft a Device Guard code integrity policy for my Surface Laptop consisting solely of WHQLFilePushlisher and FilePublisher rules — i.e. only allow code to execute based on the files that I explicitly trust based on filename, file version, and signer. Such a configuration strategy would also allow, in theory, the ability to update the operating system — versus a policy where everything is approved by hash.

While application whitelisting is a huge step in the right direction, the majority of configuration strategies, at a minimum, will involve implicitly trusting all Microsoft-signed code. The problem with this strategy is that implicitly trusting anything from Microsoft is too trusting in my opinion, primarily because there is an ever-growing corpus of signed, abusable applications and scripts that allow an attacker to easily subvert your policy (e.g. msbuild.exe, windbg.exe, etc.). Microsoft maintains a list of these abusable apps in the form of a Device Guard code integrity blacklist policy that can be easily merged into your base policy. Such a policy would represent whitelisting in the purest form where blacklist rules blocking abusable apps would never be required. The advantage is that these binaries wouldn’t have been trusted in the first place unless there was a legitimate business need to approve them.

There are essentially three high-level application whitelisting (AWL) configuration strategies when building a base policy. They are as follows (in increasing level of security but decreasing level of manageability):

  1. Scan and automatically trust all files by their signer in a “golden” (i.e. assumed clean) OS image. This strategy is what’s recommended by most AWL vendors as it is the easiest way to get up and running with application whitelisting and does a great job of blocking all malware that isn’t specifically catered to bypass AWL. The downside to this strategy is that you implicitly allow all abusable applications to execute. Additionally, you’re not actually applying any trust decisions to the signers present on your image. What if your golden image was infected and you inadvertently whitelisted all code signed by the attacker? Would you ever notice?
  2. Automatically trust anything signed by Microsoft but deny everything else initially and gradually build a policy based on a set auditing period. This is the strategy that I personally recommend most enterprises attempt to deploy and the Device Guard team supplies you with a really good policy that only permits Window-signed and Store-signed code to execute in %windir%\schemas\CodeIntegrity\ExamplePolicies\DefaultWindows_Enforced.xml. Again, the issue here is that you’re implicitly permitting known abusable applications to execute so you must be mindful to always update your policy with blacklist rules.
  3. Trust nothing except what it is absolutely required to boot the operating system, run the operating system, run your required application, and optionally, update the operating system.

This post will cover my initial journey in attempting configuring strategy #3; a strategy, to my knowledge, that has not been done before for a Windows operating system. Readers of this post may wonder why I would spend so much time attempting such a thing. My answer is easy: I’m motivated to try things no one else has tried and I just want to know if it’s possible. If it is possible and relatively manageable upon initial creation of a base policy, then I will have the most secure and relatively realistic application whitelisting policy in existence and I will have direct insight into all code that attempts to execute outside of my strict definition of trust. It’s also exceedingly difficult to keep track of and manage known whitelisting bypasses. Resourceful attackers will also likely collect their own non-public bypasses which defenders obviously have no way of preventing in a less strict whitelisting policy.

In this post, in addition to describing the steps I used to arrive to my initial Device Guard policy consisting of just device driver rules, I will describe my thought process and failures along the way. I can only assume that such a configuration strategy was never envisioned by the Device Guard team, so I can guarantee you that there were plenty of failures along the way.

Part 2 of this series will document my process in building user-mode rules using this same configuration strategy, if such a thing is even possible without breaking core functionality of the operating system. We shall see…

Device Guard Policy Configuration Steps

I started by updating my OS to the latest version, but I didn’t install any other software or change any configurations. For example, I use Hyper-V but I need for my policy to account for everything required to run VMs so I didn’t configure Hyper-V before starting to build out my policy. It’s ideal, in my opinion, to not clutter an initial policy with files required to perform updates. An initial code integrity policy should be bare bones, minimalistic, and never change. Once it’s complete, additional rules should only be merged into this base policy.

I then created a deny-all policy in audit mode excluding the “Enabled:UMCI” option because I wanted this policy to just apply to device drivers initially (User-mode rule creation will be described in part 2 of this blog post series). I also applied the “Required: WHQL” option as well because I wanted to apply as many rules as possible where WHQL co-signing validation was also performed. I describe how to create a blank, deny-all policy in my first Device Guard post, or you may use an included policy by the Device Guard team in %windir%\schemas\CodeIntegrity\ExamplePolicies\DenyAllAudit.xml. One of the things I love about Device Guard is that driver and user-mode rules are separate. Application whitelisting configuration strategies should be considered as a part of a maturity model and one of the easiest entries into application whitelisting should be to just apply it to device drivers since the set of drivers required to run the operating system is not subject to frequent change.

In theory, all of the drivers needed to boot and run the OS would surface in the Microsoft-Windows-CodeIntegrity/Operational event log after running in audit mode from which a code integrity policy could be built. In practice, this wasn’t exactly the case…

Here is the initial policy I used. I then deployed the policy by running the following command and rebooting:

ConvertFrom-CIPolicy -XmlFilePath Deny_All_Drivers_Audit_Mode.xml -BinaryFilePath C:\Windows\System32\CodeIntegrity\SIPolicy.p7b

Building a policy via hacked together event log entries

Upon rebooting, the CodeIntegrity event log should be populated with a lot of device drivers that would have been prevented from loading if the code integrity policy we built wasn’t in audit mode.

Example of a CodeIntegrity event log driver audit event (ID 3076)

First, I recommend inspecting all the drivers that were prevented from loading and if you find that they are worthy of your trust, you can conveniently build a policy based on the event log entries. You would do so with the following command:

Generating a new code integrity policy based on audit events

You didn’t think it would be that easy, would you? As I reported over a year ago, the New-CIPolicy cmdlet still has a parsing bug where it doesn’t handle event log entries consisting of “GLOBALROOT” paths. We’re good with PowerShell though, right? Let’s not let this bug stop us and instead let’s pull the paths out ourselves and build the policy. I hacked together the following code to achieve this:

Script to extract and normalize device driver paths from the event log

It’s worth mentioning that I specified the “FilePublisher” fallback option. You will fail to generate code integrity rules for many drivers primarily due to the fact that built-in, non-3rd party drivers are not WHQL signed. When building my policies, I will always opt for the most strict options that Device Guard will allow.

Next, I cleared my CodeIntegrity event log, restarted my computer, ran things for a while, let some more event log audit entries populate, and repeat this whole process. After the CodeIntegrity event log was no longer generating any 3076 events, it was time to put my policy into enforcement mode and reboot. Logic would dictate that upon reboot, everything would work great and I could move on to building out user mode rules. Not so fast. I wasn’t going to be let off that easy.

Building code integrity rules blind

This is where the fun starts… As soon as a rebooted, my computer went into “Automatic Recovery” mode and then failed to boot. I was prompted with the following friendly message:

Automatic Repair couldn’t repair your PC. Press “Advanced Options” to try other options to repair your PC or “Shut Down” to turn off your PC. Log file: C:\Windows\System32\Logfiles\Srt\SrtTrail.txt

The “Automatic Repair” dialog at boot

So, it would seem I died of Dysentery on the SrtTrail (that’s an Oregon Trail reference for you kids). In order to get my system to boot, I needed to select “Advanced Options,” then select “Start-up Settings,” and then select “Disable driver signature enforcement.” To be clear, this disables Device Guard. So yes, if you have physical access to a system, Bitlocker is not enabled, and the “Enabled:Advanced Boot Options Menu” policy option is present in your CI policy, then you can easily circumvent Device Guard. This bypass is best mitigated by enabling Bitlocker where you would first have to enter your Bitlocker recovery key in order to enter into the advanced boot menu and enter “Start-up Settings.”

Upon disabling Device Guard, I placed my policy back into audit mode and had to figure out what rules might be missing from my policy that were preventing my laptop from booting. At this point, I suspected that the reason I wasn’t booting was because there is a point between where the code integrity service is started early in the boot process and when the OS can begin writing to the event log. So, I needed a new strategy to collect the drivers necessary to boot. Before I thought to actually inspect SrtTrail.txt for any contextual information, my first thought was to supplement my existing policy with the loaded drivers listed in the System process in Process Explorer.

Viewing loaded drivers in Process Explorer

I saved all the loaded kernel modules to a text file, wrote some PowerShell to pull out the paths and supplemented my existing policy with the entries that were unique in the Process Explorer listing. I performed a reboot and my laptop still wouldn’t boot upon being placed into enforcement mode.

At this point, I’m thinking that maybe Process Explorer isn’t listing out every loaded driver. So I used Get-NtSystemInformation in my PowerShellArsenal module to dump the loaded drivers.

Viewing loaded drivers with Get-NtSystemInformation

I got a few extra drivers in this instance and added them to my policy, performed a reboot, and still no luck. Every time I rebooted, I would immediately enter into the “Automatic Recovery” mode and failed to boot. I didn’t spend time investigating why there was a discrepancy between Process Explorer and Get-NtSystemInformation output. In the end, it won’t matter though since I ultimately won’t be reliant upon the output from either tool.

As my frustration was increasing, I was thinking that perhaps there are some drivers that are loaded then subsequently unloaded prior to the operating system’s ability to log anything to the event log. In that case, I would need to find a way to track kernel module loads starting as early as possible.

My first thought to accomplish was to use the “Enable Boot Logging” setting in Procmon. This didn’t capture a log of any early boot drivers, presumably because it was starting too late in the boot process. My next thought was to go straight to ETW to capture the information. I tried creating NT Kernel Logger trace sessions using AutoLoggers and then a Global Logger per the documentation here and here. Unfortunately, no .ETL trace files were generated. Global loggers seem to be deprecated and I need to spend some more time playing with AutoLoggers considering they can be used to capture all kinds of interesting ETW data during the early boot stages. At that point, I decided to stop wasting time and move on to another strategy.

Within the Advanced Start Menu “Start-up Settings” I had recalled seeing the “Enable Boot Logging” option. I tried this option and conveniently, I got a really nice listing of all drivers that were loaded during the boot process in C:\Windows\ntbtlog.txt. There were a few more driver entries present in this list so I added them to my policy, rebooted, and……. still no luck booting.

At this point, I was starting to get desperate. Eventually, it occurred to me to actually inspect the contents of SrtTrail.txt. Sure enough, there were error messages in the text file indicating that files such as apisetschema.dll and hvloader.dll failed to load. So what did I do next? I kept whitelisting the files that failed to load in SrtTrail.txt until there were no more errors. Reboot and to my surprise, I finally got past the automatic recovery prompt! To my dismay, the boot prompt lay frozen on the Microsoft logo in a permanent state of purgatory.

So while I appeared to have made progress, I was left with no specific ideas left. There must be just a few other drivers left that need to be whitelisted but which ones?! Where do I even look now? In my desperation, I went through all the event logs. This job was made easier by the fact that I cleared all the event logs and then I rebooted and looked at all populated event logs. It took a fews minutes for me to arrive there, but I was relieved to see the following entry in the Microsoft-Windows-Kernel-Boot event log:

A failed kernel image load event in the Kernel-Boot event log (ID 49)

So, I scraped up as many Microsoft-Windows-Kernel-Boot event ID 49 events, added them to my policy and then was finally able to boot!

For reference, it took me about 6 hours of total work to arrive to a working device driver code integrity policy and, in summary, I ended up extracting all the drivers I needed to whitelist from the following sources:

  1. Microsoft-Windows-CodeIntegrity/Operational event ID 3076
  2. “Enable boot logging” and %windir%\ntbtlog.txt
  3. %windir%\System32\Logfiles\Srt\SrtTrail.txt
  4. Microsoft-Windows-Kernel-Boot/Operational event ID 49

Additional Configuration Caveats

When all was said and done, I played around with some stricter policy options in my code integrity policy. One option I experimented with was enabling the “Required:EV Signers” setting. According to documentation, all Windows 10 driver signers starting with version 1607 are required to sign their drivers with an extended validation certificate which has a much higher bar set for issuance. Upon enforcing that rule, the following drivers failed to load:

  • iaLPSS2_UART2.sys — an Intel driver
  • SurfaceDockIntegration.sys — a SpiralOrbit and Microsoft driver

So apparently, this rule isn’t actually enforced? As a result of these drivers not loading the most noticeable effect was my inability to use the built-in keyboard on the Surface Laptop. So if I can’t even enforce EV signers on the latest version of Windows 10 (RS3) on Microsoft hardware, I think it’s safe to say that no one will be able to enforce EV signers successfully for quite some time.

Conclusion

While this process was extremely painful for me, I hope that if anyone reading this would like to also build such a strict policy, that I will have saved you a ton of time and frustration. Now that I have my extremely strict policy in place, I now have insight into whenever any new device drivers fail to load that I haven’t explicitly expected and trusted. Attackers will also have a hell of a time now installing their rootkits. There are some other hardening steps I would need to take though in order to help prevent additional attacks on Device Guard though — e.g. policy signing, enabling virtualization-based security protection of code integrity, and enabling hypervisor code integrity. Also, periodically, I will want to update the minimum file versions in all of my FilePublisher rules. This will have the effect of blocking all previous versions of whitelisted drivers that may have had vulnerabilities in prior versions.

This configuration strategy, while extremely secure, obviously isn’t for everyone. It requires patience and, if you’re performing this process at scale, you’re going to have to build unique policies for each unique hardware buildout in your enterprise. An ideal use case for this configuration strategy would be extremely hardened PCI environments, ATMs, medical devices, and manufacturing devices. As I mentioned in the introduction though, I will be using this policy on my work and personal Surface Laptops.

Thank you to Dane Stuckey at Palantir for his thorough review of this post!