Kernel Virtual Address Shadow (KVAS) is the Windows implementation of Kernel Page Table Isolation (KPTI). It was introduced to mitigate the Meltdown vulnerability, which allowed an attacker that could execute code in user mode to leak out data from the kernel by abusing a side channel. While there are plenty of papers and blog posts on Meltdown and KVAS, there isn't much info on an interesting feature that KVAS enables: software SMEP. Unfortunately or fortunately, depending on your interest level in this post and Windows internals, understanding how software SMEP works requires knowledge of x86_64 paging, regular SMEP, and KVAS, so I'll be getting into those topics enough to give you an understanding of the underlying technology. Near the end I'll be running some experiments to show the internals of what I covered in the technical sections prior.

x64 Paging on Windows

First, I'm going to dive into a short introduction to x86_64 (4-level) paging, the structures involved, and WinDbg commands to interact with the page hierarchy, just so the experiments later on are more understandable; plus a lot of this information is almost never presented together, so I think collecting it in a here's what you need to know format is useful. If you want more info consult the Intel manuals or check out Connor McGarr's blog. Connor does a great job of explaining the basics, so you may want to read his post over before continuing here if you don't already have at least a vague understanding of multi-level paging.

_MMPTE_HARDWARE

The structure that represents a page table entry on x86_64 is nt!_MMPTE_HARDWARE. It is an 8 byte structure with a lot of information:

0: kd> dt -v nt!_MMPTE_HARDWARE
struct _MMPTE_HARDWARE, 18 elements, 0x8 bytes
   +0x000 Valid               : Bitfield Pos 0, 1 Bit
   +0x000 Dirty1              : Bitfield Pos 1, 1 Bit
   +0x000 Owner               : Bitfield Pos 2, 1 Bit
   +0x000 WriteThrough        : Bitfield Pos 3, 1 Bit
   +0x000 CacheDisable        : Bitfield Pos 4, 1 Bit
   +0x000 Accessed            : Bitfield Pos 5, 1 Bit
   +0x000 Dirty               : Bitfield Pos 6, 1 Bit
   +0x000 LargePage           : Bitfield Pos 7, 1 Bit
   +0x000 Global              : Bitfield Pos 8, 1 Bit
   +0x000 CopyOnWrite         : Bitfield Pos 9, 1 Bit
   +0x000 Unused              : Bitfield Pos 10, 1 Bit
   +0x000 Write               : Bitfield Pos 11, 1 Bit
   +0x000 PageFrameNumber     : Bitfield Pos 12, 36 Bits
   +0x000 ReservedForHardware : Bitfield Pos 48, 4 Bits
   +0x000 ReservedForSoftware : Bitfield Pos 52, 4 Bits
   +0x000 WsleAge             : Bitfield Pos 56, 4 Bits
   +0x000 WsleProtection      : Bitfield Pos 60, 3 Bits
   +0x000 NoExecute           : Bitfield Pos 63, 1 Bit

Some fields of particular importance:

  • Valid - this entry is valid. must be 1 to consider the data inside the rest of the structure valid.
  • Owner - 0 for kernel mode pages, 1 for user mode pages. corresponds to the KPROCESSOR_MODE enum in the DDK.
  • LargePage - noted here, discussed below!
  • Write - 0 if the page is read only, 1 if R/W
  • PageFrameNumber - the physical address of the base of the next level of paging. mask these bits out or pull them out and shift left by 12 (0xc) to get the address, shown in detail below. abbreviated PFN.
  • NoExecute - NX bit. code cannot be executed in these pages.

Each level of the page table hierarchy has an _MMPTE_HARDWARE entry. If a permission is set at a lower level, then the permission must be set at all higher levels as well in order for it to take effect. Conversely, if a permission is set at a higher level, it must also be set at all lower levels in order for it to have effect.

Let's look at an example in user mode on a system with KVAS disabled:

0: kd> !process 0 0 explorer.exe
PROCESS ffffc8064497b340
    SessionId: 1  Cid: 1038    Peb: 0090c000  ParentCid: 100c
    DirBase: bc33c000  ObjectTable: ffffa2827c3a1800  HandleCount: 1884.
    Image: explorer.exe
0: kd> .process /p /i ffffc8064497b340
You need to continue execution (press 'g' ) for the context
to be switched. When the debugger breaks in again, you will be in
the new process context.
0: kd> g
Break instruction exception - code 80000003 (first chance)
nt!DbgBreakPointWithStatus:
fffff805`2e1fd0b0 cc              int     3
1: kd> .reload
Connected to Windows 10 19041 x64 target at (Sun Nov 15 19:51:29.691 2020 (UTC - 5:00)), ptr64 TRUE
Loading Kernel Symbols
...
1: kd> bp /p @$proc ntdll!NtCreateFile
1: kd> g
Breakpoint 0 hit
ntdll!NtCreateFile:
0033:00007ffc`3608c830 4c8bd1          mov     r10,rcx
1: kd> !pte kernel32
                                           VA 00007ffc35ee0000
PXE at FFFFE5F2F97CB7F8    PPE at FFFFE5F2F96FFF80    PDE at FFFFE5F2DFFF0D78    PTE at FFFFE5BFFE1AF700
contains 0A000000BBF48867  contains 0A000000BC34E867  contains 0A000000BC34F867  contains 8100000003806025
pfn bbf48     ---DA--UWEV  pfn bc34e     ---DA--UWEV  pfn bc34f     ---DA--UWEV  pfn 3806      ----A--UR-V

There are executable pages in kernel32, but the page containing the header should not be executable. This is reflected in the page hierarchy above, where the PXE, PPE, and PDE are all RWX, but the PTE indicates that the page is read only. The !pte command is detailed more in a few sections, so don't worry if the output is confusing at this moment.

Manually Walking the Page Tables

To appreciate tools like !pte let's look at an example of manually walking the page tables to resolve the physical address of data from it's virtual address. I'm going to be walking the page tables on a system that has KVAS disabled, to reduce complexity, but note there will be a slight twist in this example.
Let's look for nt!NtCreateFile. First, we can use the .formats command to get the binary representation of the address of nt!NtCreateFile. The CR3 register is also required here, since it holds the hardware address of the base of the page tables.

0: kd> .formats nt!NtCreateFile
Evaluate expression:
  Hex:     fffff805`2e3ff090
  Decimal: -8773842243440
  Octal:   1777777600245617770220
  Binary:  11111111 11111111 11111000 00000101 00101110 00111111 11110000 10010000
  Chars:   .....?..
  Time:    ***** Invalid FILETIME
  Float:   low 4.3642e-011 high -1.#QNAN
  Double:  -1.#QNAN
0: kd> r cr3
cr3=00000000001ad000

Since addresses must be canonical, bits 63-49 will all be the same. Then we have bits representing the index into each level of the page tables (9 bits at a time until the page offset):

  • Bits 47-39 = Page-Map Level 4 (PML4) entry (sometimes PXE)
  • Bits 38-30 = Page Directory Pointer Table (PDPT) entry (sometimes PPE)
  • Bits 29-21 = Page Directory Entry (PDE)
  • Bits 20-12 = Page Table Entry (PTE)
  • Bits 11-0 = Offset into physical page where the start of the data resides

Let's break down the .formats output into each index:

                            PML4 idx.   PDPT idx.   PDT idx.    PTE idx.     page idx.
Binary:  11111111 11111111 [11111000 0][0000101 00][101110 001][11111 11110][000 10010000]

Each level of the page hierarchy is just an array of 512 (0x200) _MMPTE_HARDWARE structures. To get the PML4 entry, index into the array starting at CR3 by the PML4 index found from the .formats command above. Remember the -p flag to dt or this will fail. Also, instead of prefixing binary with 0b, which would make too much sense, WinDbg prefixes binary with 0y.

0: kd> dt -p _MMPTE_HARDWARE @@C++(@cr3+@@(0y111110000)*sizeof(_MMPTE_HARDWARE))
nt!_MMPTE_HARDWARE
   +0x000 Valid            : 0y1
   +0x000 Dirty1           : 0y1
`  +0x000 Owner            : 0y0
   +0x000 WriteThrough     : 0y0
   +0x000 CacheDisable     : 0y0
   +0x000 Accessed         : 0y1
   +0x000 Dirty            : 0y1
   +0x000 LargePage        : 0y0
   +0x000 Global           : 0y0
   +0x000 CopyOnWrite      : 0y0
   +0x000 Unused           : 0y0
   +0x000 Write            : 0y0
   +0x000 PageFrameNumber  : 0y000000000000000000000100101100001001 (0x4b09)
   +0x000 ReservedForHardware : 0y0000
   +0x000 ReservedForSoftware : 0y0000
   +0x000 WsleAge          : 0y0000
   +0x000 WsleProtection   : 0y000
   +0x000 NoExecute        : 0y0

Let's also look at the entry with !dq:

0: kd> !dq @@C++(@cr3+@@(0y111110000)*sizeof(_MMPTE_HARDWARE)) L1
#  1adf80 00000000`04b09063

To get to the Page Directory Pointer Table (PDPT) entry from here we need to take the PageFrameNumber, shift it back into its original position in _MMPTE_HARDWARE via a shift left by 12 (0xc) bits and then take the PDPT index. You can also just mask the QWORD that represents the entry (ex. 0x0000000004b09063 & 0xfffffffff000).

0: kd> dt -p _MMPTE_HARDWARE @@C++((0x4b09<<0xc)+@@(0y000010100)*sizeof(_MMPTE_HARDWARE))
nt!_MMPTE_HARDWARE
   +0x000 Valid            : 0y1
   +0x000 Dirty1           : 0y1
   +0x000 Owner            : 0y0
   +0x000 WriteThrough     : 0y0
   +0x000 CacheDisable     : 0y0
   +0x000 Accessed         : 0y1
   +0x000 Dirty            : 0y1
   +0x000 LargePage        : 0y0
   +0x000 Global           : 0y0
   +0x000 CopyOnWrite      : 0y0
   +0x000 Unused           : 0y0
   +0x000 Write            : 0y0
   +0x000 PageFrameNumber  : 0y000000000000000000000100101100001010 (0x4b0a)
   +0x000 ReservedForHardware : 0y0000
   +0x000 ReservedForSoftware : 0y0000
   +0x000 WsleAge          : 0y0000
   +0x000 WsleProtection   : 0y000
   +0x000 NoExecute        : 0y0

Now for the PDE and PTE levels, which are calculated the same way, using the next level's PFN.

0: kd> dt -p _MMPTE_HARDWARE @@C++((0x4b0a<<0xc)+@@(0y101110001)*sizeof(_MMPTE_HARDWARE))
nt!_MMPTE_HARDWARE
   +0x000 Valid            : 0y1
   +0x000 Dirty1           : 0y0
   +0x000 Owner            : 0y0
   +0x000 WriteThrough     : 0y0
   +0x000 CacheDisable     : 0y0
   +0x000 Accessed         : 0y1
   +0x000 Dirty            : 0y0
   +0x000 LargePage        : 0y1
   +0x000 Global           : 0y1
   +0x000 CopyOnWrite      : 0y0
   +0x000 Unused           : 0y0
   +0x000 Write            : 0y0
   +0x000 PageFrameNumber  : 0y000000000000000000000010110000000000 (0x2c00)
   +0x000 ReservedForHardware : 0y0000
   +0x000 ReservedForSoftware : 0y0000
   +0x000 WsleAge          : 0y1010
   +0x000 WsleProtection   : 0y000
   +0x000 NoExecute        : 0y0
0: kd> dt -p _MMPTE_HARDWARE @@C++((0x2c00<<0xc)+@@(0y1111111110)*sizeof(_MMPTE_HARDWARE))
nt!_MMPTE_HARDWARE
   +0x000 Valid            : 0y0
   +0x000 Dirty1           : 0y0
   +0x000 Owner            : 0y1
   +0x000 WriteThrough     : 0y1
   +0x000 CacheDisable     : 0y0
   +0x000 Accessed         : 0y0
   +0x000 Dirty            : 0y1
   +0x000 LargePage        : 0y0
   +0x000 Global           : 0y1
   +0x000 CopyOnWrite      : 0y0
   +0x000 Unused           : 0y0
   +0x000 Write            : 0y1
   +0x000 PageFrameNumber  : 0y100001011111011011100000010111011000 (0x85f6e05d8)
   +0x000 ReservedForHardware : 0y0000
   +0x000 ReservedForSoftware : 0y1111
   +0x000 WsleAge          : 0y0000
   +0x000 WsleProtection   : 0y000
   +0x000 NoExecute        : 0y0

What happened here? The PTE does not seem valid. Pay close attention to the flags in the PDE.

   +0x000 LargePage        : 0y1

This means that the page is part of a large page and the attributes from the PDE apply to every page that it would represent. Large pages on x86 represent a whole PDE worth of pages. The math works out to 1GB of pages represented by a large page:

0: kd> ? 0n512 * 0n512 * 0x1000 // << number of bytes calculation
Evaluate expression: 1073741824 = 00000000`40000000
0: kd> ? 0n1024 * 0n1024 * 0n1024 // << 1GB calculation
Evaluate expression: 1073741824 = 00000000`40000000

There are also huge pages that work the same way, except at the PDPT level instead.

To resolve the starting physical address in this situation, you just need to use the remaining bits (20-0) as an offset into the large page PFN. The diagram above (from the .formats command) becomes the following:

                            PML4 idx.   PDPT idx.   PDT idx.    page idx.
Binary:  11111111 11111111 [11111000 0][0000101 00][101110 001][11111 11110000 10010000]

Now we just need to do the math:

0: kd> ? (0x2c00<<c)+0y111111111000010010000
Evaluate expression: 48230544 = 00000000`02dff090

Validate by dumping out what is at the virtual address for nt!NtCreateFile and what is at the physical address we calculated above:

0: kd> !dq (0x2c00<<c)+0y111111111000010010000
# 2dff090 33000000`88ec8148 44c77824`448948c0
# 2dff0a0 44890000`00207024 89602444`89486824
# 2dff0b0 00e02484`8b582444 8b485024`44890000
# 2dff0c0 89480000`00d82484 00d02484`8b482444
# 2dff0d0 848b4024`44890000 24448900`0000c824
# 2dff0e0 000000c0`24848b38 b824848b`30244489
# 2dff0f0 48282444`89000000 48000000`b024848b
# 2dff100 000017e8`20244489 00000088`c4814800
0: kd> dq nt!NtCreateFile
fffff805`2e3ff090  33000000`88ec8148 44c77824`448948c0
fffff805`2e3ff0a0  44890000`00207024 89602444`89486824
fffff805`2e3ff0b0  00e02484`8b582444 8b485024`44890000
fffff805`2e3ff0c0  89480000`00d82484 00d02484`8b482444
fffff805`2e3ff0d0  848b4024`44890000 24448900`0000c824
fffff805`2e3ff0e0  000000c0`24848b38 b824848b`30244489
fffff805`2e3ff0f0  48282444`89000000 48000000`b024848b
fffff805`2e3ff100  000017e8`20244489 00000088`c4814800

There you have it, validation that the process we followed was correct. If the PDE was not a large page then the PTE would have been valid and bits 11-0 would have been an index into the PFN of the PTE.

Windbg Commands

Of course it is very annoying to do that whole process manually, so WinDbg provides two ways to accomplish what we just looked at. The !pte command will take what is in CR3 and walk the page tables with the virtual address you give it. To match up with the same example as above:

0: kd> !pte nt!NtCreateFile
                                           VA fffff8052e3ff090
PXE at FFFFE5F2F97CBF80    PPE at FFFFE5F2F97F00A0    PDE at FFFFE5F2FE014B88    PTE at FFFFE5FC02971FF8
contains 0000000004B09063  contains 0000000004B0A063  contains 0A00000002C001A1  contains 0000000000000000
pfn 4b09      ---DA--KWEV  pfn 4b0a      ---DA--KWEV  pfn 2c00      -GL-A--KREV  LARGE PAGE pfn 2dff 

This shows the virtual addresses of each level in the hierarchy as well as a breakdown of what each _MMPTE_HARDWARE structure contains. There is also the !vtop command, which will let you specify what page table base (hardware address) to use as the base of the page tables (PML4). This will become useful to us in investigating KVAS, because we want to be able to look at each page table without having to change CR3. Again mirroring the example above to show what data it provides:

0: kd> r cr3
cr3=00000000001ad000
0: kd> ? nt!NtCreateFile
Evaluate expression: -8773842243440 = fffff805`2e3ff090
0: kd> !vtop 1ad000 fffff8052e3ff090
Amd64VtoP: Virt fffff8052e3ff090, pagedir 00000000001ad000
Amd64VtoP: PML4E 00000000001adf80
Amd64VtoP: PDPE 0000000004b090a0
Amd64VtoP: PDE 0000000004b0ab88
Amd64VtoP: Large page mapped phys 0000000002dff090
Virtual address fffff8052e3ff090 translates to physical address 2dff090.

You can examine the addresses via dump commands prefixed with ! (ex. !dq, !dd, !dc) and by using dump type (dt) with the -p flag for physical addresses.

Note that !vtop doesn't play as nice with symbols or WinDbg numbers, so make sure things are in the right format before passing them in. For example, the following commands are invalid to !vtop:

0: kd> !vtop 1ad000 nt!NtCreateFile
Amd64VtoP: Virt 0000000000000000, pagedir 00000000001ad000
Amd64VtoP: PML4E 00000000001ad000
Amd64VtoP: PDPE 0000000100ee1000
Amd64VtoP: zero PDPE
Virtual address 0 translation fails, error 0xD0000147.
0: kd> !vtop @cr3 fffff8052e3ff090
usage: vtop PFNOfPDE VA
0: kd> !vtop 1ad000 fffff805`2e3ff090
Amd64VtoP: Virt 00000000fffff805, pagedir 00000000001ad000
Amd64VtoP: PML4E 00000000001ad000
Amd64VtoP: PDPE 0000000100ee1018
Amd64VtoP: zero PDPE
Virtual address fffff805 translation fails, error 0xD0000147.

We will be using these commands to walk the page tables for the rest of the post, but it is good to know how to manually walk them.

SMEP

SMEP stands for Supervisor Mode Execution Prevention (or sometimes Protection). The idea here is code in lower privileged memory pages should never be trusted (i.e. executed) by a higher privileged mode. For standard SMEP this means executable pages allocated in user mode should not be executed while in kernel mode. It is enforced by the CPU itself and requires explicit support. AMD and Intel processors started rolling out support for this feature in around 2012 for Intel (Ivy Bridge) and 2014 for AMD (Family 17h, Family 15h model >60h). SMEP is enabled on a supported processor when bit 20 of the CR4 register is set. This is consistent between AMD and Intel processors. Do you remember the owner bit (U/K) from the _MMPTE_HARDWARE structure? This is the bit that says whether a page belongs to user mode or kernel mode and is how SMEP is enforced. When in kernel mode (supervisor mode), if the owner bit is 1, then the page is owned by user mode and code should not be executed inside of it. This begs the question: well, what if we can flip that bit? Can we execute those pages? The answer there is yes absolutely, until KVAS was introduced. My favorite presentation on this topic is from EKOParty 2015 by Enrique Nissim and Nicolas Economou called Windows SMEP Bypass U=S. We will examine why KVAS mitigates this attack soon.

Another technology that implements the same sort of trust boundary that SMEP enforces is called Mode-Based Execution Control (MBEC, or just MBE Control), which is enforced between a hypervisor and its guest(s). I'm not going to deep dive into that here, but just know that the high level concept of SMEP applies where the supervisor (hypervisor) does not trust the less privileged pages in user mode (guest) and thus will not execute in them from supervisor mode. Another interesting note about hypervisors: it's also possible to implement software SMEP via Extended Page Table (EPT) permissions. Here's a post from 2014 detailing how this might be done.

There is also Supervisor Mode Access Prevention (SMAP), which is a newer control that prevents accesses to user mode while in kernel mode, unless certain conditions are met. It can be turned on via bit 21 of CR4 on supported processors. This is not entirely relevant to this post, so I'll skip the details on this one for now as well.

KVAS Implementation in Brief

To avoid information disclosure from a successful exploit of the Meltdown vulnerability, separate page tables are kept for user mode and kernel mode for each process. The general term for this technology is Kernel Page Table Isolation (KPTI). Kernel Virtual Address Shadow (KVAS) is the Windows specific implementation of KPTI. The user mode version of the page tables does not even contain the mappings for (almost all) kernel addresses, which the kernel mode version contains mappings for both user and kernel address spaces. Some pages exist in both sets, like KUSER_SHARED_DATA and the system call handler, which actually replaces CR3 on entry and exit into/from the handler, as well as other kernel entry/exit points. We will be looking specifically at the system call handler for this example.

Check out the Microsoft blog post describing the implementation. Fortinet also has a great post on the internals of how KVAS is initialized in the kernel.

Your first thought with this implementation may be: "that sounds very memory expensive!". The overhead of having two sets of paging structures (which occupy some memory) per process is definitely nonzero. However, one optimization that exists relies on the fact that Microsoft does not consider the boundary between an administrator account and the kernel to be a security boundary. Processes that execute in an elevated context do not use KVAS at all! From Microsoft

Because these applications are fully trusted by the operating system, and already have (or could obtain) the capability to load drivers that could naturally access kernel memory, KVA shadowing is not required for fully-privileged applications.

This includes applications that are run by users in the BUILTIN\Administrators group and "processes that execute as a fully-elevated administrator account". Remember: this is an information disclosure concern, so if that information can already be accessed, disclosing it is not a concern. Low privileged users should not be able to leak kernel memory, so this mitigation will be in full effect for those users.


To begin to understand the implementation of KVAS in the Windows kernel, we can look at important fields in the nt!_KPRCB and nt!_KPROCESS structures:

0: kd> dt _KPROCESS DirectoryTableBase UserDirectoryTableBase AddressPolicy
ntdll!_KPROCESS
   +0x028 DirectoryTableBase     : Uint8B
   +0x388 UserDirectoryTableBase : Uint8B
   +0x390 AddressPolicy          : UChar
0: kd> dt nt!_KPRCB KernelDirectoryTableBase RspBaseShadow UserRspShadow ShadowFlags
   +0x8e80 KernelDirectoryTableBase : Uint8B
   +0x8e88 RspBaseShadow            : Uint8B
   +0x8e90 UserRspShadow            : Uint8B
   +0x8e98 ShadowFlags              : Uint4B

Before KVAS, _KPROCESS.DirectoryTableBase held the base of the page tables for a particular process. Remember, on a system without KVAS or in a process where KVAS is disabled, the user and kernel page tables are not separated, so _KPROCESS.DirectoryTableBase is moved into CR3 on process context switch. When KVAS is enabled, _KPROCESS.DirectoryTableBase holds the complete (user and kernel) page table base. The value of _KPROCESS.DirectoryTableBase is moved into _KPRCB.KernelDirectoryTableBase when a process context switch occurs. The user-only page table base is held in _KPROCESS.UserDirectoryTableBase. The _KPROCESS.AddressPolicy field tells the kernel if a process participates in KVAS. If _KPROCESS.AddressPolicy is 1, then KVAS is disabled for the process; if it is 0, then KVAS is enabled. _KPRCB.ShadowFlags holds flags that tell the kernel if KVAS is enabled for the process (according to _KPROCESS.AddressPolicy) and which page table is active. On entry points to the kernel, the value from _KPRCB.KernelDirectoryTableBase is loaded into CR3. On exit from the kernel _KPROCESS.UserDirectoryTableBase is moved into CR3. _KPRCB.RspBaseShadow and _KPRCB.UserRspShadow hold the stack pointer for each mode and are loaded into RSP at entry/exit from the kernel, respectively.

In a KVAS participating process, the hardware address in CR3 has some flags in the bottom bits: bit 0 is set for a user mode page table and bit 1 is set for a kernel mode page table. This can be seen by examining _KPROCESS.DirectoryTableBase and _KPROCESS.UserDirectoryTableBase for a KVAS participating process (explorer.exe):

0: kd> !process 0 0 explorer.exe
PROCESS ffffb68d61dd9080
    SessionId: 1  Cid: 1098    Peb: 00fa4000  ParentCid: 1078
    DirBase: bd6de002  ObjectTable: ffffde87c9020e00  HandleCount: 2120.
    Image: explorer.exe

0: kd> .process /i /p ffffb68d61dd9080
0: kd> dt _KPROCESS @$proc DirectoryTableBase UserDirectoryTableBase
ntdll!_KPROCESS
   +0x028 DirectoryTableBase     : 0xbd6de002
   +0x388 UserDirectoryTableBase : 0xbd6dd001

To use the !vtop command with these values, just mask off the bottom bits.


The system call handler is different on systems wth KVAS enabled. The system call handler is located in Model Specific Register (MSR) 0xC0000082 (LSTAR) for x86 systems. On a x86_64 machine with KVAS explicitly disabled, the system call handler is KiSystemCall64 as shown below:

0: kd> db nt!KiKvaShadow L1
fffff805`2ec01840  00                                               .
0: kd> rdmsr c0000082
msr[c0000082] = fffff805`2e2066c0
0: kd> ln fffff805`2e2066c0
Browse module
Set bu breakpoint

(fffff805`2e2066c0)   nt!KiSystemCall64   |  (fffff805`2e206900)   nt!KiSystemServiceUser

At the top of the system call handler you can see that RSP is moved into _KPCR.UserRsp and _PRCB.RspBase is moved into RSP. _KPCR.UserRsp is then pushed onto the kernel stack for recovery later (at the end of the system call handler).

KiSystemCall64
The system call handler when KVAS is disabled for the system

Next, let's look at the system call handler that is used when KVAS is enabled on the system:

0: kd> db nt!KiKvaShadow L1
fffff804`75001840  01                                               .
0: kd> rdmsr c0000082
msr[c0000082] = fffff804`74c13180
0: kd> ln fffff804`74c13180
Browse module
Set bu breakpoint

(fffff804`74c13180)   nt!KiSystemCall64Shadow   |  (fffff804`74c14060)   nt!_guard_retpoline_icall_handler

KiSystemCall64Shadow is used. The beginning of this function is similar to KiSystemCall64, with a few extra steps. It backs up RSP to _KPRCB.UserRspShadow, swaps _KPRCB.KernelDirectoryTableBase into CR3 if the second bit of _KPRCB.ShadowFlags is set, and restores the kernel stack pointer to RSP from _KPRCB.RspBaseShadow, before pushing _KPRCB.UserRspShadow to the stack (as opposed to _KPCR.UserRsp). See the disassembly below:

KiSystemCall64Shadow
The system call handler when KVAS is enabled for the system

At the end of KiSystemCall64Shadow there is a jump to KiSystemServiceUser which is partway through KiSystemCall64.

KiSystemCall64Shadow_end
The end of the Shadow syscall handler jumps to the label KiSystemServiceUser, which is in the middle of KiSystemCall64

At the end of KiSystemCall64 there is a test to see if KiKvaShadow is 1 (KVAS enabled) and if it is a jump to KiKernelSysretExit is made.

KiSystemCall64_return
The end of KiSystemCall64 calls KiKernelSysretExit if KVAS is enabled

KiKernelSysretExit checks the 2nd bit of _KPRCB.ShadowFlags to see if KVAS is enforced for the process (0 = enforced, 1 = not enforced). If it is enforced, then _KPROCESS.UserDirectoryTableBase is loaded into CR3. If the low bit of _KPRCB.UserDirectoryTableBase is set and the low bit of _KPRCB.ShadowFlags is set, then the low bit of _KPRCB.ShadowFlags is unset indicating that the user page table is now in use.

KiKernelSysretExit
KiKernelSysretExit checks if CR3 needs to be updated or not on exit from the kernel

KiKernelSysretExit is called in a few different places. Unsurprisingly, these places are exit-points from the kernel.

KiKernelSysretExit_xref
KiKernelSysretExit is called in a few kernel exitpoint functions


Next, let's look at cross references of KiKvaShadow just to get an idea of what functions are affected by KVAS.

KiKvaShadow_xref
The shadow flag is checked in many places

There are quite a few functions where this flag is checked. Investigating interesting functions is an exercise left up to the reader.


Now that we have seen a few places where the kernel switches up CR3, let's look at thread context switching to see how it is handled. Thread context switching is performed by the nt!KiSwapContext function, which saves the context and then calls nt!SwapContext:

KiSwapContext
KiSwapContext is a small function that calls SwapContext

The RCX and RDX registers hold the destination and source _KTHREAD structures, respectively. These values are moved into RSI and RDI in preparation for a call to nt!SwapContext. An overview of SwapContext can be seen below:

SwapContext_overview
SwapContext is a fairly large function

In SwapContext, RDI is a pointer to the thread being switched out and RSI is a pointer to the thread being switched in. Among other things and especially important to us, SwapContext is responsible for switching in the correct page table to CR3, checking the destination process's address policy, and setting up _KPRCB.ShadowFlags as well as _KPRCB.KernelDirectoryTableBase. If the destination process is the same as the source process, this entire process is unnecessary and is skipped. If they are different, then they may have different address policies. The destination process (RSI.ApcState.Process) is loaded into R14 and then if KVAS is enabled on the system, the 2nd bit of _KPROCESS.DirectoryTableBase is checked to see if it is a kernel page table. If it is a kernel page table, the high bit of the page table will be set and the low bit of _KPRCB.ShadowFlags will be set. The (potentially) modified kernel page table address is then moved int _KPRCB.KernelDirectoryTableBase, the page table's high bit is unset, the 2nd bit of _KPRCB.ShadowFlags is masked off (unset), and _KPROCESS.AddressPolicy is checked. If the address policy is 1 (KVAS not enforced), then _KPRCB.ShadowFlags is xor-ed with 3 (0b11) to set the 2nd bit and unset the first resulting in a _KPRCB.ShadowFlags value of 2. Then, the page table address is put into CR3. Interrupts are disabled (cli) and then re-enabled (sti) to prevent the system from interrupting this process. If running under Hyper-V, then instead of accessing CR3 directly, a hypercall will be made to switch address spaces.

SwapContext_AddressPolicy
The correct ShadowFlags are set based on a number of checks, then CR3 is updated with the new page table base

A few blocks down, the thread's initial stack (_KTHREAD.InitialStack) is saved in _KPRCB.RspBase and either _KPCR.TssBase->Rsp0 or _KPRCB.RspBaseShadow; the latter is used on a KVAS enabled system.

SwapContext_tss_or_RspBaseShadow
The current thread's kernel stack base is kept in different places for KVAS and non-KVAS processes

On examination of these fields, we can see that on a KVAS enabled system _KPRCB.RspBase, _KPRCB.RspBaseShadow, and _KTHREAD.InitialStack are all the same value.

0: kd> dt _KPCR @$pcr Prcb.UserRspShadow Prcb.RspBase Prcb.RspBaseShadow TssBase->Rsp0
ntdll!_KPCR
   +0x008 TssBase            : 
      +0x004 Rsp0               : 0xfffff804`78c64200
   +0x180 Prcb               : 
      +0x028 RspBase            : 0xffff828b`d7d02c90
      +0x8e88 RspBaseShadow      : 0xffff828b`d7d02c90
      +0x8e90 UserRspShadow      : 0x555ee68
0: kd> dt _KTHREAD @$thread InitialStack
ntdll!_KTHREAD
   +0x028 InitialStack : 0xffff828b`d7d02c90 Void

On a KVAS disabled system, _KPCR.TssBase->Rsp0, _KPRCB.RspBase, and _KTHREAD.InitialStack are all the same value.

0: kd> dt _KPCR @$pcr Prcb.UserRspShadow Prcb.RspBase Prcb.RspBaseShadow TssBase->Rsp0
nt!_KPCR
   +0x008 TssBase            : 
      +0x004 Rsp0               : 0xfffff805`31d3cc90
   +0x180 Prcb               : 
      +0x028 RspBase            : 0xfffff805`31d3cc90
      +0x8e88 RspBaseShadow      : 0
      +0x8e90 UserRspShadow      : 0
0: kd> dt _KTHREAD @$thread InitialStack
nt!_KTHREAD
   +0x028 InitialStack : 0xfffff805`31d3cc90 Void

A final question: What do all of these functions have in common?
They are all in the KVASCODE section of the kernel binary.

KVASCODE
The KVASCODE section is mapped for both sets of page tables

This section of the kernel binary is mapped in both sets of page tables! To validate this claim, let's use !vtop to resolve nt!KiSystemCall64Shadow (0xfffff80474c13180) in both sets of page tables.

0: kd> dt _KPROCESS @$proc DirectoryTableBase UserDirectoryTableBase
ntdll!_KPROCESS
   +0x028 DirectoryTableBase     : 0xbd6de002
   +0x388 UserDirectoryTableBase : 0xbd6dd001
0: kd> !vtop 0xbd6de000 0xfffff80474c13180
Amd64VtoP: Virt fffff80474c13180, pagedir 00000000bd6de000
Amd64VtoP: PML4E 00000000bd6def80
Amd64VtoP: PDPE 0000000004809088
Amd64VtoP: PDE 000000000480ad30
Amd64VtoP: Large page mapped phys 0000000003213180
Virtual address fffff80474c13180 translates to physical address 3213180.
0: kd> !vtop 0xbd6dd000 0xfffff80474c13180
Amd64VtoP: Virt fffff80474c13180, pagedir 00000000bd6dd000
Amd64VtoP: PML4E 00000000bd6ddf80
Amd64VtoP: PDPE 000000013cd21088
Amd64VtoP: PDE 000000013cd20d30
Amd64VtoP: PTE 000000013cd27098
Amd64VtoP: Mapped phys 0000000003213180
Virtual address fffff80474c13180 translates to physical address 3213180.

The address maps successfully to physical address 3213180 in both sets of page tables for this particular process. This makes sense because if these functions didn't exist in both sets of page tables then the implementation would not be able to do the switch properly. The backing memory would not exist according to the page table at some point during the function (either before or after the CR3 switch).

Experiments

Now onto my experiments. For each experiment I will run the same commands on a system with KVAS enabled and also on a system with KVAS disabled and note the differences. Hopefully this will help you understand the implementation a bit better! I know it has helped me.

KVAS Implemetation

For the first experiment, I will show the effect of KVAS by showing a function that exists in one page table, but not the other on the KVAS enabled system. I will also show that the system call handler is different between the two systems.

First, I will switch process contexts to explorer.exe then I will look at what is in MSR 0xC0000082 (LSTAR). Next, I will look up the page tables used by the process and try to resolve the physical address of nt!NtCreateFile in each page table using !vtop.

KVAS Enabled

1: kd> !cpuinfo
CP  F/M/S Manufacturer  MHz PRCB Signature    MSR 8B Signature Features
 0  6,158,10 GenuineIntel 2592 000000d600000000                   311b3dff
 1  6,158,10 GenuineIntel 2592 000000d600000000 >000000d600000000<311b3dff
                      Cached Update Signature 000000d600000000
                     Initial Update Signature 000000d600000000
1: kd> !process 0 0 explorer.exe
PROCESS ffffb68d61dd9080
    SessionId: 1  Cid: 1098    Peb: 00fa4000  ParentCid: 1078
    DirBase: bd6de002  ObjectTable: ffffde87c9020e00  HandleCount: 2360.
    Image: explorer.exe

1: kd> .process /i /p ffffb68d61dd9080
You need to continue execution (press 'g' ) for the context
to be switched. When the debugger breaks in again, you will be in
the new process context.
1: kd> g
Break instruction exception - code 80000003 (first chance)
nt!DbgBreakPointWithStatus:
fffff804`745fd0b0 cc              int     3
1: kd> .reload
Connected to Windows 10 19041 x64 target at (Fri Nov 27 20:52:29.550 2020 (UTC - 5:00)), ptr64 TRUE
Loading Kernel Symbols
...............................................................
................................................................
Loading User Symbols
.......................................................
Loading unloaded module list
1: kd> dt nt!_KPROCESS @$proc DirectoryTableBase
   +0x028 DirectoryTableBase : 0xbd6de002
1: kd> dt nt!_KPROCESS @$proc UserDirectoryTableBase
   +0x388 UserDirectoryTableBase : 0xbd6dd001
1: kd> dt nt!_KPCR @$pcr Prcb.KernelDirectoryTableBase
   +0x180 Prcb                          : 
      +0x8e80 KernelDirectoryTableBase      : 0x80000000`bd6de002
1: kd> rdmsr c0000082
msr[c0000082] = fffff804`74c13180
1: kd> ln fffff804`74c13180
Browse module
Set bu breakpoint

(fffff804`74c13180)   nt!KiSystemCall64Shadow   |  (fffff804`74c14060)   nt!_guard_retpoline_icall_handler
Exact matches:
1: kd> ? nt!NtCreateFile
Evaluate expression: -8776958611312 = fffff804`747ff090
1: kd> !vtop
usage: vtop PFNOfPDE VA
1: kd> !vtop 0xbd6de000 0xfffff804747ff090
Amd64VtoP: Virt fffff804747ff090, pagedir 00000000bd6de000
Amd64VtoP: PML4E 00000000bd6def80
Amd64VtoP: PDPE 0000000004809088
Amd64VtoP: PDE 000000000480ad18
Amd64VtoP: Large page mapped phys 0000000002dff090
Virtual address fffff804747ff090 translates to physical address 2dff090.
1: kd> !vtop 0xbd6dd000 0xfffff804747ff090
Amd64VtoP: Virt fffff804747ff090, pagedir 00000000bd6dd000
Amd64VtoP: PML4E 00000000bd6ddf80
Amd64VtoP: PDPE 000000013cd21088
Amd64VtoP: PDE 000000013cd20d18
Amd64VtoP: zero PDE
Virtual address fffff804747ff090 translation fails, error 0xD0000147.
1: kd> r cr3
cr3=00000000bd6de002
1: kd> !pte nt!NtCreateFile
                                           VA fffff804747ff090
PXE at FFFF87C3E1F0FF80    PPE at FFFF87C3E1FF0088    PDE at FFFF87C3FE011D18    PTE at FFFF87FC023A3FF8
contains 0000000004809063  contains 000000000480A063  contains 0A00000002C000A1  contains 0000000000000000
pfn 4809      ---DA--KWEV  pfn 480a      ---DA--KWEV  pfn 2c00      --L-A--KREV  LARGE PAGE pfn 2dff        

1: kd> !pte ntdll!NtCreateFile
                                           VA 00007ffe181ec830
PXE at FFFF87C3E1F0F7F8    PPE at FFFF87C3E1EFFFC0    PDE at FFFF87C3DFFF8600    PTE at FFFF87BFFF0C0F60
contains 8A0000003F8EA867  contains 0A0000003DFF0867  contains 0A0000003DFF1867  contains 01000001006B4025
pfn 3f8ea     ---DA--UW-V  pfn 3dff0     ---DA--UWEV  pfn 3dff1     ---DA--UWEV  pfn 1006b4    ----A--UREV

KVAS Disabled

0: kd> !cpuinfo
CP  F/M/S Manufacturer  MHz PRCB Signature    MSR 8B Signature Features
 0  6,158,10 GenuineIntel 2592 000000d600000000 >000000d600000000<311b3dff
 1  6,158,10 GenuineIntel 2592 000000d600000000                   311b3dff
0: kd> !process 0 0 explorer.exe
PROCESS ffffc8064497b340
    SessionId: 1  Cid: 1038    Peb: 0090c000  ParentCid: 100c
    DirBase: beb3c000  ObjectTable: ffffa2827c3a1800  HandleCount: 2254.
    Image: explorer.exe

0: kd> .process /i /p ffffc8064497b340
You need to continue execution (press 'g' ) for the context
to be switched. When the debugger breaks in again, you will be in
the new process context.
0: kd> g
Break instruction exception - code 80000003 (first chance)
nt!DbgBreakPointWithStatus:
fffff805`2e1fd0b0 cc              int     3
0: kd> .reload
Connected to Windows 10 19041 x64 target at (Fri Nov 27 20:52:32.030 2020 (UTC - 5:00)), ptr64 TRUE
Loading Kernel Symbols
...............................................................
................................................................
...............................................................
Loading User Symbols
.............
Loading unloaded module list
.............
0: kd> dt nt!_KPROCESS @$proc DirectoryTableBase
   +0x028 DirectoryTableBase : 0xbeb3c000
0: kd> dt nt!_KPROCESS @$proc UserDirectoryTableBase
   +0x388 UserDirectoryTableBase : 0
0: kd> dt nt!_KPCR @$pcr Prcb.KernelDirectoryTableBase
   +0x180 Prcb                          : 
      +0x8e80 KernelDirectoryTableBase      : 0
0: kd> rdmsr c0000082
msr[c0000082] = fffff805`2e2066c0
0: kd> ln fffff805`2e2066c0
Browse module
Set bu breakpoint

(fffff805`2e2066c0)   nt!KiSystemCall64   |  (fffff805`2e206900)   nt!KiSystemServiceUser
Exact matches:
0: kd> ? nt!NtCreateFile
Evaluate expression: -8773842243440 = fffff805`2e3ff090
0: kd> !vtop 0xbeb3c000 0xfffff8052e3ff090
Amd64VtoP: Virt fffff8052e3ff090, pagedir 00000000beb3c000
Amd64VtoP: PML4E 00000000beb3cf80
Amd64VtoP: PDPE 0000000004b090a0
Amd64VtoP: PDE 0000000004b0ab88
Amd64VtoP: Large page mapped phys 0000000002dff090
Virtual address fffff8052e3ff090 translates to physical address 2dff090.
0: kd> r cr3
cr3=00000000beb3c000
0: kd> !pte nt!NtCreateFile
                                           VA fffff8052e3ff090
PXE at FFFFE5F2F97CBF80    PPE at FFFFE5F2F97F00A0    PDE at FFFFE5F2FE014B88    PTE at FFFFE5FC02971FF8
contains 0000000004B09063  contains 0000000004B0A063  contains 0A00000002C001A1  contains 0000000000000000
pfn 4b09      ---DA--KWEV  pfn 4b0a      ---DA--KWEV  pfn 2c00      -GL-A--KREV  LARGE PAGE pfn 2dff        

0: kd> !pte ntdll!NtCreateFile
                                           VA 00007ffc3608c830
PXE at FFFFE5F2F97CB7F8    PPE at FFFFE5F2F96FFF80    PDE at FFFFE5F2DFFF0D80    PTE at FFFFE5BFFE1B0460
contains 0A000000BC048867  contains 0A0000000604E867  contains 0A00000005350867  contains 010000006A1EC025
pfn bc048     ---DA--UWEV  pfn 604e      ---DA--UWEV  pfn 5350      ---DA--UWEV  pfn 6a1ec     ----A--UREV

Results

The page table lookup for nt!NtCreateFile fails for the user page table on the KVAS enabled system! This means KVAS is working just fine.

Software SMEP

For the next test, I will show that Software SMEP is enforced at the top level of the page tables on a KVAS enabled system.

I will resolve the address of the PML4 entry for ntdll!NtCreateFile for all page tables utilized via !vtop, then I will look at the page permissions applied using dt -p.

KVAS Enabled

1: kd> ? ntdll!NtCreateFile
Evaluate expression: 140729303091248 = 00007ffe`181ec830
1: kd> !vtop 0xbd6dd000 0x00007ffe181ec830
Amd64VtoP: Virt 00007ffe181ec830, pagedir 00000000bd6dd000
Amd64VtoP: PML4E 00000000bd6dd7f8
Amd64VtoP: PDPE 000000003f8eafc0
Amd64VtoP: PDE 000000003dff0600
Amd64VtoP: PTE 000000003dff1f60
Amd64VtoP: Mapped phys 00000001006b4830
Virtual address 7ffe181ec830 translates to physical address 1006b4830.
1: kd> !vtop 0xbd6de000 0x00007ffe181ec830
Amd64VtoP: Virt 00007ffe181ec830, pagedir 00000000bd6de000
Amd64VtoP: PML4E 00000000bd6de7f8
Amd64VtoP: PDPE 000000003f8eafc0
Amd64VtoP: PDE 000000003dff0600
Amd64VtoP: PTE 000000003dff1f60
Amd64VtoP: Mapped phys 00000001006b4830
Virtual address 7ffe181ec830 translates to physical address 1006b4830.
1: kd> dt -p nt!_MMPTE_HARDWARE @@(0x0000000bd6dd7f8)
   +0x000 Valid            : 0y1
   +0x000 Dirty1           : 0y1
   +0x000 Owner            : 0y1
   +0x000 WriteThrough     : 0y0
   +0x000 CacheDisable     : 0y0
   +0x000 Accessed         : 0y1
   +0x000 Dirty            : 0y1
   +0x000 LargePage        : 0y0
   +0x000 Global           : 0y0
   +0x000 CopyOnWrite      : 0y0
   +0x000 Unused           : 0y0
   +0x000 Write            : 0y1
   +0x000 PageFrameNumber  : 0y000000000000000000111111100011101010 (0x3f8ea)
   +0x000 ReservedForHardware : 0y0000
   +0x000 ReservedForSoftware : 0y0000
   +0x000 WsleAge          : 0y1010
   +0x000 WsleProtection   : 0y000
   +0x000 NoExecute        : 0y0
1: kd> dt -p nt!_MMPTE_HARDWARE @@(0x00000000bd6de7f8)
   +0x000 Valid            : 0y1
   +0x000 Dirty1           : 0y1
   +0x000 Owner            : 0y1
   +0x000 WriteThrough     : 0y0
   +0x000 CacheDisable     : 0y0
   +0x000 Accessed         : 0y1
   +0x000 Dirty            : 0y1
   +0x000 LargePage        : 0y0
   +0x000 Global           : 0y0
   +0x000 CopyOnWrite      : 0y0
   +0x000 Unused           : 0y0
   +0x000 Write            : 0y1
   +0x000 PageFrameNumber  : 0y000000000000000000111111100011101010 (0x3f8ea)
   +0x000 ReservedForHardware : 0y0000
   +0x000 ReservedForSoftware : 0y0000
   +0x000 WsleAge          : 0y1010
   +0x000 WsleProtection   : 0y000
   +0x000 NoExecute        : 0y1

KVAS Disabled

0: kd> ? ntdll!NtCreateFile
Evaluate expression: 140721215031344 = 00007ffc`3608c830
0: kd> !vtop 0xbeb3c000 0x00007ffc3608c830
Amd64VtoP: Virt 00007ffc3608c830, pagedir 00000000beb3c000
Amd64VtoP: PML4E 00000000beb3c7f8
Amd64VtoP: PDPE 00000000bc048f80
Amd64VtoP: PDE 000000000604ed80
Amd64VtoP: PTE 0000000005350460
Amd64VtoP: Mapped phys 000000006a1ec830
Virtual address 7ffc3608c830 translates to physical address 6a1ec830.
0: kd> dt -p nt!_MMPTE_HARDWARE @@(0x0000000beb3c7f8)
   +0x000 Valid            : 0y1
   +0x000 Dirty1           : 0y1
   +0x000 Owner            : 0y1
   +0x000 WriteThrough     : 0y0
   +0x000 CacheDisable     : 0y0
   +0x000 Accessed         : 0y1
   +0x000 Dirty            : 0y1
   +0x000 LargePage        : 0y0
   +0x000 Global           : 0y0
   +0x000 CopyOnWrite      : 0y0
   +0x000 Unused           : 0y0
   +0x000 Write            : 0y1
   +0x000 PageFrameNumber  : 0y000000000000000010111100000001001000 (0xbc048)
   +0x000 ReservedForHardware : 0y0000
   +0x000 ReservedForSoftware : 0y0000
   +0x000 WsleAge          : 0y1010
   +0x000 WsleProtection   : 0y000
   +0x000 NoExecute        : 0y0

Results

The PML4 entry for the kernel page table has the NoExecute bit set for user mode addresses. Even if the processor does not support SMEP, an access violation will be thrown on attempted execution from kernel mode if the kernel page table is in CR3. The KVAS disabled system does not have separate page tables, so the user code must be executable.

KVAS Disabled in Privileged Processes

Next up is showing that KVAS is disabled for privileged/elevated processes.

I will switch to a non-elevated instance of cmd.exe and look at _KPROCESS.DirectoryTableBase, _KPROCESS.UserDirectoryTableBase, _KPROCESS.AddressPolicy, _KPRCB.KernelDirectoryTableBase, and _KPRCB.ShadowFlags and then I will show the same fields when in the context of an elevated cmd.exe instance.

Non-Elevated Process

0: kd> !process 0 0 cmd.exe
PROCESS ffffb68d5b96f080
    SessionId: 1  Cid: 0dd4    Peb: 100343000  ParentCid: 1098
    DirBase: 0785a002  ObjectTable: ffffde87d25062c0  HandleCount:  68.
    Image: cmd.exe

0: kd> .process /i /p ffffb68d5b96f080
You need to continue execution (press 'g' ) for the context
to be switched. When the debugger breaks in again, you will be in
the new process context.
0: kd> g
Break instruction exception - code 80000003 (first chance)
nt!DbgBreakPointWithStatus:
fffff804`745fd0b0 cc              int     3
1: kd> dt nt!_KPROCESS @$proc DirectoryTableBase UserDirectoryTableBase AddressPolicy
   +0x028 DirectoryTableBase     : 0x785a002
   +0x388 UserDirectoryTableBase : 0xbb659001
   +0x390 AddressPolicy          : 0 ''
1: kd> dt nt!_KPRCB @$prcb KernelDirectoryTableBase ShadowFlags
   +0x8e80 KernelDirectoryTableBase : 0x80000000`0785a002
   +0x8e98 ShadowFlags              : 1

Elevated Process

0: kd> !process 0 0 cmd.exe
PROCESS ffffb68d63bb7080
    SessionId: 1  Cid: 0a58    Peb: 52134af000  ParentCid: 1098
    DirBase: 8b073002  ObjectTable: ffffde87d250e100  HandleCount:  65.
    Image: cmd.exe

0: kd> .process /i /p ffffb68d63bb7080
You need to continue execution (press 'g' ) for the context
to be switched. When the debugger breaks in again, you will be in
the new process context.
0: kd> g
Break instruction exception - code 80000003 (first chance)
nt!DbgBreakPointWithStatus:
fffff804`745fd0b0 cc              int     3
1: kd> dt nt!_KPROCESS @$proc DirectoryTableBase UserDirectoryTableBase AddressPolicy
   +0x028 DirectoryTableBase     : 0x8b073002
   +0x388 UserDirectoryTableBase : 1
   +0x390 AddressPolicy          : 0x1 ''
1: kd> dt nt!_KPRCB @$prcb KernelDirectoryTableBase ShadowFlags
   +0x8e80 KernelDirectoryTableBase : 0x80000000`8b073002
   +0x8e98 ShadowFlags              : 2

Results

The non-elevated process has a _KPROCESS.AddressPolicy of 0 and the 1st bit of _KPRCB.ShadowFlags set. The elevated process does not have a valid _KPROCESS.UserDirectoryTableBase, has a _KPROCESS.AddressPolicy of 1, and has the 2nd bit set in _KPRCB.ShadowFlags.

Faults

For this section I will be testing the existence of software SMEP by running with permutations of not only KVAS enabled/disabled, but also with SMEP enabled/disabled. For each case, I have outlined an expected result for fun, let's see if my assumptions match up with reality!

To test, I'll context switch to a KVAS enabled process (or any process on the KVAS disabled system), set the instruction pointer to executable code in user mode, then I'll single step and see what happens to the system in each case.

KVAS Enabled, SMEP Enabled

Expected result: fault on user mode page execution in kernel mode

0: kd> !process 0 0 explorer.exe
PROCESS ffff848c6c231340
    SessionId: 1  Cid: 10b8    Peb: 00d61000  ParentCid: 1064
    DirBase: b3bd4002  ObjectTable: ffffc40b7e99ec00  HandleCount: 1684.
    Image: explorer.exe

0: kd> .process /i /p ffff848c6c231340
You need to continue execution (press 'g' ) for the context
to be switched. When the debugger breaks in again, you will be in
the new process context.
0: kd> g
Break instruction exception - code 80000003 (first chance)
nt!DbgBreakPointWithStatus:
fffff804`445fd0b0 cc              int     3
1: kd> .reload /user
Loading User Symbols
................................................................
................................................................
................................................................
................................................................
..........
1: kd> u kernel32+216e L1
KERNEL32!SortGetSortKey+0xede:
00007fff`a74c216e cc              int     3
1: kd> r rip=kernel32+216e
1: kd> p
KERNEL32!SortGetSortKey+0xedf:
00007fff`a74c216f fc              cld
1: kd> p
KDTARGET: Refreshing KD connection

*** Fatal System Error: 0x000000fc
                       (0x00007FFFA74C216F,0x0200000008782025,0xFFFFFB874D733940,0x0000000080000005)


A fatal system error has occurred.
Debugger entered on first try; Bugcheck callbacks have not been invoked.

A fatal system error has occurred.

nt!DbgBreakPointWithStatus:
fffff804`445fd0b0 cc              int     3
1: kd> g
Break instruction exception - code 80000003 (first chance)

A fatal system error has occurred.
Debugger entered on first try; Bugcheck callbacks have not been invoked.

A fatal system error has occurred.

nt!DbgBreakPointWithStatus:
fffff804`445fd0b0 cc              int     3
1: kd> !analyze -v
Connected to Windows 10 19041 x64 target at (Fri Nov 27 22:45:37.050 2020 (UTC - 5:00)), ptr64 TRUE
Loading Kernel Symbols
...............................................................
................................................................
.............................................................
Loading User Symbols
................................................................
................................................................
................................................................
................................................................
..........
Loading unloaded module list
................
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

ATTEMPTED_EXECUTE_OF_NOEXECUTE_MEMORY (fc)
An attempt was made to execute non-executable memory.  The guilty driver
is on the stack trace (and is typically the current instruction pointer).
When possible, the guilty driver's name (Unicode string) is printed on
the bugcheck screen and saved in KiBugCheckDriver.
Arguments:
Arg1: 00007fffa74c216f, Virtual address for the attempted execute.
Arg2: 0200000008782025, PTE contents.
Arg3: fffffb874d733940, (reserved)
Arg4: 0000000080000005, (reserved)

KVAS Disabled, SMEP Enabled

Expected result: fault on user mode page execution in kernel mode

0: kd> !process 0 0 explorer.exe
PROCESS ffff9787d1477080
    SessionId: 1  Cid: 10ac    Peb: 01182000  ParentCid: 1094
    DirBase: b2f75000  ObjectTable: ffff8601a3fc3200  HandleCount: 1911.
    Image: explorer.exe

0: kd> .process /i /p ffff9787d1477080
You need to continue execution (press 'g' ) for the context
to be switched. When the debugger breaks in again, you will be in
the new process context.
0: kd> g
Break instruction exception - code 80000003 (first chance)
nt!DbgBreakPointWithStatus:
fffff801`2a9fd0b0 cc              int     3
1: kd> .reload /user
Loading User Symbols
................................................................
................................................................
................................................................
..............................................................
1: kd> u kernel32+216e L1
KERNEL32!SortGetSortKey+0xede:
00007ffb`a752216e cc              int     3
1: kd> r rip=kernel32+216e
1: kd> p
KERNEL32!SortGetSortKey+0xedf:
00007ffb`a752216f fc              cld
1: kd> p
KDTARGET: Refreshing KD connection

*** Fatal System Error: 0x000000fc
                       (0x00007FFBA752216F,0x010000000A8B1025,0xFFFFED060F7F0940,0x0000000080000005)


A fatal system error has occurred.
Debugger entered on first try; Bugcheck callbacks have not been invoked.

A fatal system error has occurred.

nt!DbgBreakPointWithStatus:
fffff801`2a9fd0b0 cc              int     3
1: kd> g
Break instruction exception - code 80000003 (first chance)

A fatal system error has occurred.
Debugger entered on first try; Bugcheck callbacks have not been invoked.

A fatal system error has occurred.

nt!DbgBreakPointWithStatus:
fffff801`2a9fd0b0 cc              int     3
1: kd> !analyze -v
Connected to Windows 10 19041 x64 target at (Fri Nov 27 22:48:37.554 2020 (UTC - 5:00)), ptr64 TRUE
Loading Kernel Symbols
...............................................................
................................................................
.............................................................
Loading User Symbols
................................................................
................................................................
................................................................
..............................................................
Loading unloaded module list
..............
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

ATTEMPTED_EXECUTE_OF_NOEXECUTE_MEMORY (fc)
An attempt was made to execute non-executable memory.  The guilty driver
is on the stack trace (and is typically the current instruction pointer).
When possible, the guilty driver's name (Unicode string) is printed on
the bugcheck screen and saved in KiBugCheckDriver.
Arguments:
Arg1: 00007ffba752216f, Virtual address for the attempted execute.
Arg2: 010000000a8b1025, PTE contents.
Arg3: ffffed060f7f0940, (reserved)
Arg4: 0000000080000005, (reserved)

KVAS Enabled, SMEP Disabled

Expected result: fault on user mode page execution in kernel mode via Software SMEP

0: kd> !process 0 0 explorer.exe
PROCESS ffffd18ad3a31340
    SessionId: 1  Cid: 0acc    Peb: 00c3c000  ParentCid: 0d20
    DirBase: 3f159002  ObjectTable: ffffac865a3e3780  HandleCount: 1667.
    Image: explorer.exe

0: kd> .process /i /p ffffd18ad3a31340
You need to continue execution (press 'g' ) for the context
to be switched. When the debugger breaks in again, you will be in
the new process context.
1: kd> .reload /user
Loading User Symbols
................................................................
................................................................
................................................................
................................................................


************* Symbol Loading Error Summary **************
Module name            Error
SharedUserData         No error - symbol load deferred

You can troubleshoot most symbol related issues by turning on symbol loading diagnostics (!sym noisy) and repeating the command that caused symbols to be loaded.
You should also verify that your symbol search path (.sympath) is correct.
1: kd> u kernel32+216e L1
KERNEL32!SortGetSortKey+0xede:
00007ff9`abfd216e cc              int     3
1: kd> r rip=kernel32+216e
1: kd> r cr4=@@C++(@cr4 & ~(1<<20))
1: kd> p
KERNEL32!SortGetSortKey+0xedf:
00007ff9`abfd216f fc              cld
1: kd> p
KDTARGET: Refreshing KD connection

*** Fatal System Error: 0x000000fc
                       (0x00007FF9ABFD216F,0x030000000F670025,0xFFFF9884AB7F0940,0x0000000080000005)


A fatal system error has occurred.
Debugger entered on first try; Bugcheck callbacks have not been invoked.

A fatal system error has occurred.

nt!DbgBreakPointWithStatus:
fffff803`753fd0b0 cc              int     3
1: kd> !analyze -v
The debuggee is ready to run
1: kd> !analyze -v
The debuggee is ready to run
1: kd> g
Break instruction exception - code 80000003 (first chance)

A fatal system error has occurred.
Debugger entered on first try; Bugcheck callbacks have not been invoked.

A fatal system error has occurred.

nt!DbgBreakPointWithStatus:
fffff803`753fd0b0 cc              int     3
1: kd> !analyze -v
Connected to Windows 10 19041 x64 target at (Fri Nov 27 22:40:28.176 2020 (UTC - 5:00)), ptr64 TRUE
Loading Kernel Symbols
...............................................................
................................................................
.............................................................
Loading User Symbols
................................................................
................................................................
................................................................
................................................................

Loading unloaded module list
.............................

************* Symbol Loading Error Summary **************
Module name            Error
SharedUserData         No error - symbol load deferred

You can troubleshoot most symbol related issues by turning on symbol loading diagnostics (!sym noisy) and repeating the command that caused symbols to be loaded.
You should also verify that your symbol search path (.sympath) is correct.
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

ATTEMPTED_EXECUTE_OF_NOEXECUTE_MEMORY (fc)
An attempt was made to execute non-executable memory.  The guilty driver
is on the stack trace (and is typically the current instruction pointer).
When possible, the guilty driver's name (Unicode string) is printed on
the bugcheck screen and saved in KiBugCheckDriver.
Arguments:
Arg1: 00007ff9abfd216f, Virtual address for the attempted execute.
Arg2: 030000000f670025, PTE contents.
Arg3: ffff9884ab7f0940, (reserved)
Arg4: 0000000080000005, (reserved)

KVAS Disabled, SMEP Disabled

Expected result: successful execution in a user mode page

0: kd> !process 0 0 explorer.exe
PROCESS ffff840ec792c340
    SessionId: 1  Cid: 1050    Peb: 00380000  ParentCid: 1024
    DirBase: b2f4f000  ObjectTable: ffff948cdda96d40  HandleCount: 1952.
    Image: explorer.exe

0: kd> .process /i /p ffff840ec792c340
You need to continue execution (press 'g' ) for the context
to be switched. When the debugger breaks in again, you will be in
the new process context.
0: kd> g
Break instruction exception - code 80000003 (first chance)
nt!DbgBreakPointWithStatus:
fffff806`743fd0b0 cc              int     3
1: kd> .reload /user
Loading User Symbols
................................................................
................................................................
................................................................
..............................................................
1: kd> r rip=kernel32+216e
1: kd> u kernel32+216e
KERNEL32!SortGetSortKey+0xede:
00007ff8`b5a0216e cc              int     3
00007ff8`b5a0216f fc              cld
00007ff8`b5a02170 ff              ???
00007ff8`b5a02171 ff418b          inc     dword ptr [rcx-75h]
00007ff8`b5a02174 c24d8d          ret     8D4Dh
00007ff8`b5a02177 3c44            cmp     al,44h
00007ff8`b5a02179 0f1f8000000000  nop     dword ptr [rax]
00007ff8`b5a02180 418d0413        lea     eax,[r11+rdx]
1: kd> u kernel32+216e L1
KERNEL32!SortGetSortKey+0xede:
00007ff8`b5a0216e cc              int     3
1: kd> r cr4=@@C++(@cr4 & ~(1<<20))
1: kd> p
KERNEL32!SortGetSortKey+0xedf:
00007ff8`b5a0216f fc              cld
1: kd> p
00007ff8`b5a02170 ff              ???

No crash!!

Results

As expected, all tests but the last caused a crash immediately. Interestingly, the CPU executed the breakpoint instruction and crashed on the next instruction on every test that crashed. Instruction caching? Or just how the CPU is designed. Very interesting!

noexecute
:(

Wrap up

I hope you've learned a thing or two from this. I've been wanting to do this investigation for a while, just to nail down the implementation details here. If you have questions feel free to reach out on Twitter @jgeigerm. For now and as always ~~h a v e f u n i n s i d e~~.

Bonus: WinDbg Bug

There's a bug in the dt command where it sign extends bit 31 on 64-bit values making it impossible to do dt -p on some values:

1: kd> dt -p nt!_MMPTE_HARDWARE 0x0000000bd6de7f8
   +0x000 Valid            : ??
   +0x000 Dirty1           : ??
   +0x000 Owner            : ??
   +0x000 WriteThrough     : ??
   +0x000 CacheDisable     : ??
   +0x000 Accessed         : ??
   +0x000 Dirty            : ??
   +0x000 LargePage        : ??
   +0x000 Global           : ??
   +0x000 CopyOnWrite      : ??
   +0x000 Unused           : ??
   +0x000 Write            : ??
   +0x000 PageFrameNumber  : ??
   +0x000 ReservedForHardware : ??
   +0x000 ReservedForSoftware : ??
   +0x000 WsleAge          : ??
   +0x000 WsleProtection   : ??
   +0x000 NoExecute        : ??
Memory read error ffffffffbd6de7f8

Totally bogus! The solution I found was to wrap the value in the MASM or C++ interpreter:

1: kd> dt -p nt!_MMPTE_HARDWARE @@C++(0x00000000bd6de7f8)
   +0x000 Valid            : 0y1
   +0x000 Dirty1           : 0y1
   +0x000 Owner            : 0y1
   +0x000 WriteThrough     : 0y0
   +0x000 CacheDisable     : 0y0
   +0x000 Accessed         : 0y1
   +0x000 Dirty            : 0y1
   +0x000 LargePage        : 0y0
   +0x000 Global           : 0y0
   +0x000 CopyOnWrite      : 0y0
   +0x000 Unused           : 0y0
   +0x000 Write            : 0y1
   +0x000 PageFrameNumber  : 0y000000000000000000111111100011101010 (0x3f8ea)
   +0x000 ReservedForHardware : 0y0000
   +0x000 ReservedForSoftware : 0y0000
   +0x000 WsleAge          : 0y1010
   +0x000 WsleProtection   : 0y000
   +0x000 NoExecute        : 0y1
1: kd> dt -p nt!_MMPTE_HARDWARE @@(0x00000000bd6de7f8)
   +0x000 Valid            : 0y1
   +0x000 Dirty1           : 0y1
   +0x000 Owner            : 0y1
   +0x000 WriteThrough     : 0y0
   +0x000 CacheDisable     : 0y0
   +0x000 Accessed         : 0y1
   +0x000 Dirty            : 0y1
   +0x000 LargePage        : 0y0
   +0x000 Global           : 0y0
   +0x000 CopyOnWrite      : 0y0
   +0x000 Unused           : 0y0
   +0x000 Write            : 0y1
   +0x000 PageFrameNumber  : 0y000000000000000000111111100011101010 (0x3f8ea)
   +0x000 ReservedForHardware : 0y0000
   +0x000 ReservedForSoftware : 0y0000
   +0x000 WsleAge          : 0y1010
   +0x000 WsleProtection   : 0y000
   +0x000 NoExecute        : 0y1

Other resources I didn't find a place for but still wanted to include