Manually analyzing the stack is often the most powerful technique when dealing with crashes such as these. Typically, this involves dumping the current stack pointer register (for example, esp and rsp on x86 and x64 processors, respectively). However, because the code responsible for crashing the system itself might modify the stack in ways that make analysis difficult, the processor responsible for crashing the system provides a backing store for the current data in the stack, called KiPreBugcheckStackSaveArea, which contains a copy of the stack before any code in KeBugCheckEx executes. By using the dps (dump pointer with symbols) command in the debugger, you can dump this area (instead of the CPU’s stack pointer register) and resolve symbols in an attempt to discover any potential stack traces. In this crash, here’s what dumping the stack area eventually revealed on a 32-bit system:0: kd> dps KiPreBugcheckStackSaveArea KiPreBugcheckStackSaveArea+3000
81d7dd20 881fcc44
81d7dd24 98fcf406 myfault+0x406
81d7dd28 badb0d00
Although this data was located among many other different functions, it is of special interest because it mentions a function in the Myfault driver, which as we’ve seen was currently executing an IRP, that doesn’t show on the stack. For more information on manual stack analysis, see the Debugging Tools for Windows help file and the additional resources referenced later in this chapter.
Hung or Unresponsive Systems
If a system becomes unresponsive (that is, you are receiving no response to keyboard or mouse input), the mouse freezes, or you can move the mouse but the system doesn’t respond to clicks, the system is said to have hung . A number of things can cause the system to hang:
A device driver does not return from its interrupt service (ISR) routine or deferred procedure call (DPC) routine
A high priority real-time thread preempts the windowing system driver’s input threads
A deadlock (when two threads or processors hold resources each other wants and neither will yield what they have) occurs in kernel mode
You can check for deadlocks by using the Driver Verifier option called deadlock detection . Deadlock detection monitors the use of spinlocks, mutexes, and fast mutexes, looking for patterns that could result in a deadlock. (For more information on these and other synchronization primitives, see Chapter 3 in Part 1.) If one is found, Driver Verifier crashes the system with an indication of which driver causes the deadlock. The simplest form of deadlock occurs when two threads hold resources each other thread wants and neither will yield what they have or give up waiting for the one they want. The first step to troubleshooting hung systems is therefore to enable deadlock detection on suspect drivers, then unsigned drivers, and then all drivers, until you get a crash that pinpoints the driver causing the deadlock.
There are two ways to approach a hanging system so that you can apply the manual crash troubleshooting techniques described in this chapter to determine what driver or component is causing the hang: the first is to crash the hung system and hope that you get a dump that you can analyze, and the second is to break into the system with a kernel debugger and analyze the system’s activity. Both approaches require prior setup and a reboot. You use the same exploration of system state with both approaches to try to determine the cause of the hang.
To manually crash a hung system, you must first add the DWORD registry value HKLM\SYSTEM\CurrentControlSet\Services\i8042prt\Parameters\CrashOnCtrlScroll and set it to 1. After rebooting, the i8042 port driver, which is the port driver for PS/2 keyboard input, monitors keystrokes in its ISR (discussed further in Chapter 3 in Part 1) looking for two presses of the Scroll Lock key while the right Control key is depressed. When the driver sees that sequence, it calls KeBugCheckEx with the MANUALLY_INITIATED_CRASH (0xE2) stop code that indicates a manually initiated crash. When the system reboots, open the crash dump file and apply the techniques mentioned earlier to try to determine why the system was hung (for example, determining what thread was running when the system hung, what the kernel stack indicates was happening, and so on). Note that this works for most hung system scenarios, but it won’t work if the i8042 port driver’s ISR doesn’t execute. (The i8042 port driver’s ISR won’t execute if all processors are hung as a result of their IRQL being higher than the ISR’s IRQL, or if corruption of system data structures extends to interrupt-related code or data.)
Note