Troubleshooting general kernel-mode memory corruption with Driver Verifier

  • 7010703
  • 28-Aug-2012
  • 08-Jul-2016

Environment

Novell Client 2 SP2 for Windows 7
Novell Client 2 SP2 for Windows Vista
Novell Client 2 SP2 for Windows 2008
Novell Client 2 SP2 for Windows 2008 R2

Situation

General kernel-mode memory corruption

System crash (BSOD) (Blue Screen) with errors such as:
DRIVER_CORRUPTED_EXPOOL (0xC5)
DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)

Some kernel-mode crashes occur at "random locations", which is consistent with Windows kernel-mode memory corruption where simply whatever "victim" happens to get it's memory corrupted is what crashes, as opposed to the crash being in an actual malfunctioning driver. Troubleshooting such issues requires following Microsoft's recommendations on tracking down kernel-mode memory corruption.

Resolution

The Microsoft Driver Verifier tool which can be used to trap the activities of a specific suspected driver.  The Driver Verifier tool can be used to track whether Novell Client code can be determined to be corrupting memory.

To enable Driver Verifier for the Novell Client for trapping potential memory corruption on Windows 7 platforms, use the following steps:

1. On a Windows machine where this issue has been occurring, right-click the Windows "Command Prompt" shortcut and select "Run as Administrator" in order to have full Administrators permissions.

2. In the elevated Command Prompt session, execute "VERIFIER.EXE".  This will launch an application named "Driver Verifier Manager".

3. On the "Select a task" list, select "Create custom settings (for code developers)" and press Next.

4. On the "Create custom settings" list, select "Select individual settings from a full list" and press Next.

5. On the "Select individual settings from this full list" page, leave EVERYTHING de-selected except "Pool tracking", "Special pool", and "Force IRQL checking"; all three of which should be checked.  Press Next.
 
Note: Enabling "Deadlock Detection" will create false-positives where Driver Verifier will trap on conditions which "may" represent problems but are not actually failure scenarios, and therefore does not aid in the effort of "trap when a driver is actually misbehaving."  Driver Verifier is simply tracking "Lock B was acquired after Lock A", without knowledge of whether any other mechanisms prevented that from being an actual problem.  For the case of a deadlock, Novell Client engineering will only look at dumps of an actually deadlocked machine; not dumps of a machine where Driver Verifier was running with "Deadlock Detection" enabled.
 
Note: Enabling "I/O Verification" will unfortunately enable a check that will blue screen the machine unless a kernel-mode debugger is attached.  The check is raising an assertion because the Novell-specific error values -- e.g. 0x8801, 0x899A, etc. -- do not conform to what Windows expects as an error value (e.g. 0xC0000005, 0x80070001, etc.).  This is not actually any kind of fatal or problem condition; the software sending and receiving this status knows exactly what it means.  And when a kernel-mode debugger is attached, Microsoft appropriately allows you to simply "ignore" this Driver Verifier assertion.  But when enabling Driver Verifier on a machine without a kernel mode debugger attached, this check will force the machine to blue screen under otherwise normal circumstances, and therefore does not aid in the effort of "trap when a driver is actually misbehaving."

6. On the "Select what drivers to verify" list, select "Select driver names from a list" and press Next.

7. Once Driver Verifier Manager presents the list of drivers, select all of the Novell Client drivers.  Not all of these drivers are actually involved in the file system access & delete file or directory code paths, but go ahead and enable Driver Verifier for all of these drivers just so that all Novell Client code can be monitored and ruled out:

ncrecognizer.sys, ncfilter.sys, ncuncfilter.sys, nicm.sys, ncioctl.sys, ncpl.sys, nsns.sys, nipctl.sys, nscm.sys, ncfsd.sys, ncp.sys, niam.sys, nsvccost.sys, nciom.sys, xtxplat.sys, ndm.sys, ndmndap.sys, ncpfsp.sys, nccache.sys

Drivers are selected by putting a checkmark in the "Verify?" column beside each of the drivers in the list Driver Verifier presented.  Once all the indicated drivers have been selected, press Finish.

8. Now reboot the machine, and then login and use the system normally.  The system will likely be noticeably slower, particular during Novell Client operations, due to the overhead introduced by the Driver Verifier checks that were enabled and the extra memory consumed by Special Pool usage.

At this point, you're now attempting to and/or waiting to reproduce the issue again.  The intention being, if one of the Novell Client drivers are responsible for corrupting or misusing the kernel-mode memory, the Driver Verifier checks are now attempting to trap that mis-use when it occurs, rather than allowing the corruption to happen and some "random victim" to crash when trying to use the corrupted memory later.

If the machine blue screens while deleting a file or directory after Driver Verifier is enabled, the evidence we want to collect is still the memory dump generated by Windows.  If the issue occurred and wasn't due to the Novell Client, the crash will again be in just "some random location" because we didn't catch the code that was actually corrupting memory.  If Driver Verifier does catch misuse of memory, the bugcheck reported will be a Driver Verifier-specific bugcheck code different from the bugchecks which were being reported before.

In either case, we want to have the customer .ZIP and upload the resulting dump, so that we can verify where exactly an issue was caught, which drivers were enabled for Driver Verifier, and whether any Driver Verifier-specific condition was involved or not.

Additional Information

If the Driver Verifier tool ends up not indicating any Novell Client-specific cause of the memory corruption, the next  recommendation will be to have the customer contact Microsoft for additional expertise on trapping kernel-mode memory corruption and/or additional knowledge of known issues for other non-Novell Client software present in the customer's environment.