ubuysa
The BSOD Doctor
Driver management is an aspect of Windows support that seems to cause a fair few problems. That may be because the function and operation of drivers is not well understood, or because driver support seems to be some sort of 'dark art'.
Driver problems are the most common cause of BSODs, yet a driver-caused BSOD can be tricky to diagnose. It's often not possible for example, to point with any confidence at the driver that actually caused the BSOD. In addition, in the vast majority of driver-caused BSODs, you need to be fairly adept at using the Windows debugging tools, and have a good understanding of the internals of I/O operation in Windows and its related control-block structures, to accurately identify the failing driver - and that of course assumes you have a kernel dump file to analyse in the first place.
That's not to say that all driver problems result in a BSOD, driver management would be much simpler if they did! Driver problems can also cause system crashes, hangs, black screens, and of course a myriad of niggly issues with the device(s) they manage.
Because of the above it's probably worthwhile spending some time talking about what drivers are, about what they do, and about why driver failures often result in a BSOD. We should also spend some time talking about what you can do to reduce the risk of a driver-caused BSOD (or indeed any driver-caused failure) and what simple and logical steps you can take to identify the failing driver when you do get problems.
Drivers are an integral part of the Windows I/O Subsystem, so perhaps the first thing we should do is define what we mean by I/O. I/O stands for input/output and everything that goes on in your PC outside of the CPU and RAM is I/O. When we talk about 'input' we mean input into the CPU/RAM, and when we talk about 'output' we mean output from the CPU/RAM. Without I/O capability your PC would be a useless box. The keyboard and mouse are I/O devices, the monitor is an I/O device, even your disk drives are I/O devices.
Every I/O device need a driver to manage it, sometimes these 'device drivers' are part of the Windows I/O subsystem (the basic mouse and keyboard drivers for example), sometimes they are drivers written and provided by Microsoft (the CD/DVD drivers for example), and quite often they are provided by the vendor that created the device (motherboard drivers for example).
How I/O Works
Drivers handle most of the processing involved in doing all I/O operations, so it's well worth looking at a simple overview of how an I/O operation is handled by Windows to see where drivers fit in. As an example we'll take a simple read operation from a file on a disk, initiated by an ordinary user application (this explanation has been greatly simplified)...
Our user application's view of the file it's using is as a sequential list of records that exist 'somewhere', and the application now wants record number 237 (for example). It thus allocates a buffer in virtual storage to hold the record and issues a read request for record 237 (in the specified file) and this is passed to the I/O Manager in the Windows kernel.
The I/O Manager does some basic error checking to make sure the I/O request is valid and complete, and it then passes the I/O request to the appropriate driver for the device on which the specified file resides (a disk drive in this case). At this point the originating application's thread is placed in a wait state, waiting on a specific event - the completion of this I/O operation.
The device driver for the disk (running in kernel mode) does some more checking of the request using its intimate knowledge of the device (like ensuring that the allocated buffer is big enough to hold the record, for example) and then translates the application's record 237 into an actual data block in a specific track and sector on a specific disk. If the required disk is free (ie. not in the middle of another I/O operation) the driver communicates with the disk using the appropriate disk hardware ports, registers, and commands to instruct it to read the required data block from the specified track and sector. At this point the device driver exits and a new thread is dispatched on this CPU.
A hard disk would now move the read/write heads to the required track and wait for the wanted sector to rotate under them (these are the seek and latency times of hard disks, this is why they are so slow). An SSD would just electronically switch to the appropriate block, a process that's very fast...
As the wanted data block passes under the read/write heads (or is electronically selected on an SSD) the disk controller copies the data off the disk surface (or from the SSD cells) and into the application's buffer (using Direct Memory Access - DMA). When that's done the disk controller raises an interrupt. Interrupts are hardware signals that cause a CPU to stop executing the current thread (its status is saved) and switch the CPU to begin executing the interrupt routine in the device driver for the device that raised the interrupt. In our example this will be the disk device driver (the same driver as earlier).
The interrupt routine in the disk device driver checks with the disk controller that the data has been located and copied and then 'posts' the wait event, in other words it signals that the event the application thread is waiting on has completed. The application's thread is now marked as ready and will go on a CPU ready queue to be dispatched. The device driver now exits and the I/O is complete.
When the application's thread is next dispatched on a CPU the contents of record 237 are now magically present in the buffer and the application can begin to process it.
There are a couple of key things to note from the above...
1. Drivers run in kernel mode. (Some simple drivers, like printer and scanner drivers, can run in user mode).
2. The driver (and the device) do all the heavy lifting in an I/O operation.
The first of these observations, drivers running in kernel mode, is a huge deal because in kernel mode you can execute privileged CPU instructions, access data and code in any address space, and potentially modify the kernel itself. A misbehaving, or a malicious, driver could cause untold damage to your system or hide hard to find malware (keyloggers and the like). In addition, the ability of Windows to recover from misbehaving kernel code is limited, kernel code is supposed to behave itself and obey all the rules, so often the only option Windows has with misbehaving kernel code is to BSOD the system.
The second of these observations, drivers (and the devices) doing all the heavy lifting, means that it is absolutely vital that the driver code installed is designed specifically for the exact device it is managing. Using the wrong driver that doesn't fully understand how to manage the device is going to end in tears (or more likely a BSOD). In addition, many drivers are not written by Microsoft, they are written by the hardware vendors themselves (usually in C and C++) so the quality of the coding can't always be guaranteed. We saw in point 1. above that it's also essential that the driver contains only the code necessary to manage the device and no other 'suspect' code, that's hard to ensure. And remember drivers are kernel code.
It's also worth noting from the above that this I/O was synchronous because the originating application was placed in a wait whilst the I/O was done. Most I/O operations in Windows are synchronous, but Windows does support asynchronous I/O. This is where the application is not placed in a wait state and can continue executing, starting additional I/O operations without waiting for the first to complete. This means the originating application has to check to see whether its I/Os have completed and handle any synchronisation necessary between them. Applications using asynchronous I/O are much more difficult to write of course and they are way more difficult to debug too!
Other Driver Functions
Many drivers can also be used to manage the device; change buffer sizes, turn features on and off, etc. In these cases the management (rather than the I/O) function of a driver is called directly, either from a user application or by a specific management application (and sometimes by Windows applications).
Sometimes the driver itself modifies its behaviour at I/O time, based on the either on the application that's called it or special parameters passed by the calling application. This is done by invoking 'filters' that are part of the driver code itself, either before or after the main driver code is called. Graphics drivers commonly make use of filters to enhance the performance (or user experience) in specific games.
Driver problems are the most common cause of BSODs, yet a driver-caused BSOD can be tricky to diagnose. It's often not possible for example, to point with any confidence at the driver that actually caused the BSOD. In addition, in the vast majority of driver-caused BSODs, you need to be fairly adept at using the Windows debugging tools, and have a good understanding of the internals of I/O operation in Windows and its related control-block structures, to accurately identify the failing driver - and that of course assumes you have a kernel dump file to analyse in the first place.
That's not to say that all driver problems result in a BSOD, driver management would be much simpler if they did! Driver problems can also cause system crashes, hangs, black screens, and of course a myriad of niggly issues with the device(s) they manage.
Because of the above it's probably worthwhile spending some time talking about what drivers are, about what they do, and about why driver failures often result in a BSOD. We should also spend some time talking about what you can do to reduce the risk of a driver-caused BSOD (or indeed any driver-caused failure) and what simple and logical steps you can take to identify the failing driver when you do get problems.
Drivers are an integral part of the Windows I/O Subsystem, so perhaps the first thing we should do is define what we mean by I/O. I/O stands for input/output and everything that goes on in your PC outside of the CPU and RAM is I/O. When we talk about 'input' we mean input into the CPU/RAM, and when we talk about 'output' we mean output from the CPU/RAM. Without I/O capability your PC would be a useless box. The keyboard and mouse are I/O devices, the monitor is an I/O device, even your disk drives are I/O devices.
Every I/O device need a driver to manage it, sometimes these 'device drivers' are part of the Windows I/O subsystem (the basic mouse and keyboard drivers for example), sometimes they are drivers written and provided by Microsoft (the CD/DVD drivers for example), and quite often they are provided by the vendor that created the device (motherboard drivers for example).
How I/O Works
Drivers handle most of the processing involved in doing all I/O operations, so it's well worth looking at a simple overview of how an I/O operation is handled by Windows to see where drivers fit in. As an example we'll take a simple read operation from a file on a disk, initiated by an ordinary user application (this explanation has been greatly simplified)...
Our user application's view of the file it's using is as a sequential list of records that exist 'somewhere', and the application now wants record number 237 (for example). It thus allocates a buffer in virtual storage to hold the record and issues a read request for record 237 (in the specified file) and this is passed to the I/O Manager in the Windows kernel.
The I/O Manager does some basic error checking to make sure the I/O request is valid and complete, and it then passes the I/O request to the appropriate driver for the device on which the specified file resides (a disk drive in this case). At this point the originating application's thread is placed in a wait state, waiting on a specific event - the completion of this I/O operation.
The device driver for the disk (running in kernel mode) does some more checking of the request using its intimate knowledge of the device (like ensuring that the allocated buffer is big enough to hold the record, for example) and then translates the application's record 237 into an actual data block in a specific track and sector on a specific disk. If the required disk is free (ie. not in the middle of another I/O operation) the driver communicates with the disk using the appropriate disk hardware ports, registers, and commands to instruct it to read the required data block from the specified track and sector. At this point the device driver exits and a new thread is dispatched on this CPU.
A hard disk would now move the read/write heads to the required track and wait for the wanted sector to rotate under them (these are the seek and latency times of hard disks, this is why they are so slow). An SSD would just electronically switch to the appropriate block, a process that's very fast...
As the wanted data block passes under the read/write heads (or is electronically selected on an SSD) the disk controller copies the data off the disk surface (or from the SSD cells) and into the application's buffer (using Direct Memory Access - DMA). When that's done the disk controller raises an interrupt. Interrupts are hardware signals that cause a CPU to stop executing the current thread (its status is saved) and switch the CPU to begin executing the interrupt routine in the device driver for the device that raised the interrupt. In our example this will be the disk device driver (the same driver as earlier).
The interrupt routine in the disk device driver checks with the disk controller that the data has been located and copied and then 'posts' the wait event, in other words it signals that the event the application thread is waiting on has completed. The application's thread is now marked as ready and will go on a CPU ready queue to be dispatched. The device driver now exits and the I/O is complete.
When the application's thread is next dispatched on a CPU the contents of record 237 are now magically present in the buffer and the application can begin to process it.
There are a couple of key things to note from the above...
1. Drivers run in kernel mode. (Some simple drivers, like printer and scanner drivers, can run in user mode).
2. The driver (and the device) do all the heavy lifting in an I/O operation.
The first of these observations, drivers running in kernel mode, is a huge deal because in kernel mode you can execute privileged CPU instructions, access data and code in any address space, and potentially modify the kernel itself. A misbehaving, or a malicious, driver could cause untold damage to your system or hide hard to find malware (keyloggers and the like). In addition, the ability of Windows to recover from misbehaving kernel code is limited, kernel code is supposed to behave itself and obey all the rules, so often the only option Windows has with misbehaving kernel code is to BSOD the system.
The second of these observations, drivers (and the devices) doing all the heavy lifting, means that it is absolutely vital that the driver code installed is designed specifically for the exact device it is managing. Using the wrong driver that doesn't fully understand how to manage the device is going to end in tears (or more likely a BSOD). In addition, many drivers are not written by Microsoft, they are written by the hardware vendors themselves (usually in C and C++) so the quality of the coding can't always be guaranteed. We saw in point 1. above that it's also essential that the driver contains only the code necessary to manage the device and no other 'suspect' code, that's hard to ensure. And remember drivers are kernel code.
It's also worth noting from the above that this I/O was synchronous because the originating application was placed in a wait whilst the I/O was done. Most I/O operations in Windows are synchronous, but Windows does support asynchronous I/O. This is where the application is not placed in a wait state and can continue executing, starting additional I/O operations without waiting for the first to complete. This means the originating application has to check to see whether its I/Os have completed and handle any synchronisation necessary between them. Applications using asynchronous I/O are much more difficult to write of course and they are way more difficult to debug too!
Other Driver Functions
Many drivers can also be used to manage the device; change buffer sizes, turn features on and off, etc. In these cases the management (rather than the I/O) function of a driver is called directly, either from a user application or by a specific management application (and sometimes by Windows applications).
Sometimes the driver itself modifies its behaviour at I/O time, based on the either on the application that's called it or special parameters passed by the calling application. This is done by invoking 'filters' that are part of the driver code itself, either before or after the main driver code is called. Graphics drivers commonly make use of filters to enhance the performance (or user experience) in specific games.
Last edited: