ubuysa
The BSOD Doctor
Don't Panic!
Many people panic when they get a BOSD, partly this is because they happen unexpectedly and because they usually happen without warning, but also because many believe they mean you have a hardware problem. The good news is that in the majority of cases a BSOD is a software problem and almost always it's a device driver problem. The bad news is that a BSOD is not normal and should never be ignored but should be investigated and corrected.
What causes a BSOD
BSODs are caused when the Windows kernel detects an unrecoverable error. Because the kernel runs at a privileged level (called kernel-mode) it typically can't use the normal error recovery processes to recover, all it can do is halt the system to prevent a potential catastrophic cascade of errors that may cause damage to critical user data. One of the basic design rules of kernel-mode code is that it must stick to a set of very rigid rules, one of which is that it must behave predictably, if it doesn't do that then the kernel will halt the system. To give you a fighting chance at establishing what error caused the kernel to halt the system, the kernel does a couple of things before it halts...
1. It writes a memory dump, this is a copy of the contents of RAM at the time of the error that we can use to identify the cause of the problem. At the time of the BSOD the memory dump is written directly to the paging file, this may take some time to complete. When the system restarts after a BSOD, the Session Manager checks the paging file to see whether it contains a dump and if it does then it's copied out to a regular disk file, by default that's C:\Windows\Memory.dmp, and this too may take some time to complete.
2. It displays the Blue Screen Of Death with a stop code (IRQL_NOT_LESS_OR_EQUAL, for example) and then halts the system. The Stop Code is useful because it gives a general indication of what went wrong. Often we can make a pretty accurate guess at what went wrong just from the Stop Code - the example above is almost always a faulty driver for example.
Ensuring that your system writes a memory dump
The memory dump is the critical source of debugging information that allows the cause of the BSOD to be determined, so it's essential that your system is able to write a dump in the event of a BSOD.
A memory dump will only be written if a paging file exists and is large enough to hold it. A popular myth in these days of 16GB or more of installed RAM is that a paging file is no longer required because the system will never page out. Wrong! If there is no paging file the system can never write a memory dump. Another paging file myth is that you need to set the size yourself (1.5*RAM size is a commonly suggested size). Wrong again! The 'best' value for the paging file size is to let Windows manage it (a system managed paging file). Set this by entering sysdm.cpl in the Run command box, click the Advanced tab, then the Performance Settings button and then the Advanced tab in there. Click the Change button and ensure that the 'Automatically manage paging file size for all drives' checkbox is checked.
Your system must also be configured to write a memory dump if a BSOD occurs (by default it will be). To check, enter sysdm.cpl in the Run command box, click the Advanced tab, then the Start-up and Recovery Settings button. Ensure that the 'Write debugging information' pull-down menu is set to 'Automatic memory dump' and the 'Overwrite any existing file' checkbox is checked. The dump file location should be %SystemRoot%\Memory.dmp - it could actually be any file name in any folder on any drive, but the folder must exist and it must be on a drive that is online at boot time. It's wise to leave it as is because that's where everyone will look for a memory dump. Ensure that the 'Disable automatic deletion of memory dumps when disk space is low' is unchecked, you don't want the dump being deleted and your disk should never be so full that it's ever necessary for Windows to consider deleting it.
A note on the different types of memory dump:
There are five different types of dump you can select here (six if you include 'None')...
Small Memory Dump (256k) - these are known as Minidumps and they're written to the folder C:\Windows\Minidumps. They contain the active parts of the user address space and only those parts of the kernel space that were in use by that user. They are better than a poke in the eye with a sharp stick, but they don't contain all the kernel data areas necessary for a full analysis. Don't select this option.
Kernel memory dump - These contain the entire kernel space but none of the user address space that generated the error. These are the dumps we want to see when debugging a BSOD (because it's only the kernel that causes BSODs). The reason we don't suggest using this option is because the (recommended) Automatic memory dump option also takes a kernel dump but has more leeway in managing the paging file size to hold it. Don't select this option.
Complete memory dump - These contain the entire contents of memory at the time of the error. This means they are guaranteed to hold all the information necessary to debug the problem, but they are huge - the size of in-use RAM. Since we're only interested in the kernel space when debugging BSODs there is no point in taking the time or space to write these dumps (they'd take an age to upload in any case). Don't select this option.
Active memory dump - This was introduced in Windows 10 and is similar to the complete memory dump. However the active memory dump filters out pages that are not likely to be useful in debugging the problem, so these dumps are smaller. For example, on a system running multiple virtual machines (VMs) an active memory dump of the host would not include the memory being used by the VMs, whereas a complete memory dump would. Don't select this option.
It should also be mentioned that even after you have configured the system to take an automatic memory dump (ie. kernel dumps) you may still see minidumps appearing in the C:\Windows\Minidumps folder - you may even see a minidump and a kernel dump for the same BSOD. Minidumps are often taken 'on the fly' by Windows system services, by some device drivers, and often by user-mode code, to write a small memory dump even if the error can be recovered and processing can continue. A kernel dump is always preferred to a minidump for BSOD debugging, but if all you have are minidumps then they're better than nothing.
Note also that there can only be one kernel dump in C:\Windows\Memory.dmp at a time, if you take another BSOD the original dump file will be overwritten (and if you uncheck the 'Overwrite any existing file' checkbox and the C:\Windows\Memory.dmp file already exists then no new dump will be written!). Once you have taken a BSOD check to see whether a kernel dump was written to C:\Windows\Memory.dmp (check the file date and time) and if the dump is relevant then copy it to another folder somewhere so that you have it (and others) available for later analysis if necessary.
Last updated by @ubuysa on 16-02-2021
Many people panic when they get a BOSD, partly this is because they happen unexpectedly and because they usually happen without warning, but also because many believe they mean you have a hardware problem. The good news is that in the majority of cases a BSOD is a software problem and almost always it's a device driver problem. The bad news is that a BSOD is not normal and should never be ignored but should be investigated and corrected.
What causes a BSOD
BSODs are caused when the Windows kernel detects an unrecoverable error. Because the kernel runs at a privileged level (called kernel-mode) it typically can't use the normal error recovery processes to recover, all it can do is halt the system to prevent a potential catastrophic cascade of errors that may cause damage to critical user data. One of the basic design rules of kernel-mode code is that it must stick to a set of very rigid rules, one of which is that it must behave predictably, if it doesn't do that then the kernel will halt the system. To give you a fighting chance at establishing what error caused the kernel to halt the system, the kernel does a couple of things before it halts...
1. It writes a memory dump, this is a copy of the contents of RAM at the time of the error that we can use to identify the cause of the problem. At the time of the BSOD the memory dump is written directly to the paging file, this may take some time to complete. When the system restarts after a BSOD, the Session Manager checks the paging file to see whether it contains a dump and if it does then it's copied out to a regular disk file, by default that's C:\Windows\Memory.dmp, and this too may take some time to complete.
2. It displays the Blue Screen Of Death with a stop code (IRQL_NOT_LESS_OR_EQUAL, for example) and then halts the system. The Stop Code is useful because it gives a general indication of what went wrong. Often we can make a pretty accurate guess at what went wrong just from the Stop Code - the example above is almost always a faulty driver for example.
Ensuring that your system writes a memory dump
The memory dump is the critical source of debugging information that allows the cause of the BSOD to be determined, so it's essential that your system is able to write a dump in the event of a BSOD.
A memory dump will only be written if a paging file exists and is large enough to hold it. A popular myth in these days of 16GB or more of installed RAM is that a paging file is no longer required because the system will never page out. Wrong! If there is no paging file the system can never write a memory dump. Another paging file myth is that you need to set the size yourself (1.5*RAM size is a commonly suggested size). Wrong again! The 'best' value for the paging file size is to let Windows manage it (a system managed paging file). Set this by entering sysdm.cpl in the Run command box, click the Advanced tab, then the Performance Settings button and then the Advanced tab in there. Click the Change button and ensure that the 'Automatically manage paging file size for all drives' checkbox is checked.
Your system must also be configured to write a memory dump if a BSOD occurs (by default it will be). To check, enter sysdm.cpl in the Run command box, click the Advanced tab, then the Start-up and Recovery Settings button. Ensure that the 'Write debugging information' pull-down menu is set to 'Automatic memory dump' and the 'Overwrite any existing file' checkbox is checked. The dump file location should be %SystemRoot%\Memory.dmp - it could actually be any file name in any folder on any drive, but the folder must exist and it must be on a drive that is online at boot time. It's wise to leave it as is because that's where everyone will look for a memory dump. Ensure that the 'Disable automatic deletion of memory dumps when disk space is low' is unchecked, you don't want the dump being deleted and your disk should never be so full that it's ever necessary for Windows to consider deleting it.
A note on the different types of memory dump:
There are five different types of dump you can select here (six if you include 'None')...
Small Memory Dump (256k) - these are known as Minidumps and they're written to the folder C:\Windows\Minidumps. They contain the active parts of the user address space and only those parts of the kernel space that were in use by that user. They are better than a poke in the eye with a sharp stick, but they don't contain all the kernel data areas necessary for a full analysis. Don't select this option.
Kernel memory dump - These contain the entire kernel space but none of the user address space that generated the error. These are the dumps we want to see when debugging a BSOD (because it's only the kernel that causes BSODs). The reason we don't suggest using this option is because the (recommended) Automatic memory dump option also takes a kernel dump but has more leeway in managing the paging file size to hold it. Don't select this option.
Complete memory dump - These contain the entire contents of memory at the time of the error. This means they are guaranteed to hold all the information necessary to debug the problem, but they are huge - the size of in-use RAM. Since we're only interested in the kernel space when debugging BSODs there is no point in taking the time or space to write these dumps (they'd take an age to upload in any case). Don't select this option.
Active memory dump - This was introduced in Windows 10 and is similar to the complete memory dump. However the active memory dump filters out pages that are not likely to be useful in debugging the problem, so these dumps are smaller. For example, on a system running multiple virtual machines (VMs) an active memory dump of the host would not include the memory being used by the VMs, whereas a complete memory dump would. Don't select this option.
It should also be mentioned that even after you have configured the system to take an automatic memory dump (ie. kernel dumps) you may still see minidumps appearing in the C:\Windows\Minidumps folder - you may even see a minidump and a kernel dump for the same BSOD. Minidumps are often taken 'on the fly' by Windows system services, by some device drivers, and often by user-mode code, to write a small memory dump even if the error can be recovered and processing can continue. A kernel dump is always preferred to a minidump for BSOD debugging, but if all you have are minidumps then they're better than nothing.
Note also that there can only be one kernel dump in C:\Windows\Memory.dmp at a time, if you take another BSOD the original dump file will be overwritten (and if you uncheck the 'Overwrite any existing file' checkbox and the C:\Windows\Memory.dmp file already exists then no new dump will be written!). Once you have taken a BSOD check to see whether a kernel dump was written to C:\Windows\Memory.dmp (check the file date and time) and if the dump is relevant then copy it to another folder somewhere so that you have it (and others) available for later analysis if necessary.
Last updated by @ubuysa on 16-02-2021
Last edited: