DPC_WATCHDOG_VIOLATION ERROR

garvic

New member
Hi all, I'm using Cinema4D for some 3D work, and while rendering (a particularly GPU heavy task), my machine is freezing up, giving me the dreaded blue screen and the DPC_WATCHDOG_VIOLATION error. I've done as much as I know to do (which isn't a lot), but have found out a few things. One of the fixes is apparently updating the IDE ATA/ATAPI controller drivers. That didn't work, but one of the four Standard SATA AHCI Controllers shows an interesting log in the events window of its properties. About 6 days ago (just about when this erroring began), there's a 'Device configured (mshdc.inf)' entry, followed a second later by a 'Device not started (storahci)' entry. If I open this up, it tells me: 'Device PCI\VEN_1022&DEV_7901&SUBSYS_87C01043&REV_51\4&9b70dcb&0&0043 had a problem starting.' If I open that up further, it gives me an Event ID of 411.

I found out elsewhere that this Event ID is linked to the WATCHDOG VIOLATION error.

I also think this error was mentioned on the blue screen 'nvlddmkm.sys', but I need it to crash again to double check.

None of this means anything to me, so if there's anyone able to lend a hand, I'd be so grateful.

I'm on up to date Windows 10, with 2 x NVidia GeForce RTX 2080Ti Graphics Cards (also up to date), and AMD Ryzen 9
 

SpyderTracks

We love you Ukraine
Hi all, I'm using Cinema4D for some 3D work, and while rendering (a particularly GPU heavy task), my machine is freezing up, giving me the dreaded blue screen and the DPC_WATCHDOG_VIOLATION error. I've done as much as I know to do (which isn't a lot), but have found out a few things. One of the fixes is apparently updating the IDE ATA/ATAPI controller drivers. That didn't work, but one of the four Standard SATA AHCI Controllers shows an interesting log in the events window of its properties. About 6 days ago (just about when this erroring began), there's a 'Device configured (mshdc.inf)' entry, followed a second later by a 'Device not started (storahci)' entry. If I open this up, it tells me: 'Device PCI\VEN_1022&DEV_7901&SUBSYS_87C01043&REV_51\4&9b70dcb&0&0043 had a problem starting.' If I open that up further, it gives me an Event ID of 411.

I found out elsewhere that this Event ID is linked to the WATCHDOG VIOLATION error.

I also think this error was mentioned on the blue screen 'nvlddmkm.sys', but I need it to crash again to double check.

None of this means anything to me, so if there's anyone able to lend a hand, I'd be so grateful.

I'm on up to date Windows 10, with 2 x NVidia GeForce RTX 2080Ti Graphics Cards (also up to date), and AMD Ryzen 9
Hiya, is this a PCSpecialist system?

Could you post your full specs from the order page?
 

garvic

New member
Hiya, is this a PCSpecialist system?

Could you post your full specs from the order page?

Yes no problem, here they are:

1667252378688.png
 

ubuysa

The BSOD Doctor
Please upload the memory dumps and event logs as described in https://www.pcspecialist.co.uk/forums/threads/when-youre-seeking-help-with-a-bsod.71885/#post-568901 and I'll take a look for you.

The VEN_1022&DEV_7901 device is a SATA (ACHI) controller, but that is not going to be the problem because that's a Microsoft driver and (since you bought a pre-installed copy of Windows) it's not going to be a BIOS setting issue either. I have seen several niggly issues fixed by re-seating the M.2 drive, so I'd suggest you remove the drive and then re-seat it fully. See whether that helps.

Do please upload the minidumps and event logs as described in the link above. If you can, it will be worth uploading the kernel dump too, that is in the file C:\Windows\Memory.dmp. You'll find that it's to large to upload here, so please upload it to the cloud and post a link to it here.

For info; a DPC is a Deferred Procedure Call, these are the 'back-end' of a driver. When a device interrupt occurs the ISR (Interrupt Service Routine) in the appropriate driver is immediately executed. All the ISR does is save device status, save pointers to buffers, and schedule a DPC. DPCs are placed at the end of the ready queue, so they are dispatched after all higher priority work. The DPC reads the device status, accesses the buffers and reads the data.

Since running a DPC is effectively a hardware operation, and since it's holding a processor when higher priority work may have arrived on the ready queue, there is a limit on the length of time a DPC can run (from memory it's 100ms). The DPC Watchdog checks that no DPC runs longer than this and BSODs if one does. The reason for the BSOD is because a DPC has stepped outside it's limits and we have no idea whather damage to user data may occur as a result - so to protect your data the watchdog BSODs.
 
Last edited:

garvic

New member
Please upload the memory dumps and event logs as described in https://www.pcspecialist.co.uk/forums/threads/when-youre-seeking-help-with-a-bsod.71885/#post-568901 and I'll take a look for you.

The VEN_1022&DEV_7901 device is a SATA (ACHI) controller, but that is not going to be the problem because that's a Microsoft driver and (since you bought a pre-installed copy of Windows) it's not going to be a BIOS setting issue either. I have seen several niggly issues fixed by re-seating the M.2 drive, so I'd suggest you remove the drive and then re-seat it fully. See whether that helps.

Do please upload the minidumps and event logs as described in the link above. If you can, it will be worth uploading the kernel dump too, that is in the file C:\Windows\Memory.dmp. You'll find that it's to large to upload here, so please upload it to the cloud and post a link to it here.

For info; a DPC is a Deferred Procedure Call, these are the 'back-end' of a driver. When a device interrupt occurs the ISR (Interrupt Service Routine) in the appropriate driver is immediately executed. All the ISR does is save device status, save pointers to buffers, and schedule a DPC. DPCs are placed at the end of the ready queue, so they are dispatched after all higher priority work. The DPC reads the device status, accesses the buffers and reads the data.

Since running a DPC is effectively a hardware operation, and since it's holding a processor when higher priority work may have arrived on the ready queue, there is a limit on the length of time a DPC can run (from memory it's 100ms). The DPC Watchdog checks that no DPC runs longer than this and BSODs if one does. The reason for the BSOD is because a DPC has stepped outside it's limits and we have no idea whather damage to user data may occur as a result - so to protect your data the watchdog BSODs.
Thanks so much for looking into this for me. Here's a link to the minidumps (only 2 of which seem to have any data in):
Also, here are the application and system logs:

I've also attached the system info, however the command for the driver list didn't work for me.


In the mean time I'll reseat the M2 as suggested,

thanks again!
 
Last edited:

ubuysa

The BSOD Doctor
All but two of the minidumps are corrupt - that makes me think SSD drive even more....

Of the two dumps that are complete, the only third-party driver on the call stack is nvlddmkm.sys, the Nvidia graphics driver. The version you have is probably the latest but it would be worth checking...
Rich (BB code):
14: kd> lmDvmnvlddmkm
Browse full module list
start             end                 module name
fffff803`72530000 fffff803`75c88000   nvlddmkm T (no symbols)        
    Loaded symbol image file: nvlddmkm.sys
    Image path: \SystemRoot\System32\DriverStore\FileRepository\nv_dispsig.inf_amd64_766052fd974747a3\nvlddmkm.sys
    Image name: nvlddmkm.sys
    Browse all global symbols  functions  data
    Timestamp:        Thu Oct 13 01:11:46 2022 (63473BA2)
    CheckSum:         03656BE0
    ImageSize:        03758000
    Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4
    Information from resource tables:

I don't know what impact having two graphics cards might have, hardware is not my thing, others will know better. These two BSODs can be laid at the door of the graphics driver (or graphics card) however, it was this driver's DPC that ran for too long in the DPC_WATCHDOG_VIOLATION BSOD (which could be because of a graphics card hardware issue)

Looking at your System log, there are some worrying WHEA warning messages. WHEA is the Windows Hardware Error Architecture and these messages (which repeat again and again) are both related to (one of) the Nvida graphics cards...
Rich (BB code):
Log Name:      System
Source:        Microsoft-Windows-WHEA-Logger
Date:          31/10/2022 18:27:57
Event ID:      17
Task Category: None
Level:         Warning
Keywords:   
User:          LOCAL SERVICE
Computer:      DESKTOP-S41R5BE
Description:
A corrected hardware error has occurred.

Component: PCI Express Endpoint
Error Source: Advanced Error Reporting (PCI Express)

Primary Bus:Device:Function: 0xB:0x0:0x2
Secondary Bus:Device:Function: 0x0:0x0:0x0
Primary Device Name:PCI\VEN_10DE&DEV_1AD6&SUBSYS_250319DA&REV_A1
Secondary Device Name:

Note that they report a hardware error that was corrected, but that's worrying given that they repeat many times. VEN_10DE is Nvidia and DEV_1AD6 is an Nvidia USB 3.1 Host Controller. I would assume that's part of (one of) the graphics card?

We also see a huge number of these WHEA warnings...
Rich (BB code):
Log Name:      System
Source:        Microsoft-Windows-WHEA-Logger
Date:          31/10/2022 18:26:21
Event ID:      17
Task Category: None
Level:         Warning
Keywords:   
User:          LOCAL SERVICE
Computer:      DESKTOP-S41R5BE
Description:
A corrected hardware error has occurred.

Component: PCI Express Legacy Endpoint
Error Source: Advanced Error Reporting (PCI Express)

Primary Bus:Device:Function: 0xA:0x0:0x0
Secondary Bus:Device:Function: 0x0:0x0:0x0
Primary Device Name:PCI\VEN_10DE&DEV_1E04&SUBSYS_250319DA&REV_A1
Secondary Device Name:

Another corrected hardware error warning, but these messages keep repeating too. VEN_10DE is Nvidia as we've seen, DEV_1E04 is the RTX 2080Ti card itself.

The sheer number of these WHEA warning messages hints that not all is well with one (or both) of the graphics cards. What that might be, I don't know.

We also see a couple of recent messages relating to a disk drive - possibly the M.2 drive, I can't tell from the dump...
Rich (BB code):
Log Name:      System
Source:        disk
Date:          31/10/2022 23:52:00
Event ID:      153
Task Category: None
Level:         Warning
Keywords:      Classic
User:          N/A
Computer:      DESKTOP-S41R5BE
Description:
The IO operation at logical block address 0x5cae48 for Disk 1 (PDO name: \Device\00000045) was retried.

Again, this is a warning that an I/O operation (a read/write) failed and was retried successfully. If this is your M.2 SSD then re-seating it is definitely worthwhile.

The other is this one...
Rich (BB code):
Log Name:      System
Source:        storahci
Date:          31/10/2022 23:52:00
Event ID:      129
Task Category: None
Level:         Warning
Keywords:      Classic
User:          N/A
Computer:      DESKTOP-S41R5BE
Description:
Reset to device, \Device\RaidPort3, was issued.

Again this is only a warning messages indicating that a disk device was reset. I have no way of knowing whether this is your M.2 SSD - but I feel that it probably is.

Your Application log contains lots of errors for DbxSvc - a Dropbox service - and errors for DtsApo4Service.exe - the DTS sound device, but they're not related to the BSODs.

I think you may have an issue with the M.2 SSD, so a re-seat is a good idea, but I think your BSODs are related to some sort of hardware issue with one or both graphics cards. The System log warning messages point to a developing graphics card issue too.

As I said, hardware is not my think, but perhaps removing one graphics card at a time and running a stress test on each of them on their own might highlight the issue?
 
Last edited:

ubuysa

The BSOD Doctor
In the VIDEO_TDR_FAILURE dump we can clearly see the (a) graphics card being reset in the call stack...
Rich (BB code):
2: kd> knL
 # Child-SP          RetAddr               Call Site
00 ffff9886`4d7b8a18 fffff801`84a9555e     nt!KeBugCheckEx
01 ffff9886`4d7b8a20 fffff801`84a45b04     dxgkrnl!TdrBugcheckOnTimeout+0xfe
02 ffff9886`4d7b8a60 fffff801`84a3e63c     dxgkrnl!ADAPTER_RENDER::Reset+0x174
03 ffff9886`4d7b8a90 fffff801`84a94c85     dxgkrnl!DXGADAPTER::Reset+0x4dc
04 ffff9886`4d7b8b10 fffff801`84a94df7     dxgkrnl!TdrResetFromTimeout+0x15
05 ffff9886`4d7b8b40 fffff801`77c52b65     dxgkrnl!TdrResetFromTimeoutWorkItem+0x27
06 ffff9886`4d7b8b70 fffff801`77c71d25     nt!ExpWorkerThread+0x105
07 ffff9886`4d7b8c10 fffff801`77e005e8     nt!PspSystemThreadStartup+0x55
08 ffff9886`4d7b8c60 00000000`00000000     nt!KiStartSystemThread+0x28

Notice the dxgkrnl!TdrResetFromTimeout function call; dxgkrnl is the DirectX kernel and TDR is Timeout Detection and Recovery, which detects if a graphics operation has timed out. Notice also the dxgkrnl!DXGADAPTER::Reset function call, this is the software reset of the graphics card resulting from the timed out operation. Finally, notice the dxgkrnl!TdrBugcheckOnTimeout function call, having reset the graphics card the operation timed out again and so the system bugchecks and BSODs.

Although probably not related to the BSODs, I would get rid of Bullguard. It has been known to cause BSODs in the past (any third party antivirus tool increases the BSOD risk) and you really don't need it.
 
Last edited:

garvic

New member
In the VIDEO_TDR_FAILURE dump we can clearly see the (a) graphics card being reset in the call stack...
Rich (BB code):
2: kd> knL
 # Child-SP          RetAddr               Call Site
00 ffff9886`4d7b8a18 fffff801`84a9555e     nt!KeBugCheckEx
01 ffff9886`4d7b8a20 fffff801`84a45b04     dxgkrnl!TdrBugcheckOnTimeout+0xfe
02 ffff9886`4d7b8a60 fffff801`84a3e63c     dxgkrnl!ADAPTER_RENDER::Reset+0x174
03 ffff9886`4d7b8a90 fffff801`84a94c85     dxgkrnl!DXGADAPTER::Reset+0x4dc
04 ffff9886`4d7b8b10 fffff801`84a94df7     dxgkrnl!TdrResetFromTimeout+0x15
05 ffff9886`4d7b8b40 fffff801`77c52b65     dxgkrnl!TdrResetFromTimeoutWorkItem+0x27
06 ffff9886`4d7b8b70 fffff801`77c71d25     nt!ExpWorkerThread+0x105
07 ffff9886`4d7b8c10 fffff801`77e005e8     nt!PspSystemThreadStartup+0x55
08 ffff9886`4d7b8c60 00000000`00000000     nt!KiStartSystemThread+0x28

Notice the dxgkrnl!TdrResetFromTimeout function call; dxgkrnl is the DirectX kernel and TDR is Timeout Detection and Recovery, which detects if a graphics operation has timed out. Notice also the dxgkrnl!DXGADAPTER::Reset function call, this is the software reset of the graphics card resulting from the timed out operation. Finally, notice the dxgkrnl!TdrBugcheckOnTimeout function call, having reset the graphics card the operation timed out again and so the system bugchecks and BSODs.

Although probably not related to the BSODs, I would get rid of Bullguard. It has been known to cause BSODs in the past (any third party antivirus tool increases the BSOD risk) and you really don't need it.
This is all super helpful, and there are other signs that may suggest a Graphics card issue. A couple of weeks ago, I had to switch from the DVI port on the back of one of my monitors, to USBC, since it'd stopped recieving that signal. I thought it was a monitor issue, but in hindsight it could be the Graphics card beginning to fail? Also, when it does crash, one monitor (the one still connected by DVI) goes black for 20 seconds before it fully locks up and gives me a blue screen.

I have a guy more capable than myself coming over this evening to reseat the M2, and bring some spare Graphics cards to try isolate the issue. Thank you so much for your insight on this, I'll give you an update once we've taken a look.
 

ubuysa

The BSOD Doctor
This is all super helpful, and there are other signs that may suggest a Graphics card issue. A couple of weeks ago, I had to switch from the DVI port on the back of one of my monitors, to USBC, since it'd stopped recieving that signal. I thought it was a monitor issue, but in hindsight it could be the Graphics card beginning to fail? Also, when it does crash, one monitor (the one still connected by DVI) goes black for 20 seconds before it fully locks up and gives me a blue screen.

I have a guy more capable than myself coming over this evening to reseat the M2, and bring some spare Graphics cards to try isolate the issue. Thank you so much for your insight on this, I'll give you an update once we've taken a look.
Be aware that if a third party messes with it then PCS may void your warranty. Better to have the knowledgeable friend guide you through re-seating it yourself.

Terms & Conditions (my highlighting)...
7.9 We reserve the right to suspend the warranty or refuse service if your Case, Motherboard, CPU or BIOS have been replaced without authorisation.
Any tampering, repair or modification by unauthorised personnel voids the warranty.
 
Top