Switching Off during mid-games

StuAug

Member
Bear with me and be gentle on this one, as I am far from an expert!

Am looking for assistance/advice, as my sons PC tends to switch off when playing his games, especially MS Flight Sim. I have posted the specs below, he says that he has seen the GPU temp go up to 110, but normally sits around 60/70. He has said he has overclocked the GPU by using the AMD software - as he prefers the FPS speed and impact - I have suggested that he reverses that but got the sulks.

There is no warning or beeping, the PC will just switch off.

I have also posted some pics to see if there is anything else in terms of the physical set up.
I've checked all the fans are pointing in the right direction! (probably the best extent I could get to!) I also got him to check and dust what he can get to, and we've lifted it off the floor and put in a normal fan to get air moving.

As it is getting close to Christmas I could look at investing in upgrades if needed...


Spec:
Case
FRACTAL FOCUS G BLACK GAMING CASE (Window)
Processor (CPU)
AMD Ryzen 7 3800X Eight Core CPU (3.9GHz-4.5GHz/36MB CACHE/AM4)
Motherboard
ASUS® TUF X570-PLUS GAMING (USB 3.2 Gen 2, PCIe 4.0, CrossFireX) - ARGB Ready!
Memory (RAM)
64GB Corsair VENGEANCE DDR4 2666MHz (4 x 16GB)
Graphics Card
8GB AMD RADEON™ RX 5700 XT - HDMI
1st M.2 SSD Drive
128GB ADATA SX6000 LITE M.2 2280 (1800 MB/R, 1200 MB/W)
1st Storage Drive
2TB SEAGATE BARRACUDA SATA-III 3.5" HDD, 6GB/s, 7200RPM, 256MB CACHE
Power Supply
CORSAIR 650W TXm SERIES™ SEMI-MODULAR 80 PLUS® GOLD, ULTRA QUIET
Power Cable
1 x 1.5 Metre UK Power Cable (Kettle Lead)
Processor Cooling
Corsair H100i ELITE CAPELLIX RGB Hydro Series High Performance CPU Cooler
Thermal Paste
STANDARD THERMAL PASTE FOR SUFFICIENT COOLING
LED Lighting
50cm ARGB LED Strip
Sound Card
ONBOARD 6 CHANNEL (5.1) HIGH DEF AUDIO (AS STANDARD)
Wireless Network Card
WIRELESS INTEL® Wi-Fi 6 AX200 2,400Mbps/5GHz, 300Mbps/2.4GHz PCI-E CARD + BT 5.0
USB/Thunderbolt Options
MIN. 2 x USB 3.0 & 2 x USB 2.0 PORTS @ BACK PANEL + MIN. 2 FRONT PORTS
 

Attachments

  • Screenshot_20221112-095626_Gallery.jpg
    Screenshot_20221112-095626_Gallery.jpg
    337.4 KB · Views: 91
  • Screenshot_20221112-095615_Gallery.jpg
    Screenshot_20221112-095615_Gallery.jpg
    489.7 KB · Views: 94
  • Screenshot_20221112-095653_Gallery (1).jpg
    Screenshot_20221112-095653_Gallery (1).jpg
    432.8 KB · Views: 91
  • Screenshot_20221112-095635_Gallery (1).jpg
    Screenshot_20221112-095635_Gallery (1).jpg
    363.8 KB · Views: 93
  • Screenshot_20221112-095647_Gallery (1).jpg
    Screenshot_20221112-095647_Gallery (1).jpg
    427.7 KB · Views: 91
  • Screenshot_20221112-095641_Gallery.jpg
    Screenshot_20221112-095641_Gallery.jpg
    449.7 KB · Views: 92

StuAug

Member
read the bsod
 

Attachments

  • 111122-7515-01.zip
    641.3 KB · Views: 141
  • application.zip
    164.3 KB · Views: 123
  • drivers-pcspec.txt
    62.7 KB · Views: 140
  • PCSPECS.zip
    121.6 KB · Views: 145

ubuysa

The BSOD Doctor
Sadly I'm afraid your son is going to have to get the sulks. When dealing with BSODs the very first thing that it's essential to do is to disable all overclocks (and undervolts) so that everything is running at stock. If it doesn't BSOD in that state then it was overclocked (or undervolted) too aggressively. That the GPU temp has gone to 110°C, is IMO of concern, even though AMD state that, although 110°C is tMax for the GPU, it is capable of operating at that temperature. It does suggest to me though, that it may be being driven too hard by the overclock.

The two dumps are inconclusive I'm afraid, both show the problem as originating in a user mode process (possibly the FlightSim game).

One is a CRITICAL_PROCESS_DIED bugcheck, the csrss.exe process (the client server run time process) is the one that died - but it's not possible to debug this BSOD with a mindump (or kernel dump for that matter) because the problem occurred in user mode. IT would have been nice to see the system log for that period but you didn't upload it. The application log only covers the 11th Nov and the dump is from 5th Nov. Please ask your son NOT to clean dump files or event logs when he uses that IOBIT System Care tool to do garbage cleaning.

The other is an APC_INDEX_MISMATCH which is a kernel error, but in this case originated from user mode - specifically the overwolf.exe process. That's a game modding service I think? Again, it would have been nice to see the system log and, again, the application log doesn't go back far enough (It starts at 18:37 on 11th Nov, but the dump is timestamped 18:23 on 11th Nov.

I can see FlightSim crashing in the application log...
Code:
Log Name:      Application
Source:        Application Error
Date:          11/11/2022 18:40:46
Event ID:      1000
Task Category: Application Crashing Events
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      The-Beast
Description:
Faulting application name: FlightSimulator.exe, version: 1.29.28.0, time stamp: 0x00000000
Faulting module name: unknown, version: 0.0.0.0, time stamp: 0x00000000
Exception code: 0xc0000005
Fault offset: 0x00007ff7df16e5f0
Faulting process ID: 0xd78
Faulting application start time: 0x01d8f5ec4d95a5a9
Faulting application path: C:\Program Files\WindowsApps\Microsoft.FlightSimulator_1.29.28.0_x64__8wekyb3d8bbwe\FlightSimulator.exe
Faulting module path: unknown
Report ID: deb9ddb1-71f9-4fbd-9ac5-3c7f327d38e8
Faulting package full name: Microsoft.FlightSimulator_1.29.28.0_x64__8wekyb3d8bbwe
Faulting package-relative application ID: App

But there's no indication in the log as to why.

My advice would be that your son removes all overclocks and undervolts and then see whether it BSODs at stock frequencies and voltages. You need to give it plenty of time to BSOD, despite your son's protests. This is the most important troubleshooting step so give it plenty of time to BSOD. If it doesn't BSOD then the overclock/undervolt was too aggressive.

If it does BSOD at stock frequencies/voltages then ask your son to remove all mods he may have installed in games. Also upload the dumps and event logs (both of them please) and we'll do some more detailed analysis.
 

StuAug

Member
Hello there,

Son speaking, the BSODs have still happened even after removing all the overclocks and the Overwolf software, they happen mostly when playing games but once when browsing the web. There are no new dumps in the C:\Windows\Minidumps folder. I cant attach the System Logs as the file is too large! (Even with the files zipped). The Application Event Viewer logs are attached.
 

Attachments

  • EventViewer.zip
    285 KB · Views: 139
  • PCspecs13.11.zip
    132.9 KB · Views: 143
  • drivers-driverss.zip
    9.1 KB · Views: 161

StuAug

Member
i have managed to force a bsod to create a new minidump using Powershell and the "Winit" command
 

Attachments

  • minidump13.11.zip
    377.5 KB · Views: 141

ubuysa

The BSOD Doctor
i have managed to force a bsod to create a new minidump using Powershell and the "Winit" command
Do you mean the "wininit" command? Who told you to try that?

If you mean you entered the wininit command in PowerShell then that will certainly lead to a BSOD - on everybody's system. The wininit.exe process is the Windows Initialisation process, it launches other critical processes at boot time, it also allows background processes to start. If you execute wininit.exe manually it will usually BSOD - because it's not a user mode process and if it's executed improperly it will BSOD. The dump shows that wininit.exe was the process that died, so this isn't a problem, it's user misoperation. ;)

The csrss.exe CRITICAL_PROCESS_DIED dump earlier isn't for the same reason is it, you didn't execute the "csrss "command in PowerShell did you?

I really would like to see the System log. If you upload it to the cloud (DropBox, OneDrive, Google Drive, etc.) and post a link to it here I'll gladly take a look. Be sure to make the file public in the cloud.
 

ubuysa

The BSOD Doctor
the csrss was not me! that was a random one. and i meant the wininit command. thought id try that if any other info regarding the random shut downs!

https://drive.google.com/file/d/1EcusvGYXjMbvVZdxabK2wZecfd9HAFSN/view?usp=sharing There are the system logs. Theres alot of them! (around 32k so good luck!)
Thanks! I love a challenge. I'll let you know what I find...

LATER EDIT: There are some interesting logs in there. Could you do the following for me please?

In the Run command box, type msinfo32 and press Enter. When msinfo32 starts, click on File and then Save, call the file StuAugmsinfo32 (a .nfo extension will be added) and press Enter. Then upload the file StuAugmsinfo32.nfo to here.

Also, what external storage devices (USB drives, USB flash drives, etc.) is your son using?
 
Last edited:

StuAug

Member
msinfo32 file added! Zipped so the forums can accept it. I plug a 4gb, 8gb usb drives into it, mostly to transfer music from my pc to my mac. A 1tb external hard drive and a DVD drive.
 

Attachments

  • StuAugmsinfo32.zip
    172.9 KB · Views: 75

ubuysa

The BSOD Doctor
I asked about the external drives only because there is a host of "disk n has been surprise removed" warnings, indicating that a USB drive was removed before it was ready. You risk data loss doing this. You should either disable write caching on your USB device(s) or use the "safely remove hardware" system tray icon.

I think your son has a hardware problem. The System log runs from 16th July to 13th Nov (but the application log only runs from 11th Nov to 12th Nov - I wonder why?). In all that time (in the System log) there have been only two BSODs, the two you uploaded dumps for. I can see a whole host of critical error 41 messages indicating that Windows restarted without properly shutting down. These will be the "switching off" that your son complains about.

There don't appear to be any common problems that could account for all these error 41 messages, which strongly suggests a hardware failure. This is supported by several WHEA logger messages (WHEA is the Windows Hardware Error Architecture) reporting a hardware failure. Here's the most recent...

Rich (BB code):
Log Name:      System
Source:        Microsoft-Windows-WHEA-Logger
Date:          30/10/2022 17:15:30
Event ID:      18
Task Category: None
Level:         Error
Keywords:
User:          LOCAL SERVICE
Computer:      The-Beast
Description:
A fatal hardware error has occurred.

Reported by component: Processor Core
Error Source: Machine Check Exception
Error Type: Cache Hierarchy Error
Processor APIC ID: 6

The details view of this entry contains further information.

The reason I suspect, why we don't see WHEA messages for every switch off is because the hardware error crashed the PC before WHEA could log the error.

The highlighted error reason of "cache hierarchy error" is a CPU detected error, as shown by the "processor core" as the reported by component. I don't think this is a CPU error however, I think this is a RAM error. Flaky RAM might explain the sudden switch off, but the problem might also be a PSU issue.

I'm no hardware expert so I'm going to flag @SpyderTracks (who is a hardware expert) and ask him whether that 650W PSU is big enough to drive the CPU/GPU hard in a demanding game like FlightSim. If he thinks the PSU is man enough for the job then I think we need to look at RAM.

We could run Memtest86 on your RAM, but I'm going to suggest you don't do that for a couple of reasons. The first is that to run two sets of 4 iterations of the tests would take many hours on your 64GB of RAM and I doubt your son would want to be without it for that long[!]. The second reason is that even Memtest86 cannot definitively prove that your RAM is good, it's just the best testing tool available.

A better, and 100% reliable, way of testing for bad RAM is to remove one of your RAM sticks and run on the remaining 3 for a few days. 48GB should be more than enough RAM, although you might see the odd paging slowdown if the system does page RAM, but I think that unlikely.

If you still see this sudden switching off then put that RAM stick back and remove one of the others.

Keep swapping RAM sticks until you have either tried will all four out at some point and you still get switching off (in which case it's not RAM), or until it's stable with only 3 sticks - and the one that's out is the flaky one.
 
Last edited:

StuAug

Member
if it helps in bios 2 RAM sticks appear as 2166MHz even though it says 2666MHz on the actual stick i will try moving the sticks to see if it is the sticks or something with the bios/mobo
 

StuAug

Member
I suspect something was wrong with the two ram sticks, switching them to the other slot showed them still as 2166mhz. I have taken them out and will try playing some games now to see if it has helped the sudden turn offs. will report back the results
 

Attachments

  • Screenshot_2022-11-15-16-02-41-39_99c04817c0de5652397fc8b56c3b3817.jpg
    Screenshot_2022-11-15-16-02-41-39_99c04817c0de5652397fc8b56c3b3817.jpg
    583.4 KB · Views: 88

StuAug

Member
UPDATE: The gpu is still hitting 110 on msfs even with added cooling ;). the junction temp is 110 whereas the current temp is around 60.
 
Top