Using the Performance Monitor

ubuysa

The BSOD Doctor
The Windows Performance Monitor is an incredibly useful tool and yet few users even know that it exists. As the name suggests the Performance Monitor (or perfmon for short) reports on your system's performance, but the detail that can be produced varies from tracking a single variable (like % CPU usage for example) to a very detailed report on the performance aspects of the most minute components of your system. (Note that you must be a member of the Administrators group to use perfmon).

Perfmon is flexible and configurable, it uses data collectors that run in the background collecting data as you use your computer. You can specify exactly what values (perfmon calls them 'counters') you want to sample and over what period of time. For example, you could sample the time an individual logical processor spends handling interrupts, or the utilisation percentage of your GPU engine, or even the resident bytes in the system paged pool. There are a huge number of counters that you can record, and at an individual resource level. The results can be studied at your leisure and you can drill down as deep as your knowledge can take you to analyse what is causing any performance issues you might be having.

Perfmon is such a useful tool for all users that I thought it would be useful to write a few tutorials on how to use it and what it can do for you. I'm not claiming to be a perfmon expert, there are features of it that I haven't used yet, so additions and updates to this thread by others is welcome.

We'll start with the simplest (but still very useful) feature of perfmon; the System Diagnostics report, also known as a System Health report.

Unfortunately, the April 2018 version of Windows 10 (1803) breaks the System Diagnostic report a tad (and the Performance Report too). The Memory and Network status is not shown in the Resource Overview sub-section and the Memory section is missing altogether. This has been reported to Microsoft, hopefully a fix won't be too long in coming.

The October 2018 Update (1809) does not fix this issue, the Memory and Network status is still not shown in the Resource Overview sub-section and the Memory section is also still missing in Windows 10 1809.

The May 2019 Update (1903) does not fix this issue either, the Memory and Network status is still not shown in the Resource Overview sub-section and the Memory section is also still missing in Windows 10 1903.


The November 2019 Update (1909) does not fix this issue either, the Memory and Network status is still not shown in the Resource Overview sub-section and the Memory section is also still missing in Windows 10 1909.

The examples shown here were produced on a Windows 10 Anniversary Edition (1607) system running in a virtual machine.


System Diagnostics Report

A very useful feature of perfmon (that all users might be interested in) is the System Diagnostics report (it's also known as a System Health report). This runs a pre-defined set of data collectors that run for 60-seconds and produces a very detailed report on almost every aspect of your system. It's extremely useful for checking whether you have any issues than needs investigating and/or whether there are any performance bottlenecks in your system (identified during the 60-second data collection period only of course).

The beauty of the System Diagnostics report is that you don't need to configure anything, you just start it and wait for it to produce the report. The report it produces is multi-level and quite easy to read at the high level you first see (it even uses a simple traffic light system) but you can drill down to look at some fairly detailed information should you want to (or need to).


Running the System Diagnostics Report

To start the System Health Report enter 'perfmon /report' (without the quotes) in the Run command box and click Ok. The System Health Report will now run for 60 seconds and then show you the report. Whilst the data collector is running you will see this window...

Health Report Running.jpg

When the data collectors finish you'll see a message on the above window telling you that's it's producing the report. After a time (which can be many seconds, or even minutes on a slower system) the display changes to show the report, this looks something like the following...

Health Report Top-Level.jpg

You can see that the top two sections (System Diagnostic Report and Diagnostic Results) are already expanded and that the other sections are not expanded. These top two sections give you a good overview of your system health and if all is well here you need look no further.

The System Diagnostics Report section simply tells you when the report was produced and on which computer. Note that I ran this one on a newly installed Windows 10 Home system in a virtual machine with only 4GB of RAM and a single processor (so performance is pretty poor).

The Diagnostic Results section has two sub-sections; Warnings and Performance. In the Warnings section there is an Error and Informational sub-section (and there could be a Warnings or even a Critical one too) - these are the same message groups you would see in the Event Viewer. There is also a Basic System Checks sub-section - this has a traffic light results column to help you see whether you have any problems in these areas.

In my example above there is an error related to missing device drivers, if I click on the Symptom hyperlink next to the red X I'm taken to a more detailed area of the report where the devices with missing drivers are detailed. You can see this report below...

Health report pnp error.jpg

This clearly shows that the problem is with a Plug & Play device but apart from that it's not very helpful. The device itself does show the Ven and Dev numbers (80EE and CAFE respectively) and it's possible to look these up online. VEN_80EE and DEV_CAFE is actually a component of VirtualBox, the virtual machine system I was using!

The Critical, Error, Informational, and Warnings sub-sections allows you to quickly investigate problems that have been logged to the Event Viewer. Critical and Error messages here need to be investigated further, Informational and Warning messages are much less of a problem and in the main can be ignored.

The Basic System Checks sub-section shows you five areas with a traffic light in the result column to let you decide whether any further analysis is needed. In my example above there is a red traffic light next to the Hardware and Driver Checks, so I could click on the + next to it to expand this section to get more detailed information. In this case we'd see the various test groups performed and how many failed (in my example it was a plug & play device test that failed - which we already know from the Error above).

You can of course click on the + next to any of these checks to expand the entry, but with the green traffic light there will be no failures in there.

In the Performance section (which you can see below) there is a Resource Overview sub-section with a traffic light status to show you whether any of the displayed resources were under stress. In my example below you can see a red light for the CPU showing that it's 81% busy (which might or might not be a problem, it depends whether you were expecting it). This sub-section lets you quickly see whether you need to expand any of the CPU, Network, Disk, or Memory sections you see below here. Clearly I'm going to need to expand the CPU section later.

Health Report Top-Level page 2.jpg

Each of the sections below here are collapsed at the moment, you can of course expand any (or all) of them. Inside each you'll find sub-sections with entries that can also be expanded (look for the + sign) to drill down for more detailed information.

The first of these is the Software Configuration section. In here are four sub-sections called OS Checks, Security Centre Information, System Services, and Startup Programs, each of these can be expanded. When you run perfmon /report yourself expand these sections in your report to see the wealth of information contained in there. Admittedly not all of it is easily understood or even intelligible(!) but there are some useful nuggets of information buried in there such as, whether you did a normal boot or a specialised boot, how many processors you have and how much memory you have for example.

The Hardware Configuration section has five sub-sections called Disk Checks, System, Desktop Rating, BIOS, and Devices. Each of these can be expanded of course and the entries within further expanded (look for the + sign).

The CPU, Network, Disk and Memory sections expand in to sub-sections and expandable entries that allow you to see how these resources were being used during the 60-second data collection interval. The traffic lights of the Resource Overview sub-section above indicate whether you need to bother expanding any of these. In my example it would be good to see where that 81% CPU usage was being expended so I'll expand the CPU section. (In the next post)...
 
Last edited:

ubuysa

The BSOD Doctor
The CPU section contains four sub-sections called Process, Service, Services and System. To find out what is using my CPU I'll first expand the Process sub-section, you can see this below...

Health report process.jpg

The Image Statistics section at the top shows me the biggest CPU users, and apart from the idle thread these are TiWorker.exe and svchost.exe. TiWorker.exe is part of Windows Update and it alone is using 55.3% of my CPU - that's because I had Windows Update running during the 60-second data collection interval.

There is also an instance of svchost.exe using 3.8% of my CPU and it would be nice to know what services were being hosted in there. I could simply expand it by clicking on the + sign to see what's being hosted, but to show you some more of what this report can do I've expanded the Service and Services sub-sections, you can see this below...

Health report services.jpg

In the Service sub-section you can see the Windows components that are consuming most of my CPU. Wuauserv is the Windows Update service, TrustedInstaller is the Trusted Installer (used by Windows Update) and sysmain is the SuperFetch service.

In the Services sub-section I've expanded the first instance of svchost.exe (pid 836, the one that's using 3.8% of my CPU note) and you can see that hosted in there is the Windows Update service and we already know that's what's contributing to most (3.1%) of that svchost.exe's CPU usage.

Just to complete the picture, I've expanded both the Memory and Disk sections below for you...

Helth report memory.jpg

In the Hot Files sub-section of the Disk section you can see that the $LogFile and $MFT (both part of the directory structure of a disk) are busiest followed by the software distribution data store. Since we know that Windows Update was running these disk uses don't come as a surprise.

In the Process sub-section of the Memory section you can see that the biggest users of RAM were TiWorker.exe (which we already know is part of Windows Update) and svhost##3 - note the pid of 836 - this is the instance of svchost.exe that's hosting the Windows Update service. Between them they're actually using 645.5MB of RAM (the sum of their working set sizes, note that there is only 4096MB of RAM installed) and they're both using a shade over 4GB of virtual storage (the sum of their Commit values).

All in all we've shown quite convincingly that Windows Update was the biggest consumer of CPU and RAM on this system during the 60-second data collection interval.

The Report Statistics section at the bottom shows you, in the main, when the data collectors were started and stopped and the date of the report.


Saving The System Diagnostic Report

You may well want to save the report somewhere so that you can come back to it later, and the good news is that it's saved automatically - as soon as it's created in fact. It's saved within perfmon so we need to start the full Performance Monitor to be able to go back to this report.

To start the Performance Monitor enter perfmon in the Run command box, you'll see the home screeen shown below...

Perfmon Home.jpg

All reports are saved within the Reports folder. When you expand that folder you'll see two sub-solders called User Defined and System, expand the System folder. In the System folder are two sub-folders called System Diagnostics and System Performance, expand the System Diagnostics folder. In there will be the reports for every perfmon /report commnd you have ever run on this machine, they are date-stamped and numbered (from 000001 onwards). Simply click on the report you want to see.

Just before we finish this example of perfmon use, click on the View menu item on the toolbar at the top of the perfmon window. You'll see four entries; Report, Performance Monitor, Folder, and Customise.

The Customise option allows you to customise the objects you see on the perfmon display (toolbars and menus for example). The Report option shows the reported data as a report - that's what you're looking at now. The Performance Monitor option shows the data from all the counters recorded on a graph, we'll be using this graph in future experiments with perfmon which is why I want you to see it, notice that there is a legend at the bottom (and a scale factor). This cluttered graph is way too complicated to be of any use, this report is much better viewed and used as a Report. The Folder option simply shows you the files created by the various data collectors that ran in the 60-second interval (they're in C:\PerfLogs\System\Diagnostics\report-id).

The System Diagnostic is thus a useful tool that anyone can run to get a good overview of how 'healthy' their system is; whether there are any errors that need investigations and/or whether there were any performance issues encountered.

I hope to write some more on this thread later showing how to use perfmon to investigate performance problems in the key areas of CPU, memory, disk, and network in later posts. In these you'll see that you can select which performance counters you want to record and over what time period. :)
 
Last edited:

ubuysa

The BSOD Doctor
The major disadvantages of the System Diagnostics report we saw above is that you don't get to select which performance counters you want to track and you don't get to specify how long you want the data collectors to run for. It may be for example, that you need to record a performance trace for specific counters (like % CPU use, or committed bytes) over a much longer period so that you can attempt to reproduce whatever issue you're having whilst the data recorders are running.

This obviously means a bit more work setting up perfmon before you run the data collectors, and a longer recording interval will of course result in bigger data collector filesizes, you need to bear that in mind. One other factor to bear in mind is that running the performance monitor will make your performance problem worse. This is because perfmon uses system resources (CPU, RAM, disk, etc.) in order to run the data collectors, it's not a lot but it will inevitably mean fewer resources available for the rest of your system, thus making a performance issue related to resource congestion even worse.

Information is everything when troubleshooting a performance issue, so the small hit you take running perfmon is definitely worth the information you'll gather.


Running The Performance Monitor

To start perfmon enter 'perfmon' (without quotes) into the Run command box, you'll see the home window we saw above. In the tree menu on the left click on Performance Monitor and the frame on the right will change into a graph like the one shown here...

Default graph.jpg

You can see that (by default) there is a data collector running, it's sampling the CPU busy percentage. You can see the vertical red bar moving across the graph in real time leaving behind a red trace of CPU busy.

You can tell it's real-time by the timestamps on the X-axis, notice how they are updated as the vertical line passes.

You can tell it's a CPU busy trace by the legend at the bottom, notice the value in the Counter column says '% Processor Time'. Notice also that the Instance column says '_Total' which means this is CPU busy across all processors - so it's overall CPU busy.

The Y-axis is labelled from 0 to 100, the Scale column in the legend shows '1.0' which means the red line is the actual CPU busy on the 0 - 100 scale on the left, so you can read the actual CPU busy by the height of the red line on that scale.

Between the legend and the graph are instantaneous values; you see the 'Last' CPU Busy value recorded, then the 'Average', the 'Minimum' and the 'Maximum'. The 'Duration' at the far right is the width of the X-axis (in minutes and seconds here) - the default width is 100 seconds.


Changing The Graph Layout

You can change the wrapping of the graph to a smooth leftwards scroll if you prefer. Right-click anywhere on the graph area and select Properties from the pop-up menu. In the dialog that opens select the Graph tab. Notice the Scroll Style radio button is set to 'Wrap', click the 'Scroll' radio button instead and then click Ok. The vertical red line has vanished and the CPU Busy red line trace is scrolling smoothly to the left with the times updated automatically. I don't know whether you'll agree but I prefer the graphs to be scrolling.

If your CPU busy is quite low the red line is probably less than around 10 on the Y-axis scale so you probably can't see much detail in the graph. If we changed the Y-axis to go from 0 - 10 we'd have a much clearer view of the changes in CPU busy. To do that right-click anywhere in the graph area, select Properties, and select the Graph tab again. At the bottom of the dialog that opens you can see the Vertical Scale, change the maximum to be 10 instead of 100 and click Ok. Notice how the Y-axis has changed and notice too how the graph is easier to read now. If your CPU Busy is more than 10% at any time you'll see it clipped at 10, if it's always more than 10% busy you'll just see a horizontal line at 10.

You can of course change the minimum value on the graph so that the graph represents a range in values. To illustrate that (and assuming your CPU Busy is less than 10% at the moment) change the vertical scale to be 10 to 60. If your CPU Busy is less than 10% you'll just see a horizontal bar at the bottom (the 10 level) with perhaps an occasional spike above. The graph clips to the top and bottom for values that are out of range of the vertical scale.

Change the vertical scale back to 0 to 100 so that we're not missing anything.

You can also change the scale of the X-axis if necessary, as mentioned the default width is 100 seconds which is generally long enough to see whatever you're looking for. To change the X-axis, right click anywhere on the graph area, select Properties, and then select the General tab. At bottom right are the Graph Elements, change the Duration to 200 seconds and click Ok. Notice how the graph moves much more slowly now and how the Duration value is now 3:20 (in minutes and seconds). Go back to the Graph Elements again and change the Sample Every value to 10. Notice first how the graph is cleared, whenever you make changes the graph is cleared. Notice also how the labels on the X-axis are now much wider apart and how the graph shows much less detail. Sampling more often produces more detail but uses more resources of course. A 1 second sample (the smallest you can select) is usually fast enough to show whatever it is you're looking for.

Change the graph elements back to a sample rate of 1 and a duration of 100.

If you go back to the General tab again, near the top you can see checkboxes which allow you to hide aspects of the graph. If you uncheck the Legend and click Apply then the (important!) legend at the bottom vanishes. In my opinion you need all three of these elements, so check the Legend box again.

At the moment the sample data is being displayed as a line, to change that right-click anywhere in the graph area, select Properties and select the Graph tab again. Click on the drop-down list that says Line and select 'Histogram Bar', notice how you now see a histogram box moving up and down. This is most useful when you are tracing two or more counters and you want to be able to compare them side by side.

Now go back to the Graph tab in the Properties dialog, click on the drop-down list again and select Report. What you see is a single number that's changing. It's actually the average CPU Busy (average is the default) but you can change this. Go back to the Properties dialog and select the General tab again, in the Report and Histogram Data area experiment with clicking each radio button. You'll see the displayed value change to reporting whichever one is selected. As you might imagine, these radio buttons also affect what a histogram graph displays.

Now you know what the graphs look like and how to change the way they're displayed we'll move on to look at adding more counters (see next post)...
 

ubuysa

The BSOD Doctor
Adding And Removing Performance Counters

Right-click anywhere in the graph area and select 'Remove All Counters', click Ok in the confirmation dialog. Notice how the CPU Busy graph has disappeared and the legend is empty.

To add counters right-click anywhere in the graph area and select 'Add Counters', it may take a few seconds to populate the dialog box that opens, this is shown below...

Add counters.jpg

The large populated frame on the left is where you select the counters you want to include. They are shown in expandable groups and the Processor group is selected by default but you can scroll up and down this list. Scroll up to the Memory section and click the down arrow next to it to expand the memory section. Here are all the counters you can select related to both virtual and real memory. In this list click 'Available MBytes', hold down the Ctrl key (to select multiple values) and select 'Committed Bytes', whilst still holding the Ctrl key down select 'Page Faults/sec' and then click the Add button at the bottom to add these counters. You'll see them appear in the frame on the right. Now click Ok.

You'll notice that the graph is quite hard to read, you're most likely getting Available MBytes and Committed Bytes clipped at the top as horizontal lines, the Page Fault/sec is probably clipping at the top too. This is because of the vertical scale, it runs from 0 to 100 and yet you have way more than 100 MBytes of available RAM (I hope!). We need to change the vertical scale max to be the number of MBytes of RAM we have installed, I have 16 GB of RAM and that's 16384 MBytes so I must change my vertical scale max to 16384. Change yours to the appropriate value (8 GB is 8192 MBytes). Your graph will look similar to mine shown here...

memory report.jpg

Your Available MBytes and Committed Bytes should be roughly horizontal lines (though they may vary) and the Page Faults/sec is a jagged line that can vary right across the graph.

Notice that in the legend the Available MBytes entry is selected, the instantaneous values just above relate to the selected value. Select Page Faults/sec and see these values changing. The unit of these values is whatever counter is being shown, so for Available MBytes, the units are Mbytes, for Committed Bytes the units are Bytes, and for Page Faults/sec the units are page faults/sec.

Notice also the Scale column in the legend, this is a multiplier that you need to use when reading values off the graph. On my system the Available MBytes has a scale of 1.0 so can be read directly off the Y-axis. Page Faults/sec has a scale of 0.1 so the value read off the Y-axis must be multiplied by 10. Committed Bytes has a scale of 0.000001 so the value read of the Y-axis must be multiplied by a million! (Note that this is because the Y-axis is scaled in Mbytes but the Committed Bytes value is in Bytes). The scale factor shows you how much the actual values have been reduced in order to fit the line on this Y-axis scale.

So far all the counters we've added have been system-wide but for many counters you can limit them to a specific device. Open the Add Counters dialog again. Expand the Processor section and select % Processor Time (that's the CPU Busy we saw earlier), notice that in the frame below you have several instances which you can select. Total (selected by default) and All Instances are the same thing (all CPUs) but you can limit this counter to individual (or a group of) CPUs. Assuming you have at least four CPUs click 0 and whilst holding the CTRL key down select 4 as well. Now click the Add button. Notice that the frame on the right has two counters selected, one for % Processor Time on CPU 0 and one for % processor Time on CPU 4. Now click Ok.

You may not be able to see these new CPU Busy values because they're (on my system) scaled with a value of 1.0 which makes them almost a horizontal line at the bottom. We can make it easier to see them by removing some of the clutter and temporarily hiding the memory counter graphs, to do that uncheck the boxes next to the three memory graphs in the legend. We can make it even easier to see these CPU graphs by re-scaling the Y-axis, make it 0 to 10. Now you can (probably) see the two CPUs % Busy graphs. Change from a line to a histogram, now you can see the two CPU's % Busy side by side.

Change the graph back to a line and show all the other counters by checking all the boxes in the legend. Most of the other counters are now clipped at the top on this 0 - 10 scale, notice their scale factor in the legend has changed too.

Select the Available MBytes counter in the legend, right-click anywhere in the graph and select 'Scale Selected Counters'. You'll notice the scale value in the legend (probably) changes and the trace appears on the screen. What this does is force the trace onto the current scale so you can see it and update the legend scale factor as appropriate.

So far we've only seen perfmon used here for real-time recording and reporting, we'll cover running background data collectors in the next post. For now experiment with adding different counters, it doesn't matter what they are just get used to selecting counters for all devices and for specific devices (where appropriate). Experiment with scaling (particularly Y-axis scaling), with histograms and report displays, and be sure you're confident at using and interpreting the graphs and the numbers you see here. You need to be clear about what you're seeing on here and familiar with manipulating these graphs before we move on to actual performance analysis using background data collectors.
 
Last edited:

ubuysa

The BSOD Doctor
Watching performance counters in real time is all well and good, but it's only useful if you have a pretty constant performance issue that you can diagnose in real time. Quite often performance issues are transitory, hangs and freezes for example, and real time analysis can't help much with those. You can also get performance issues with specific applications or games and it's often impossible to run the application/game and monitor the perfmon graphs at the same time. For these performance issues you need to run the data collectors in the background as you use the application/game, or run them for long enough in the background to catch the transitory performance issues in the trace.

As mentioned earlier, the April 2018 version of Windows 10 (1803) breaks the System Diagnostic and System Performance reports in a small but important way. The Memory and Network status is not shown in the Resource Overview sub-section and the Memory section is missing altogether. This has been reported to Microsoft, hopefully a fix won't be too long in coming. The examples shown here were produced on a Windows 10 Anniversary Edition (1607) system running in a virtual machine.

Creating Data Collector Sets

To use the backround data collectors we need to start perfmon of course, the home screen is shown below...

Perfmon Home.jpg

In the folder tree on the left expand the Data Collector Sets folder and then click on the User Defined folder (this one should be empty if you've not used background data collectors before). In the empty frame on the right, right-click anywhere and select New and then Data Collector Set from the pop-up menu. The Create New data Collector Set dialog will open, as shown below...

add data collector.jpg

Initially you probably don't have any idea what is causing your performance issue so you want to run a general background trace to see what all the resource utilisations look like. You could use the System Diagnostics report that we saw earlier for this (perfmon /report), but we'll use the (similar) System Performance report as a template so that you get a chance to see it here too.

On the first page of the Create New Data Collector Set dialog give your set a name (I've used 'My First Set') and leave the 'Create from a template' radio button selected (we'll see manual use later).

On the next page you're asked to select the template to use, select System Performance and click the Next button (note that you could select a System Diagnostics report here too). Note that WDAC is the Windows Defender Application Control, the WDAC Diagnostics trace produces a report limited to WDAC. The Basic option produces a very simple and limited system report.

Unless you want to change the Root directory where the output of the data collectors is written click Next to accept the specified folder path. It would be wise to make a note of where the data collector output is going though, you could always copy it and send it to someone else to analyse.

The 'Run as' box lets you specify the user under whose authority the data collectors will run, this user must be a member of the Administrators group and you'll have to specify the userid and password for it. The <Default> option means the data collectors run under your userid and with your authority (you have to be a member of the Administrators group to use perfmon at all). We want to be able to specify how long the data collectors run so click the 'Open properties for this data collector set' radio button and then click the Finish button.

You'll see the Properties dialog for your data collector set open. Generally you don't want to add or remove any keywords from this report (we'll use manual data collectors later for specifying our own performance counters to monitor), nor do you want to change any other options apart from the 'Stop Condition', so click this tab. If the 'Overall duration' checkbox is not checked then select it. Here you can specify how many seconds, minutes, hours, days or weeks you want the data collectors to run when you start them. Be very careful with the days and weeks options, they can result in huge filesizes. I suggest 1 or 2 minutes for a first run but you can make it longer if you're looking for a transient issue, finally click Ok and you're done setting up the data collector set.


Running The Data Collector Set

You can see your new data collector set if you expand the User Defined folder in the left hand folder tree, you can also see it in the large right-hand frame. Note that these data collector sets are saved to disk so they are preserved across system restarts or power-off. You can also run them as many times as you like.

Click once on your data collector set (in either the folder tree or the right-hand frame) to select it and on the menu bar at the top a green right-facing arrowhead will appear. When you click that green arrowhead the data collectors will start and run for the duration you specified, so that is also the time to activate the application or game that's causing you performance issues. Note the grey square to the right of the green arrowhead, it will go black when the data collectors are running (and then the green arrowhead will be greyed out), this is the stop button in case you want to stop the data collectors early.


The System Performance Report

The results of the data collector set run are in the User Defined folder under the Results folder. Click on the report to open it. It looks very similar to the System Diagnostics report we saw earlier, but without the event viewer entries. You can see mine below...

system performance home.jpg

The Summary section provides some useful information about the max utilisations seen. In my example it lists the maximum CPU busy percent seen in the interval and the name of the process using most of that (MsMpEng.exe in this case - that's Microsoft Security Essentials, I was running an anti-virus scan at the time). It also shows the busiest disk (0 in this case - I only have one in this virtual machine), the maximum disk activity (189 I/O operations per sec) and the depth of the queue (2 in this case, meaning a max of two I/O operations were queued up waiting for this disk - queue depth gives you a good measure of how 'busy' a disk is). The maximum RAM busy percent is shown (46%) and how much RAM is installed (2Gb in this virtual machine), also shown is the process using most RAM (svhost##2 in this case) this process's working set size is also shown (127,260 KB - that's the RAM being used by this process).

The traffic light system in the Resource Overview is the same as with the System Diagnostic, you can see this system was busy, but not overly so - all the lights are green.

The remaining sections are exactly the same as the System Diagnostic report we've already seen. Had we seen any red traffic lights you could have expanded these sections to identify what's using this resource the most.
 
Last edited by a moderator:

ubuysa

The BSOD Doctor
Using Manual Performance Counters

If the System Performance report we just looked at shows you that one (or more) resources are under stress in the interval and the detail in the performance report isn't sufficient, or if you want to track other counters, then you'll need to create a data collector set with your own counters specified.

Click on the User Defined folder in the folder tree view on the left, you're back to the user defined data collectors page again. Right-click anywhere in the right-hand frame and select New and then Data Collector Set again. Give this set a different name (I'm using 'Manual Counters Set'), click the 'Create manually' radio button and then click the Next button. You'll see a new dialog asking what type of data collector set to create, as shown below...

Select data type.jpg

Select the 'Performance counter' checkbox and click the Next button. Now you'll see the dialog where you can select performance counters, it's shown below... The Event Trace Data is for diagnosing internal Windows problems and is not directly performance related so we won't look at these. System Configuration information is for tracking changes to registry keys and is not directly performance related so we won't look at these. The Performance Counter Alert can be useful in performance analysis and we'll look at this in a later post.

performance counters.jpg

At the bottom of this dialog you need to select the sampling rate, the default is 15 seconds which is too long for detailed performance analysis. Of course the shorter you make the sampling interval the more data you collect and the bigger the data collector filesizes. That said, I generally select 1 second here (the fastest sampling rate you can select) to give me the best chance of capturing the problem.

To add performance counters click the 'Add...' box, the 'Add Counters' dialog that we've seen before pops up. Here you select the actual counters you want to record. I'll use examples from the Windows 10 system I'm running in a virtual machine here, but you can follow along on your own system to see what I'm doing.

We've already seen from the System Performance report that there aren't any major issues on this system so I'm just going to pick some sample values to illustrate the sort of thing you can do. I will have file explorer copy the Windows folder to my Documents folder whilst the data collectors are running - just to create a workload.

First, I want to track how much virtual memory is being used across the system (this is total committed bytes) so I'll expand the Memory section and select Committed Bytes, then click Add to add that counter.

Since I'll be doing a copy operation I also want to track some disk counters. To do that I'll first expand the Physical disk section and click on 'Avg. Disk Queue Length', if you check the 'Show Description' checkbox at the bottom it will show you what that counter does - it's the average disk queue depth. In the instances frame will be the disk number and the Windows volume letters of all the partitions on each physical disk (so there could be a list of volume letters against some disks). I would select disk '0 C:' here (I only have one disk in this vitual machine) and click Add. Note that if you want to look at data for individual partitions you should use the Logical Disk section.

I'm also going to add a second counter from the Physical Disk section (remember with the Ctrl key you can add both at the same time). This second counter is the Disk Bytes/sec which shows the rate at which bytes of data are being written and read from the disk.

Finally I want to track the % processor usage of the File Explorer process. To do that I'll need to expand the Process section and select the % Processor Time counter, but I only want it for the Explorer process. To save scrolling through all the instances enter 'Explorer' in the search box at the bottom and click the Search button. The Explorer process is now the only one displayed, so select it and click Add. Note that you can use this search function on any counter that has multiple instances.

We're now done adding performance counters so click Ok to exit the Add Counters dialog. You'll now be back at the first add performance counters dialog with the counters we selected shown in the box, if all is as expected click Next.

You'll now see the root directory where the results of the counters will be saved, it's unlikely you'll to need to change this, so click the Next button. On the final page of the dialog we're asked for the user under whose authority the data collectors will be run, accept 'Default as before. Also as before click the 'Open properties for this data collector set' button so we can set/change the interval duration and then click the Finish button.

On the data collector properties dialog (which we've seen before), click the 'Stop Condition' tab and in there check the 'Overall Duration' checkbox and specify the length of the data collector interval (I used 1 minute), then click Ok to finish.

The data collector set with manual counters is now ready to run.

Run the new data collector set, either by clicking the green arrowhead or by right-clicking on the data collector set and selecting 'Start' from the menu.


The Manual Data Collector Report

As we've seen before, the performance counters are displayed initially as line graphs with the legend at the bottom showing the colours, counter name, and scale factor. You can see my initial report below...

manual report.jpg

Notice that the default Y-axis scale of 0 - 100 is used, this might not be appropriate for these values (and certainly not for committed bytes!). The scale in the legend shows you what multiplier to use when reading values off the graph. The % Processor Time counter has a scale of 1.0 so this can be read straight off the graph. The Avg. Disk Queue Length counter has a scale factor of 10.0 so values read off the graph must be divided by 10. The Disk Bytes/sec counter has a scale factor of 0.0000001 so values read off the graph must be multiplied by a million. Committed Bytes counter has a scale factor of 0.00000001 so values read off the graph must be multiplied by 10 million. The scale factor shows you how much the actual values have been reduced in order to fit the line on this Y-axis scale.

We've also seen that by clicking on each counter (to highlight it) the instantaneous values reflect the values of that counter. What you haven't yet seen is that by hovering the cursor over any point on any of the line graphs a flyout appears giving you the min, avg, max values for the point on that counter, you can see this in my example below...

perfmon highlight.jpg

A graph only makes sense if we're trying to compare these values in some way, and on my example graph you can see a slight correlation between the the Explorer % Processor Time, the Avg. Disk Queue Length, and the Disk Bytes/sec. This isn't a surprise as a copy between two folders on the same disk was running at the time, but it does illustrate the sorts of correlations you need to look for. This might be easier to see if you change the graph type from Line to Area.

Sometimes the data makes more sense in a numerical report form, so let's look at my data as a report where the maximum values are being displayed...

manual report as report.jpg

In my example above you can see that the maximum processor usage of Explorer is 22.23% which is quite a large slice for one application (and on a uniprocessor), though it's not excessive.

The maximum physical disk queue length was never more than 2 (1.89 is reported) which is about as high as you normally want to see it, in this case however the contention was coming from the same application; Explorer, which was reading and writing to the same physical disk. We were however moving almost 425 MBytes/sec of data to and from the disk and for an HDD this would be very impressive! A good hard disk has a data transfer rate of about 150 MBytes/sec, the virtual disk here is on a very fast M.2 NVMe SSD - hence the impressive data rate.

The committed bytes are pretty constant at just over 1GB which is fine in a virtual machine with 2GB of RAM installed (and a pagefile of course which allows for a higher commit limit).
 
Last edited:

ubuysa

The BSOD Doctor
Some Useful Tips

In the vast majority of cases all you're likely to need is the System Performance report with a suitable interval, pretty much all the counters you will need to analyse a performance issue will be in that report. If you do manually create data collector sets be careful of not including too many counters, clutter makes the graphs unreadable. You also need to choose the right counters so here are a few that you might want to include. The limits that I'm suggesting here are mostly very approximate and depend very much on what hardware you have installed.

Hard Disks tend to cause the most performance problems simply because they are inherently slow, so some useful counters to monitor hard disks are:

Physical Disk\% Idle Time - this is a measure of how busy the disk is, aim for 60% idle (reduce active files on this disk if exceeded)
Physical Disk\Agv. Disc sec/Read - this is the average time for a read operation, aim for no longer than 25ms (archive old data and defrag if exceeded)
Physical Disk\Agv. Disc sec/Write - this is the average time for a read operation, aim for no longer than 25ms (archive old data and defrag if exceeded)
Physical Disk\Current Disk Queue Length - this is the depth of the wait queue, aim for no more than 2 (move some active files to other physical disks, spread the load)
Logical Disk(HarddiskVolumeX)\% Free Space - this is the percentage of space on the 'X' hard disk that is free, aim for no less than 25% (archive old data, add another hard disk, or upgrade disk size if exceeded)

RAM is a critical resource but Windows manages it extremely well and high RAM use is not always a problem, some useful counters here are:

Memory\Available MBytes - this is RAM that can be used immediately, it includes the Standby, Modified, and Free lists, aim for at least 10% of installed RAM available (just a guide, 100% RAM used is not a problem in itself, as long as you're not hard paging)
Memory\Page Reads/sec - these are hard page faults that were satisfied from the pagefile on the disk, aim for no more more than 100/sec (install more RAM if exceeded - unless they all come from one process, in which case investigate the process for memory leaks)

The Network can often be a performance limiter if one is streaming of doing other network intensive work, some useful counters here are:

Network Interface/Bytes Total/sec - this is the data rate (in and out) over each network adapter, aim for less than 60% of the interface bandwidth, more than 80% might cause problems (obtain more bandwidth or reduce the concurrent network usage if exceeded)
Network Interface/Output Queue Length - this is the depth of the output packet queue, aim for 1, more than 2 will be a problem (reduce the concurrent network usage or obtain more bandwidth)

CPU is the most critical resource of course, some useful counters here are:

Processor\% Processor Time (_Total) - this is the average CPU busy percentage across all processors, aim for less than 60%, more than 80% might be a problem if it's consistent, 100% for short periods is not a problem (CPU upgrade or reduce the volume of concurrent work if exceeded)
System\Processor Queue Length - this is the system wide average ready queue depth, divide this by the number of (logical) processors and aim for less than 5 per processor (CPU upgrade or reduce the volume of concurrent work if exceeded)
Process\% Processor Time (by process) - this is the average CPU busy percentage per process - there are lots of processes so check that your key ones are getting enough CPU time or locate those that are getting more than you need them to (stop unnecessary CPU hog processes at busy periods or consider lowering their Base Priority one step)
 
Last edited:

polycrac

Super Star
A brilliant post, thank you. I'm trying to work my way through following what you did, but I have some different file options to you in the report itself:
perfmon report trial.png

Any idea why?
 

ubuysa

The BSOD Doctor
Using Performance Alerts

Performance alerts are a handy tool if you want to be alerted when a particular counter goes over (or falls below) a specific value. You might want to be alerted for example, if the disk queue depth for a specific physical disk exceeded 2 (indicating excessive contention for the disk) or even when the disk free space fell below 40%. You might also want to automatically start a data collector set when the alert was triggered to gather more information. You might even want to start some other program or software tool when the event was triggered to do some external processing.

You can do all these things with performance alerts, the only trouble is that setting these alerts up is not exactly intuitive and the alerts themselves are buried deep inside the Event Viewer. In addition, starting an external program requires you to create a new scheduled task. Despite this it is worth knowing how to use performance alerts, and especially how to automatically start a data collector set when an alert is triggered.


Creating The Alert

Start perfmon, expand the Data Collector Sets\User Defined folder, right-click anywhere in the frame on the right, and select New > Data Collector Set.

Give the set a name (I've used 'Performance Alert') and click the 'Create manually' radio button. In the 'type of data' dialog click the 'Performance Counter Alert' radio button and click the Next button.

You'll see the performance alert counters dialog as shown below...

perf alert blank.jpg

Click the Add... button and the familiar add counters dialog box appears. Select as many counters from here as you want to monitor, an alert will be triggered when ANY of the counters exceeds the threshold you set for it (it's a logical OR relationship). In this example I'm going to select Processor\% Processor Time (_Total) and Memory\Committed Bytes, once that's done I click the Ok button and we go back to the performance alert counters dialog with the counters I selected shown in the box. You can see this below...

perf counters filled.jpg

Now you need to set the threshold for each of the counters listed. Click each counter to highlight it and then select Above or Below and the limit value. This units of this value are of course related to the counter you're monitoring, so for % Processor Time it must be a value between 0 and 100 (because it's a percentage) but for the Committed Bytes it must be a value between 0 and the number of bytes of RAM (a huge number).

In my example I've used 'Above 10' for the % Processor Time and 'Above 7516192768' for Committed Bytes (that's 7GB in bytes, I know that my current commit value is 6.6GB so I need a value a little above that). This means an alert will be triggered for either of these two events.

Now we can click the Next button, and on the next dialog accept the 'Save and close' radio button option and click the Finish button.

We're not quite done setting up the alert yet however, for one thing we haven't specified what to do when the alert is triggered. To do that you need to click on your alert data collector set in the User Defined folder in the tree folder view on the left to select it. You can also double-click the alert data collector set in the right hand frame to select it. The right hnd frame now changes to show the contents of your alert data collector set, it will look something like my example below...

perfmon alert contents.jpg

We now need to modify the contents of the Alert data collector in the right-hand frame (probably called DataCollector01). To do that right-click on it and select Properties from the menu. The dialog you see is very similar to the performance alert counters dialog we saw earlier but this dialog is more complex, it's shown below...

perfmon alert actions.jpg

At the bottom is the Sample Interval we've seen before. Here you probably don't want a 1 second interval, it depends on how often you want to test for the alert value having been triggered. The 15 seconds default seems reasonable for most uses but I'm impatient so I set it to 10 seconds. Now click the Alert Action tab and you'll see the alert action dialog.

In the alert action dialog you need to check the checkbox that says 'Log an entry in the application event log' if all you want is a record of when the alert was triggered. These events are written to the system event log and can be viewed with the Event Viewer (more on this later). Note that this setting applies to all counters, so any counter in this alert set triggering an alert will write a log entry.

If you want to start a data collector set when the alert is triggered click the arrow at the right of the drop-down list and select which data collector set you wish to start. Note that you must create this data collector set before you run any alert sets that start it. Note also that this data collector set will be run if any counter triggers an alert, if you want different data collector sets to run for different alerts you have to set up multiple alert sets.

I'll look at uing the Alert Task tab in a short while, for now our alert is all set up and we can click Ok to end this dialog.

To start the alert set running click on the data collector set in the User Defined folder in the folder tree on the left to highlight it, then click the green arrowhead that you see on the menu bar at the top. Note that this data collector will run until you manually stop it by clicking the black square on the menu bar.

Now we have to wait for the counter thresholds to be exceeded. In my example I achieved that by starting Malwarebytes scanner, that uses a lot of CPU and is sure to trigger the % Processor Time threshold I set, I expect it to trigger the Committed Bytes threshold too, unless some other process already triggered it.


Viewing The Alerts

If you check the 'Log an entry in the application event log' checkbox, and it's always wise to do that so you have a record somewhere of all the alerts, the alert event can be viewed with the Event Viewer. To start the Event Viewer enter 'eventvwr.msc' in the Run command box. You may need to wait a while whilst it populates.

The performance alert events are buried deep within the folder tree structure on the left, expand the folders to navigate to the Applications and Services\Microsoft\Windows\Diagnosis-PLA\Operational entry (I told you they were well buried!). You should be looking at some information messages with an event code of 2031, if you select one of them the large pane at the bottom shows the details of which counter was triggered. You can see an example from my system below...

perfmon alert event viewer.jpg

Notice that the log entry is date and time stamped and that it shows you which counter was triggered, what that actual value of the counter was at the time, and what the threshold value was.

If you had specified that a data collector set was to run when an alert was triggered then the first alert would have started the data collector set, since you can only have one instance of a data collector set running further alerts won't start another instance of that data collector set. In that case however you will see an error entry with an event id of 1055 (and a 0x800710E0 error code) in the Event Viewer for the failed attempt to start each additional instance.

If the automatically started data collector set ends when it's interval is reached and then another alert occurs the data collector set will be re-started. Note that an automatically started data collector set will keep running once started by an alert even if you stop the alert set (though of course you can stop the automatically started data collector set manually).

The results of the automatically started data collector sets are in exactly the same place as if you had started it manually (they're in the Results\User Defined\data collector set name folder).
 
Last edited:

ubuysa

The BSOD Doctor
Using the Alert Task Option

When you right-click on the alert data collector to setup the alert action you can optionally setup an alert task, a task that will be started when the alert is triggered. What that task might do or how to code it is beyond the scope of this guide but we will see how to invoke such a task.

The Alert Task tab in the alert data collector properties dialog is shown below...

View attachment 12035

In the 'Run this task when an alert is triggered:' box you type the name of a task to be started when the alert is triggered. Note that this is not a .exe file nor a .bat file, it must be the name of a started task. This means you also have to create the started task itself (more on this in a minute).

The 'Task arguments:' entry is where you can select some variables to be passed to the started task as arguments. Click the arrow to the right to see the variables you can select, generally you need to separate these with commas (which you have to type). As you select these variables the 'Example task arguments:' box will show the values that will be passed to the started task if the alert was triggered now.

The 'Task argument use text:' entry is where you can enter a text string to be passed as an argument to the started task. When you're done click the Ok box.


Creating The Started Task

Started tasks is a subject all on their own, so we're only going to look here at the steps to create a started task that you can invoke from a performance alert.

To open the started task dialog enter 'control schedtasks' into the Run command box, the started task dialog will open as shown below...

View attachment 12036

In the menu on the right hand side click the 'Create task...' option, the create a new task dialog opens as shown below...

View attachment 12037

On the General tab enter the name of the started task in the 'Name:' field, this must match the name you entered in the alert task dialog earlier.

If the program this task will invoke requires elevated access then click the 'Run with highest privileges' checkbox. It is also wise to change the 'Configure for:' list to Windows 10. If you want the invoked program to run hidden then check the 'Hidden' checkbox.

Now click the 'Actions' tab and then click the 'New...' button at the bottom of this dialog, you'll see the 'New Action' dialog as shown below...

View attachment 12038

The 'Action:' entry must be set to 'Start a program', the other options of 'Send an email' and 'Send a message' whilst useful have been deprecated in Windows 10. Enter the name of the program or script that is to be run when this task is started, you can click the browse button to locate it. You don't need to specify any arguments here, the arguments that will be passed are the ones you set up on the alert task dialog.

There are no other values you need enter, so you should click Ok on the new action dialog to close it and then click Ok on the create task dialog to close that and create the started task. You'll be able to see it added to the list of started tasks on your system.

You're now ready to start your alert set and the specified started task will be run when the first alert trigger occurs.

To test this out (and you could do the same) I used the calculator as my started task program (C:\Windows\System32\calc.exe). That way, when an alert was triggered the calculator started up - not very useful but it does prove the feature works!
 
Top