2015-12-14

Windows PerfCounters and Powershell - Network and Contention perf data

In previous blog, I covered DISK/IO counters. This blog will briefly touch on Network, Threading and Contention.

Other counters:


Network I/O

COUNTER: Network Interface\Bytes Total/sec
TYPE: Instantaneous
USAGE:
#Get Instances
PS > (New-Object Diagnostics.PerformanceCounterCategory("Network Interface")).GetInstanceNames("")

Intel[R] Centrino[R] Advanced-N 6205
Microsoft Virtual WiFi Miniport Adapter _2
Microsoft Virtual WiFi Miniport Adapter
Intel[R] 82579LM Gigabit Network Connection

PS > New-Object Diagnostics.PerformanceCounter("Network Interface",
"Bytes Total/sec", "Intel[R] 82579LM Gigabit Network Connection")

CategoryName     : Network Interface
CounterHelp      : Bytes Total/sec is the rate at which bytes are sent and received over each network adapter,
including framing characters. Network Interface\Bytes Total/sec is a sum of Network Interface\Bytes Received/sec
and Network Interface\Bytes Sent/sec.
CounterName      : Bytes Total/sec
CounterType      : RateOfCountsPerSecond64
InstanceLifetime : Global
InstanceName     : Intel[R] 82579LM Gigabit Network Connection
ReadOnly         : True
MachineName      : .
RawValue         : 0
Site             : 
Container        : 

PS > (New-Object Diagnostics.PerformanceCounter("Network Interface",
"Bytes Total/sec", "Intel[R] 82579LM Gigabit Network Connection")).NextValue("")
0

MEANING:This counter indicates the rate at which bytes are sent and received over each network adapter. It helps you know whether the traffic at your network adapter is saturated and if you need to add another network adapter. How quickly you can identify a problem depends on the type of network you have as well as whether you share bandwidth with other applications.
THRESHOLD:Sustained values of more than 80 percent of network bandwidth.

COUNTER: Network Interface\Bytes Received/sec
TYPE: Instantaneous
USAGE: See above.
MEANING:This counter indicates the rate at which bytes are received over each network adapter. You can calculate the rate of incoming data as a part of total bandwidth. This will help you know that you need to optimize on the incoming data from the client or that you need to add another network adapter to handle the incoming traffic.
THRESHOLD:No specific value.

COUNTER: Network Interface\Bytes Sent/sec
TYPE: Instantaneous
USAGE: See above.
MEANING:This counter indicates the rate at which bytes are sent over each network adapter. You can calculate the rate of incoming data as a part of total bandwidth. This will help you know that you need to optimize on the data being sent to the client or you need to add another network adapter to handle the outbound traffic.
THRESHOLD:No specific value.


Threading and Contention

COUNTER: .NET CLR LocksAndThreads\Contention Rate / sec
TYPE: Instantaneous
USAGE:
#Get Instances
PS > (New-Object Diagnostics.PerformanceCounterCategory(".NET CLR LocksAndThreads")).GetInstanceNames("")

_Global_
powershell_ise
PresentationFontCache
dataserv
pcee4

PS > (New-Object Diagnostics.PerformanceCounter(".NET CLR LocksAndThreads",
"Contention Rate / sec", "_Global_")).NextSample("")

RawValue         : 310
BaseValue        : 0
SystemFrequency  : 2533369
CounterFrequency : 0
CounterTimeStamp : 0
TimeStamp        : 47774751876
TimeStamp100nSec : 130927572465737721
CounterType      : RateOfCountsPerSecond32

PS > (New-Object Diagnostics.PerformanceCounter(".NET CLR LocksAndThreads",
"Contention Rate / sec", "_Global_")).NextValue("")
0

PS > (New-Object Diagnostics.PerformanceCounter(".NET CLR LocksAndThreads",
"Contention Rate / sec", "powershell_ise")).NextValue("")
0

MEANING:This counter displays the rate at which the runtime attempts to acquire a managed lock but without a success. Sustained non-zero values may be a cause of concern. You may want to run dedicated tests for a particular piece of code to identify the contention rate for the particular code path.
THRESHOLD:No specific value.

COUNTER: .NET CLR LocksAndThreads\Current Queue Length
TYPE: Instantaneous
USAGE: See above.
MEANING:This counter displays the last recorded number of threads currently waiting to acquire a *managed* lock in an application. You may want to run dedicated tests for a particular piece of code to identify the average queue length for the particular code path. This helps you identify inefficient synchronization mechanisms.
THRESHOLD:No specific value.

COUNTER: Thread\% Processor Time
TYPE: Instantaneous
USAGE:
PS > Get-CimInstance win32_perfformatteddata_perfproc_thread | Select IDProcess, PercentProcessorTime |
Sort PercentProcessorTime -Descending | Group -Property PercentProcessorTime |
Select -ExpandProperty Group | Select -First 5

IDProcess                           PercentProcessorTime
---------                           --------------------
        0                                            100
     3820                                             96
        0                                             84
        0                                             84
        0                                             46
MEANING:This counter gives you the idea as to which thread is actually taking the maximum processor time. If you see idle CPU and low throughput, threads could be waiting or deadlocked. You can take a stack dump of the process and compare the thread IDs from test data with the dump information to identify threads that are waiting or blocked. Or examine Thread State and Thread Wait Reason counters.
THRESHOLD:No specific value.


PS > Get-CimInstance win32_perfformatteddata_perfproc_thread | Select -First 1 | FL *
Caption               : 
Description           : 
Name                  : Idle/0
Frequency_Object      : 
Frequency_PerfTime    : 
Frequency_Sys100NS    : 
Timestamp_Object      : 
Timestamp_PerfTime    : 
Timestamp_Sys100NS    : 
ContextSwitchesPersec : 0
ElapsedTime           : 13092759167
IDProcess             : 0
IDThread              : 0
PercentPrivilegedTime : 0
PercentProcessorTime  : 0
PercentUserTime       : 0
PriorityBase          : 0
PriorityCurrent       : 0
StartAddress          : 59492592
ThreadState           : 2
ThreadWaitReason      : 0
PSComputerName        : 
CimClass              : root/cimv2:Win32_PerfFormattedData_PerfProc_Thread
CimInstanceProperties : {Caption, Description, Name, Frequency_Object...}
CimSystemProperties   : Microsoft.Management.Infrastructure.CimSystemProperties


PS > (New-Object Diagnostics.PerformanceCounterCategory("Thread")).GetCounters("") | 
  Select CounterName | Sort CounterName

CounterName
-----------
% Privileged Time
% Processor Time
% User Time
Context Switches/sec
Elapsed Time
ID Process
ID Thread
Priority Base
Priority Current
Start Address
ThreadState
Thread Wait Reason

PS > (New-Object Diagnostics.PerformanceCounterCategory(".NET CLR LocksAndThreads")).GetCounters("") |
  Select CounterName | Sort CounterName

CounterName
-----------
# of current logical Threads
# of current physical Threads
# of current recognized threads
# of total recognized threads
Contention Rate / sec
Current Queue Length
Queue Length / sec
Queue Length Peak
rate of recognized threads / sec
Total # of Contentions


Next blog will be the last in the Windows PerfCounters series where I will put all of this to work writing Top script for Windows.

In this series:
BLOG 1: PerfCounters infrastructure
BLOG 2: PerfCounters Raw vs. Formatted values
BLOG 3: PerfCounters, fetching the values
BLOG 4: PerfCounters, CPU perf data
BLOG 5: PerfCounters, Memory perf data
BLOG 6: PerfCounters, Disk/IO perf data
BLOG 7: PerfCounters, Network and Contention perf data

2015-12-07

Windows PerfCounters and Powershell - Disk & IO perf data

This post is the hardest for me to write as I generally pay little attention to disks. When they prove too slow, I replace them with faster ones. So now I am writing this on laptop with two SSDs. That said, Disk subsystem could be a major system performance bottleneck and thus there are numerous counters covering this area (Get-CimClass *disk* | Select CimClassName). I would also like to turn your attention to old yet excellent article Top Six FAQs on Windows 2000 Disk Performance if you're interested in subject.

Disk counters:

Note: Microsoft recommends that "when attempting to analyse disk performance bottlenecks, you should always use physical disk counters. However, if you use software RAID, you should use logical disk counters. As for Logical Disk and Physical Disk Counters, the same values are available in each of these counter objects. Logical disk data is tracked by the volume manager(s), and physical disk data is tracked by the partition manager."

The one I look into the most is Disk Queue Length which comes in two flavours; Average and Current.
COUNTER: Win32_PerfFormattedData_PerfDisk_PhysicalDisk\AvgDiskQueueLength (AvgDiskReadQueueLength)
TYPE: Sample, Instance
USAGE:
PS > Get-CimInstance Win32_PerfFormattedData_PerfDisk_PhysicalDisk | Where {$_.Name -eq '_Total'} |
 Select AvgDiskQueueLength, CurrentDiskQueueLength | FL

AvgDiskQueueLength     : 0
CurrentDiskQueueLength : 0
MEANING: Average number of both read and write requests that were queued and waiting for the selected disk during the sample interval as well as requests in service. Since I used "_Total" instance, this means I need to divide the value with number of physical disks on the system. PerfMon shows this value per logical disk.
PS > Get-CimInstance Win32_PerfFormattedData_PerfDisk_LogicalDisk |
 Select Name, AvgDiskQueueLength, CurrentDiskQueueLength | FL

Name                   : HarddiskVolume1 #Boot image on Physical disk 1
AvgDiskQueueLength     : 0
CurrentDiskQueueLength : 0

Name                   : C: #Boot partition on Physical disk 1
AvgDiskQueueLength     : 0
CurrentDiskQueueLength : 0

Name                   : D: #Partition on Physical disk 1
AvgDiskQueueLength     : 0
CurrentDiskQueueLength : 0

Name                   : E: #Partition on Physical disk 2
AvgDiskQueueLength     : 0
CurrentDiskQueueLength : 0

Name                   : G: #Partition on Physical disk 2
AvgDiskQueueLength     : 0
CurrentDiskQueueLength : 0

Name                   : _Total
AvgDiskQueueLength     : 0
CurrentDiskQueueLength : 0 
GOTCHA: Since both "pending" and "in service" requests are counted, this counter might overstate the activity.
THRESHOLD: If more than 2 requests are continuously waiting on a single disk, the disk might be a bottleneck. To analyse queue length data further, use it's components; AvgDiskReadQueueLength and AvgDiskWriteQueueLength.

COUNTER: Win32_PerfFormattedData_PerfDisk_PhysicalDisk\CurrentDiskQueueLength
TYPE: Instantaneous, Instance
USAGE:
PS > Get-CimInstance Win32_PerfFormattedData_PerfDisk_PhysicalDisk | Where {$_.Name -eq '_Total'} |
 Select AvgDiskQueueLength, CurrentDiskQueueLength | FL

AvgDiskQueueLength     : 0
CurrentDiskQueueLength : 0
MEANING: Number of requests outstanding on the disk at the time the performance data is collected. It includes requests being serviced at the time of data collection. The value represents an instantaneous length, not an average over a time interval. Multispindle disk devices can have multiple requests active at one time, but other concurrent requests await service. This property may reflect a transitory high or low queue length. If the disk drive has a sustained load, the value will be consistently high. Requests experience delays proportional to the length of the queue minus the number of spindles on the disks. This difference should average less than two for good performance.
GOTCHA:
THRESHOLD: 2 requests in queue for prolonged period of time for single disk (spindle).

Inner workings of measurement collection:

Values are mostly derived by the diskperf filter driver that provides disk performance statistics. Diskperf is a layer of software sitting in the disk driver stack. As I/O Request packets (IRPs) pass through this layer, diskperf keeps track of the time I/O's start and the time they finish. On the way to the device, diskperf records a timestamp for the IRP. On the way back from the device, the completion time is recorded. The difference is the duration of the I/O request. Averaged over the collection interval, this becomes the Avg. Disk sec/Transfer, a direct measure of disk response time from the point of view of the device driver. Diskperf also maintains byte counts and separate counters for reads and writes, at both the Logical and Physical disk level allowing Avg. Disk sec/Transfer to be broken out into reads and writes. This layer does add to latency but not significantly (up to 5%). Now that we know the mechanics, back to PhysicalDisk\Avg. Disk Queue Length and why we gather both queued and in-service requests in a bunch.
So, AvgDiskQueueLength counter is useful for gathering concurrency data, including data bursts and peak loads. These values represent the number of requests in flight below the driver taking the statistics. This means the requests are not necessarily queued but could actually be in service or completed and on the way back up the path. Possible in-flight locations include the following:
  • SCSIport or Storport queue
  • OEM driver queue
  • Disk controller queue
  • Hard disk queue
  • Actively receiving from a hard disk

Brief account of some other counters:

COUNTER: PhysicalDisk\Disk Writes/sec
MEANING: This counter indicates the rate of write operations on the disk.
THRESHOLD: Depends on manufacturer’s specifications.

COUNTER: PhysicalDisk\Split IO/sec
MEANING: Reports the rate at which the operating system divides I/O requests to the disk into multiple requests. A split I/O request might occur if the program requests data in a size that is too large to fit into a single request or if the disk is fragmented. Factors that influence the size of an I/O request can include application design, the file system, or drivers. A high rate of split I/O might not, in itself, represent a problem. However, on single-disk systems, a high rate for this counter tends to indicate disk fragmentation.
More info in MSDN.

Disk and Memory:

Because memory is cached to disk as physical memory becomes limited, make sure that you have a sufficient amount of memory available. When memory is scarce, more pages are written to disk, resulting in increased disk activity. Also, make sure to set the paging file to an appropriate size. Additional disk memory cache will help offset peaks in disk I/O requests. However, it should be noted that a large disk memory cache seldom solves the problem of not having enough spindles, and having enough spindles can negate the need for a large disk memory cache.

COUNTER: PhysicalDisk\Avg. Disk sec/Transfer
MEANING: This counter indicates the time, in seconds, of the average disk transfer. This may indicate a large amount of disk fragmentation, slow disks, or disk failures.
GOTCHA: Multiply the values of the Physical Disk\Avg. Disk sec/Transfer and Memory\Pages/sec counters. If the product of these counters exceeds 0.1, paging is taking more than 10% of disk access time, so you need more physical memory available.
THRESHOLD: Should not be more than 18 milliseconds.

COUNTER: Memory\Pages/sec
MEANING: This counter indicates the rate at which pages are read from or written to disk to resolve hard page faults. Multiply the values of the Physical Disk\Avg. Disk sec/Transfer and Memory\Pages/sec performance counters. If the product of these values exceeds 0.1, paging is utilizing more than 10 percent of disk access time, which indicates that insufficient physical memory is available.
GOTCHA: A high value for the performance counter could indicate excessive paging which will increase disk I/0. If this occurs, consider adding physical memory to reduce disk I/O and increase performance.
THRESHOLD: A sustained value of more than 5 indicates a bottleneck.


Next I will talk briefly of other counter categories such as Network and Processes.

In this series:
BLOG 1: PerfCounters infrastructure
BLOG 2: PerfCounters Raw vs. Formatted values
BLOG 3: PerfCounters, fetching the values
BLOG 4: PerfCounters, CPU perf data
BLOG 5: PerfCounters, Memory perf data
BLOG 6: PerfCounters, Disk/IO perf data
BLOG 7: PerfCounters, Network and Contention perf data

2015-11-30

Windows PerfCounters and Powershell - Memory perf data

In the last blog I spoke of CPU counters. Now, I'll talk of Memory counters.

MEMORY Counters (CIM_PhysicalMemory class, Win32_PerfFormattedData_PerfOS_Memory class, Memory Performance Information ...):

Note: I introduced the notion of samples and how to fetch them using NextValue() so I will occasionally omit $var.NextValue() going forward.

Let me note here that if you thought previously described performance classes were complicated, you are now entering the realm of black magic ;-) There is a good series of blogs on subject of Memory by Mark Russinovich worth reading although quite old.

Memory is a key resource for any machine so I will look at the most of the values available on Windows. In Resource monitor, Memory tab, you find a bar with Hardware reserved, In use, Modified, Standby and Free values. There are also Available, Cached, Total and Installed values. Let's start with the biggest number, Installed RAM.

In-depth description of Memory Counters important for my use-case:

COUNTER: cim_physicalmemory\Capacity
TYPE: Instantaneous
USAGE: (Get-Ciminstance -class "cim_physicalmemory" | Measure-Object Capacity -Sum).Sum / 1024 / 1024 #MB
MEANING: Total capacity of the physical memory, in bytes. Refers to "Installed".
GOTCHA: You will find tips to use TotalPhysicalMemory but, according to MSDN, it's been deprecated. Also, that page recommends using TotalVisualMemorySize property in the CIM_OperatingSystem class instead but this is wrong as there is no TotalVisualMemorySize property and, even if there was, we need installed memory size.
THRESHOLD:

Intermediate step; how much of the installed memory is available to OS:
COUNTER: win32_operatingsystem\TotalVisibleMemorySize
TYPE: Instantaneous
USAGE: [math]::Round((Get-CimInstance win32_operatingsystem).TotalVisibleMemorySize / 1024,2)
MEANING: Total amount of RAM available to OS. Refers to "Total".
GOTCHA:
THRESHOLD:

Subtracting TotalVisibleMemorySize from Capacity gives us HW reserved RAM, i.e. RAM taken by various HW such as video card. Check this post for details.
COUNTER: HW reserved
TYPE: Calculated
USAGE: cim_physicalmemory\Capacity (Installed) - win32_operatingsystem\TotalVisibleMemorySize (Total)
MEANING: Size of RAM not available to OS although installed on the system. Refers to "Hardware reserved".
GOTCHA: Depends on HW and BIOS settings, not something "fixable" in Windows.
THRESHOLD:

COUNTER: win32_operatingsystem\FreePhysicalMemory (Bytes), Memory\Available MBytes
TYPE: Instantaneous
USAGE:
(Get-WmiObject win32_operatingsystem).FreePhysicalMemory
$Memory_AvailMB = New-Object Diagnostics.PerformanceCounter("Memory", "Available MBytes")
(New-Object Diagnostics.PerformanceCounter("Memory", "Available MBytes")).RawValue

MEANING: Total amount of RAM available to processes. Equal to the sum of memory assigned to the standby (cached), free and zero page lists. Refers to "Available".
GOTCHA:
THRESHOLD: A consistent value of less than 20% of installed RAM. In such situations, consult additional counters, such as Win32_PerfFormattedData_PerfOS_Memory\PagesPerSec to determine if System memory is adequate for the workload.

COUNTER: In use memory
TYPE: Calculated
USAGE: win32_operatingsystem\TotalVisibleMemorySize (Total) - Memory\Available MBytes (Available)
MEANING: Amount of RAM in use by processes running on the box.
GOTCHA:
THRESHOLD:

COUNTER: Memory\Modified Page List Bytes (Win32_PerfFormattedData_PerfOS_Memory)
TYPE: Instantaneous
USAGE: $Memory_ModPLBy = New-Object System.Diagnostics.PerformanceCounter("Memory", "Modified Page List Bytes")
MEANING: The amount of RAM taken by the pages previously belonging to a working set but removed. However, the pages were modified while in use and their current contents haven’t yet been written to storage. The Page Table Entry still refers to the physical page(s) but is marked invalid and in transition. It must be written to the backing store before the physical page can be reused.
GOTCHA: No description in MSDN!?
THRESHOLD: Keep as low as possible.

COUNTER: Win32_PerfFormattedData_PerfOS_Memory\FreeAndZeroPageListBytes
TYPE: Instantaneous
USAGE: (get-wmiobject -computername localhost -Namespace root\CIMV2 -Query "Select * from Win32_PerfFormattedData_PerfOS_Memory").FreeAndZeroPageListBytes / 1024 / 1024 #MB
MEANING: The amount of physical memory, in bytes, that is assigned to the free and zero page lists thus immediately available for allocation to a process or for system use since it does not contain any data. Refers to "Free".
GOTCHA: There is a big difference between Free and Available memory. This is due to most of the pages considered available being in some sort of transition state (i.e. waiting to be written to disk) or have not yet met all of the OS requirements (i.e. page is not considered secure until it's zeroed out).
THRESHOLD: Keep as high as possible.

COUNTER: Standby
TYPE: Calculated
USAGE:
$Memory_SBCCBy = New-Object Diagnostics.PerformanceCounter("Memory", "Standby Cache Core Bytes")
$Memory_SBCNPBy = New-Object Diagnostics.PerformanceCounter("Memory", "Standby Cache Normal Priority Bytes")
$Memory_SBCRBy = New-Object Diagnostics.PerformanceCounter("Memory", "Standby Cache Reserve Bytes")
[math]::Round($Memory_SBCCBy.NextValue()/1024/1024 + $Memory_SBCNPBy.NextValue()/1024/1024+$Memory_SBCRBy.NextValue()/1024/1024,2)

MEANING: The amount of RAM in pages previously belonging to a working set but removed (or marshaled directly into the standby list). The pages weren’t modified since last written to disk. The Page Table Entry still refers to the physical pages but are marked invalid and in transition. Or, simpler explanation, memory that has been removed from a process's working set (its physical memory) en route to disk but is still available to be recalled.
GOTCHA: Please see the explanation of the factors in Win32_PerfFormattedData_PerfOS_Memory or Memory Object MSDN pages.
THRESHOLD:

COUNTER: Cached
TYPE: Calculated
USAGE:
MEANING: This number represents the sum of the system working set, standby list and modified page list. So, Memory\Cache Bytes, Memory\Modified Page List Bytes, Memory\Standby Cache Core Bytes, Memory\Standby Cache Normal Priority Bytes and Memory\Standby Cache Reserve Bytes. In this case, Memory\Cache Bytes + Memory\Modified Page List Bytes + Standby.
GOTCHA: Presented here for the sake of completeness.
THRESHOLD:

More counters of significance:

Win32_PerfFormattedData_PerfOS_Memory\CacheBytes - Number of bytes currently being used by the file system cache. The file system cache is an area of physical memory that stores recently used pages of data for applications. The operating system continually adjusts the size of the cache, making it as large as it can while still preserving the minimum required number of available bytes for processes. This property displays the last observed value only; it is not an average. See also SystemCacheResidentBytes and relatives.
Simpler explanation would be that the memory pages that the System uses are counted in two main counters, Cache Bytes and Pool Nonpaged Bytes. The Cache Bytes counter value is the amount of resident pages allocated in RAM that the Kernel threads can address without causing a Page Fault. This counter includes the Pool Paged Resident Bytes, the System Cache Resident Bytes, the System Code Resident Bytes and the System Driver Resident Bytes.

Note: If Memory\Pool Nonpaged Bytes value is 10% or more higher than its value at system startup, there is probably a leak.

Win32_PerfFormattedData_PerfOS_Memory\CacheFaultsPerSec - Number of faults which occur when a page is not found in the file system cache and must be retrieved from elsewhere in memory (a soft fault) or from disk (a hard fault). The file system cache is an area of physical memory that stores recently used pages of data for applications. Cache activity is a reliable indicator of most application I/O operations. This property counts the number of faults without regard for the number of pages faulted in each operation.

There is a whole set of Paging counters and they do require our attention since we can deduce Memory shortages on Windows by using them. Some of the key counters I will describe below. Dealing with Windows Paging you have to keep in mind that paging occurs for various operations within OS and excessive paging doesn’t automatically indicate a memory shortage. For instance, many network card drivers utilize the Pagefile (sometimes excessively) and this can be misread as a memory shortage.

Win32_PerfFormattedData_PerfOS_Memory\PagesPerSec (and relatives) - A sustained value of over 20 should be closely monitored and a System with a sustained value of over 50 is probably lacking in System Memory. Again, it is normal for this value to spike occasionally, especially if the other Memory counters do not show a lack of System Memory.

COUNTER: Pages Input per second / Page Reads per second
TYPE: Calculated
USAGE:
$Memory_PIps = New-Object Diagnostics.PerformanceCounter("Memory", "Pages Input/sec")
$Memory_PRps = New-Object Diagnostics.PerformanceCounter("Memory", "Page Reads/sec")
[math]::Round ($Memory_PIps.NextValue() / $Memory_PRps.NextValue(),2)

MEANING: The average of Memory\Pages Input/sec divided by average of Memory\Page Reads/sec gives the number of pages per disk read. This value should not generally exceed five pages per second. A value greater than five indicates that the system is spending too much time paging and requires more memory (assuming that the application has been optimized).
GOTCHA:
THRESHOLD: Sustained value of 5 or more.

Some other interesting counters I will not be covering in detail:

Memory\Page Reads/sec
Memory\Page Writes/sec
Paging File(_total)\% Usage
and so on.

In the next blog I will cover Disk counters.

In this series:
BLOG 1: PerfCounters infrastructure
BLOG 2: PerfCounters Raw vs. Formatted values
BLOG 3: PerfCounters, fetching the values
BLOG 4: PerfCounters, CPU perf data
BLOG 5: PerfCounters, Memory perf data
BLOG 6: PerfCounters, Disk/IO perf data
BLOG 7: PerfCounters, Network and Contention perf data

2015-11-23

Windows PerfCounters and Powershell - CPU perf data

So far, I talked of WMI, CIM, WQL, System.Diagnostics.PerformanceCounterCategory, perf-counter data organization and flavour. Now it's time to look at some performance counters I deem important for my use-case more closely.
Note: List of available Counters for Get-Counter command
Get-Counter -ListSet * | Sort-Object CounterSetName | Format-Table CounterSetName

Basic concepts:

I will introduce basic concepts of Processor, Core and CPU now to help you follow the text. Let us use this convention:
  • "Processor" is a piece of hardware you connect to a slot on the motherboard.
  • "Physical Core" is a physical computing unit built into the "Processor".
  • "Virtual Core" is a virtual computing unit built on top of "Physical Core" (i.e. HT is ON).
  • "CPU" is a computing unit inside the "Processor", either physical or virtual.


Putting concepts to work

Now lets calculate number of CPUs for my laptop:
PS > ((Get-CimInstance -Namespace root/CIMV2 -ClassName CIM_Processor).NumberOfLogicalProcessors | Measure-Object -Sum).Sum

4

Note: Many other counters fail for some HW configuration and/or OS! Be sure to check.
Note: HT is ON on my dual-core laptop and no cores are parked so to get number of Physical cores:
PS > ((Get-CimInstance -Namespace root/CIMV2 -ClassName CIM_Processor).NumberOfCores | Measure-Object -Sum).Sum

2

Note: There are many ways to collect this info:
PS > (Get-CimInstance Win32_ComputerSystem).NumberOfLogicalProcessors
PS > ((New-Object Diagnostics.PerformanceCounterCategory("Processor Information")).GetInstanceNames() | ?{$_ -match "^(\d{1}),(\d{1})"} | Measure-Object -Sum).Count
Note: RegEx expression is matching "Number,Number" Instances only (See previous blog about instances).

It is not obvious when working with 1 NUMA node/Slot, but the -Sum might refer to Sum of CPUs per Slot, depending on RegEx.

Before starting on Counters, let me stress that the measurements at the system, process and thread level in Windows are based on a sampling methodology thus the data gathered is subject to typical sampling errors like:
  • accumulating a "sufficient" number of sample observations to be able to make a reliable statistical inference, i.e. the sampling size
and
  • ensuring that there aren’t systemic sources of sampling error that causes results to be under or over-sampled as I will demonstrate shortly.

As of W2K8, the trends are changing towards event driven measurement for CPU utilization which, although more sane and accurate, poses its own set of challenges (say, a clock drift across multiprocessor cores when they are not resynchronized periodically and so on). To compensate for drift, new PerfMon/ResMon work by measuring CPU load in real time using event oriented measurement data gathered by the OS Scheduler each time a context switch occurs.
A context switch occurs in Windows whenever the processor switches its execution context to run a different thread (see more below). Context switches also occur as a result of high priority Interrupt Service Routines (ISRs) as well as the Deferred Procedure Calls (DPCs) that ISRs schedule to complete the interrupt processing. Starting in Windows 6 (Vista/2008), the OS Scheduler began issuing RDTSC instructions to get the internal processor clock each time a context switch occurs. I will talk of context switching and DPC counters in a short while. For more details please see this excellent blog post.

System CPU counters:

First counter I want to talk about is Processor Queue Length. Immediately a Linux users observes that there is no "System load" counter on Windows. This is because Windows OS is Thread based as opposed to Linux which is Process based. This simply means that, in Windows, an execution thread is a basic unit of execution (thus basis for collecting usage statistics too) and a process acts as a container for threads. As simple as it may seem, this actually poses a lot of challenges since one has to start aggregating data about running processes from Threads counters and work his way up. I will talk about this in detail in final blog. So, the WMI counter mimicking Linux "System load" best is, IMO, Processor Queue Length:
PS > Get-Counter '\System\Processor Queue Length'

Timestamp                 CounterSamples                                                      
---------                 --------------                                                      
23.10.15. 10:34:10        \\server_name\system\processor queue length : 1                                          
However, this is slooooow (although subsequent calls return much faster):
PS > Measure-Command { Get-Counter '\System\Processor Queue Length' }

TotalSeconds      : 4.2961321

PS > Measure-Command { Get-Counter '\System\Processor Queue Length' }

TotalSeconds      : 1.007445
So, as described in previous blog, I use System.Diagnostics class to fetch this value:
PS > Measure-Command { New-Object Diagnostics.PerformanceCounter("System", "Processor Queue Length")}

TotalSeconds      : 2.0006457

PS > Measure-Command { New-Object Diagnostics.PerformanceCounter("System", "Processor Queue Length")}

TotalSeconds      : 0.000643
Now, put this into a variable and simply call NextValue():
PS > $System_ProcQL = New-Object Diagnostics.PerformanceCounter("System", "Processor Queue Length")
PS > $System_ProcQL.NextValue()
0
PS > $System_ProcQL.NextValue()
10
The value obtained is for all of the CPU's so you need to calculate the number of CPU's to be your divider and obtain the real value:
$SystemLoad = $System_ProcQL.NextValue() / $totCPU


In-depth description of System Counters important for my use-case:

COUNTER: System\Processor Queue Length
TYPE: Instantaneous
USAGE: New-Object Diagnostics.PerformanceCounter("System", "Processor Queue Length") / ((Get-CimInstance -Namespace root/CIMV2 -ClassName CIM_Processor).NumberOfLogicalProcessors | Measure-Object -Sum).Sum
MEANING: Number of threads per CPU that are ready for execution but can't get CPU cycles for whatever reason thus waiting in OS Scheduler queue. Since Windows have one Scheduler queue, I divide this value with total number of computation units (i.e. CPUs). The actual mechanics is that when Counter value is requested a measurement function traverses the Scheduler Ready Queue and counts the number of threads waiting for an available CPU.
GOTCHA: Even on idle system there can be significant number of threads running on schedule that can bump this number very high. Say you have 4 CPU box and processes fetching values for 100 counters, 10 samples every 1 second. All of these sample requests will lay sleeping for 1 second (thus the Processor Queue Length value will be low) and then all will wake up at the same timer event (clock interrupt) causing Processor Queue Length to spike although there is no real load on the system. It's even worse if your thread(s) is of high priority as it will get executed sooner than the user threads thus pushing Processor Queue Length number very very high. This leads to disproportionate number of Ready Threads waiting for cycles, even (or especially) when the processor itself is not very busy overall. So tip 1 would be to check if CPUs are really busy or not.
THRESHOLD: Pending on above, it is hard to tell what the threshold value is but most people seem to agree it's "sustained value of 2 or more" with CPU utilization of 85%+. This combination tells us we can benefit from adding more CPUs.

COUNTER: System\Context Switches/sec
TYPE: Instantaneous
USAGE:
$System_CSpS = New-Object Diagnostics.PerformanceCounter("System", "Context Switches/sec")
$System_CSpS.NextValue()

MEANING: Context switching happens when a higher priority thread pre-empts a lower priority thread that is currently running or when a high priority thread blocks. High levels of context switching can occur when many threads share the same priority level. This often indicates that there are too many threads competing for the processors on the system. If you do not see much processor utilization and you see very low levels of context switching, it could indicate that threads are blocked (link).
GOTCHA: The number obtained is system-wide! To report the total number of context switches generated per second by all threads use the Thread(_Total)\Context Switches/sec counter (Category((Instance)\Counter):
New-Object Diagnostics.PerformanceCounter("Thread", "Context Switches/sec", "_Total")
THRESHOLD: Context switching rates in excess of 15,000 per second per CPU. The remedy would be to reduce the number of threads and queue more at the application level. This will cause less context switching, and less context switching is good for reducing CPU load.


In-depth description of CPU Counters important for my use-case:

Note: "Processor Information" category, besides overall _Total, has instances for Slot/NUMA node (0,_Total, n,_Total) while "Processor" category gives just _Total for all CPUs as defined above.
Gotcha: On single slot machines, "Processor" category will give info for all the CPUs while on machines with multiple slots, it will give info on just the Physical cores :-/
Thus, if InstanceName is _Total, both yield the same value.

COUNTER: Processor Information(_Total)\% Processor Time, Processor(_Total)\% Processor Time
TYPE: Sample, Instance
USAGE:
$InstanceName = "_Total"
$PI_PT = New-Object Diagnostics.PerformanceCounter("Processor Information", "% Processor Time")
$PI_PT.InstanceName = $InstanceName
$null = $PI_PT.NextValue()
--or--
Get-Counter -Counter "\Processor Information(_Total)\% Processor Time"
Get-Counter -Counter "\Processor(_Total)\% Processor Time"

MEANING: Primary indicator of CPU activity. High values many not necessarily be bad. However, if the other processor-related counters are increasing linearly such as Processor\% Privileged Time or System\Processor Queue Length, high CPU utilization may be worth investigating.
GOTCHA: If this counter is around threshold value, starting new processes will only lead to increased value of Processor Queue Length but the work done will remain the same. Look for some more counters that I'm about to describe in relation to this one.
THRESHOLD: Folks seem to agree on ~85%. Low CPU utilization with sustained Processor Queue Length value of 2 or higher is indicator that requests for CPU time arrive randomly and threads demand irregular amounts of time from the CPU. This means that the processor power is not a bottleneck but that the application threading logic should be improved.

COUNTER: Processor Information(_Total)\% Privileged Time
TYPE: Sample, Instance
USAGE:
$InstanceName = "_Total"
$PI_PPT = New-Object Diagnostics.PerformanceCounter("Processor Information", "% Privileged Time")
$PI_PPT.InstanceName = $InstanceName
$null = $PI_PPT.NextValue()

MEANING: Counter indicates the percentage of non-idle CPU time spent in privileged mode, i.e. calls to OS functions (file or network I/O, memory allocation...). Basically, this is unrestricted mode allowing direct access to hardware and all memory.
GOTCHA:
THRESHOLD: Folks seem to agree on consistently being over 75%.

COUNTER: Processor Information(_Total)\% User Time
TYPE: Sample, Instance
USAGE:
$InstanceName = "_Total"
$PI_PUT = New-Object Diagnostics.PerformanceCounter("Processor Information", "% User Time")
$PI_PUT.InstanceName = $InstanceName
$null = $PI_PUT.NextValue()

MEANING: Percentage of non-idle CPU time spent in user mode. User mode is a restricted processing mode designed for applications, environment subsystems, and integral subsystems.
GOTCHA: Processor Information(_Total)\% Privileged Time +
Processor Information(_Total)\% User Time = Processor Information(_Total)\% Processor Time.
THRESHOLD: Depends on previous two counters.

COUNTER: Processor Information(_Total)\% Idle Time
TYPE: Sample, Instance
USAGE:
$InstanceName = "_Total"
$PI_PIT = New-Object Diagnostics.PerformanceCounter("Processor Information", "% Idle Time")
$PI_PIT.InstanceName = $InstanceName
$null = $PI_PIT.NextValue()

MEANING: Counter indicates the percentage of time OS idle thread was consuming cycles. On Windows, there is a special Kernel thread that consumes cycles when CPU is idling. Counting cycles consumed by this thread gives Idle CPU time.
GOTCHA: Processor Information(_Total)\% Processor Time + Processor Information(_Total)\% Idle Time = 100%
THRESHOLD:

COUNTER: Processor Information(_Total)\% Priority Time
TYPE: Sample, Instance
USAGE:
$InstanceName = "_Total"
$PI_PPRIOT = New-Object Diagnostics.PerformanceCounter("Processor Information", "% Priority Time")
$PI_PPRIOT.InstanceName = $InstanceName
$null = $PI_PPRIOT.NextValue()

MEANING: CPU utilization by high priority threads.
GOTCHA: Kernel scheduler can, on occasion, wake up low priority threads sleeping for "long" time assigning them much more slices on CPU than one would expect given the (low)priority. This, in turn, blocks high-priority threads from execution which is never an expected behaviour. I would look at this value in relation to Context switches/second to determine what's going on.
THRESHOLD:

COUNTER: Processor Information\Interrupts/sec
TYPE: Sample, Instance
USAGE:
$InstanceName = "_Total"
$PI_INTPS = New-Object Diagnostics.PerformanceCounter("Processor Information", "Interrupts/sec")
$PI_INTPS.InstanceName = $InstanceName
$null = $PI_INTPS.NextValue()

MEANING: Number of hardware interrupts per second. This value is the indicator of the activity of devices that generate interrupts, such as network adapters.
GOTCHA: See next counter.
THRESHOLD:

COUNTER: Processor Information\% Interrupt Time
TYPE: Sample, Instance
USAGE:
$InstanceName = "_Total"
$PI_PINTT = New-Object Diagnostics.PerformanceCounter("Processor Information", "% Interrupt Time")
$PI_PINTT.InstanceName = $InstanceName
$null = $PI_PINTT.NextValue()

MEANING: The value indicates the percentage of time CPUs spend receiving and servicing hardware interrupts. This value is an indirect indicator of the activity of devices that generate interrupts, such as network adapters.
GOTCHA: Mass increase in Processor Information\Interrupts/sec and Processor Information\% Interrupt Time indicates potential hardware problems.
THRESHOLD:

COUNTER: Processor Information\DPCs Queued/sec
TYPE: Sample, Instance
USAGE:
$InstanceName = "_Total"
$PI_DPCQPS = New-Object Diagnostics.PerformanceCounter("Processor Information", "DPCs Queued/sec")
$PI_DPCQPS.InstanceName = $InstanceName
$null = $PI_DPCQPS.NextValue()

MEANING: Overall rate at which deferred procedure calls ("SW interrupts") are added to the processor's DPC queue. This property measures the rate at which DPCs are added to the queue, not the number of DPCs in the queue.
GOTCHA: This is NOT the number of SW interrupts in the queue!
THRESHOLD:

COUNTER: Processor Information\DPC Time
TYPE: Sample, Instance
USAGE:
$InstanceName = "_Total"
$PI_PDPCT = New-Object Diagnostics.PerformanceCounter("Processor Information", "% DPC Time")
$PI_PDPCT.InstanceName = $InstanceName
$null = $PI_PDPCT.NextValue()

MEANING: Percentage of time that the processor spent receiving and servicing deferred procedure calls (SW interrupts) during the sample interval. They are counted separately and are not a component of the interrupt counters.
GOTCHA: This property is a component of PercentPrivilegedTime because DPCs are executed in privileged mode.
THRESHOLD:

Other useful counters I would look into in case of trouble are C1/C2/C3TransitionsPerSec. There is a huge penalty waking up CPU from C3 low power state to C2 low power state and considerable penalty transitioning from C2 to C1. So if box is choking and CPUs are idling, look here. And make sure ParkingStatus for each CPU is 0 ;-)
Example: (Physical) CPU 9 in Slot 7 was asleep:
PS > GCim Win32_PerfFormattedData_Counters_ProcessorInformation
...
Name                        : 7,9
AverageIdleTime             : 100
C3TransitionsPersec         : 64
ClockInterruptsPersec       : 64
IdleBreakEventsPersec       : 64
InterruptsPersec            : 64
PercentC3Time               : 99
...
Basically, only processing timer events.

There are also combinations of counters that can point out problems like Processor\% DPC Time, % Interrupt Time and % Privileged Time. If Interrupt Time and DPC time are a large portion of Privileged Time, the kernel is spending significant amount of time processing (most likely) I/O requests. In some cases performance can be improved by configuring interrupts and DPC affinity to a small number of CPUs on a multiprocessor system, which improves cache locality. In other cases, it works best to distribute the interrupts and DPCs among many CPUs, so as to keep the interrupt and DPC activity from becoming a bottleneck.

In the next blog I will cover Memory performance counters.

In this series:
BLOG 1: PerfCounters infrastructure
BLOG 2: PerfCounters Raw vs. Formatted values
BLOG 3: PerfCounters, fetching the values
BLOG 4: PerfCounters, CPU perf data
BLOG 5: PerfCounters, Memory perf data
BLOG 6: PerfCounters, Disk/IO perf data
BLOG 7: PerfCounters, Network and Contention perf data

2015-11-16

Windows PerfCounters and Powershell - Fetching the values

Summary from last blog:

  • Tip: An alias for Get-CimInstance is GCim, for Select-Object it's Select and for Get-WmiObject is GWmi.
  • There are Raw and Formatted counters. Watch out for formula converting Raw samples to Formatted values.


NAMESPACE organization

The general organization of namespaces is as follows:
  Category (Class if you prefer)
    Counter(s)
      Instance(s)
Every Category has Counters but not all of the Counters have Instances. The full path to the desired value is called a __PATH:
PS > GWmi Win32_PerfFormattedData_PerfOS_Processor | Select __Path

__PATH
------
      namespace                  Category/Class         InstanceName
\\localhost\root\cimv2:Win32_PerfFormattedData_PerfOS_Processor.Name="0"
...
\\.\root\cimv2:Win32_PerfFormattedData_PerfOS_Processor.Name="_Total"
Note: "." and "localhost" are synonyms.
Note: Knowing paths will allow us to move faster around as well as writing WMI queries as I will demonstrate later.


Putting .NET to work

Using System.Diagnostics.PerformanceCounterCategory class (which I prefer for fetching single values that update often) lets take two Categories as an example.
Note: I saw a lot of questions regarding WMIRefresher that exists in VB and has no apparent counterpart in Powershell and this is it, IMO, since variable pointing to counter object holds (lightweight) connection to path reducing the overhead when fetching next value. Another good example of refreshing WMI data is described in Tip 10/c. This "trick" will do the fetch of entire Processor Information category in approximately 0.3s on my laptop which is about as fast as it gets. Remember that the first execution usually takes a while (couple of seconds).
PS > [System.Diagnostics.PerformanceCounterCategory]::GetCategories() |  Select  CategoryName | Sort CategoryName

CategoryName
------------
.NET CLR Data
.NET CLR Security
.NET Data Provider for Oracle
Cache
Memory
Network Interface
Objects
PhysicalDisk
Process
Thread
.... there are 91 in total.
PS > $Category = "Memory"
PS > (New-Object Diagnostics.PerformanceCounterCategory($Category)).GetCounters("") |
  Select  CounterName | Sort CounterName

CounterName
-----------
Available Bytes
Cache Bytes
Cache Faults/sec
Modified Page List Bytes
Page Faults/sec
Pages/sec
Pool Nonpaged Bytes
Standby Cache Reserve Bytes
System Driver Resident Bytes
Write Copies/sec
.... there are 35 in total.
PS > (New-Object Diagnostics.PerformanceCounterCategory($Category)).GetInstanceNames()

PS >
So, Memory category has 35 Counters and no Instances which means you can fetch values directly:
PS > $tmp = (New-Object Diagnostics.PerformanceCounter($Category, "Available MBytes")
PS > $tmp.NextValue()

5009
Note: Available MBytes is ever updating value not dependant on number of samples. Thus calling NextValue() was unnecessary but good practice.

Fetching data from Processor Information category is a bit different:
PS > $Category = "Processor Information"
PS > (New-Object Diagnostics.PerformanceCounterCategory($Category)).GetInstanceNames() | Sort

_Total
0,_Total
0,0
0,1
0,2
0,3
Processor Information has 6 instances which means I am writing this on dual-core laptop with HT enabled (2 physical CPUs, 2 logical CPU's and the 2 Totals). It is rather interesting to play with these counters on proper servers. For now, important thing is to notice that "_Total" stands for entire box, "N, _Total" represents "Socket N" instance while "N,M" stands for "Socket,CPU" instance.

To fetch particular perf-counter value, I have to provide InstanceName:
PS > $Category = "Processor Information"
PS > $InstanceName = "_Total"
PS > $tmp = New-Object Diagnostics.PerformanceCounter($Category, "% Processor Time")
PS > $tmp.InstanceName = $InstanceName
PS > $tmp.NextValue()

21.97821
Tip:There is an overload allowing you to write New-Object Diagnostics.PerformanceCounter("Processor Information", "% Processor Time", "_Total") instead of providing Category/InstanceName members via variables.

Let's check on New-Object members:
PS > $tmp

CategoryName     : Processor Information
CounterHelp      : % Processor Time is the percentage of elapsed
time that the processor spends to execute a non-Idle thread. It is
calculated by measuring the percentage of time that the processor spends
executing the idle thread and ...
CounterName      : % Processor Time
CounterType      : Timer100NsInverse
InstanceLifetime : Global
InstanceName     : Total
ReadOnly         : True
MachineName      : 
RawValue         : 131585400020
Site             : 
Container        : 
Note: It is always a good practice to read CounterHelp and make note of CounterType. This will tell you a lot about values obtained.
PS > $tmp.RawValue

131585502870
PS > $tmp.NextSample()

RawValue         : 131585517490
BaseValue        : 0
SystemFrequency  : 2533388
CounterFrequency : 2533388
CounterTimeStamp : 117273907751
TimeStamp        : 44097511588
TimeStamp100nSec : 130899951977398973
CounterType      : Timer100NsInverse
Note: Please check on previous blog, paragraph about Raw/Formatted counters and sampling.
Note: There is a plenitude of counters and you are free to explore them in search for one that suits your needs best. I.e. if you do not want to bother with Instances here, you can use Win32_PerfFormattedData_PerfOS_Processor Class which uses absolute Index to each CPU (see below):
PS > GCim Win32_PerfFormattedData_PerfOS_Processor | Select Name

Name
----
0
1
...
_Total

So, how do we tell if Counter has Instances?

PS > $Category = "Processor Information"
PS > New-Object Diagnostics.PerformanceCounterCategory($Category)

CategoryName CategoryHelp CategoryType MachineName
------------ ------------ ------------ -----------
Processor InformationThe Processor Information ...MultiInstance .
PS > $Category = "Memory" PS > New-Object Diagnostics.PerformanceCounterCategory($Category)
CategoryName CategoryHelp CategoryType MachineName
------------ ------------ ------------ -----------
Memory The Memory performance obj...SingleInstance .
The answer is obviously in CategoryType member of PerformanceCounterCategory class which you should check while iterating.
Note: The dot in MachineName stands for localhost.

You can check instance values directly with WMI too using their individual instance paths. Remember WMI classes: GWmi -List | Select Name | Where {$_.Name -match "Win32_PerfForm"}

Let's take Win32_PerfFormattedData_PerfOS_Processor for the example:
PS > GWmi Win32_PerfFormattedData_PerfOS_Processor | Select __Path

__PATH
------
\\localhost\root\cimv2:Win32_PerfFormattedData_PerfOS_Processor.Name="0"
...
\\.\root\cimv2:Win32_PerfFormattedData_PerfOS_Processor.Name="_Total"
Note: Remember that Path property begins with two underscores.

Now that you know the path to a WMI instance, you can access it directly by converting the WMI path to a WMI object:
PS > [WMI]'Win32_PerfFormattedData_PerfOS_Processor.Name="0"'

__GENUS               : 2
__CLASS               : Win32_PerfFormattedData_PerfOS_Processor
__SUPERCLASS          : Win32_PerfFormattedData
__DYNASTY             : CIM_StatisticalInformation
__RELPATH             : Win32_PerfFormattedData_PerfOS_Processor.Name="0"
__PROPERTY_COUNT      : 24
__DERIVATION          : {Win32_PerfFormattedData, Win32_Perf, CIM_StatisticalInformation}
__SERVER              : ...
__NAMESPACE           : root\cimv2
__PATH                : \\...\root\cimv2:Win32_PerfFormattedData_PerfOS_Processor.Name="0"
C1TransitionsPersec   : 263
DPCRate               : 0
DPCsQueuedPersec      : 27
InterruptsPersec      : 1054
Name                  : 0
PercentDPCTime        : 0
PercentIdleTime       : 99
PercentInterruptTime  : 0
PercentPrivilegedTime : 0
PercentProcessorTime  : 0
PercentUserTime       : 0
Note: __PROPERTY_COUNT : 24 means I trimmed some lines.
Pick one counter:
PS > ([WMI]'Win32_PerfFormattedData_PerfOS_Processor.Name="0"').PercentProcessorTime

68
PS > [WMI]'Win32_Service.Name="RemoteAccess"'

ExitCode  : 1077
Name      : RemoteAccess
ProcessId : 0
StartMode : Disabled
State     : Stopped
Status    : OK

PS > [WMI]'\\.\root\cimv2:Win32_LogicalDisk.DeviceID="C:"'

DeviceID     : C:
DriveType    : 3
ProviderName : 
FreeSpace    : 68152655872
Size         : 168037445632
VolumeName   : System
Note that '.' stands for 'localhost'. You can provide Server name here.
Note: You can also specify the full WMI path, including a machine name to access WMI objects on remote systems (provided you have sufficient access rights).

Tip: There is a hidden object property called "PSTypeNames" which will tell you the object type as well as the inheritance chain:
PS > (GWmi Win32_PhysicalMemory).PSTypeNames

System.Management.ManagementObject#root\cimv2\Win32_PhysicalMemory
System.Management.ManagementObject#root\cimv2\CIM_PhysicalMemory
System.Management.ManagementObject#root\cimv2\CIM_Chip
System.Management.ManagementObject#root\cimv2\CIM_PhysicalComponent
System.Management.ManagementObject#root\cimv2\CIM_PhysicalElement
System.Management.ManagementObject#root\cimv2\CIM_ManagedSystemElement
System.Management.ManagementObject#Win32_PhysicalMemory
System.Management.ManagementObject#CIM_PhysicalMemory
System.Management.ManagementObject#CIM_Chip
System.Management.ManagementObject#CIM_PhysicalComponent
System.Management.ManagementObject#CIM_PhysicalElement
System.Management.ManagementObject#CIM_ManagedSystemElement
System.Management.ManagementObject
System.Management.ManagementBaseObject
System.ComponentModel.Component
System.MarshalByRefObject
System.Object
Of course, the type listed at the top is telling you the most:
PS > (GWmi Win32_PhysicalMemory).PSTypeNames[0]

System.Management.ManagementObject#root\cimv2\Win32_PhysicalMemory
PSTypeNames will work for all objects and might come handy navigating namespaces.

Note: You can do things with WMI objects, not just read counters. Check, for example, Win32_LogicalDisk device Chkdsk method:
PS > ([WMI]'\\.\root\cimv2:Win32_LogicalDisk.DeviceID="C:"').Chkdsk
OverloadDefinitions
-------------------
System.Management.ManagementBaseObject Chkdsk(System.Boolean FixErrors,
System.Boolean VigorousIndexCheck, System.Boolean SkipFolderCycle,
System.Boolean ForceDismount, System.Boolean RecoverBadSectors,
System.Boolean OkToRunAtBootUp)
This functionality as well as accessing remote machines is beyond scope of the document and mentioned here just for the sake of completeness.

It is also worth noting you can call WMI methods with CIM cmdlets. Please see this POWERTIP for details if you're interested.

Useful WMI links:

Powertip 1, auto-discovering online help for wmi
Powertip 2, getting help on wmi methods


WQL

I mentioned earlier you can write your own WQL queries to fetch data from WMI objects. WQL is the WMI Query Language, a subset of the ANSI SQL with minor semantic changes.
GWmi Win32_Process -Filter "Name like ""power%.exe""" translates to WQL query 'select * from Win32_Process where Name like "power%.exe"'. So, to get process owner for example:
$processes = GWmi -Query 'select * from Win32_Process where Name like "power%.exe"'
$extraproc =
  ForEach ($process in $processes)
  {
    Add-Member -MemberType NoteProperty -Name Owner
    -Value (($process.GetOwner()).User)
    -InputObject $process -PassThru
  }
$extraproc | Select-Object -Property Name, Owner
Note: Make sure Add-Member ... -PassThrough line is not broken if you want this code to work.
Personally, I find WQL inadequate since it's missing aggregation functions thus I use it very rarely.


Summing it up

get classes:

GWmi -List
or
Get-CimClass | Select CIMClassName
or
[System.Diagnostics.PerformanceCounterCategory]::GetCategories()

shorten the list:

GWmi -List Win32_*memory* | Select Name
or
Get-CimClass | Select CIMClassName | Where {$_.CimClassName -match "memory"}
or
[System.Diagnostics.PerformanceCounterCategory]::GetCategories() | Where {$_.CategoryName -match "memory"} | Select CategoryName

list counters:

GWmi Win32_PhysicalMemory
or
GCim CIM_PhysicalMemory
or
(New-Object Diagnostics.PerformanceCounterCategory("Memory")).GetCounters("") | Select CounterName | Sort CounterName

and for WQL, write a query:

GWmi -Query 'select Manufacturer from Win32_PhysicalMemory where BankLabel = "BANK 0"'
or
GCim -Query 'Select Manufacturer from CIM_PhysicalMemory Where BankLabel = "BANK 0"'

Note: Opening communication and fetching objects from WMI server might take considerable amount of time. Counting CPU's takes at least few seconds on my boxes:
PS > Measure-Command {((GCim -Namespace root/CIMV2 -ClassName CIM_Processor).NumberOfLogicalProcessors | Measure-Object -Sum).Sum}
...
TotalSeconds : 2.2558991
...

Tip: The fastest way to learn how many CPUs there are on the box is (GCim Win32_ComputerSystem).NumberOfLogicalProcessors
Note: In my experience, best time to fetch some value (if not cached) is about 0.3 seconds so that's what I'm aiming for always.


Conclusion:

To speed up fetching counter data, where available, I like to use System.Diagnostics .NET class:
$System_CSpS = New-Object Diagnostics.PerformanceCounter("System", "Context Switches/sec")
$System_CSpS.NextValue()

Since some of the performance counters are empty upon first access, calling NextValue() is a good habit. Subsequent calls to $var.NextValue() are lightning fast so put it in variable. The next thing this technique is good for are values that are always there; such as amount of (free)RAM, Context Switches per second and so on. Although possible, I do not use this mechanism for values that might disappear, such as number of threads belonging to some process as process might die.
If you are not able to use System.Diagnostics or you prefer CIM approach you can always fall back to Tip 10/c:
PS > # Get instance of a class
PS > $p = Get-CimInstance -ClassName Win32_PerfFormattedData_PerfOS_Processor
PS > # Perform get again by passing the instance received earlier, and get the updated properties. The value of $p remains unchanged.
PS > $p | Get-CimInstance | select PercentProcessorTime

Beware that for counters with instances, you need to supply InstanceName:
$InstanceName = "_Total"
$PI_PT = New-Object Diagnostics.PerformanceCounter("Processor Information", "% Processor Time")
$PI_PT.InstanceName = $InstanceName
#or $PI_PT = New-Object Diagnostics.PerformanceCounter("Processor Information", "% Processor Time", "_Total")
$PI_PT.NextValue()

If you are interested in measuring just one value, Get-Counter is your friend but I prefer TAB completion of CIM classes approach over iterating through paths (say, (Get-Counter -listset memory).paths).

You can almost always accomplish the same thing using WMI and CIM cmdlest. Prefer CIM over WMI.

WQL is cumbersome and lacking many commands. Avoid.

Remember the hierarchy:
  Category (Class if you prefer)
    Counter(s)
      Instance(s)
Every Category has Counters but not all of the Counters have Instances. Check before using.


Next blog will deal with several specific counters and the meaning of the values obtained in terms of performance.

In this series:
BLOG 1: PerfCounters infrastructure
BLOG 2: PerfCounters Raw vs. Formatted values
BLOG 3: PerfCounters, fetching the values
BLOG 4: PerfCounters, CPU perf data
BLOG 5: PerfCounters, Memory perf data
BLOG 6: PerfCounters, Disk/IO perf data
BLOG 7: PerfCounters, Network and Contention perf data

2015-11-09

Windows PerfCounters and Powershell - Raw vs. Formatted values


How to interpret Raw data from Windows performance counters.


Tip: An alias for Get-CimInstance is GCim and alias for Get-WmiObject is GWmi.

In the first blog post, I covered what WMI/CIM is and how to get info from there. Last I talked about was RawData counters:
Get-CimInstance -Class Win32_PerfRawData_PerfOS_Processor

Name : _Total
...
PercentIdleTime : 78061457390



Understanding RawData:

By itself, a RawData value is a sample but important thing is to determine what concrete sample value actually is and how to convert it to a form we understand. In this example, MSDN tells us PercentIdleTime is a counter of type 542180608:
  PercentIdleTime
        Data type: uint64
        Access type: Read-only
        Qualifiers: DisplayName ("% Idle Time") , CounterType (542180608) , DefaultScale (0) , PerfDetail (400)

Bear in mind, most of RawData counters need 2 samples to produce humanly readable result thus now we need a formula to convert Raw counter values into something meaningful.
Numeric-to-Name conversion (542180608 -> PERF_100NSEC_TIMER) of counter type values is listed in this MSDN page. The actual formula is then located under entries listed here as described in this page:

(N1 - N0) / (D1 - D0) x 100, where the denominator (D) represents the total elapsed time of the sample interval, and the numerator (N) represents the portions of the sample interval during which the monitored components were active.

This translates to:
      $Val =
      (
      (PercentIdleTime_Sample[n] - PercentIdleTime_Sample[n-1])
      /
      (Timestamp_Sys100NSSample[n] - Timestamp_Sys100NSSample[n-1])
      ) *100
Note: Although formula is correct according to documentation, it usually summarizes result over all CPU's (say, when fetching CPU utilization per process/thread) thus the result will most likely be well over 100% on modern boxes. In such case, we need to divide samples with total number of CPU's:
      $Val =
      (
      (PercentIdleTime_Sample[n] - PercentIdleTime_Sample[n-1])
      /(
      (Timestamp_Sys100NSSample[n] - Timestamp_Sys100NSSample[n-1])
      *$TotProc)
      ) *100


With the Class Win32_PerfRawData_PerfOS_Processor it's easy. It has Instance named _Total and when you apply formula to its values you will get proper result. I will talk of this more in next blogs.

Conclusion:

So, why Formatted and Raw counters? After all, Formatted data is coming from Raw counters. First, we have to remember that Raw counters are used for collecting N samples as the naked number obtained is meaningless. So, let's say that in above example I asked for CPU usage by certain process that I only just started. Formatted counter will either have 0 or NaN value in it while Raw counter will produce some number given that sampling time is usually 100ns. Well, you might say, it's the same as "get formatted sample | check if it is a number | no? -> take another formatted sample" and you'd be right. But you should take into account the rounding happening in calculating Formatted values internally as well as in your script (check that data type of Formatted counters is usually UINT!) and also the latency involved in WMI provider populating formatted data counters.

If you are set on using Raw counters, bear in mind you need formulas for transforming Samples into Values.

All said, it is actually a question of choice whether to use one type or the other as you will see in the script I'll be describing in the final blog.

Examples of calculating Value from Counter type and samples @Sysinternals and @MSDN.

Next blog will deal with various ways to obtain performance data.

In this series:
BLOG 1: PerfCounters infrastructure
BLOG 2: PerfCounters Raw vs. Formatted values
BLOG 3: PerfCounters, fetching the values
BLOG 4: PerfCounters, CPU perf data
BLOG 5: PerfCounters, Memory perf data
BLOG 6: PerfCounters, Disk/IO perf data
BLOG 7: PerfCounters, Network and Contention perf data

2015-11-03

Windows PerfCounters and Powershell - Infrastructure

In this series of blogs, I will cover Windows performance counters infrastructure, collecting and interpreting the data and, in final blog, provide the Powershell script built on this. I should note that I am very new to Powershell so take my empirical findings with grain of salt. Also, coming from Linux bash, I found Powershell confusing at first but, after getting comfortable with concept of passing Objects down the pipe, I have to say I like it a lot.

It all starts in WDM framework (Windows Driver Model) where metrics is collected for WMI-for-WDM enabled device drivers. The classes created by the WDM provider to represent device driver data reside in the "Root\WMI" namespace. I will talk of namespaces shortly.
So, the WDM provider records information about WDM operations in the WMI Log Files. However, WMI is not just mirroring physical data provided by WDM but it also adds (logical) counters.
Windows Management Instrumentation (WMI) is the Microsoft implementation of Web-Based Enterprise Management (WBEM), which is an industry initiative to develop a standard technology for accessing management information in an enterprise environment. WMI uses the Common Information Model (CIM) industry standard to represent systems, applications, networks, devices, and other managed components. CIM is developed and maintained by the Distributed Management Task Force (DMTF). I will talk of CIM cmdlets in Powershell shortly.

There are many ways to collect performance data in Windows, like through Registry functions, but not all of them make sense. Since WMI is a database-like service always returning live data, it is used to find out details about physical and logical computer configurations and works locally as well as remotely (only for Windows OS). It is organized in PerfCounter classes which return PerfCounter objects. So the usual way to start exploring is to collect the Classes with Get-WmiObject -List. First lines are similar to below:
NameSpace: ROOT\CIMV2

Name Methods Properties
---- ------- ----------
...


So what is NAMESPACE? The WMI infrastructure is a part of Microsoft Windows operating system and has two components: the WMI service (winmgmt), including the WMI Core, and the WMI repository.
WMI repository is organized in WMI namespaces. The WMI service creates some namespaces such as root\default, root\cimv2, and root\subscription at system startup and preinstalls a default set of class definitions, including the Win32 Classes, the WMI System Classes, and others. The remaining namespaces found on your system are created by providers for other parts of the operating system or products.
The WMI service acts as an intermediary between the providers (management applications) and WMI repository. Only static data about objects is stored in the repository, such as the classes defined by the providers meaning WMI obtains most data dynamically from the provider when a client requests it.
A WMI consumer is a management application or script that interacts with the WMI infrastructure. A management application can query, enumerate data, run provider methods, or subscribe to events by calling either the COM API for WMI or the Scripting API for WMI (like WMI-enabled Powershell cmdlets). The only data or actions available for a managed object, such as a disk drive or a service, are those that a provider supplies.
So, basically, NAMESPACE is a subfolder :-) The default namespace is “root\CIMV2” and you do not need to submit the namespace to a function call as long as the class you're addressing is located inside the default one. Most of us will use default namespace (ROOT\CIMV2) for most of the tasks but let's list them here for the sake of completeness:
PS > Get-WmiObject -Query “Select * from __Namespace” -Namespace Root | Select-Object -ExpandProperty Name
subscription
DEFAULT
CIMV2
Cli
nap
SECURITY
SecurityCenter2
RSOP
WMI
IntelNCS2
directory
Policy
Interop
ServiceModel
SecurityCenter
MSAPPS12
Microsoft
aspnet


Now, back to Get-WmiObject -List | Select Name output. We also see "duplicate" entries like this:
CIM_SoftwareElementActions
Win32_SoftwareElementAction

So, what's CIM? The Common Information Model is an open standard that defines how managed elements in an IT environment are represented as a common set of objects and relationships between them. The Distributed Management Task Force maintains the CIM to allow consistent management of these managed elements, independent of their manufacturer or provider. So CIM cmdlets will allow you to gather management data in heterogeneous environment as opposed to WMI cmdlets which work only on Windows. See wiki for the list of operating systems and provider-specific CIM implementations. This basically means that some of the WMI classes and their objects are copied to CIMv2 namespace to produce more standard objects.
So, CIM provider for WMI is consuming data from WMI and there are many benefits of using CIM in place of WMI:
  • Use of WSMAN for remote access – no more DCOM errors.
  • Note: You can drop back to DCOM for accessing systems with WSMAN 2 installed.
  • Use of CIM sessions for accessing multiple machines.
  • Get-CIMClass for investigating WMI classes.
  • Improved way of dealing with WMI associations.
  • Get-CimInstance do not contain any methods.
  • Note: This may appear as a drawback but GWMI will also loose this info when returned from background job or remote session due to serialization.
  • Works on other operating systems.

Note: Should you still find yourself in need to work with boxes running older Windows (such as XP), you can still use CIM cmdlets only define DCOM as protocol:
$oldProt = New-CimSessionOption -Protocol DCOM
$oldBoxSession = New-CimSession -ComputerName someOldXPBox -SessionOption $option
Get-CimInstance -ClassName Win32_SystemDevices -CimSession $oldBoxSession

However, DCOM is not firewall friendly and can be unavailable on destination box (which should run Windows OS).
Note: CIM cmdlets are available as of Powershell v3.

So, we learned how to list classes:
WMI: Get-WmiObject -List
CIM: Get-CimClass | Select CIMClassName | Sort CIMClassName

Note:From now on, I will mostly use CIM commands as they provide TAB completion.

Next, we need actual performance objects that chosen class provides:
PS > Get-CimInstance -Class Win32_Process
ProcessIdName HandleCountWorkingSetSizeVirtualSize
------------- ------------------------------------
0 System Idle P...0 24576 0
4 System 976 2834432 6344704

This is simple class but you will encounter more complex ones having Instances:
PS > Get-CimInstance -Class Win32_PerfFormattedData_Counters_ProcessorInformation

#All CPU's:
Name : _Total
...
PercentIdleTime : 73

#Core 0, total:
Name : 0,_Total
...
PercentIdleTime : 73

#Core 0, CPU 0:
Name : 0,0
...
PercentIdleTime : 100

and so on.

Notice also that I used PerfFormattedData class in this example. There are RawData classes too but that's for next blog.
PS > Get-CimInstance -Class Win32_PerfRawData_Counters_ProcessorInformation
Name : _Total
...
PercentIdleTime : 66682471448


This concludes the first blog in series. In next blog I will deal with various flavours of counter values.

In this series:
BLOG 1: PerfCounters infrastructure
BLOG 2: PerfCounters Raw vs. Formatted values
BLOG 3: PerfCounters, fetching the values
BLOG 4: PerfCounters, CPU perf data
BLOG 5: PerfCounters, Memory perf data
BLOG 6: PerfCounters, Disk/IO perf data
BLOG 7: PerfCounters, Network and Contention perf data