2016-02-29

Linux top command on Windows, further investigations

In my previous post, I spoke of "Normalized" and "Non-Normalized" CPU utilization values:
--

TABLE:

Foreword: Windows is not "process" based OS (like Linux) but rather "thread" based so all of the numbers relating to CPU usage are approximations. I did made a "proper" CPU per Process looping and summing up Threads counter (https://msdn.microsoft.com/en-us/library/aa394279%28v=vs.85%29.aspx) based on PID but that proved too slow given I have ~1 sec to deal with everything. CPU utilization using RAW counters with 1s delay between samples proved to produce a bit more reliable result than just reading Formatted counters but, again, too slow for my 1s ticks (collect sample, wait 1s, collect sample, do the math takes longer than 1s). Thus I use PerfFormatted counters in version 0.9RC.
    Win32_PerfRawData_PerfProc_Process; Win32_PerfFormattedData_PerfProc_Process.
  
    _PID_     Unique identified of the process.
    PPID      Unique identifier of the process that started this one.
    PrioB     Base priority.
    Name      Name of the process.
    CPUpt_(N) % of CPU used by process. On machines with multiple CPUs,
        this number can be over 100% unless you see _CPUpt_N caption which
        means "Normalized" (i.e. CPUutilization / # of CPUs).
        Toggle Normal/Normalized display by pressing the "p" key.
    Thds      # of threads spawned by the process.
    Hndl      # of handles opened by the process.
    WS(MB)    Total RAM used by the process. Working Set is, basically,
        the set of memory pages touched recently by the threads belonging to
        the process. 
    VM(MB)    Size of the virtual address space in use by the process.
    PM(MB)    The current amount of VM that this process has reserved
        for use in the paging files.
However, my approach for displaying "Non-Normalized" CPU utilization didn't work :-/

Proper functionality of this feature is rather important for my job. Looking at "Normalized" CPU utilization values for a process does not tell you much. Say a process has CPU utilization of 100%. This just tells you there is at least 1 CPU that's fully utilized by the process but it does not tell you the overall utilization. "Non-Normalized" value sums CPU utilization over all CPUs that process uses. In my case, the test box has 8 Xeon processors with 6 physical and 6 virtual cores each totaling at 96 CPUs. The system is configured as such that NUMA node corresponds to 1 Xeon processor (socket). Thus, when my process utilizes entire NUMA node (socket) to the fullest, the CPU utilization for that process should be number of CPUs per Numa/Socket (12) x 100% which is 1200%:


If the process scales correctly, I will see more NUMA nodes/Sockets light up while increasing the load:

However, this does not tell me it is my process of interest that is using the CPUs. To confirm it, I need TOP script showing CPU utilization of above 1200%:

This guarantees me mysqld process is running on more than 2 sockets (sysbench is taking up ~7 CPUs and I bet mydesktopservice is the one lighting up 3rd CPU in 2nd row).

How to make it work:

Heavy rework of #region Tasks job which is starting the "Processes" monitoring job was in order. First, I had to remove all of the below code:
<#
        if ($CPUDSw) {
            Get-CimInstance Win32_PerfFormattedData_PerfProc_Process | 
                select @{Name='_PID_'; Expression={$_.IDProcess}},
                @{Name='PPID'; Expression={$_.CreatingProcessID}},
                @{Name='PrioB'; Expression={$_.PriorityBase}},
                @{Name='Name                  '; Expression={(($_.Name).PadRight(22)).substring
                    (0, [System.Math]::Min(22, ($_.Name).Length))}}, 
                @{Name='_CPUpt__'; Expression={($_.PercentProcessorTime).ToString("0.00").PadLeft(8)}},
                @{Name='Thds'; Expression={$_.ThreadCount}},
                @{Name='Hndl'; Expression={$_.HandleCount}},
                @{Name='WS(MB)'; Expression={[math]::Truncate($_.WorkingSet/1MB)}},
                @{Name='VM(MB)'; Expression = {[math]::Truncate($_.VirtualBytes/1MB)}},
                @{Name='PM(MB)'; Expression={[math]::Truncate($_.PageFileBytes/1MB)}} |
                where { $_._PID_ -gt 0} | &$sb | 
                Select-Object -First $procToDisp | FT * -Auto 1> $pth
        } else {
            Get-CimInstance Win32_PerfFormattedData_PerfProc_Process | 
                select @{Name='_PID_'; Expression={$_.IDProcess}},
                @{Name='PPID'; Expression={$_.CreatingProcessID}},
                @{Name='PrioB'; Expression={$_.PriorityBase}},
                @{Name='Name                  '; Expression={(($_.Name).PadRight(22)).substring
                    (0, [System.Math]::Min(22, ($_.Name).Length))}}, 
                @{Name='_CPUpt_N'; Expression={"{0,8:N2}" -f ($_.PercentProcessorTime / $TotProc)}},
                @{Name='Thds'; Expression={$_.ThreadCount}},
                @{Name='Hndl'; Expression={$_.HandleCount}},
                @{Name='WS(MB)'; Expression={[math]::Truncate($_.WorkingSet/1MB)}},
                @{Name='VM(MB)'; Expression = {[math]::Truncate($_.VirtualBytes/1MB)}},
                @{Name='PM(MB)'; Expression={[math]::Truncate($_.PageFileBytes/1MB)}} |
                where { $_._PID_ -gt 0} | &$sb | 
                Select-Object -First $procToDisp | FT * -Auto 1> $pth
        }
#>
and replace it with Get-Counter version:
        $processes = Get-CimInstance Win32_PerfFormattedData_PerfProc_Process | 
            Select @{Name='_PID_'; Expression={$_.IDProcess}},
            @{Name='PPID'; Expression={$_.CreatingProcessID}},
            ElapsedTime, 
            @{Name='PrioB'; Expression={$_.PriorityBase}},
            @{Name='Name'; Expression={($_.Name).ToLower()}},
            @{Name='Thds'; Expression={$_.ThreadCount}},
            @{Name='Hndl'; Expression={$_.HandleCount}}, 
            @{Name='WS(MB)'; Expression={[math]::Truncate($_.WorkingSet/1MB)}},
            @{Name='VM(MB)'; Expression = {[math]::Truncate($_.VirtualBytes/1MB)}},
            @{Name='PM(MB)'; Expression={[math]::Truncate($_.PageFileBytes/1MB)}},
            PoolNonpagedBytes, PoolPagedBytes, PercentProcessorTime |
            Where { $_._PID_ -gt 0}

        $Samples = (Get-Counter “\Process(*)\% Processor Time”).CounterSamples

Just noting Get-Counter example:
PS:511 [HEL01]> (Get-Counter “\Process(*)\% Processor Time”).CounterSamples | FL *
...
Path             : \\hel01\process(system)\% processor time
InstanceName     : system
CookedValue      : 0
RawValue         : 3434062500
SecondValue      : 131012088253272040
MultipleCount    : 1
CounterType      : Timer100Ns
Timestamp        : 29.02.16 09:40:25
Timestamp100NSec : 131012124253270000
Status           : 0
DefaultScale     : 0
TimeBase         : 10000000
Then I had to change the way of putting it all together:
        if ($CPUDSw) { 
            $pcts = $Samples | Select @{Name=”IName"; Expression={($_.InstanceName).ToLower()}}, 
              @{Name=”CPUU”;Expression={[Decimal]::Round(($_.CookedValue), 2)}}
            $processes | select '_PID_', 'PPID', 'PrioB',
                            @{Name='Name                  '; Expression=
                                {
                                    (($_.Name).PadRight(22)).substring(0, [System.Math]::Min(22, ($_.Name).Length))
                                }
                            }, 
                            @{Name='_CPUpt__'; Expression=
                                {
                                    if ($pcts.IName.IndexOf($_.Name) -ge 0) {
                                        ($pcts.CPUU[[array]::IndexOf($pcts.IName, $_.Name)]).ToString("0.00").PadLeft(8)
                                    }
                                }
                            },
            'Thds', 'Hndl', 'WS(MB)', 'VM(MB)', 'PM(MB)' | &$sb | Select-Object -First $procToDisp | FT * -Auto 1> $pth
        } else {
            $pcts = $Samples | Select @{Name=”IName"; Expression={($_.InstanceName).ToLower()}}, 
              @{Name=”CPUU”;Expression={[Decimal]::Round(($_.CookedValue / $TotProc), 2)}}
            $processes | select '_PID_', 'PPID', 'PrioB',
                            @{Name='Name                  '; Expression=
                                {
                                    (($_.Name).PadRight(22)).substring(0, [System.Math]::Min(22, ($_.Name).Length))
                                }
                            },  
                            @{Name='_CPUpt_N'; Expression=
                                {
                                    if ($pcts.IName.IndexOf($_.Name) -ge 0) {
                                        ($pcts.CPUU[[array]::IndexOf($pcts.IName, $_.Name)]).ToString("0.00").PadLeft(8)
                                    } else {
                                        #Not found (yet). Take what you have :-/
                                        ($_.PercentProcessorTime).ToString("0.00").PadLeft(8)
                                    }
                                }
                            },
            'Thds', 'Hndl', 'WS(MB)', 'VM(MB)', 'PM(MB)' | &$sb | Select-Object -First $procToDisp | FT * -Auto 1> $pth
        }
Since Get-Counter, by default, takes samples 1 second apart:
PS:507 [HEL01]> Measure-Command{(Get-Counter “\Process(*)\% Processor Time”).CounterSamples}

Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 1
Milliseconds      : 18
Ticks             : 10189709
TotalDays         : 1.17936446759259E-05
TotalHours        : 0.000283047472222222
TotalMinutes      : 0.0169828483333333
TotalSeconds      : 1.0189709
TotalMilliseconds : 1018.9709
I also abandoned all of the code relating to Timer:
    #$sw = New-Object Diagnostics.Stopwatch
    do {
        #$sw.Start()
...
        }
        #$sw.Stop()
        #if ($sw.ElapsedMilliseconds -lt 1000) {
        #    Start-Sleep -Milliseconds (1000-$sw.ElapsedMilliseconds)
        #}
        #$sw.Reset()

    } while ($true)
    #$sw = $null

So now it works! I do not know right now when I will be able to release the new version so stay tuned.

Final thoughts:

I have hit many many problems in Windows during this testing. Just note, for example, the use of ToLower() in ($_.InstanceName).ToLower() but this is something for the new blog post. This one is about TOP script.