Aria Operations – PART II: Deploy Open Source Telegraf agent on Windows VM.

In the previous post (PART I), I described how to deploy the Open Source Telegraf agent on a Linux VM. This time we keep focus on the Windows OS. If you want to read a few words of introduction and find out how you can use Telegraf agent, go to the PART I.

List of all parts:

STEP 1: Prerequisites

Prerequisites for the Windows OS::

  • PowerShell version 4.0 or above.
  • VMware Tools version >-10.2 (suggested to use the newest one).
  • IPv4, (Only IPv4 is currently supported).

Required network traffic:

  • VM/Endpoint must have access to ports 8443 and 443 of cloud proxy or the virtual IP of the application monitoring high availability activated collector group (As you can see ports 4505 and 4506 are not used here, those ports are required only for Managed agent).
  • Remote Access (RDP) from your workstation to Windows Server.

STEP 2: Open Source agent installation.

To install Telegraf agent go to https://www.influxdata.com/time-series-platform/telegraf/ and click button Download Telegraf v1.30.1 (version may be different).
You will see instructions for various operating systems. We will use Windows version.

To install Telegraf agent on WIndows Server run commands in the PowerShell console (run console as Admin) as follow:
If your server does not have Internet access, you can download binaries on another Windows machine and copy files on the target machine.

PS C:\Users\Administrator.BLANKETVM> hostname                                                                                                                                                                                dns02                                                                                                                                                                                                                        PS C:\Users\Administrator.BLANKETVM> mkdir c:\TMP\                                                                                                                                                                           
    Directory: C:\
Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
d-----         4/15/2024   5:37 PM                TMP
PS C:\Users\Administrator.BLANKETVM> cd C:\TMP\
PS C:\TMP> wget https://dl.influxdata.com/telegraf/releases/telegraf-1.30.1_windows_amd64.zip -UseBasicParsing -OutFile telegraf-1.30.1_windows_amd64.zip
PS C:\TMP> Expand-Archive .\telegraf-1.30.1_windows_amd64.zip -DestinationPath 'C:\Program Files\InfluxData\telegraf'
PS C:\TMP> mkdir 'C:\Program Files\InfluxData\telegraf\telegraf.d'
    Directory: C:\Program Files\InfluxData\telegraf
Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
d-----         4/15/2024   5:43 PM                telegraf.d

STEP 3: Configure Telegraf to send data to cloud proxy

Now, we need to configure our Telegraf agent to send data to Cloud Proxy. To do that we will need to download script from Cloud Proxy, run that script with required parameters and prepare config file. Let’s go.

Download the helper script from cloud proxy located at Download the helper script from cloud proxy located at https://<CloudProxy-IP>/downloads/salt/telegraf-utils.ps1

NOTE: Download the script using the below commands in the command line (not PowerShell). If you use PS, there might be an error regarding the certificate. Here you can find a workaround:
https://docs.vmware.com/en/VMware-Aria-Operations/8.17.1/Configuring-Operations/GUID-FA70A401-C338-4CDF-AD14-F31185D6138B.html#GUID-FA70A401-C338-4CDF-AD14-F31185D6138B

Microsoft Windows [Version 10.0.20348.2322]
(c) Microsoft Corporation. All rights reserved.
C:\Users\Administrator.BLANKETVM>cd c:/TMP
c:\TMP>curl -k  "https://vcf-md01-vrops01cp01.blanketvm.com/downloads/salt/telegraf-utils.ps1" -o telegraf-utils.ps1
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 72834  100 72834    0     0   728k      0 --:--:-- --:--:-- --:--:--  790k

Run the helper script to update Telegraf configurations.
To run the scripts we will need:

  • Cloud Proxy IP or Collector Group Name (in my case I will use group name because I have enabled High-Availability on the Cloud Proxies).
CloudProxyHACollectionGroup
AcquireTokenRestCall
AcquireTokenResponse

authSource name must be the same as configured in Aria Operations:

AuthSourceName
  • Aria Operations FQDN.
  • Path for config files and binaries: C:\Program Files\InfluxData\telegraf\telegraf.d and C:\Program Files\InfluxData\telegraf\telegraf-1.30.1
PS C:\TMP> .\telegraf-utils.ps1  opensource -c "CloudProxy Collection Group" -t 5e16db73-8086-433a-89ac-0a2e562dc0b8::68200f37-d58b-4b91-9673-0c23e13e11c0 -d "C:\Program Files\InfluxData\telegraf\telegraf.d"  -e "C:\Program Files\InfluxData\telegraf\telegraf-1.30.1\telegraf.exe" -v vcf-md01-vrops01.blanketvm.com
Telegraf configuration to post metrics to cloud proxy succeeded. Please restart telegraf.

NOTE:
By default, the InfluxDB output plugin is active in the telegraf.conf file and data is sent to the influxdb server so that you do not get multiple warning messages in the logs about the lack of configured influxdb server comment, the “[[outputs.influxdb]]” line should be commented. The following warning message is displayed: W! [outputs.influxdb] When writing to [http://localhost:8086]: database "telegraf" creation failed: Post "http://localhost:8086/query": dial tcp [::1]:8086: connect: connection refused

STEP 4: Create a service

To create a Windows Service for Telegraf agent (to start automatically with OS) run this command:

PS C:\TMP> & "C:\Program Files\InfluxData\telegraf\telegraf-1.30.1\telegraf.exe"  --config "C:\Program Files\InfluxData\telegraf\telegraf-1.30.1\telegraf.conf" --config-directory "C:\Program Files\InfluxData\telegraf\telegraf.d"  --service install

STEP 5: Ensure that OS plugin is configured properly

Go to C:\Program Files\InfluxData\telegraf\telegraf.d directory and open file: win-perf-couters.conf
Confirm that file contains configuration as below:

##########################################################################################
[[inputs.win_perf_counters]]
PrintValid=true

[[inputs.win_perf_counters.object]]
  ObjectName = "Processor"
  Instances = ["*"]
  Counters = ["% Idle Time", "% Interrupt Time", "% Privileged Time", "% Processor Time", "% User Time", "Interrupts/sec", "% DPC Time"]
  Measurement = "win.cpu"
  IncludeTotal = true

[[inputs.win_perf_counters.object]]
  ObjectName = "LogicalDisk"
  Instances = ["*"]
  Counters = ["% Disk Read Time", "% Disk Write Time", "% Free Space", "% Idle Time", "Avg. Disk Bytes/Read", "Avg. Disk Bytes/Write", "Avg. Disk Queue Length", "Avg. Disk sec/Read", "Avg. Disk sec/Write", "Avg. Disk Write Queue Length", "Avg. Disk Read Queue Length", "Free Megabytes", "Split IO/Sec"]
  Measurement = "win.disk"

[[inputs.win_perf_counters.object]]
  ObjectName = "Memory"
  Counters = ["Available Bytes", "Cache Bytes", "Committed Bytes", "Cache Faults/sec", "Demand Zero Faults/sec", "Page Faults/sec", "Pages/sec", "Transition Faults/sec", "Pool Nonpaged Bytes", "Pool Paged Bytes"]
  Instances = ["------"]
  Measurement = "win.mem"

[[inputs.win_perf_counters.object]]
  ObjectName = "Network Interface"
  Counters = ["Bytes Received/sec", "Bytes Sent/sec", "Packets Outbound Discarded", "Packets Outbound Errors", "Packets Received Discarded", "Packets Received Errors", "Packets Received/sec", "Packets Sent/sec", "Connections Established"]
  Instances = ["*"]
  Measurement = "win.net"
  IncludeTotal = true

[[inputs.win_perf_counters.object]]
  ObjectName = "Paging File"
  Counters = ["% Usage"]
  Instances = ["*"]
  Measurement = "win.paging"
  IncludeTotal = true

  [[inputs.win_perf_counters.object]]
    ObjectName = "Process"
    Counters = ["% Privileged Time", "% Processor Time", "% User Time", "Elapsed Time", "Handle Count", "IO Read Bytes/sec", "IO Read Operations/sec", "IO Write Bytes/sec", "IO Write Operations/sec", "Private Bytes", "Thread Count", "Virtual Bytes", "Working Set", "Working Set - Private"]
    Instances = ["_Total", "telegraf", "w3wp"]         # Replace this with a list of process names that you want to monitor. "_Total" is all processes combined
    Measurement = "win.process"

[[inputs.win_perf_counters.object]]
  ObjectName = "System"
  Counters = ["Context Switches/sec", "Processes", "Processor Queue Length", "System Calls/sec", "System Up Time", "Threads"]
  Instances = ["------"]
  Measurement = "win.system"

[[inputs.win_perf_counters.object]]
  ObjectName = "TCPv4"
  Counters = ["Connection Failures", "Connections Active", "Connections Established", "Connections Passive", "Connection Reset", "Segments Received/sec", "Segments Retransmitted/sec", "Segments Sent/sec"]
  Instances = ["------"]
  Measurement = "win.net.tcp"

[[inputs.win_perf_counters.object]]
  ObjectName = "TCPv6"
  Counters = ["Connection Failures", "Connections Active", "Connections Established", "Connections Passive", "Connection Reset", "Segments Received/sec", "Segments Retransmitted/sec", "Segments Sent/sec"]
  Instances = ["------"]
  Measurement = "win.net.tcp"

[[inputs.win_perf_counters.object]]
  ObjectName = "UDPv4"
  Counters = ["Datagrams No Port/sec", "Datagrams Received/Errors", "Datagrams Received/sec", "Datagrams Sent/sec"]
  Instances = ["------"]
  Measurement = "win.net.udp"

[[inputs.win_perf_counters.object]]
  ObjectName = "UDPv6"
  Counters = ["Datagrams No Port/sec", "Datagrams Received/Errors", "Datagrams Received/sec", "Datagrams Sent/sec"]
  Instances = ["------"]
  Measurement = "win.net.udp"
##########################################################################################

Remember that it is a default configuration. We will adjust it to our requirements in the next parts.
START Telegraf agent and confirm that all works fine.

ServiceTelegrafStart

STEP 6: Check that all config files are loaded

Open Event Viewer on the Microsoft Windows OS and go to Windows Logs -> Application
You should see a few entries with a Source: telegraf. Check that all configuration files are listed:

Loading config: C:\Program Files\InfluxData\telegraf\telegraf-1.30.1\telegraf.conf
Loading config: C:\Program Files\InfluxData\telegraf\telegraf-1.30.1\win-perf-counters.conf
Loading config: C:\Program Files\InfluxData\telegraf\telegraf-1.30.1\cloudproxy-http.conf
ServiceTelegrafStart2

Conclusion

Now, we need to wait a few collection cycle to see in the Aria Operations UI changes. After some time you should be able to see agent an OS object.

TelegrafWindowsOverview1

TelegrafWindowsOverview2

TelegrafWindowsOverview3

Have fun!
If you have any questions about the procedure leave the comment.

2 thoughts on “Aria Operations – PART II: Deploy Open Source Telegraf agent on Windows VM.

  1. Hello!!

    I tried configuring it based on what you wrote and found that Aria Operations collects metrics for a single VM + single GPU just fine. Thank you.
    However, there seems to be an issue with Single VM + Multiple GPUs where only the values for a specific GPU are collected and not for all GPUs, is there anything you know about this?

    Like

    1. Hi,
      It looks like wrong configuration of Telegraf Plugins.
      I see here two options:
      a) If you have GPU metrics in Windows Performance Monitor, you can add additional counters to Windows metrics and gather it using Windows Performance Counters Input Plugin. Remember here to add all Instances of GPU.
      https://github.com/influxdata/telegraf/blob/release-1.33/plugins/inputs/win_perf_counters/README.md
      b) If it is NVIDIA, you can use NVIDIA Plugin:
      https://github.com/influxdata/telegraf/blob/release-1.33/plugins/inputs/nvidia_smi/README.md

      Unfortunately, I do not have a lab with multiple GPUs to check it on my side.

      Like

Leave a comment