ControlUp 8.1 Native VMware Horizon integration

In part 2 we highlight the native support for VMware Horizon 7 and higher environments in ControlUp 8.1. The integration is based on the SOAP API. Adding a VMware Horizon environment is easy, click on the Add EUC environment and enter the name a Horizon Connection Server and click ‘OK’. ControlUp discovers Horizon components such as Connection Servers, Cloud Pod Architecture (CPA), desktop pools and sessions automatically.

Horizon Connection Servers

On the top level, you see the stress of all the Horizon Connection Servers and in the view below that each individually Horizon Connection Server is listed with there metrics.

For all the Connection Servers, the following metrics are added to the view:

  • Horizon Pods
  • Stress Level
  • Connection Servers
  • Connection Server health
  • Connection Server Max connections
  • Average machine memory
  • Machine disk IO average latency
  • Machine Disk Transfers/sec
  • Machine Net Total

Per Horizon Connection Server, the following metrics are added such as:

  • Connection server Health
  • Amount of connection Servers
  • Active connections
  • Connection Server health
  • External URL
  • Connection Server certificate valid
  • Connection Server certificate expiration date
  • License model
  • Connection Server version
  • Horizon Pod
  • Horizon Site

When installing the ControlUp agent on the Connection Servers or VDI desktop, the hypervisor and in-guest metrics are combined with the Horizon metrics.

Desktop Pools

Below the Connection Servers, the desktop pools are displayed.

Each Desktop pool in the Horizon environment is displayed with metrics such as:

  • Pool name
  • Pools type
  • Stress level
  • Pool state
  • Provisioning enabled
  • Number of machines
  • Number of machines enabled
  • Sessions
  • Disconnects
  • Problem machines
  • Default protocol
  • Power policy
  • Logoff timeout

Per Horizon pool you can view the VDI desktop and Horizon Session with metrics such as:

  • Pool name
  • Session type
  • Machine name
  • State
  • Session start time
  • Protocol
  • Desktop source
  • Client name
  • Horizon client version
  • Horizon agent version

And from the Horizon session, you can dive deeper into the processes view to troubleshoot further.

The Virtual Expert in ControlUp includes Horizon specific suggestions such as for example the available desktops remaining in a desktop pool.

As you can see, the Horizon integration adds a lot of Horizon specific metrics. All these metrics gives great insight into what happens in the Horizon environment.

Automation

ControlUp can use automation to solve Horizon issues for you. For example, it is possible the check the Horizon agent state of each VDI desktop. If the Horizon agent state goes bad (such as agent unreachable, error, unknown and already used for example) an automated action can be configured to resolve the problem. To configure automated actions, triggers are used in ControlUp.

In this example (demoed by Trentent Tyle), 3 automation triggers are created:

  • Trigger 1 operate at 10 minutes, action: Horizon Agent restart if the horizon state is wrong
  • Trigger 2 operate at 15 minutes, action: VM restart if the horizon state is wrong
  • Trigger 3 operate at 20 minutes, action: Cold Boot VM if the horizon state is wrong

Trigger 1: operate at 10 minutes

When the VDI machine boots up it has 10 minutes to register the Horizon agent state in the Horizon Connection server. A normal VDI desktop has a READY state and is available. After 10 minutes, the trigger looks if the  Horizon agent reports the wrong state such as:

  • UNKOWN
  • *ERROR
  • ALREADY USED
  • DOMAIN FAILURE
  • AGENT UNREACHABLE

If the Horizon agent state is wrong, the following action is executed: Restart the VMware Horizon Agent.

The restart VMware Horizon Agent is a PowerShell script that restarts the VMware Horizon Agent service.

It’s easy to create scripts such as PowerShell, VBS, BAT, and CMD. ControlUp itself offers a huge library of predefined/community scripts that can be used also.

Trigger 2: operate at 15 minutes

This trigger looks at the same wrong Horizon Agent states used in the 10 minutes trigger. As an action, the VDI desktop VM is restarted using a simple command.

Trigger 3: operate at 20 minutes

This trigger looks at the same wrong Horizon Agent states used in the 10 minutes trigger. As an action, a hard reboot (cold boot) is executed using a simple command on the VDI Desktop.

Because of all the Horizon metrics available, it is possible to check and repair the Horizon agent states. For IT departments, morning checks can be easily automated to ensure the VDI desktops are ready for accepting connections.

Besides the example above, there is a huge list of other Horizon items/metrics that can be used for automated actions. Here is a short overview of some:

This huge list of  Horizon metrics/items in combination with custom scripted actions that can be used makes ControlUp very powerful.

Conclusion

ControlUp 8.1 adds support for VMware Horizon integration and discovers Horizon components such as Connection Servers, Cloud Pod Architecture (CPA), pools and sessions automatically. This integration gives great insight into what happens in the Horizon environment. Using automated actions (triggers) with the Horizon metrics and scripted actions makes it a very powerful tool for automating actions and solve specific issues as displayed for example in the example above.

More information and a trail can be found here, link.

ControlUp 8.1 Monitor Cluster

Today ControlUp version 8.1 is released with two new major features:

  • Monitor Cluster. This new cluster model adds support for monitoring more VDI endpoints per site.
  • VMware Horizon integration.  ControlUp has now native integration with VMware Horizon environments.

In this part of the blog post, I explain the basics of the new Monitor Cluster.

Monitor Cluster

The new cluster monitor model enables adding more active monitor nodes to the monitor cluster to increase VDI scalability. Each node can support up to 5000 VDI endpoints. The amount of supported VDI endpoint depends on the processes that are active in the VDI. Below is a simple overview of the new cluster monitor model.

Monitors that belong to the same site automatically balance monitoring loads. With this new model, more VDI endpoints per site can be monitored than previous versions.

Adding extra sites allows monitors to monitor resources on an isolated network or on remote networks with low bandwidth links.  All monitors deployed to these sites must be able to route back to the initial site to receive instructions.

For a monitor node with support up to 5000 VDI endpoints ControlUp recommends  the following sizing:

  • Windows Server OS with
  • 8 vCPUs
  • 32 GB memory for up to  5000 VDI endpoints. When designing for High Availability, use the N+1 rule. For example, a customer with 8000 VMs needs three monitor nodes.

The amount of VDI endpoints a monitor node supports depends on the processes that are active in the VDI endpoint. For example, a Windows 7 VDI has an average of 120 processes. For Windows 10 an average is 200 processes.

More information on sizing ControlUp can be found here, link.

In the screenshots below an extra monitor node is added to the monitor cluster. After adding it you see two dedicated monitor nodes in the same site. The loads is automatically balanced between the nodes.

In the Manage ControlUp monitors (see right screenshot above) you have now the ability to manage monitors and sites.

Data Collector

A Data Collector is responsible for collecting metrics from VMware vCenter, VMware Horizon, Citrix Delivery Controllers, XenServer Poolmasters, AHV Clusters, and NetScaler appliances. By default, the ControlUp Monitor/Console will be the Data Collector. In the screenshot below we see that the Data Collector for the VMware Horizon environment is the ControlUp Console / Monitor.

This means when having more Data Collectors for example to the VMware Hypervisor the console initiates several API requests each interval. In larger environment the traffic can be substantial. When managing over 500 VDI endpoints it’s recommended to use dedicated Data Collector(s). Per data collector, you can designate another machine on your network to gather data from the VMware Horizon environment for example. More on sizing Data Collectors can be found here, link.

For a Data Collector make sure the ControlUp agent (including the .NET framework) is installed on the machine(s).

After adding the new Data Collector remove the ControlUp Console/Monitor.

More information on the data collector can be found here, link.

Conclusion

With ControlUp 8.1 it is now possible to add more VDI endpoints with the new Monitor Cluster model. Because of this, it’s now possible to monitor large VMware Horizon environments with ControlUp. In part 2 of this blog post, I will highlight the new VMware Horizon integration.

Monitor vSAN with ControlUp

One of the new enhancements of ControlUp 7.3 is vSAN monitoring support. ControlUp will detect the vSAN cluster(s), objects and displays real-time vSAN specific metrics and metadata. In this blog post I highlight the features of the new vSAN integration in ControlUp 7.3.

Installation

The vSAN cluster is automatically recognized by ControlUp when the following requirements are met:

  • PowerShell minimum Version 5.0
  • VMware PowerCLI 10.1.1.x
  • .NET framework version 4.5
  • vSAN Performance service should be turned on on the cluster
  • The user account configured for the hypervisor connection requires the “storage.View” permission.

Running ControlUp is easy, no installation is needed, simple execute a single executable (ControlUpConsole.exe). After starting ControlUp, add the vCenter server and the vSAN cluster(s) are automatically recognized. When clicking on the vSAN cluster you see real-time metadata and performance metrics.

Views

There are several preset views available with vSAN metrics such as:

  • vSAN Performance. Includes vSAN performance metrics such as IOPS, latency, cache and buffers.
  • vSAN Health. Includes the vSAN health checks
  • vSAN Host Network. Includes vSAN network I/O and packet loss metrics.

You can easily switch between predefined views in the “Colum Preset”. Here is an overview of vSAN metrics used by ControlUp:

Datastores: Name, Type, Capacity, Read/Write IOPS, Read/Write Rate, Read/Write Latency, Compression, Capacity Deduplication, Congestion, Outstanding IO, Disk Configuration, Total Used Capacity, Total Used – Physically Written, Total Used – VM Overreserved, Total Used – System Overhead, vSAN Free Capacity, vSAN Health, vSAN Cluster Health, vSAN Network Health, vSAN Physical Disk Health, vSAN Data Health, vSAN Limits Health, vSAN Hardware Compatibility Health, vSAN Performance Service Health, vSAN Build Recommendation, vSAN Online Health.
Datastores on Hosts: Name, Type, Capacity, Read/Write IOPS, Read/Write Rate, Read/Write Latency, Compression, Capacity Deduplication, Congestion, Outstanding IO, Local Client Cache Hit IOPS, Local Client Cache Hit Rate, vSAN Max Read Cache Read Latency, vSAN Max Write Buffer Write Latency, vSAN Max Read Cache Write Latency, vSAN Max Write Buffer Read Latency, vSAN Min Read Cache Hit Rate, vSAN Write Buffer Min Free Percentage, vSAN Host Network Inbound/Outbound I/O Throughput, vSAN Host Network Inbound/Outbound Packets Per Second, vSAN Host Network Inbound/Outbound Packet Loss Rate

When navigating you see all those metrics available on the vSAN cluster, vSAN datastores on hosts, virtual disks and vSAN Host network utilization views. You can easily drill down by double clicking from the vSAN datastore to the diskgroup(s) on each ESXi host and then drill down to the the virtual disk(s). From the virtual disk(s) you can drill down to the Windows process.

Example: Find the root cause of high IOPS load on the vSAN cluster.

In the following example we will identify a Windows process that is causing high IOPS stress on the vSAN cluster. We drill down from the vSAN cluster to the vSAN diskgroup of the ESXi host to the virtual disk to the process level in the VM to find the root cause of the high IOPS.

  • In the vSAN Performance view we see the stress level has changed and a high IOPS load.

  • In the IOPS we see that the threshold of 2000 is crossed. This threshold is default and can be adjusted. The Virtual Expert suggest to navigate to the “Datastore on Hosts (IOPS detailed View).

  • When double clicking on the “Datastore on Host” we see that “esxin04.lab.local” is generating the IOPS load.

  • The vSAN diskgroup of the “esxin04.lab.local” host has a virtual disk that belongs to the “ControlUp-vSAN-Test” VM that is causing the high IOPS load.

  • When double clicking on the virtual disk we go the the “Processes” view and see that “diskspd.exe” process is causing the high IOPS load.

  • Optional: Right click on the process and select kill to end the “diskspd.exe” process. This stops the IOPS load on the vSAN cluster.

This example shows how easy it is to identify what process is causing stress on the vSAN cluster.

Alerting and reporting

For alerting you can add triggers in ControlUp to notify you when something happens on the vSAN cluster such as a change in the stress level for a period of time.

When using the triggers you’re able to start investigating it right away when something happening on the vSAN cluster. All the vSAN data is transferred to ControlUp Insight for historical reporting and analytics. This is great for analyzing data and trends over time and can be very useful when investigate issues and understanding what is going on you’re environment.

Conclusion

ControlUp is easy to set-up and great for fast troubleshooting. In version 7.3 is vSAN support added. As shown in the this blog post with a couple of double clicks you’re able to perform a root cause analysis and find what process is causing the high IOPS on the vSAN.

There is a free trail available. Give it a try here: link