Monitor vSAN with ControlUp

One of the new enhancements of ControlUp 7.3 is vSAN monitoring support. ControlUp will detect the vSAN cluster(s), objects and displays real-time vSAN specific metrics and metadata. In this blog post I highlight the features of the new vSAN integration in ControlUp 7.3.

Installation

The vSAN cluster is automatically recognized by ControlUp when the following requirements are met:

  • PowerShell minimum Version 5.0
  • VMware PowerCLI 10.1.1.x
  • .NET framework version 4.5
  • vSAN Performance service should be turned on on the cluster
  • The user account configured for the hypervisor connection requires the “storage.View” permission.

Running ControlUp is easy, no installation is needed, simple execute a single executable (ControlUpConsole.exe). After starting ControlUp, add the vCenter server and the vSAN cluster(s) are automatically recognized. When clicking on the vSAN cluster you see real-time metadata and performance metrics.

Views

There are several preset views available with vSAN metrics such as:

  • vSAN Performance. Includes vSAN performance metrics such as IOPS, latency, cache and buffers.
  • vSAN Health. Includes the vSAN health checks
  • vSAN Host Network. Includes vSAN network I/O and packet loss metrics.

You can easily switch between predefined views in the “Colum Preset”. Here is an overview of vSAN metrics used by ControlUp:

Datastores: Name, Type, Capacity, Read/Write IOPS, Read/Write Rate, Read/Write Latency, Compression, Capacity Deduplication, Congestion, Outstanding IO, Disk Configuration, Total Used Capacity, Total Used – Physically Written, Total Used – VM Overreserved, Total Used – System Overhead, vSAN Free Capacity, vSAN Health, vSAN Cluster Health, vSAN Network Health, vSAN Physical Disk Health, vSAN Data Health, vSAN Limits Health, vSAN Hardware Compatibility Health, vSAN Performance Service Health, vSAN Build Recommendation, vSAN Online Health.
Datastores on Hosts: Name, Type, Capacity, Read/Write IOPS, Read/Write Rate, Read/Write Latency, Compression, Capacity Deduplication, Congestion, Outstanding IO, Local Client Cache Hit IOPS, Local Client Cache Hit Rate, vSAN Max Read Cache Read Latency, vSAN Max Write Buffer Write Latency, vSAN Max Read Cache Write Latency, vSAN Max Write Buffer Read Latency, vSAN Min Read Cache Hit Rate, vSAN Write Buffer Min Free Percentage, vSAN Host Network Inbound/Outbound I/O Throughput, vSAN Host Network Inbound/Outbound Packets Per Second, vSAN Host Network Inbound/Outbound Packet Loss Rate

When navigating you see all those metrics available on the vSAN cluster, vSAN datastores on hosts, virtual disks and vSAN Host network utilization views. You can easily drill down by double clicking from the vSAN datastore to the diskgroup(s) on each ESXi host and then drill down to the the virtual disk(s). From the virtual disk(s) you can drill down to the Windows process.

Example: Find the root cause of high IOPS load on the vSAN cluster.

In the following example we will identify a Windows process that is causing high IOPS stress on the vSAN cluster. We drill down from the vSAN cluster to the vSAN diskgroup of the ESXi host to the virtual disk to the process level in the VM to find the root cause of the high IOPS.

  • In the vSAN Performance view we see the stress level has changed and a high IOPS load.

  • In the IOPS we see that the threshold of 2000 is crossed. This threshold is default and can be adjusted. The Virtual Expert suggest to navigate to the “Datastore on Hosts (IOPS detailed View).

  • When double clicking on the “Datastore on Host” we see that “esxin04.lab.local” is generating the IOPS load.

  • The vSAN diskgroup of the “esxin04.lab.local” host has a virtual disk that belongs to the “ControlUp-vSAN-Test” VM that is causing the high IOPS load.

  • When double clicking on the virtual disk we go the the “Processes” view and see that “diskspd.exe” process is causing the high IOPS load.

  • Optional: Right click on the process and select kill to end the “diskspd.exe” process. This stops the IOPS load on the vSAN cluster.

This example shows how easy it is to identify what process is causing stress on the vSAN cluster.

Alerting and reporting

For alerting you can add triggers in ControlUp to notify you when something happens on the vSAN cluster such as a change in the stress level for a period of time.

When using the triggers you’re able to start investigating it right away when something happening on the vSAN cluster. All the vSAN data is transferred to ControlUp Insight for historical reporting and analytics. This is great for analyzing data and trends over time and can be very useful when investigate issues and understanding what is going on you’re environment.

Conclusion

ControlUp is easy to set-up and great for fast troubleshooting. In version 7.3 is vSAN support added. As shown in the this blog post with a couple of double clicks you’re able to perform a root cause analysis and find what process is causing the high IOPS on the vSAN.

There is a free trail available. Give it a try here: link

VMware Unified Access Gateway (UAG) 3.4 RADIUS license change

The VMware Unified Access Gateway (UAG) acts as reverse proxy and tunnels sessions (PCoIP and Blast) to desktops and remote apps. Besides Horizon support, new features are added for AirWatch and Identity Manager. With version 3.4, the VMware Unified Access Gateway is offered in three editions based on the Horizon or Workspace ONE licenses.

  • Standard
  • Advanced
  • Enterprise

Per edition the following features are supported:

One of the new features is high availability support for the Unified Access Gateway. Without the use of load balancers a UAG high availability environment can be created. This makes the environment less complex and is available as enterprise feature.

Another feature is RADIUS support. RADIUS is not a new feature and is available for a very long time. RADIUS offers two-factor authentication and is always a requirement for production environments. When looking at the editions table you see that this is now an advanced feature. Before version 3.4 of the UAG, with VMware Access Point and VMware Security Server, RADIUS was supported in all the editions!

In my opinion RADIUS is not a advance feature and belongs to all the editions of Horizon. This was always the case!

I’ll have a lot of customers who are using Horizon Standard with RADIUS support for two-factor authentication. Now they are stuck with the UAG 3.3.1 appliance or must heavily invest ($$$$$) in the advanced (or higher) edition of Horizon.

I’ll hope VMware will  judge again and make RADIUS support available in all the editions of Horizon.

Update: March 13, 2019: VMware Unified Access Gateway 3.5 is released. In this version there is no license requirement anymore based on the edition. All the features have been made available for all the Workspace ONE or Horizon editions. This is great news! RADIUS support is available for all editions in version 3.5 of the UAG.  More information: link.

Using the Shuttle SH370R6 plus as home lab server

For my home lab I needed a new host that replaces my Intel NUC (link) that act as management host. The hardware resources (CPU and memory) on the NUC were limiting my lab activities.

I had the following requirements for the new home lab host:

  • Ability to run the latest VMware ESXi 6.7 U1 version
  • Support for 64 GB memory
  • Fast storage support (M.2 NVMe SSD)
  • Room for PCI-Express add-on cards
  • Run nested environments
  • Low power consumption
  • Limited budget (around 1300 euro).

I did some research for a new home lab host and investigated hosts such as the Intel NUC, Apple Mac Mini 2018, Supermicro and Shuttle. In my searching journey I’ll ended having the following Bill Of Materials (BOM) shopping list:

Parts ~Price (€) Link 
Barebone System: Shuttle XPC cube barebone SH370R6 Plus (500 Watt (80 PLUS Silver) PSU.  291,00

 

Link
CPU: Intel Core i7 8700 with 6 cores and 12 threads 65W  333,45

 

Link
Memory: 4 x 16 GB, Kingston ValueRAM KVR26N19D8/16  488,00 Link
Disk: Samsung 970 EVO 1 TB M.2 and  217,99 Link
USB stick: Kingston Datatraveler 100 G3 32 GB    7,30 Link
Total 1337,74

Barebone System

A barebone is pre-installed system with a mainboard, GPU, Power Supply Unit (PSU), CPU cooler and cables in a small form factor. You’ll need to pick certain parts such as CPU, memory and disk(s) to match you’re needs. The Shuttle XPC cube barebone SH370R6 Plus has the following specifications:

  • Chassis: black aluminium (33.2 x 21.5 x 19 cm)
  • Bays: 1 x 5.25 and 2 x 3.5″
  • CPU: Socket LGA 1151 v2. Supports 8th and 9th generation Intel Core “Coffee Lake” processors such as Core i9 / i7 / i5 / i3, Pentium or Celeron  Shuttle has a I.C.E. heatpipe cooling system. A CPU with a maximum of 95 Watt Thermal Design Power (TDP) is supported.
  • Integrated Graphics: Intel UHD graphics 610/630 (in the processor). Supports three digital UHD displays at once
  • Chipset: Intel H370 PCH
  • Memory: up to 4 x 16 GB DDR-2400/2666 DIMM modules. Max 64 GB
  • Slots: 1 x PCIe X16 (v3.0) supports dual-slot graphics cards up to 273 mm length, 1 x PCIe X4 (v3.0), 1 x M.2-2280 (SATA / PCIe X4) supports M.2 SSDs, 1 x M.2-2230 supports WLAN cards
  • SATA: 4 x SATA 3.0 (6 Gb/s) supports RAID and RST
  • Video: HDMI 2.0a and 2 x DisplayPort 1.2
  • Connections: 4 x USB 3.1 Gen 2, 4 x USB 3.1 Gen 1, 4 x USB 2.0,
  • Audio: 5 x Audio (2 x front, 3 x rear)
  • Network: Intel Gigabit I211 adapter
  • PSU: Integrated 300  Watt (80 PLUS Bronze) in the SH370R6 or 500 Watt (80 PLUS Silver) in the SH370R6 plus version.

 

 

 

 

 

 

 

The Shuttle  SH370R6 plus has a 500 W Power supply so a GPU can be added  if needed. For now I will use the onboard GPU. There is one Intel Gigabit I211 NIC onboard.

Slots

The Shuttle has two slots available for add-on cards:

  • 1 x PCIe X16 (v3.0) supports dual-slot graphics cards up to 273 mm length,
  • 1 x PCIe X4 (v3.0)

I will use both slots with spare sparts, one slot for a Intel Gigabit  NIC and the other slot for an extra M.2 slot. In the future I can change the 1 Gigabit NIC for a 10 Gigabit NIC for example.

CPU

As CPU i decided to use an Intel Core i7 8700 8th generation Intel Core “Coffee Lake” processor with 6 cores and 12 threads.  It has a Thermal Design Power (TDP) of 65 W. The decision was based on pricing, horse power and  power consumption.

Memory

The maximum memory the Shuttle can handle is 64 GB. This is very unique for a barebone system on the moment. There a 4 DIMM slots for 4 x 16 GB. I used Kingston ValueRAM KVR26N19D8/16 that is not on the Shuttle hardware compatibility list (*1) but can be found in the Kingston compatibility list.

Storage

The Shuttle has one M.2 2280 slot. The Samsung EVO 970 1 TB NVMe SSD will be used as local datastore for storing VMs. Besides the SSD I use an existing QNAP NAS with a NFS connection to this host. ESXi will be installed in the USB stick.

(*1) Shuttle has a hardware compatibility list available (link).

Hardware installation

Once all the hardware parts were delivered it was time to build the Shuttle XPC SH370R6 Plus. Shuttle provides step by step documentation that describes every step.

  • Open the barebone Shuttle by removing 3 screws on the back and the move the aluminium case away.
  • Install the Intel CPU and add some thermal paste.

 

  • Shuttle has a own Integrated Cooling Engine (I.C.E.) heatpipe technology that delivers cooling for the CPU with a 92 mm fan.

  • Mount the I.C.E. heatpipe cooler on the CPU and attach the fan.

  • The Shuttle supports up to 64 GB DDR4 memory. Insert the 4 memory modules in the DIMM slots.

  • Attach the Samsung EVO 970 1 TB NVMe SSD in the M.2 2280 slot.

  • The mainboard has two slots for PCI-e cards, 1 x PCIe X16 (v3.0) supports dual-slot graphics cards up to 273 mm length for example and 1 x PCIe X4 (v3.0). I added the following spare parts that are lying around:
    • Intel Gigabit NIC (dedicated for my storage connection to my QNAP NAS).
    • PCIe to M.2 adapter (Lycom DT-120 M.2 PCIe to PCIe 3.0 x4 Adapter) and  installed a Samsung 950 Pro 512 GB M.2 SSD.

The documentation that Shuttle provides is very clear so the hardware installation went without any problems. Once the hardware parts are installed it was time for the first power-on. After the power-on I entered the BIOS and modified some settings such as:

  • Boot order
  • Disabled some hardware devices you don’t use such as audio
  • Use Smart FAN mode for controlling the FAN speed
  • Modify the power management by enabling “Power-On after Power-Fail”

Operating Systems

The Shuttle has no remote management functionality so you’ll need a physical monitor, mouse and keyboard  physically connected for the software installation. Once the installation is done, remote management in the software can be enabled.  Windows Server 2019 with the Hyper-V role and VMware ESXi 6.7 U1 is installed on the Shuttle SH370R6 plus. Both OSes are not officially supported. Here are my installation experiences:

Windows Server 2019

As test I installed Windows Server 2019 on the Samsung EVO 970 1 TB. The installation of Windows Server 2019 is finished within a couple of minutes. The onboard Intel Gigabit I211 NIC is not recognized by the Windows. The Intel driver is only for Windows 10 and won’t install on a server OSes by default. This issue is there for a long time (see link). Intel does not want that desktop NICs are being used on Windows Server OSes. With some hacking (link) it is possible to enable the NIC. The add-on Intel Gigabit NIC is recognized by default in Windows Server 2019. Enabled the Hyper-V role and installed some VMs without problems. The user experience is fast, very fast!

 

VMware ESXi

After the Windows Server 2019 test, it was time to create an USB stick (link) and install VMware ESXi  6.7 U1. The installation is a piece of cake. The onboard Intel Gigabit I211 NIC is recognized by ESXi and will be used for management and VM traffic in my config. The add-on Intel 1 GbE NIC is configured as NFS connection to my existing QNAP NAS. After some initial configuration (network and storage connection, NTP, and licensing) I migrated all the VMs from my old NUC to the new Shuttle. Most VMs are placed on the NVMe SSDs. I really like to performance boost of the VMs when they are running on the new hardware. When the load increases on the Shuttle the fan will make more noise (Smart FAN mode ). This can be annoying when you’re in the same room. In the BIOS you can tweak the FAN speed. For me this is not a problem because i’ll have a separate room were my home lab servers resides.

The performance difference with the old Intel NUC is huge. It’s a great user experience!

vSAN

By adding a PCIe  to M.2 adapter and a 10 Gigabit NIC it’s possible (not tested yet) to create a All Flash NVMe vSAN node. I added a PCIe to M.2 adapter and added a NVMe SSD and having two NVMe SSDs that are recognized by ESXi (see storage screenshot above).

Power Consumption

When the Shuttle is booted with ESXi and the two add-on cards are added without any VMs powered-on, the power consumption is around 20-24 W. The power consumption is depending on the amount resources that are being used. In my configuration, running 10 VMs that are using between 35 and 70 W.

Conclusion

The Shuttle SH370R6 Plus home lab host is now running for a couple of weeks. Here are my findings:

  • Depending on your requirements (such as budget) you can customize the SH370R6 Plus for you’re needs.
  • The hardware installation was easy and without problems. The documentation that Shuttle provides is very clear.
  • There two version of the  Shuttle: the SH370R6 and the SH370R6 Plus. The PSU is the difference, the plus version has a 500 W PSU (for adding a GPU).
  • The Shuttle barebone has a lightweight aluminium case (33.2 x 21.5 x 19 cm). This is a lot bigger than a Intel NUC but gives more extensible by adding hardware.
  • Using a 8th generation Intel I7 8700 with 6 cores and 12 threads with a TDP of 65 W is cost efficient and consumes less power but still having a powerful CPU (VMs with 12 vCPU are possible to create).
  • The Shuttle Barebone System supports up to 64 GB memory. For a barebone is every unique on the moment. Most barebone systems supports a maximum of 32 GB on the moment!
  • There is one M.2-2280M SSD slot for a NVMe SSD on the mainboard. By adding a PCIe to M.2 adapter you can add an extra NVMe SSD. Great for a vSAN All Flash (AF) use cases.
  • There is room for two 3.5″ HDDs. I didn’t test if the SATA controller are recognized by ESXi because I use M.2 SSDs and NFS storage.
  • Two PCI-Express slots are available: 1 x PCIe X16 (v3.0) and 1 x PCIe X4 (v3.0).  This makes it possible to add a GPU and 10 Gigabit adapter for example.
  • When the load increases on the Shuttle the fan will make more noise (Smart fan mode ). This can be annoying when you’re in the same room. In the BIOS you can tweak the fan speed. For me this is not a problem because i’ll have a separate room the the home lab servers.
  • The Shuttle is not officially certified and supported by VMware and Microsoft
  • The performance and user experience are fast, very fast!
  • More information about the Shuttle SH370R6 Plus can be found here, link.

I’m happy with my new home lab member, the Shuttle SH370R6 Plus because all my requirements are met and I really like to performance boost of the VMs.