VMware patch alert

VMware released the following six new pathes:

Patch name Description Type
ESX350-200811401-SG – PATCH Updates VMkernel, hostd, and Other RPMs Security
ESX350-200811402-SG – PATCH Updates ESX Scripts General
ESX350-200811405-SG – PATCH Security Update to libxml2 Security
ESX350-200811406-SG – PATCH Security Update to bzip2 Security
ESX350-200811408-BG – PATCH Updates QLogic Software Driver CRITICAL
ESX350-200811409-BG – PATCH Updates Kernel Source and VMNIX CRITICAL

 

ESX350-200811401-SG – PATCH solves some spontaneously reboots when the setting “VMware Tools to automatically upgrade Tools before each power-on” is on, see  below the summaries and symptoms:

This patch fixes the following issues:

  • A memory corruption condition may occur in the virtual machine hardware. A malicious request sent from the guest operating system to the virtual hardware may cause the virtual hardware to write to uncontrolled physical memory.

    The Common Vulnerabilities and Exposures project (cve.mitre.org) has assigned the name CVE-2008-4917 to this issue.

  • VMotion might trigger VMware Tools to automatically upgrade. This issue occurs on virtual machines that have the setting for Check and upgrade Tools before each power-on enabled, and the affected virtual machines are moved, using VMotion, to a host with a newer version of VMware-esx-tools.

    • Virtual machines unexpectedly restart during a VMotion migration.
    • The guest operating systems might stall (reported on forums).

    Note: After patching the ESX host, you need to upgrade VMware Tools in the affected guests that reside on the host.

    Symptoms seen without this patch:

  • Swapping active and standby NICs results in a loss of connectivity to the virtual machine. 
  • A race issue caused an ASSERT_BUG to unnecessarily run and caused the ESX host to crash. This change removes the invalid ASSERT_BUG.Symptoms seen without this patch: The ESX host crashes with an ASSERT message that includes fs3DiskLock.c:1423.Example:
    ASSERT /build/mts/release/bora-77234/bora/modules/vmkernel/vmfs3/fs3DiskLock.c:1423 bugNr=147983

     

  • A virtual machine can become registered on multiple hosts due to a .vmdk file locking issue. This issue occurs when network errors cause HA to power on the same virtual machine on multiple hosts, and when SAN errors cause the host on which the virtual machine was originally running to lose its heartbeat. The original virtual machine becomes unresponsive.

    With this patch, the VI Client displays a dialog box warning you that a .vmdk lock is lost. The virtual machine is powered off after you click OK.

  • This change fixes confusing VMkernel log messages in cases where one of the storage processors (SP) of an EMC CLARiiON CX storage array is hung. The messages now correctly identify which SP is hung.Example of confusing message:
    vmkernel: 1:23:09:57.886 cpu3:1056)WARNING: SCSI: 2667: CX SP B is hung.
    vmkernel: 1:23:09:57.886 cpu3:1056)SCSI: 2715: CX SP A for path vmhba1:2:2 is hung.

    vmkernel: 1:23:09:57.886 cpu3:1056)WARNING: SCSI: 4282: SP of path vmhba1:2:2 is
    hung. Mark all paths using this SP as dead. Causing full path failover.

    In this case, research revealed that SP A was hung, but SP B was not.

  • This patch allows VMkernel to successfully boot on unbalanced NUMA configurations—that is, those with some nodes having no CPU or memory. When such unbalanced configuration is detected, VMkernel shows an alert and continues booting. Previously, VMkernel failed to load on such NUMA configurations.

    Sample alert message when memory is missing from one of the nodes (here, node 2):

    No memory detected in SRAT node 2. This can cause very bad performance.

  • When the zpool create command from a Solaris 10 virtual machine is run on a LUN that is exported as a raw device mapping (RDM) to that virtual machine, the command creates a partition table of type GPT (GUID partition table) on that LUN as part of creating the ZFS filesystem. Later when a LUN rescan is run on the ESX server through VirtualCenter or through the command line, the rescan takes a significantly long amount of time to complete because the VMkernel fails to read the GUID partition table. This patch fixes this problem.
    Symptoms seen without this patch: Rescanning HBAs takes a long time and an error message similar to the following is logged in /var/log/vmkernel:

    Oct 31 18:10:38 vmkernel: 0:00:45:17.728 cpu0:8293)WARNING: SCSI: 255: status Timeout for vml.02006500006006016033d119005c8ef7b7f6a0dd11524149442030. residual R 800, CR 80, ER 3

  • A race in LVM resignaturing code can cause volumes to disappear on a host when a snapshot is presented to multiple ESX hosts, such as in SRM environments.
    Symptoms: After rescanning, VMFS volumes are not visible.
  • This change resolves a rare VMotion instability.

    Symptoms: During a VMotion migration, certain 32-bit applications running in 64-bit guests might crash due to access violations.

  • Solaris 10 Update 4, 64-bit graphical installation fails with the default virtual machine RAM size of 512MB.
  • DRS development and performance improvement. This change prevents unexpected migration behavior.
  • In a DRS cluster environment, the hostd service reaches a hard limit for memory usage, which causes hostd to restart itself.

    Symptoms: The hostd service restarts and temporarily disconnects from VirtualCenter. The ESX host stops responding before hostd reconnects.

  • Fixes for supporting Site Recovery Manager (upcoming December 2008 release) on ESX 3.5 Update 2 and Update 3.
When installing this patch you must reboot your ESX server(s) and update the VMware tools on the VMs, this requires a reboot to. 
 
The Virtual Machine Monitoring reboot problem (see post https://www.ivobeerens.nl/?p=180) is NOT fixed, :-(. I hope VMware will fix this soon.
 
 
 
 

VMware ESX 3.5 Update 3 VM’s spontaneously reboot.

A week ago i upgraded a customers environment to VMware ESX 3.5 Update 3 and VC 2.5 Update 3. After the upgrade some Virtual Machines (VM) spontaneously rebooted. After investing the problem we saw that after a VMotion action the spontaneously reboot occurred.

We disabled in HA the option “Virtual Machine Monitoring” and set DRS to manual.  The problem with Virtual Machine monitoring is:

The Virtual Machine heartbeats are being dropped which is triggered by Vmotion and the VM gets reset by the HA feature as it thinks it has gone offline. Since the feature is now off it should be safe to turn on DRS again.

 

There are more people who have this problem, read the following post on the VMware forum, 3.5U3 – any guinea pigs yet?.

I made a support request @ VMware. The told me today that 20 November patch 10 for VMware 3.5 Update 3 will be released. Patch 10 fixes SOME random reboot problems in Update 3. I hope it resolves this nasty issue.