VMware ESX uptime 570 days

Today I did a Healthcheck of a VMware ESX environment.  The environment was still VMware ESX 3.0.0 build 42531. In ESXTOP the uptime was 570 days! WOW the last time I saw systems with such a long uptime was in the Netware days.

uptimeESX01

Site Recovery Manager (SRM) Update 1 released

SRM 1.0 Update 1 has the following new features:

    • New Permission Required to Run a Recovery Plan
      SRM now distinguishes between permission to test a recovery plan and permission to run a recovery plan. After an SRM server is updated to this release, existing users of that server who had permission to run a recovery plan no longer have that permission. You must grant Run permission to these users after the update is complete. Until you do, no user can run a recovery plan. (Permission to test a recovery plan is unaffected by the update.)
    • Full Support for RDM devices
      SRM now provides full support for virtual machines that use raw disk mapping (RDM) devices. This enables support of several new configurations, including Microsoft Cluster Server. (Virtual machine templates cannot use RDM devices.)
    • Batch IP Property Customization
      This release of SRM includes a tool that allows you to specify IP properties (network settings) for any or all of the virtual machines in a recovery plan by editing a comma-separated-value (csv) file that the tool generates.
    • Limits Checking and Enforcement
      A single SRM server can support up to 500 protected virtual machines and 150 protection groups. This release of SRM prevents you from exceeding those limits when you create a new protection group. If a configuration created in an earlier release of SRM exceeds these limits, SRM displays a warning, but allows the configuration to operate.
    • Improved Support for Virtual Machines that Span Multiple Datastores.
      This release provides improved support for virtual machines whose disks reside on multiple datastores.
    • Single Action to Reconfigure Protection for Multiple Virtual Machines
      This release introduces a Configure All button that applies existing inventory mappings to all virtual machines that have a status of Not Configured.
    • Simplified Log Collection
      This release introduces new utilities that retrieve log and configuration files from the server and collect them in a compressed (zipped) folder on your desktop.
    • Improved Acceptance of Non-ASCII Characters
      non-ASCII characters are now allowed in many fields during installation and operation.

SRM 1.0 Update 1 has great improvements such as:

– RDM support including Microsoft Cluster services

– Support VM’s that span multiple datastores

– Support for ESX 3.5 Update 3 (You must have the latest patches!)

– Support for VC 2.5 Update 3

Before you begin with SRM check the compatibility matrix

The release notes are here.

VMware patch alert

VMware released the following six new pathes:

Patch name Description Type
ESX350-200811401-SG – PATCH Updates VMkernel, hostd, and Other RPMs Security
ESX350-200811402-SG – PATCH Updates ESX Scripts General
ESX350-200811405-SG – PATCH Security Update to libxml2 Security
ESX350-200811406-SG – PATCH Security Update to bzip2 Security
ESX350-200811408-BG – PATCH Updates QLogic Software Driver CRITICAL
ESX350-200811409-BG – PATCH Updates Kernel Source and VMNIX CRITICAL

 

ESX350-200811401-SG – PATCH solves some spontaneously reboots when the setting “VMware Tools to automatically upgrade Tools before each power-on” is on, see  below the summaries and symptoms:

This patch fixes the following issues:

  • A memory corruption condition may occur in the virtual machine hardware. A malicious request sent from the guest operating system to the virtual hardware may cause the virtual hardware to write to uncontrolled physical memory.

    The Common Vulnerabilities and Exposures project (cve.mitre.org) has assigned the name CVE-2008-4917 to this issue.

  • VMotion might trigger VMware Tools to automatically upgrade. This issue occurs on virtual machines that have the setting for Check and upgrade Tools before each power-on enabled, and the affected virtual machines are moved, using VMotion, to a host with a newer version of VMware-esx-tools.

    • Virtual machines unexpectedly restart during a VMotion migration.
    • The guest operating systems might stall (reported on forums).

    Note: After patching the ESX host, you need to upgrade VMware Tools in the affected guests that reside on the host.

    Symptoms seen without this patch:

  • Swapping active and standby NICs results in a loss of connectivity to the virtual machine. 
  • A race issue caused an ASSERT_BUG to unnecessarily run and caused the ESX host to crash. This change removes the invalid ASSERT_BUG.Symptoms seen without this patch: The ESX host crashes with an ASSERT message that includes fs3DiskLock.c:1423.Example:
    ASSERT /build/mts/release/bora-77234/bora/modules/vmkernel/vmfs3/fs3DiskLock.c:1423 bugNr=147983

     

  • A virtual machine can become registered on multiple hosts due to a .vmdk file locking issue. This issue occurs when network errors cause HA to power on the same virtual machine on multiple hosts, and when SAN errors cause the host on which the virtual machine was originally running to lose its heartbeat. The original virtual machine becomes unresponsive.

    With this patch, the VI Client displays a dialog box warning you that a .vmdk lock is lost. The virtual machine is powered off after you click OK.

  • This change fixes confusing VMkernel log messages in cases where one of the storage processors (SP) of an EMC CLARiiON CX storage array is hung. The messages now correctly identify which SP is hung.Example of confusing message:
    vmkernel: 1:23:09:57.886 cpu3:1056)WARNING: SCSI: 2667: CX SP B is hung.
    vmkernel: 1:23:09:57.886 cpu3:1056)SCSI: 2715: CX SP A for path vmhba1:2:2 is hung.

    vmkernel: 1:23:09:57.886 cpu3:1056)WARNING: SCSI: 4282: SP of path vmhba1:2:2 is
    hung. Mark all paths using this SP as dead. Causing full path failover.

    In this case, research revealed that SP A was hung, but SP B was not.

  • This patch allows VMkernel to successfully boot on unbalanced NUMA configurations—that is, those with some nodes having no CPU or memory. When such unbalanced configuration is detected, VMkernel shows an alert and continues booting. Previously, VMkernel failed to load on such NUMA configurations.

    Sample alert message when memory is missing from one of the nodes (here, node 2):

    No memory detected in SRAT node 2. This can cause very bad performance.

  • When the zpool create command from a Solaris 10 virtual machine is run on a LUN that is exported as a raw device mapping (RDM) to that virtual machine, the command creates a partition table of type GPT (GUID partition table) on that LUN as part of creating the ZFS filesystem. Later when a LUN rescan is run on the ESX server through VirtualCenter or through the command line, the rescan takes a significantly long amount of time to complete because the VMkernel fails to read the GUID partition table. This patch fixes this problem.
    Symptoms seen without this patch: Rescanning HBAs takes a long time and an error message similar to the following is logged in /var/log/vmkernel:

    Oct 31 18:10:38 vmkernel: 0:00:45:17.728 cpu0:8293)WARNING: SCSI: 255: status Timeout for vml.02006500006006016033d119005c8ef7b7f6a0dd11524149442030. residual R 800, CR 80, ER 3

  • A race in LVM resignaturing code can cause volumes to disappear on a host when a snapshot is presented to multiple ESX hosts, such as in SRM environments.
    Symptoms: After rescanning, VMFS volumes are not visible.
  • This change resolves a rare VMotion instability.

    Symptoms: During a VMotion migration, certain 32-bit applications running in 64-bit guests might crash due to access violations.

  • Solaris 10 Update 4, 64-bit graphical installation fails with the default virtual machine RAM size of 512MB.
  • DRS development and performance improvement. This change prevents unexpected migration behavior.
  • In a DRS cluster environment, the hostd service reaches a hard limit for memory usage, which causes hostd to restart itself.

    Symptoms: The hostd service restarts and temporarily disconnects from VirtualCenter. The ESX host stops responding before hostd reconnects.

  • Fixes for supporting Site Recovery Manager (upcoming December 2008 release) on ESX 3.5 Update 2 and Update 3.
When installing this patch you must reboot your ESX server(s) and update the VMware tools on the VMs, this requires a reboot to. 
 
The Virtual Machine Monitoring reboot problem (see post https://www.ivobeerens.nl/?p=180) is NOT fixed, :-(. I hope VMware will fix this soon.