Best practices for HP EVA, vSphere 4 and Round Robin multi-pathing

 

 To get your HP EVA storage system  and VMware ESX hosts storage balances you get a better performance. Here are some Best practices.

The VMware vSphere and the HP EVA 4×00, 6×00 and 8×00 series are ALUA compliant. ALUA compliant means in simple words that it is not needed to manually identify preferred I/O paths between VMware ESX hosts and the storage controllers.

When you create a new Vdisk on the HP EVA the LUN is set default set to No Preference. The No Preference policy means the following: 

  • Controller ownership is non-deterministic. The unit ownership is alternated between controllers during initial presentation or when controllers are restarted

  • On controller failover (owning controller fails), the units are owned by the surviving controller

  • On controller failback (previous owning controller returns), the units remain on the surviving controller. No failback occurs unless explicitly triggered.

To get a good distribution between the controllers the following Vdisk policies can be used:

 Path A-Failover/failback

– At presentation, the units are brought online to controller A

– On controller failover, the units are owned by the surviving controller (B)

– On controller failback, the units are brought online on controller A implicitly.

 Path B-Failover/failback

– At presentation, the units are brought online to controller B

– On controller failover, the units are owned by surviving controller (A)

– On controller failback, the units are brought online on controller B implicitly.

On the HP EVA half of the Vdisks are set on path A-Failover/failback and the other half  of the Vdisks are set to B-Failover/failback, so that they alternate between controller A and B. This can be done from the HP EVA command view.  Now the vDisk are distributed between the two controllers we can go to the vSphere configuration. On every vSphere host perform an rescan or reboot.

In VMware vSphere the Most Recently Used (MRU) and Round Robin (RR) multi-pathing policies are ALUA compliant. Round Robin load balancing is now officially supported.  These multi-path policies have the following characteristics:

MRU:

– Will give preference to an optimal path to the LUN

– When all optimal paths are unavailable, it will use a non-optimal path

 – When an optimal path becomes available, it will failover to the optimal

– Although each ESX server may use a different port through the optimal controller to the LUN, only a single controller port is used for LUN access per ESX server

 Round Robin:

– Will queue I/O to LUNs on all ports of the owning controllers in a round robin fashion providing instant bandwidth improvement

– Will continue queuing I/O in a round robin fashion to optimal controller ports until none are available and will failover to the non-optimal paths

– Once an optimal path returns it will failback to it

Can be configured to round robin I/O to all controller ports for a LUN by ignoring optimal path preference. (May be suitable for a write intensive environment due to increased controller port bandwidth)

The fixed multi-path policy is not ALUA compliant and therefore not recommend to use.

In vSphere 4 there is new multi-pathing framework. There are three core components:

– Native Multi-pathing Plugin (NMP), handles the multi-pathing configuration, communicates with the SATP and PSP to indentify path failure conditions.

– Storage Array Type Plugin (SATP), handles specific operations such as device discovery, error codes and failover.

– Path Selection Plugin (PSP), handles the best available path, there are three policies fixed, MRU and Round Robin.

PSP are set per LUN, meaning that it is possible to have some LUNs use MRU and other use Round Robin policy. Best practice from HP is to change to PSP from MRU to Round Robin we use the following command in the Service Console:

esxcli nmp satp setdefaultpsp --satp VMW_SATP_ALUA --psp VMW_PSP_RR

Another Best practice is to set the IOPS (Default the IOPS value is 1000) with a value of 1 (controls how many IOs are sent down a given path before vSphere starts to use the next path) for every LUN by using the following command:

for i in `ls /vmfs/devices/disks/ | grep naa.600` ; 
do esxcli nmp roundrobin setconfig --type "iops" --iops=1 --device $i ;done

But there is a bug when rebooting the VMware ESX server, the IOPS value reverted to a random value. More information can be found on the Virtual Geek blog from Chad Sakac. To check the IOPS values on all LUNs use the following command:

for i in `ls /vmfs/devices/disks/ | grep naa.600` ; 
do esxcli nmp roundrobin getconfig --device $i ;done

image

To solve this IOPS bug, edit the /etc/rc.local file on every VMware ESX host and and add the  IOPS=1 command. The rc.local file execute after all init scripts are executed.

clip_image002[5]

After adding the IOPS=1 command restart the VMware ESX host and check the IOPS values when its back online.

clip_image002[7]

Now you can check if the the Round Robin policy is active and the LUNs are spread over the two controllers.

clip_image002 clip_image002[4]

Here are some great PowerCLI one-liners created by Luc Dekens. Thanks for creating so quickly these one-liners for me!

Set the multi-path policy to Round Robin for all hosts:

Get-VMHost|Get-ScsiLun -LunType "disk"|where {$_.MultipathPolicy –ne 
"RoundRobin"}|Set-ScsiLun -MultipathPolicy "RoundRobin" 

Get the multi-path policy for one host:

Get-VMHost <ESXname> | Get-ScsiLun | Select CanonicalName, MultiPathPolicy

 Get the multi-path policy for all the hosts:

Get-VMHost | %{$_.Name; $_ | Get-ScsiLun | Select CanonicalName, MultiPathPolicy}

[ad#banner]

 

source: Configuration best practices for HP StorageWorks Enterprise Virtual Array (EVA) family and VMware vSphere 4

 

 

25 thoughts on “Best practices for HP EVA, vSphere 4 and Round Robin multi-pathing”

  1. I hardly believe that setting path IO limit to 1 is best practice, actually I would strongly advise against that.

  2. A colleague and I authored a white-paper on EVA best practices for EVA4.0. In that document we do recommend the setting of IOPS=1. So I assume it is where the author of this blog found that information. This link to this paper is at: http://h20195.www2.hp.com/V2/GetDocument.aspx?docname=4AA1-2185ENW&cc=us&lc=en

    Now, the reason for the IOPS=1 recommendation is because during lab tests that setting showed a nice even distribution of IOs through all EVA ports used. If you experiment with this you can see the queue depth for all EVA ports used very much even and also throughput through the various ports. Additionally with the workloads with experimented with, we noticed better overall performance with this setting. This is why we recommended it as a starting point for EVA configurations. As we know, best practices are not a one recommendation fits all cases. So we also recommend that end customers experiment when possible with various settings and tune their environment accordingly. But as a starting point IOPS=1 showed to be adequate for many workloads types.

    R/
    Aboubacar.

  3. Very nice post.
    would it be possible to do “esxcli nmp satp setdefaultpsp –satp VMW_SATP_ALUA –psp VMW_PSP_RR” in PowerrrCli?

  4. Loosing the IOPS=1 setting still happens today with ESX4.0.0 build 236512 (latest).

    the command mentioned above to set the IOPS gave me errors. The one below works:
    for i in `esxcli nmp device list | grep ^naa.600`;
    do esxcli nmp roundrobin setconfig –type “iops” –iops=1 –device $i;
    done;
    Difference is : ^naa.600 instead of naa.600

  5. Hi,
    I’m new to PowerCli and scripting al together, so I might be doing a rooke mistake here.
    When I try to run the oneliner to get MultiPath policy for al hosts, i get the folowing error:
    At line:1 char:41
    + Get-VMHost | %{$_.Name; $_ | Get-ScsiLun <<<< | Select CanonicalName, MultiPathPolicy}
    + CategoryInfo : NotSpecified: (:) [Get-ScsiLun], ArgumentException
    + FullyQualifiedErrorId : System.ArgumentException,VMware.VimAutomation.VimAutomation.Commands.Host.GetScsiLun

  6. Hi,

    We had an issue where the ESXi hosts were throwing “Unknow device naa.600” errors, even though the LUNs were fine, and the VMs on them working working.

    AFter a long analysis, we disabled the following in the rc.locol:

    for i in `ls /vmfs/devices/disks/ | grep naa.600` ; do esxcli nmp roundrobin setconfig –type “iops” –iops=1–device $i; done)

    Doing this stopped the errors.
    Can some body explain why the best practise has caused this issue?

    regards,
    Ziddi.

  7. We just had vmware tell us that they don’t recommend round robin on vSphere with EVA8000. Anyone have any problems with their round robin setup on an EVA?

  8. It looks like PowerCLI script to set the multipath policy to round robin will also affect local drives on the host. Is this correct? And if so, will this affect anything since there is just one path to those drives anyway?

    Thanks,

    Michael

  9. No issues reported before had seen issues with fixed paths and the 8000 series that was leading to LUN drops at random points.

  10. Should you also adjust the path selection of the Storage Array controllers ? E.G. a HP Fibre channel raid controller (naa.5001etc.)

    Can this both be RR as MRU ?

  11. Use the nmp plugin for the device list output for the device list
    Enable round robin for all LUNs
    for dev in $(esxcli nmp device list | grep “^naa”); do esxcli nmp device setpolicy –psp VMW_PSP_RR –device $dev; done

    Get config
    for dev in $(esxcli nmp device list | grep “^naa”); do esxcli nmp roundrobin getconfig –device $dev; done

  12. Pingback: vSphere Round Robin MultiPathing « Phil the Virtualizer
  13. This is what I used for ESXi 4.1 update1 attached to EVA4400 and P2000. Rebooted server after the first command.

    esxcli nmp satp setdefaultpsp –satp VMW_SATP_ALUA –psp VMW_PSP_RR

    for i in `esxcli nmp device list | grep ^naa.600` ; do esxcli nmp roundrobin setconfig –type iops –iops=1 –device $i; done

  14. Hello.

    The correct format of this command via SSH is:

    ~ # esxcli storage nmp psp roundrobin deviceconfig set -t iops -I 1 -d naa.600c0ff000115f100db4c

    The command from the HP whitepaper is totally in the woods.

  15. We where told the following to make it stick.

    NOTE: Setting the IOPS with a value of 1 does not persist across reboots. Upon reboot the value will jump to the default of 1000. The following command is recommended to make the setting permanent.

    # esxcli nmp psp setconfig –device –config “policy=iops;iops=XXX”

  16. After i did what you wrote, my ESXi 4.1 went in total lockdown. Cant even re-image it anymore either. Still checking what just happened… but this was the only change that was made. FYI.

  17. ESXi5 finally retains the IO limit setting !
    I used the following commands:

    Determine IO limit
    for i in `ls /vmfs/devices/disks/ | grep naa.600` ; do esxcli storage nmp psp roundrobin deviceconfig get -d $i ;done

    Set IO limit to 1
    for i in `ls /vmfs/devices/disks/ | grep naa.600` ; do esxcli storage nmp psp roundrobin deviceconfig set -t iops -I 1 -d $i ;done

  18. ESXi5 script to apply PSP default Roundrobin for SATP ALUA, update any assigned LUNS to Roundrobin and change IOPS to 1.

    I hacked this out to apply on my ESXI5 host as we upgraded from ESX 4.0 to ESXi5.0, you will need to change list value depending on your system, (“naa.600508b4*”} ) Hope it helps,

    # Enter ESX Host to Connect to
    write-host “”
    $VCServerName = Read-Host “Enter ESX Host to Connect to”
    Write-host “Enter User ID Root and Password”
    write-host “”
    Connect-VIServer $VCServerName
    $psp = “VMW_PSP_RR”
    $satp = “VMW_SATP_ALUA”
    Write-host “”
    Write-Host “Updating IOPS to 1 for all san luns $vcservername”
    $esxcli= get-esxcli
    $esxcli.storage.nmp.device.list() | where {$_.device -like “naa.600508b4*”} | %{
    $esxcli.storage.nmp.device.set($null, $_.Device, “VMW_PSP_RR”)
    $esxcli.storage.nmp.psp.roundrobin.deviceconfig.set($null, $_.device, 1, “iops”, $null)
    }
    #Display Device Paths
    $esxcli.storage.nmp.device.list() | where {$_.device -like “naa.600508b4*”} | Select Device,PathSelectionPolicy,StorageArrayType, pathselectionpolicydeviceconfig

    #Change the default PSP for my SATP
    $esxcli.storage.nmp.satp.set($null,$psp,$satp) | Out-Null

    disconnect-viserver $VCServerName -Confirm:$false | Out-Null

  19. Pingback: VMWARE TECHNICAL BEST PRACTICE DOCUMENT « Logeshkumar Marudhamuthu

Leave a Comment