vSphere 6 vMotion Enhancements

Starting with vSphere 6, VMware provided some new enhancements to the existing vMotion capabilities. Let us look at the history of vMotion over the last couple vSphere versions:

  • vSphere 5.0
    • Multi-NIC vMotion, which allows you to dedicate multiple NICs for vMotion
    • SDPS – Stun During Page Send has been introduced. SDPS ensures that vMotion will not fail due to memory copy convergence issues. Previously, vMotion might fail if the virtual machine modifies memory faster than it can be transferred. SDPS will slow down the virtual machine to avoid such a case
  • vSphere 5.1
    • vMotion without shared storage – vMotion can now migrate virtual machines to a different host and datastore simultaneously. Also, the storage device no longer needs to be shared between the source host and destination host

So, what are the new vMotion enhancements in vSphere 6? There are three major enhancements:

vMotion across vCenters

  • Simultaneously change compute, storage, networks and management
  • Leverage vMotion with unshared storage
  • Support local, metro and cross-continental distances

Screen Shot 2014-12-30 at 10.55.05 AM

Requirements for vMotion across vCenter Servers:

  • Supported only starting with vSphere 6
  • Same Single-Sign-On domain for destination vCenter Server instance using UI; different SSO domain possible if you use the API
  • 250 Mbps network bandwidth per vMotion operation

 

vMotion across vSwitches aka x-vSwitch vMotion)

  • x-vSwitch vMotion is fully transparent to the Guest.
  • Required L2 VM network connectivity
  • Transfers Virtual Distributed Switch port metadata
  • Works with a max of virtual switches

Screen Shot 2014-12-30 at 10.49.04 AM

Long-distance vMotion

  • Allows cross-continental vMotion with up to 100ms latency (Round-trip delay time)
  • Does not require vVols
  • Use Cases:
    • Permanent migrations
    • Disaster avoidance
    • SRM/DA testing
    • Multi-site load balancing
  • vMotion network will cross L3 boundaries
  • NFC network, carrying cold traffic, will be configurable

Screen Shot 2014-12-30 at 11.01.00 AM

 

What’s required for Long-distance vMotion?

  • If you use vMotion across multiple vCenters, then vCenters must connect via L3
  • VM network:
    • L2 connection
    • Same VM IP address available at destination
  • vMotion network:
    • L3 connection
    • Secure (dedicated or encrypted)
    • 250 Mbps per vMotion operation
  • NFC network:
    • Routed L3 through Management Network or L2 connection
    • Networking L4-L7 services manually configured at destination

Long-distance vMotion supports Storage Replication Architectures

  • Active-active replicated storage appears as shared storage to the Vm
  • Migration over active-active replication is classic vMotion
  • VVOLs are required for geo distances

vSphere 6 Fault Tolerance

VMware vSphere Fault Tolerance (FT) provides continuous availability for applications on virtual machines.

FT creates a live clone instance of a virtual machine that is always up-to-date with the primary virtual machine. In the event of a host/hardware failure, vSphere Fault Tolerance will automatically trigger a failover, ensuring zero downtime and data loss. VMware vSphere Fault Tolerance utilized heartbeats between the primary virtual machine and the live clone to ensure availability. In case of a failover, a new live clone will be created to deliver continuous protection for the VM.

The VMware vSphere Fault Tolerance FAQ can be found here.

Screen Shot 2014-12-30 at 5.58.55 PM

 

On a first glance, VMware vSphere Fault Tolerance seems like a great addition to vSphere HA Clusters to ensure continuous availability within your VMware vSphere environment.

However, in VMware vCenter Server 4.x and 5.x only one virtual CPU per protected virtual machine is supported. If your VM uses more than one virtual CPU, you will not be able to enable VMware vSphere Fault Tolerance on this machine. Obviously, this is an enormous short-come and explains why many companies are not using VMware’s FT capability.

So what’s new with vSphere 6 in regards to Fault Tolerance?

  • Up to 4 virtual CPUs per virtual machine
  • Up to 64 GB RAM per virtual machine
  • HA, DRS, DPM, SRM and VDS are supported
  • Protection for high performance multi-vCPU VMs
  • Faster check-pointing to keep primary and secondary VM in sync
  • VMs with FT enabled, can now be backed up with vStorage APIs for Data Protection (VADP)

With the new features in vSphere 6, Fault Tolerance will surely get much more traction, since you can finally enable FT on VMs with up to 4 vCPUs.

vSphere 6 NFSv4.1

As most of us know, VMware supports many storage protocols – FC, FCoE, iSCSI and NFS.
However, only NFSv3 was supported in vSphere 4.x and 5.x. NFSv3 has many limitations and shortcomings like:

  • No multipathing support
  • Proprietary advisory locking due to lack of proper locking from protocol
  • Limited security
  • Performance limited by the single server head

Starting with vSphere 6, VMware introduces NFSv4.1. Compared to NFSv3, v4.1 brings a bunch of new features:

  • Session Trunking/Multipathing
    • Increased performance from parallel access (load balancing)
    • Better availability from path failover
  • Improved Security
    • Kerberos, Encryption and Signing is supported
    • User authentication and non-root access becomes available
  • Improved Locking
    • In-band mandatory locks, no longer proprietary advisory locking
  • Better Error Recovery
    • Client and server not state-less any more, with recoverable context
  • Efficient Protocol
    • Less chatty, no file lock heartbeat
    • Session leases

Note: NFSv4.1, does not support SDRS, SIOC, SRM and vVOLs.

Supportability of NFSv3 and NFSv4.1:

  • NFSv3 locking is not compatible with NFS 4.1
    • NFSv3 uses propriety client side locking
    • NFSv4.1 uses server side locking
  • Single protocol accessforadatastore
    • Use either NFSv3 or NFSv4.1 to mount the same NFS share across all ESXi hosts within a vSphere HA cluster
    • Mounting one NFS share as NFSv3 on one ESX host and the same share as NFSv4.1 on another host is not supported!

Kerberos Support for NFSv4.1:

  • NFSv3 only supports AUTH_SYS
  • NFSv4.1 support AUTH_SYS and Kerberos
  • Requires Microsoft AD for KDC
  • Supports RPC header authentication (rpc_gss_svc_none or krb5)
  • Only supports DES-CBC-MD5
    • Weaker but widely used
    • AES-HMAC not supported by many vendors

Implications of using Kerberos:

  • NFSv3 to NFSv4.1
    • Be aware of the uid, gid on the files
    • For NFSv3 the uid & gid will be root
    • Accessing files created with NFSv3 from NFSv4.1 – Kerberized client will result in permission denied errors
  • Always use the same user on all hosts
    • vMotion and other features might fail if two hosts use different users
    • Host Profiles can be used to automate the usage of users

 

Remove VIB – Device or resource busy

Today, I was playing around with some vSphere Installation Bundles (VIB) and ran into an issue when I tried to remove vib:

esxcli software vib remove -n vmware-esx-KoraPlugin 
 [InstallationError]
 Error in running rm /tardisks/:
 Return code: 1
 Output: rm: can't remove '/tardisks/': Device or resource busy

Even adding the –force attribute did not help in this situation.

The following workaround seemed to be working for me:

  1. Stophostd on theESXi host – this will be non-disruptive to yourVMs
    /etc/init.d/hostd stop
  2. Runlocalcli to uninstallVIB
    localcli software vib remove -n vmware-esx-KoraPlugin

    Note: We need to run localcli, since esxcli is not available if hostd is stopped

  3. Starthostd on theESXi host
    /etc/init.d/hostd start
  4. Verify thatvmware-esx-KoraPlugin no longer shows up
    esxcli software vib list | grep  vmware-esx-KoraPlugin

You should no longer see the VIB installed on your ESXi host.
Localcli is not widely spread within the community and mainly used by VMware’s Technical Support. It provides more troubleshooting capabilities, even if hostd is not running.

 

 

Unmount VMware Datastore – Device Busy

Welcome back, I hope everyone had some time to relax and spend the christmas holidays with their families. I was lucky enough to have some time off and play with my lab.

After playing around with some newly deployed NFS datastores, I tried to unmount them and I got a device busy error and on the CLI I’ve got Sysinfo error on operation returned status : Busy. Please see the VMkernel log for detailed error information.

Let me show you the steps I ran through:

  1. Mounted a newNFSdatastore throughtheESXCLI
    1. esxcli storage nfs add  --host=nas-ip --share=share-name --volume-name=volume_name
  2. List all NFS shares
    1. ~ # esxcli storage nfs list
      Screen Shot 2015-01-12 at 3.04.01 PM
  3. Verify that all VMs on this datastore are either powered off or have been vMotioned to another datastore
  4. Trytounmountthedatastore
    1. esxcli storage nfs remove -v b3c1
      Sysinfo error on operation returned status : Busy. Please see the VMkernel log for detailed error information
  5. Looking through the vmkernel.log, doesn’thelpmuch either. The only messageprintedthere is
    1. 2015-01-12T23:10:09.357Z cpu2:49350 opID=abdf676b)WARNING: NFS: 1921: b3c1 has open files, cannot be unmounted
  6. After some searching, I found this article on VMware
  7. Basically, the issue seems to be that vSphere HA Datastore Heartbeats are  enabled on this datastore, which is causing the device to be busy.

The solution for this problem is pretty simple. Open your vSphere Client and edit the vSphere HA settings, after selecting your vSphere HA Cluster. Within the settings, make sure to set the vSphere HA heartbeats to Use Datastores only from the specified list and deselect your datastore, which you try to unmount.

Screen Shot 2015-01-12 at 2.39.42 PM

After changing the setting, I was able to successfully unmount the NFS share with the following command esxcli storage nfs remove -v datastore_name