Mind The Virtualization

vSphere 6 vMotion Enhancements

Starting with vSphere 6, VMware provided some new enhancements to the existing vMotion capabilities. Let us look at the history of vMotion over the last couple vSphere versions:

vSphere 5.0
- Multi-NIC vMotion, which allows you to dedicate multiple NICs for vMotion
- SDPS – Stun During Page Send has been introduced. SDPS ensures that vMotion will not fail due to memory copy convergence issues. Previously, vMotion might fail if the virtual machine modifies memory faster than it can be transferred. SDPS will slow down the virtual machine to avoid such a case

vSphere 5.1
- vMotion without shared storage – vMotion can now migrate virtual machines to a different host and datastore simultaneously. Also, the storage device no longer needs to be shared between the source host and destination host

So, what are the new vMotion enhancements in vSphere 6? There are three major enhancements:

vMotion across vCenters

Simultaneously change compute, storage, networks and management
Leverage vMotion with unshared storage
Support local, metro and cross-continental distances

Requirements for vMotion across vCenter Servers:

Supported only starting with vSphere 6
Same Single-Sign-On domain for destination vCenter Server instance using UI; different SSO domain possible if you use the API
250 Mbps network bandwidth per vMotion operation

vMotion across vSwitches aka x-vSwitch vMotion)

x-vSwitch vMotion is fully transparent to the Guest.
Required L2 VM network connectivity
Transfers Virtual Distributed Switch port metadata
Works with a max of virtual switches

Long-distance vMotion

Allows cross-continental vMotion with up to 100ms latency (Round-trip delay time)
Does not require vVols
Use Cases:
- Permanent migrations
- Disaster avoidance
- SRM/DA testing
- Multi-site load balancing
vMotion network will cross L3 boundaries
NFC network, carrying cold traffic, will be configurable

What’s required for Long-distance vMotion?

If you use vMotion across multiple vCenters, then vCenters must connect via L3
VM network:
- L2 connection
- Same VM IP address available at destination
vMotion network:
- L3 connection
- Secure (dedicated or encrypted)
- 250 Mbps per vMotion operation
NFC network:
- Routed L3 through Management Network or L2 connection
- Networking L4-L7 services manually configured at destination

Long-distance vMotion supports Storage Replication Architectures

Active-active replicated storage appears as shared storage to the Vm
Migration over active-active replication is classic vMotion
VVOLs are required for geo distances

vSphere 6 Fault Tolerance

VMware vSphere Fault Tolerance (FT) provides continuous availability for applications on virtual machines.

FT creates a live clone instance of a virtual machine that is always up-to-date with the primary virtual machine. In the event of a host/hardware failure, vSphere Fault Tolerance will automatically trigger a failover, ensuring zero downtime and data loss. VMware vSphere Fault Tolerance utilized heartbeats between the primary virtual machine and the live clone to ensure availability. In case of a failover, a new live clone will be created to deliver continuous protection for the VM.

The VMware vSphere Fault Tolerance FAQ can be found here.

On a first glance, VMware vSphere Fault Tolerance seems like a great addition to vSphere HA Clusters to ensure continuous availability within your VMware vSphere environment.

However, in VMware vCenter Server 4.x and 5.x only one virtual CPU per protected virtual machine is supported. If your VM uses more than one virtual CPU, you will not be able to enable VMware vSphere Fault Tolerance on this machine. Obviously, this is an enormous short-come and explains why many companies are not using VMware’s FT capability.

So what’s new with vSphere 6 in regards to Fault Tolerance?

Up to 4 virtual CPUs per virtual machine
Up to 64 GB RAM per virtual machine
HA, DRS, DPM, SRM and VDS are supported
Protection for high performance multi-vCPU VMs
Faster check-pointing to keep primary and secondary VM in sync
VMs with FT enabled, can now be backed up with vStorage APIs for Data Protection (VADP)

With the new features in vSphere 6, Fault Tolerance will surely get much more traction, since you can finally enable FT on VMs with up to 4 vCPUs.

vSphere 6 NFSv4.1

As most of us know, VMware supports many storage protocols – FC, FCoE, iSCSI and NFS.
However, only NFSv3 was supported in vSphere 4.x and 5.x. NFSv3 has many limitations and shortcomings like:

No multipathing support
Proprietary advisory locking due to lack of proper locking from protocol
Limited security
Performance limited by the single server head

Starting with vSphere 6, VMware introduces NFSv4.1. Compared to NFSv3, v4.1 brings a bunch of new features:

Session Trunking/Multipathing
- Increased performance from parallel access (load balancing)
- Better availability from path failover
Improved Security
- Kerberos, Encryption and Signing is supported
- User authentication and non-root access becomes available
Improved Locking
- In-band mandatory locks, no longer proprietary advisory locking
Better Error Recovery
- Client and server not state-less any more, with recoverable context
Efficient Protocol
- Less chatty, no file lock heartbeat
- Session leases

Note: NFSv4.1, does not support SDRS, SIOC, SRM and vVOLs.

Supportability of NFSv3 and NFSv4.1:

NFSv3 locking is not compatible with NFS 4.1
- NFSv3 uses propriety client side locking
- NFSv4.1 uses server side locking
Single protocol accessforadatastore
- Use either NFSv3 or NFSv4.1 to mount the same NFS share across all ESXi hosts within a vSphere HA cluster
- Mounting one NFS share as NFSv3 on one ESX host and the same share as NFSv4.1 on another host is not supported!

Kerberos Support for NFSv4.1:

NFSv3 only supports AUTH_SYS
NFSv4.1 support AUTH_SYS and Kerberos
Requires Microsoft AD for KDC
Supports RPC header authentication (rpc_gss_svc_none or krb5)
Only supports DES-CBC-MD5
- Weaker but widely used
- AES-HMAC not supported by many vendors

Implications of using Kerberos:

NFSv3 to NFSv4.1
- Be aware of the uid, gid on the files
- For NFSv3 the uid & gid will be root
- Accessing files created with NFSv3 from NFSv4.1 – Kerberized client will result in permission denied errors
Always use the same user on all hosts
- vMotion and other features might fail if two hosts use different users
- Host Profiles can be used to automate the usage of users

Remove VIB – Device or resource busy

Today, I was playing around with some vSphere Installation Bundles (VIB) and ran into an issue when I tried to remove vib:

esxcli software vib remove -n vmware-esx-KoraPlugin 
 [InstallationError]
 Error in running rm /tardisks/:
 Return code: 1
 Output: rm: can't remove '/tardisks/': Device or resource busy

Even adding the –force attribute did not help in this situation.

The following workaround seemed to be working for me:

Stophostd on theESXi host – this will be non-disruptive to yourVMs
```
/etc/init.d/hostd stop
```
Runlocalcli to uninstallVIB
```
localcli software vib remove -n vmware-esx-KoraPlugin
```
Note: We need to run localcli, since esxcli is not available if hostd is stopped
Starthostd on theESXi host
```
/etc/init.d/hostd start
```

Verify thatvmware-esx-KoraPlugin no longer shows up

esxcli software vib list | grep  vmware-esx-KoraPlugin

You should no longer see the VIB installed on your ESXi host.
Localcli is not widely spread within the community and mainly used by VMware’s Technical Support. It provides more troubleshooting capabilities, even if hostd is not running.

Unmount VMware Datastore – Device Busy

Welcome back, I hope everyone had some time to relax and spend the christmas holidays with their families. I was lucky enough to have some time off and play with my lab.

After playing around with some newly deployed NFS datastores, I tried to unmount them and I got a device busy error and on the CLI I’ve got Sysinfo error on operation returned status : Busy. Please see the VMkernel log for detailed error information.

Let me show you the steps I ran through:

Mounted a newNFSdatastore throughtheESXCLI

esxcli storage nfs add  --host=nas-ip --share=share-name --volume-name=volume_name

List all NFS shares
1. ```
~ # esxcli storage nfs list
```
Verify that all VMs on this datastore are either powered off or have been vMotioned to another datastore

Trytounmountthedatastore

esxcli storage nfs remove -v b3c1
Sysinfo error on operation returned status : Busy. Please see the VMkernel log for detailed error information

Looking through the vmkernel.log, doesn’thelpmuch either. The only messageprintedthere is

2015-01-12T23:10:09.357Z cpu2:49350 opID=abdf676b)WARNING: NFS: 1921: b3c1 has open files, cannot be unmounted

After some searching, I found this article on VMware
Basically, the issue seems to be that vSphere HA Datastore Heartbeats are enabled on this datastore, which is causing the device to be busy.

The solution for this problem is pretty simple. Open your vSphere Client and edit the vSphere HA settings, after selecting your vSphere HA Cluster. Within the settings, make sure to set the vSphere HA heartbeats to Use Datastores only from the specified list and deselect your datastore, which you try to unmount.

After changing the setting, I was able to successfully unmount the NFS share with the following command esxcli storage nfs remove -v datastore_name