Jumbo Frames – Do It Right

Configuring jumbo frames can be such a pain if it doesn’t get done properly. Over the last couple of years, I have seen many customers having mismatched MTUs due to improperly configured jumbo frames.. If it is done properly, jumbo frames can increase the overall network performance between your hosts and your storage array. It is recommendable to use it if you have 10GbE connection to your storage device. However, if it is not configured properly, jumbo frames quickly become your worst nightmare. I have seen it causing performance issues, drops of connection as well as ESXi hosts losing storage devices.

Now, we all know what kind of issues jumbo frames can cause as well as it is advisable to use it if you have a 10GbE connection to your storage device. However, let’s discuss some details about jumbo frames:

  • Larger than 1500 bytes
  • Many devices support up to 9216 bytes
    • Refer to your switch manual for the proper setting
  • Most people will refer to jumbo frame as a MTU 9000 bytes
  • It often causes a MTU mismatch due to misconfiguration

 

Below’s steps offer guidance on how to setup jumbo frame properly:

Note: I recommend to schedule a maintenance window for this change!

On your Cisco Switch:

Please take a look at this Cisco page which lists the syntax for most of their switches.
Once the switch ports have been configured properly, we can go ahead and change the networking settings on the storage device.

On Nimble OS 1.4.x:

  1. Go to Manage -> Array -> Edit Network Addresses
  2. Change the MTU of your data interfaces from 1500 to jumbo

nimble_1-4-X_jumbo

On Nimble OS 2.x:

  1. Go to Administration -> Network Configuration -> Active Settings -> Subnets
  2. Select your data subnet and click on edit. Change the MTU of your data interfaces from 1500 to jumbo.

NimbleOS2X_jumbo

 

On ESXi 5.x:

  1. Connect to your vCenter using the vSphere Client
  2. Go to Home -> Inventory -> Hosts and Clusters
  3. Select your ESXi host and click on Configuration -> NetworkingESXi_networking
  4. Click on Properties of the vSwitch which you want to configure for jumbo framesvSwitch_properties
  5. Select the vSwitch and click on Edit.
  6. Under “Advanced Properties”, change the MTU from 1500 to 9000 and click ok.vSwitch_Jumbo
  7. Next, select your vmkernel port and click on Edit.
  8. Under “NIC settings” you can change the MTU to 9000.vmk_jumbo
  9. Follow step 7 & 8 for all your vmkernel ports within this vSwitch.

After you changed the settings on your storage device, switch and ESXi host, log in to your ESXi host via SSH and run the following command to verify that jumbo frames are working from end to end:

vmkping -d -s 8972 -I vmkport_with_MTU_9000 storage_data_ip

If the ping succeeds, you’ve configured jumbo frames correctly.

Windows 2012 with e1000e could cause data corruption

A couple of days ago, I spent two hours setting up two Windows Server 2012 VMs on my ESXi 5.1 cluster and tried to get some performance tests done. When copying multiple ISOs across the network between those two VMs, I received an error that neither of my 5 ISOs could be opened on the destination.

After checking the settings of my VMs, I saw that I used the default e1000e vNICs. Apparently, this is a known issue with Windows Server 2012 VMs using e1000e vNICs, running on top of VMware ESXi 5.0 and 5.1.

The scary part is, the e1000e vNIC is the default vNIC by VMware when creating a new VM. This means, if you don’t carefully select the correct vNIC  type when creating your VM, you could potentially run into the data corruption issue.
The easiest workaround is to change the vNIC type from e1000e to e1000 or VMXNET3. However, if you use DHCP, your VM will get a new IP assigned as the DHCP server will recognize the new NIC.

If you prefer not to change the vNIC type, you might just want to disable TCP segmentation offload on the Windows 2012 VMs.
There are three settings which should be changed:

IPv4 Checksum Offload

IPv4_Checksum_Offload

 

Large Send Offload (IPv4)

Large_Send_Offload_IPv4

 

TCP Checksum Offload (IPv4)

TCP_Checksum_Offload

 

Further details can be found in VMware KBA 2058692.

 

SCSI UNMAP – VMware ESXi and Nimble Storage Array

Starting with VMware ESXi 5.0, VMware introduced the SCSI UNMAP primitive (VAAI Thin Provisioning Block Reclaim) to their VAAI feature collection for thin provisioned LUNs. VMware even automated the SCSI UNMAP process, however, starting with ESXi 5.0U1, SCSI UNMAP became a manual process. Also, SCSI UNMAP needs to be supported by your underlying SAN array. Nimble Storage started to support SCSI UNMAPs with Nimble OS version 1.4.3.0 and later.


What is the problem?

When deleting a file from your VMFS5 datastore (thin provisioned), the usage reported on your datastore and the underlying Nimble Storage volume will not match. The Nimble Storage volume is not aware of any space reclaimed within the VMFS5 datastore. This could be caused by a single file like an ISO but also be due to the deletion of a whole virtual machine.

What version of VMFS is supported?

You can run SCSI UNMAPs against VMFS5 and upgraded VMFS3-to-VMFS5 datastores.

What needs to be done on the Nimble Storage array?

SCSI UNMAP is supported by Nimble Storage arrays starting from version 1.4.3.0 and later.
There is nothing to be done on the array.

How do I run SCSI UNMAP on VMware ESXi 5.x?

  1. Establish a SSH session to your ESXi host which has the datastore mounted.
  2. Run esxcli storage core path list | grep -e ‘Device Display Name’ -e ‘Target Transport Details’  to get a list of volumes including the EUI identifier. list eui for scsi unmap
  3. Run VAAI status get to verify if SCSI UNMAP (Delete Status) is supported for the volume.
    esxcli storage core device vaai status get -d eui.e5f46fe18c8acb036c9ce900c48a7f60
    eui.e5f46fe18c8acb036c9ce900c48a7f60
    VAAI Plugin Name:
    ATS Status: supported
    Clone Status: unsupported
    Zero Status: supported
    Delete Status: supported
  4. Change to the datastore directory.
    cd /vmfs/volumes/
  5. Run vmkfstools to trigger SCSI UNMAPs.
    vmkstools -y
    For ESXi 5.5: Use 
    esxcli storage vmfs unmap -l
    Note: the value for the percentage has to be between 0 and 100. Generally, I recommend using 60 to start with.
  6. Wait until the ESXi host returns “Done”.

 

Further details for ESXi 5.0 and 5.1 can be found here  and for ESXi 5.5, please click here.

 

 

Crucial Data In Your VMware ESXi 5 Log Files

As an Escalation Engineer, part of my daily work is reviewing log files of various systems and vendors. In my first blog post, I would like to show which VMware ESXi 5 log files are most relevant for troubleshooting storage and networking related problems.

All current ESXi 5 logs are located under /var/log and as they rotate, they’ll be available under /scratch/logs

 

vmware_esxi_logs

/var/log/vmkernel.log:

  • VMkernel related activities, such as:
    • Rescan and unmount of storage devices and datastores
    • Discovery of new storage like iSCSI and FCP LUNs
    • Networking (vmnic and vmks connectivity)

/var/log/vmkwarning.log:

  • Extracted warning and alert messages from the vmkernel.log

/var/log/hostd.log::

  • Logs related to the host management service
  • SDK connections
  • vCenter tasks and events
  • Connectivity to vpxa service, which is the vCenter agent on the ESXi server

/var/log/vobd.log:

  • VMkernel observations
  • Useful for network and performance issues

Also, if you have a VM which is affected in particular, it might be worth looking into the vmware.log which is stored with the Virtual Machine. You can find the log under /vmfs/volumes/datastore_name/VM_name/vmware.log.

For the location of ESXi 3.5 and 4.x log files, can be found here.

Silicon Valley VMUG – Double-Take & VSAN

Today, I attended my first Silicon Valley VMUG at the Biltmore Hotel and Suites in San Jose, CA. Vision Solutions presented their software DoubleTake which provides real-time high availability. Joe Cook, Senior Technical Marketing Manager at VMware, provided an overview of VSAN and its requirements.

VMUG_Silicon_Valley

I took a couple of notes for both presentations and summarized the most important points below:

Double-Take Availability

  • Allow migration P2V, V2P, P2P, V2V cross-hypervisor
  • Provides HW and Application independent failover
  • Monitors availability and provides alerting functionality by SNMP and Email
  • Supports VMware 5.0 and 5.1, as well as Microsoft Hyper-V Server and Role 2008 R2 and 2012
  • Full server migration and failover only available for Windows. Linux version will be available in Q4.

Double-Take Replication

  • Uses byte-level replication which continuously looks out for changes and transfers them
  • Either real-time or scheduled
  • Replication can be throttled

Double-Take Move

  • Provides file and folder migration
  • Does NOT support mounted file shares. Disk needs to show as a local drive

 

VMware Virtual SAN (VSAN) by Joe Cook

Hardware requirements:

  • Any Server on the VMware Compatibility Guide
  • At least 1 of each
    • SAS/SATA/PCIe SSD
    • SAS/NL-SAS/SATA HDD
  • 1Gb/10Gb NIC
  • SAS/SATA Controllers (RAID Controllers must work in “pass-through” or RAID0
  • 4GB to 8GB (preferred) USB, SD Cards

Implementation requirements:

  • Minimum of 3 hosts in a cluster configuration
  • All 3 host must contribute storage
  • vSphere 5.5 U1 or later
  • Maximum of 32 hosts
  • Locally attached disks
    • Magnetic disks (HDD)
    • Flash-based devices (SSD)
  • 1Gb or 10Gb (preferred) Ethernet connectivity

Virtual SAN Datastore

  • Distributed datastore capacity, aggregating disk groups found across multiple hosts within the same vSphere cluster
  • Total capacity is based on magnetic disks (HDDs) only.
  • Flash based devices (SSDs) are dedicated to VSAN’s caching layer

Virtual SAN Network

  • RequiredadedicatedVMkernel interface for Virtual SAN traffic
    • Used for intra-cluster communication and data replication
  • Standard and Distributed vSwitches are supported
  • NIC teaming – used for availability not for bandwidth
  • Layer 2 Multicast must be enabled on physical switches

Virtual SAN Scalable Architecture

  • VSAN provides scale up and scale out architecture
    • HDDs are used for capacity
    • SSDs are used for performance
    • Disk Groups are used for performance and capacity
    • Nodes are used for compute capacity

Additional information

  • VSAN is a cluster level feature like DRS and HA
  • VSAN will be deployed, configured and manages through the vSphere Web Client only
  • Hands-on labs are available here