Well, that one objective dealt with. Just another 30 odd to go.
Don’t forget that reading is not everything, you need to have done as much of this stuff as possible.
There are links to VMware’s product documentation throughout, be sure to be familiar with it as the language and methods that they use there should be the same as you’ll find in the VCAP exam – I hope.
To put an application into a virtual environment you must first understand its I/O requirements to make sure that it will perform adequately on the storage that you have configured. This is very much like the process of determining how much CPU and Memory resource an application will need – it’s a necessary step. Of course you can miss it out and in the majority of cases that won’t be an issue but to keep your finger on capacity management and to avoid possible problems it is best to follow a defined a repeatable set of steps.
It’s also useful to be able to profile the I/O workload of applications already on a virtual platform.
So, what do we need to know? Well although I have a fairly well-rounded set of skills, I am more of a Microsoft guy than a Unix guy. As such I’m better at looking at I/O on Windows systems than anything else. Obviously this is something that I need to address a bit but it’s not exactly in the scope of this post.
Kevin Kline, a SQL MVP, has a short video hosted here that is targeted at physical SQL servers – the kind of load you’re likely to want to do this with.
There’s also a good MS Technet article about determining the I/O requirements for Exchange 2003 that’s a useful read.
Next up, I’d suggest having a look at SWAT. It’s Sun’s java based I/O monitoring tool.
Also, get to know vscsistats. Scott Drummonds (a guru in this area) has a VMware communities page that is a must read. Next, read the useful posts by Duncan Epping (YellowBricks), Gabrie van Zanten (Gabe’s Virtual World) and Gabe’s other post on making the output data Excel friendly.
What is LUN masking? explains what LUN masking is in layman’s terms (in case you have a NAS only background). See Storage Masking? for the Yellow Bricks advice on LUN masking.
For an overview of PSA and commands, see VMware vSphere 4.1 PSA.
Also see the vSphere CLI guide, vSphere Command-Line Interface Installation and Reference Guide.
VMFS Resignaturing is worth a read. See also the section on Managing Duplicate VMFS Datastores in the ESX Configuration Guide.
From the ESX Configuration Guide.
When you perform VMFS datastore management operations, vCenter Server uses default storage filters. The filters help you to avoid storage corruption by retrieving only the storage devices, or LUNs, that can be used for a particular operation. Unsuitable LUNs are not displayed for selection. You can turn off the filters to view all LUNs.
Before making any changes to the LUN filters, consult with the VMware support team. You can turn off the filters only if you have other methods to prevent LUN corruption.
- In the vSphere Client, select Administration > vCenter Server Settings.
- In the settings list, select Advanced Settings.
- In the Key text box, type a key.
Key Filter Name
config.vpxd.filter.vmfsFilter VMFS Filter
config.vpxd.filter.rdmFilter RDM Filter
config.vpxd.filter.SameHostAndTransportsFilter Same Host and Transports Filter
config.vpxd.filter.hostRescanFilter Host Rescan Filter
NOTE If you turn off the Host Rescan Filter, your hosts continue to perform
a rescan each time you present a new LUN to a host or a cluster.
- In the Value text box, type False for the specified key.
- Click Add.
- Click OK.
You are not required to restart the vCenter Server system.
Read Performance Characterization of VMFS and RDM Using a SAN. It may be for ESX 3.5 but still holds true. The conclusion from the document is:
VMware ESX Server offers two options for disk access management—VMFS and RDM. Both options provide clustered file system features such as user‐friendly persistent names, distributed file locking, and file permissions. Both VMFS and RDM allow you to migrate a virtual machine using VMotion. This study compares the performance characteristics of both options and finds only minor differences in performance. For random workloads, VMFS and RDM produce similar I/O throughput. For sequential workloads with small I/O block sizes, RDM provides a small increase in throughput compared to VMFS. However, the performance gap decreases as the I/O block size increases. For all workloads, RDM has slightly better CPU cost.
The test results described in this study show that VMFS and RDM provide similar I/O throughput for most of the workloads we tested. The small differences in I/O performance we observed were with the virtual machine running CPU‐saturated. The differences seen in these studies would therefore be minimized in real life workloads because most applications do not usually drive virtual machines to their full capacity. Most enterprise applications can, therefore, use either VMFS or RDM for configuring virtual disks when run in a virtual machine.
However, there are a few cases that require use of raw disks. Backup applications that use such inherent SAN features as snapshots or clustering applications (for both data and quorum disks) require raw disks. RDM is recommended for these cases. We recommend use of RDM for these cases not for performance reasons but because these applications require lower level disk control.
And read Use RDMs for Practical Reasons and Not Performance Reasons too.
There is a section in the ESX Configuration Guide that is relevent.
Obviously the choice of storage vendor and the underlying technologies play a part here but there are some general guidelines that apply regardless. VMware themselves have a short page on this which I have copied below:
Many of the best practices for physical storage environments also apply to virtual storage environments. It is best to keep in mind the following rules of thumb when configuring your virtual storage infrastructure:
Configure and size storage resources for optimal I/O performance first, then for storage capacity.
This means that you should consider throughput capability and not just capacity. Imagine a very large parking lot with only one lane of traffic for an exit. Regardless of capacity, throughput is affected. It’s critical to take into consideration the size and storage resources necessary to handle your volume of traffic—as well as the total capacity.
Aggregate application I/O requirements for the environment and size them accordingly.
As you consolidate multiple workloads onto a set of ESX servers that have a shared pool of storage, don’t exceed the total throughput capacity of that storage resource. Looking at the throughput characterization of physical environment prior to virtualization can help you predict what throughput each workload will generate in the virtual environment.
Base your storage choices on your I/O workload.
Use an aggregation of the measured workload to determine what protocol, redundancy protection and array features to use, rather than using an estimate. The best results come from measuring your applications I/O throughput and capacity for a period of several days prior to moving them to a virtualized environment.
Remember that pooling storage resources increases utilization and simplifies management, but can lead to contention.
There are significant benefits to pooling storage resources, including increased storage resource utilization and ease of management. However, at times, heavy workloads can have an impact on performance. It’s a good idea to use a shared VMFS volume for most virtual disks, but consider placing heavy I/O virtual disks on a dedicated VMFS volume or an RDM to reduce the effects of contention.
As far as vendor specific configuration goes, NetApp’s TR-3428 document is worth a read and maybe also this document from EMC. On the subject of EMC, Alan Renouf and Simon Seagrave ran a session at a recent London VMUG meeting that may also be of interest. Find out about it here.
There isn’t a single rule for this – there are more like thousands of rules! Basically have an idea of what workloads VMs are generating in terms of IO and try to balance them out but also bear in mind that write intensive loads will perform better on RAID 10 than on RAID 5 but RAID 10 uses more disks than RAID 5 does.
Whilst not specifically related to RAID and it talks about EMC storage, Optimal VM Placement offers some interesting thoughts and mentions the alarms that can be set in vCenter that are usueful to monitor problems:
- VM Disk Usage (KBps)
- Total Disk Latency (ms)
- VM Disk Abort
- VM Disk resets
Also, as a rule of thumb, if a server consistently generates a certain number of IOPS as either reads or writes on physical hardware, it will probably generate the the same on virtual hardware. So it follows that if you’d use RAID 10 for that physical server, you should use a RAID 10 LUN with the virtual server. It’s a common sense thing gained from experience really.
From How NPIV-Based LUN Access Works:
SAN objects, such as switches, HBAs, storage devices, or virtual machines can be assigned World Wide Name (WWN) identifiers. WWNs uniquely identify such objects in the Fibre Channel fabric. When virtual machines have WWN assignments, they use them for all RDM traffic, so the LUNs pointed to by any of the RDMs on the virtual machine must not be masked against its WWNs. When virtual machines do not have WWN assignments, they access storage LUNs with the WWNs of their host’s physical HBAs. By using NPIV, however, a SAN administrator can monitor and route storage access on a per virtual machine basis. The following section describes how this works.
NPIV enables a single FC HBA port to register several unique WWNs with the fabric, each of which can be assigned to an individual virtual machine. When a virtual machine has a WWN assigned to it, the virtual machine’s configuration file (.vmx) is updated to include a WWN pair (consisting of a World Wide Port Name, WWPN, and a World Wide Node Name, WWNN). As that virtual machine is powered on, the VMkernel instantiates a virtual port (VPORT) on the physical HBA which is used to access the LUN. The VPORT is a virtual HBA that appears to the FC fabric as a physical HBA, that is, it has its own unique identifier, the WWN pair that was assigned to the virtual machine. Each VPORT is specific to the virtual machine, and the VPORT is destroyed on the host and it no longer appears to the FC fabric when the virtual machine is powered off.
If NPIV is enabled, four WWN pairs (WWPN & WWNN) are specified for each virtual machine at creation time.When a virtual machine using NPIV is powered on, it uses each of these WWN pairs in sequence to try to discover an access path to the storage. The number of VPORTs that are instantiated equals the number of physical HBAs present on the host up to the maximum of four. A VPORT is created on each physical HBA that a physical path is found on. Each physical path is used to determine the virtual path that will be used to access the LUN.Note that HBAs that are not NPIV-aware are skipped in this discovery process because VPORTs cannot be instantiated on them.
Note: If a user has four physical HBAs as paths to the storage, all physical paths must be zoned to the virtual machine by the SAN administrator. This is required to support multipathing even though only one path at a time will be active.
That’s NPIV in a nutshell. For more detail and the requirements, read How to Configure NPIV on VMware vSphere 4.0.
DirectPath places some limitations on VMs and so is used with caution. Generally, any VM that uses DirectPath becomes tied to an ESX host – vMotion and DRS will not work.
DirectPath must first be enabled in the ESX host’s BIOS. As a consequence only certain systems support this. A PCI device can only be assigned to 1 VM at a time. That device cannot also be used by the host. A VM can have upto 2 directly connected devices.
The advantage that DirectPath gives is the ability for devices not directly supported by VMware to be attached to VMs. Also, by circumventing the virtualisation layer, greater performance can be achieved by a VM using a directly connected device. Typically DirectPath is used to assign high speed, dedicated NICs to high performance VMs. Other use cases include attaching locally attached USB devices to a VM.
Simon Long explains DirectPath well in VMware DirectPath I/O. An example of using USB devices can be seen in Enable USB Support for ESXi with VMDirectPath.
See also Configuration Examples for DirectPath for more detail and examples.