0

Troubleshooting: Missing file after vMotion attempt

I apologise in advance if this doesn’t make much sense to you. It took me a while to unravel what was wrong and I still don’t know why.
Update 5 was being applied to a 3.5 cluster and one of the hosts was being placed into maintenance mode. Most of the VMs were migrated to other hosts but one failed part way through and was powered off. At first I thought that one of the two hosts had a grumpy moment but the VM then refused to power on again and the following message was shown:

Ooops. I had a good rummage in the hostd.log file on the host that attempted to power on the VM and found the following messages:

[text][2010-01-27 12:33:09.101 ‘BaseLibs’ 133225392 info] DISKLIB-VMFS : "/vmfs/volumes/4a081291-4fb12f12-bef0-001e0bcdc996/myvm/mydisk_1-000001-delta.vmdk" : open successful (21) size = 16106127360, hd = 0. Type 8
[2010-01-27 12:33:09.103 ‘BaseLibs’ 133225392 info] DISKLIB-VMFS : "/vmfs/volumes/4a081291-4fb12f12-bef0-001e0bcdc996/myvm/mydisk_1-000001-delta.vmdk" : closed.
[2010-01-27 12:33:09.151 ‘BaseLibs’ 133225392 info] SNAPSHOT: Unable to find all files for ‘/vmfs/volumes/4992b455-063c9aec-5e36-001e0bcdc996/mytemplate/mydisk_1.vmdk’
[2010-01-27 12:33:56.219 ‘vm:/vmfs/volumes/4a081291-4fb12f12-bef0-001e0bcdc996/myvm/myvm.vmx’ 20868016 info] Question info: VMware ESX Server cannot find the virtual disk "/vmfs/volumes/4992b455-063c9aec-5e36-001e0bcdc996/mytemplate/mydisk_1.vmdk". Please verify the path is valid and try again.
Cannot open the disk ‘/vmfs/volumes/4a081291-4fb12f12-bef0-001e0bcdc996/myvm/mydisk_1-000001.vmdk’ or one of the snapshot disks it depends on.
[2010-01-27 12:33:56.240 ‘ha-eventmgr’ 20868016 info] Event 81 : Message on myvm on myhost.local in ha-datacenter: VMware ESX Server cannot find the virtual disk "/vmfs/volumes/4992b455-063c9aec-5e36-001e0bcdc996/mytemplate/mydisk_1.vmdk". Please verify the path is valid and try again.
Cannot open the disk ‘/vmfs/volumes/4a081291-4fb12f12-bef0-001e0bcdc996/myvm/mydisk_1-000001.vmdk’ or one of the snapshot disks it depends on.[/text]

(I’ve sanitised this log file snippet so the names aren’t accurate but they are consistent with the issue that I discovered.)

Firstly the logfile shows a delta file. That means that the VM is running from a snapshot. This didn’t show up beforehand and the Snapshot Manager did not show it. Most likely VCB (or the backup software using it) didn’t clean up after itself. Browsing the datastore where the VM resides showed that the snapshot was nearly two weeks old.

Secondly, you can see the issue in the third line onwards. It looks like the base disk file has gone missing. However reading more closely it looks like the base disk is on a different datastore and actually part of a different VM! For some reason, when this VM was deployed from a template it retained one of the template’s disks as its own. Looking into that datastore I could see the mydisk_1-flat.vmdk file but there was no mydisk_1.vmdk file. (Just to explain, the former is the actual disk file. 15Gb in size and containing the VM’s data. The latter file is a small text file and contains configuration data. I’ll call it the disk descriptor file.) So, it was a missing disk descriptor file that was the issue. I did a quick google and didn’t find anything immediately helpful so I ran through the following steps:

  1. Copied the mydisk_1-flat.vmdk file from the template VM’s datastore to the broken VM’s datastore.
  2. Knowing that the disk was supposed to be 15Gb in size, I created a quick VM with a single 15Gb disk and copied the disk descriptor file to the broken VM’s datastore.
  3. Next I made a note of the parentCID from the mydisk_1-000001.vmdk disk descriptor file. This value (from the snapshot delta’s disk descriptor file) is the ID of the parent disk.
  4. [text]# Disk DescriptorFile
    version=1
    CID=de54d5dd
    parentCID=1bb73626
    createType="vmfsSparse"
    parentFileNameHint="/vmfs/volumes/4992b455-063c9aec-5e36-001e0bcdc996/mytemplate/mydisk_1.vmdk"
    # Extent description
    RW 31457280 VMFSSPARSE "mydisk_1-000001-delta.vmdk"

    # The Disk Data Base
    #DDB

    ddb.toolsVersion = "7302"[/text]

  5. I also modified the file above to correct the parentFileNameHint value so that it referred to the local datastore and became:
  6. [text]parentFileNameHint="mydisk_1.vmdk"[/text]

  7. I modified the newly created 15Gb disk descriptor file with the CID matching the parent value from step 3. And made sure that the Extent description was correct.
  8. [text]# Disk DescriptorFile
    version=1
    CID=1bb73626
    parentCID=ffffffff
    createType="vmfs"

    # Extent description
    RW 31457280 VMFS "mydisk_1-flat.vmdk"

    # The Disk Data Base
    #DDB

    ddb.virtualHWVersion = "4"
    ddb.uuid = "60 00 C2 91 07 97 77 cb-87 9e 5d 9f 95 95 2c 46"
    ddb.geometry.cylinders = "1958"
    ddb.geometry.heads = "255"
    ddb.geometry.sectors = "63"
    ddb.adapterType = "lsilogic"[/text]

  9. I saved the file as mydisk_1.vmdk

The VM then powered on successfully. I checked the disks after successful boot up and they’re there.

Now all that remains is to sort out the snapshot. It still doesn’t register in snapshot manager.

This has been a bit of a hack but it worked. And before anyone comments, I just modified my google search terms and found the answer in a VMware KB – first hit! Recreating a missing virtual disk (VMDK) header/descriptor file

Beginning with Storage vMotion

Storage vMotion was introduced with ESX 3.5 about a year ago but for various reasons I’ve never had the opportunity or need to use it until now. The project that required its use has been knocking around for a few months now. It started as a single ESX server that was to be used to build a small number of VMs for use by my customer as an HR system. Three months or so after putting in that first host I was finally called back to add another and setup VirtualCenter. Installing ESX, VC and doing all of the necessary configuration was straightforward (mostly), I’ve done it a number of times before. Migrating the live VMs from their location on local storage on the first host to a SAN LUN was the new bit, the bit that required Storage vMotion. It didn’t go smoothly all the way.

vMotion (the movment of a running VM from one host to another without downtime) is an established technology. It works and it’s built into the VI client. Storage vMotion functionality doesn’t have a button or context menu entry for it by default. It needs to be invoked from a command line or a plugin needs to be installed. I investigated two options, tried them both and still had a few issues.

Simple SVMotion GUI by Alexander Gaiswinkler

In the hunt for svmotion utilities, this is the one that I found first. It can be downloaded from a link in a VMware communities thread here and requires the Remote CLI for VMware to be installed. Installation of the SVMotion GUI isn’t too difficult. It is merely a zip file containing a perl script and a Windows executable. The perl script has to be moved to the bin directory of the Remote CLI installation. The executable can be placed anywhere and just exeuted to launch the GUI.

Simple SVMotion GUI

Simple SVMotion GUI

The GUI is quite straightforward to use although there are a couple of problems that I encountered. The first was with the datacenter name. VirtualCenter will allow you to create a datacenter with a name that contains certain characters that the Simple SVMotion GUI will not work with. Spaces for instance. The datacenter name had been selected as “xxx – London”. The GUI was unable to connect and return a list of datastores when this was entered so I had to rename the datacenter in VirtualCenter to be just “xxx”.

The second issue I hit was with the source datastores. When ESX was installed a VMFS partition was created automatically with the remainder of the disk space after all of the other partitions had been sized. ESX creates this datastore with a name of “[hostname:storage1]” or something like that. The colon caused a problem with the svmotion.pl script however so I had to rename the datastore to “[hostname_storage1]”.

With those two issues sorted I was able to svMotion two of the five VMs stored locally on one host to the SAN space. The other three would not go but that’s a story for later.

Simple SVMotion GUI is certainly simple but it does it’s job. There are limitations though.

vip-svmotion by Andrew Kutz

This is actually a plugin for the VI client and looks a lot like what you would expect VMware to create themselves. Who knows, the next release of VirtualCenter and the VI client may include it. The plugin can be downloaded from SourceForge here and the installation is straightforward. In fact, once installed, the plugin is automatically enabled within the VI client.

Using it is pretty simple, just right click on a VM and select Migrate Storage… from the context menu.

vip-svmotion plugin context menu entry

vip-svmotion plugin context menu entry

This opens a window that shows the currently configured storage and allows a new location to be selected. My apologies, I would included a screenshot with lots of detail but I have to be careful about pasting customer specific data into a public blog!

vip-svmotion properties

vip-svmotion properties

The top for disk icons are the local storage on the two hosts. The VM (and its single Hard Disk) is stored on the second local VMFS of the first host. Relocating the storage is simply a case of dragging the VM (the grey box) to the SAN LUN (the bottom disk icon) and applying the change.

This is more like how svMotion should be done. It’s all kept in the VI client and is easy to follow.

Other svMotion Issues

I mentioned that I had some other issues with svMotion. Three of the VMs that I had to move would not go. In case anyone else runs into these problems, I will describe them here.

1. Floppy drive

A handful of the VMs had actually been migrated from an old ESX 2.x environment and the default hardware configuration had probably not been changed when they were first setup. The new ESX hosts (HP DL585s) did not have floppy drives fitted yet the .vmx file for two of the VMs were trying to map virtual floppy drives to a real one on the host. For some reason this was causing the following message to appear when svMotion was attempted:

Device ‘Floppy Drive 1’ has a backing type that is not supported. This is a general limitation of the host.

VirtualCenter won’t let me remove the floppy drive while the VM is running and it also suggested that the .vmx file had been edited by hand at some point. So, to fix this I’m going to have to shut down the VM which is unfortunate as that was the whole point of using svMotion in the first place.

Uneditable device

Uneditable device

Once I got change approval to shut the VM down, I downloaded the .vmx file and changed the floppy configuration to remove the floppy drive.

floppy0.present = “true”

became

floppy0.present = “false”

The VM had to be removed from inventory and then re-added with the modified .vmx file but after that Storage vMotion worked fine and the VM booted without issue.

2. Independent Disks

Again I believe that this is a VM from an older 2.x host. This time the hard disk is set to be in independent-persistent mode. I will have to shut down the VM and change the setting to allow svMotion to work. The error message that VMware provides is pretty clear as to what the problem is:

A general system error occurred: The virtual machine has virtual disk in indepedent-persistent mode that prevents migration.