The Situation:
1. We have two SuperMicro servers running ESXi 5.1, and do cross-backups each night between the servers using vkmfstools. Each server has Windows 2003 Server VM running Windows Services for UNIX Version 3.5, each with a system virtual hard drive, and a data virtual hard drive, which is an NTFS compressed volume that is shared NFS. All VMs are thin. This has combination has worked fine since 2008. I expanded the NFS volume on Server2 from 200 GB to 500 GB. I unmounted and remounted it afterward, and it shows 499 GB.
The Problem:
1. When Server2 backs up fine to the NFS volume on Server1, the clones go fine. However, when Server1 backs up to Server2, the largest VM (55GB thin and 80GB declared, fails at the 90% point on each attempt, even if I clear space, which it shouldn't need as the drive only has about 51GB of data on a 500 GB NFS volume, and the ESXi VMware Client shows the same amount of free space no matter which server I check it from.
Destination disk format: VMFS thin-provisioned
Cloning disk '/vmfs/volumes/datastore1/my_vm/my_vm.vmdk'...
Clone: 90% done.Failed to clone disk: There is not enough space on the file system for the selected operation (13).
All of the other VMs after it backup fine. It only fails on the larger VM, but I can back it up fine to a local directory.
2. When I check out the size of the virtual hard drive used to hold the NFS volume, it shows a provisioned size of 500 GB with an on disk size of 398 GB, which is odd, because the drive was only 200 GB before it was expanded because I couldn't backup the larger VM anymore. However, I never tried to copy anywhere near that much data to it that it would inflate it like that. It also only shows it using 51 GB, which is too small, even compressed.
Thoughts: Windows sees the large volume, the ESXi sees the large volume, but the usable space must not be any larger. I could just remove the NFS sharing from the volume and re-add it, but the size on disk of the thin vmdk shouldn't be anywhere near that large and the contents should be more than 51 GB, which made me wonder if there was something fishy with the .vmdk.
Solution:
- Went into settings and removed virtual hard drive 2 (this acts like unplugging a USB from the Windows server)
- Made another virtual hard disk and saved the settings.
- Removed the virtual hard drive I just made
- Logged in SSH and renamed the nas-2 directory to nas-2.old
- Made a new nas-2 directory
- vmkfstools -i admin2_1.vmdk -d thin ./nas-2/nas-2.vmdk
- deleted admin2_1.vmdk
- added the new nas-2.vmdk virtual hard drive (shows provisioned 500 GB, space on disk 0)
- Inside 2003 Server I initialized and did a quick format on the drive, and shared it as nas-2 again and added NFS (space on disk is now 82 MB)
- I kicked off the GhettoVCB backup from the 2003 Task Scheduler for a repeat of last night backup, and it worked. No errors, and the size on disk for the .vmdk stands at 201 GB, which is just about right. I expanded it from 200 because I couldn't quite backup the large VM anymore.
Lessons learned:
- The best way to expand a virtual hard drive in ESXi is to start with a new one.
- If the size of the vmdk doesn't make sense, there is something fishy with the vmdk.
- You can rely on vmkfstools for straight answers.