I’m currently working with a customer and we are leveraging HCX’s ability to vMotion VMs between their legacy datacenter and a brand new VCF on VxRail platform. HCX doesn’t require Enhanced Linked Mode between the vCenter servers and can work back to vSphere 5.0 U3 (caveat being you require vCenter Server 5.1+ as HCX requires SSO). You can also use it to extend Layer 2 networks so you can vMotion a VM between datacenters on the same network as an example, which makes it an incredibly versatile and powerful tool.
The way that HCX vMotion works is that the Interconnect Appliance (IX) creates a dummy host on the source and destination clusters where the Service Mesh is deployed. The vMotion starts by the VM being moved at the source side to this dummy host and in the background the data is copied to the destination dummy host, and in turn to the final destination host. There’s a lot of cleverness behind the scenes however the basic premise of a standard vMotion applies.
We encountered an error where a VM wouldn’t migrate and the vMotion process failed pretty much straight away. In the vSphere client you’ll see something like this error message against the VM:
I did some basic troubleshooting, first I checked to see if the service mesh was working and it was, in fact we had already migrated several VMs from that cluster which suggested to me that the issue was with the VM. I then tried to clone the VM and vMotion the clone (with network disconnected of course!) and that succeeded. This further cemented my suspicion that it was the VM itself. I checked if it was protected by SRM or vSphere Replication and it wasn’t. It also wasn’t part of any DRS rules which can cause a HCX vMotion to fail (with a slightly different but more explanatory error message). I then checked the VM config, VMX file and VM directory but didn’t spot anything obvious.
Running out of things to check I logged a support call on behalf of the customer and one of our HCX engineers reached out to me asked to check for the presence of core dump files in the VM folder. The reason is that there is a bug with HCX (up to version R141 at the time of this post) where if there is a core dump file present in the VM directory then a HCX vMotion will fail. I hadn’t thought to check this because it usually isn’t an issue for a regular vMotion within the cluster as any core dump files are simply moved as part of the VMs configuration. I checked the directory of one of the problem VMs and sure enough, there was a core dump file present, in this case vmware64-core0.gz
I simply deleted this file and afterwards I was able to proceed with the HCX vMotion without any errors.