Instance Live Migration

  • by Xiao Hua Shen, Ying Tang May 26th, 2017
  • Tags:

    • nova

    • live migration

Beginning with IBM’s Bluemix Private Cloud 4.0 release, you can migrate a VM instance from one compute host to another while keeping the VM running. Live migration is useful when you do not want to interrupt the workload running on the VM.

Depending on your needs and the type of VM being migrated, you might also consider non-live migration and other types of live migration. See details about these migration types in the OpenStack document.

Before you begin, you should know:

  • Only the cloud_admin role can live migrate instances.
  • You can only reliably do volume-backed live migration, that is, live migration for instances that use volumes rather than ephemeral disks.
  • Do not live migrate a VM if it has a big workload. Live migration may time out if the CPU or memory usage on the VM is high.

Another type of live migration, block live migration, requires the disks to be copied from the source to the destination host. It takes more time, puts more load on the network, and has a much bigger impact on performance of the cloud.

Take the following steps to check whether your VM is volume-backed:

  1. Log on to the Horizon dashboard as the cloud_admin role.
  2. Under the Admin topic, expand System and click Instances. You can view a list of your instances with their details.
  3. Click on the name of the VM, and you will view the VM detail page with the Overview, Log, Console, and Action Log tabs.
  4. On the bottom of the Overview tab, if there is a description of the instance like the following, your VM is volume-backed:

     Volumes Attached
    
     Attached To
         <vm name> on /dev/vda
    

Live migration steps:

You can do live migration for your instance either with the Horizon dashboard or with the command line client.

Live migration using the Horizon dashboard:

  1. Log on to the Horizon dashboard as cloud_admin.
  2. Under the Admin topic, expand System, and click Instances.
  3. In the instance list table, find the source host of the VM in the Host column.
  4. In the Actions drop-down list beside the VM, click Live Migrate Instance, and you will see a pop-up window.
  5. In the pop-up window, check the following options and then click Summit to start the migration.

    • New Host: You can either use the default Automatically Schedule New Host option to determine the new host, or select a specific new host.
    • Disk Over Commit: Leave it unchecked.
    • Block Migration: Leave it unchecked.
  6. You can watch the migration progress on the top-right of the page.

    • If you see a message like Info: The instance is preparing the live migration to a new host, live migration has started. You can go to the next step to verify the result.
    • If you see an error message like Error: Failed to live migrate instance to host <New Host Name>, live migration has failed. See the Troubleshooting section for more information about fixing the problem.
  7. Back in the instance list table, check if the Host value is changed. If yes, live migration completed successfully. Otherwise, see the Troubleshooting section for more information about fixing the problem.

Live migration using the command line:

You have two options to do live migration with the command line: either with the OpenStack client (recommended, version 3.4.0 or higher) or the Nova client.

In an Openstack client, use the following syntax:

openstack server migrate <server id or name> # automatically scheduled to a host nova selects for you
openstack server migrate <server id or name> --live <new host name> # specify new host

In a Nova client, use the following syntax:

nova live-migration <server id or name> # automatically scheduled to a host nova selects for you
nova live-migration <server id or name> <new host name> # specify new host

The command line does not tell you if the migration completed successfully. Run the following commands before and after migration to compare whether the hypervisor hostname has changed. If the hostname changed, the status is active, and you can login to the VM the same as before, you have a successful live migration.

nova list --fields name,OS-EXT-SRV-ATTR:hypervisor_hostname,status
+--------------------------------------+--------------------------+--------------------------------------+-----------+
| ID                                   | Name                     | OS-EXT-SRV-ATTR: Hypervisor Hostname | Status    |
+--------------------------------------+--------------------------+--------------------------------------+-----------+
| 91be3c43-9cc3-4a6c-8e8c-307bee1670e9 | cx-volume-test           | compute2-p.blueboxgrid.com           | ACTIVE    |
| 8042d926-a6d4-4788-8c49-f2ba585d057e | lm_test                  | compute2-p.blueboxgrid.com           | ACTIVE    |
| f31c08bb-230c-4f4c-b27d-b120b3497c7a | lm_test                  | controller1-p.blueboxgrid.com        | ACTIVE    |
| ae755449-d7c3-4674-99c5-54d521634fea | test_instance1           | None                                 | ERROR     |
| 54f87e4f-5bb0-472f-a577-996b102d18bf | walter_vm_20170525090424 | compute2-p.blueboxgrid.com           | ACTIVE    |
+--------------------------------------+--------------------------+--------------------------------------+-----------+

Warning:

If you use the Nova command line and run the nova live-migration <server id or name> <new host name> command with a host name that does not exist, the VM will be moved into a Migrating but suspended status. If this happens, open a ticket for IBM Bluemix Private Cloud support team to have the issue resolved. The OpenStack client with the syntax openstack server migrate <server id or name> --live <new host name> does not have such an issue.

Troubleshooting

The following section contains a list of possible issues, their causes, and resolutions. If you still cannot resolve the issue, open a ticket for IBM Bluemix Private Cloud team.

Incompatible CPU

Cause: The live migration might fail because the destination host does not have the compatible CPU.

Resolution: You can use the Nova command line client to check the CPU information of the source host and the destination host, select another destination host with the compatible CPU type, and do live migration again.

For example, use the following command to get a list of current hypervisors:

nova hypervisor-list

For a specific hypervisor, use the following command to display its details:

nova hypervisor-show <hypervisor-hostname>

For example, instances on the following two hypervisors cannot be migrated from one hypervisor to the other.

Hypervisor 1:

| cpu_info_arch             | x86_64                                   |
| cpu_info_features         | ["pge", "avx", "clflush", "sep",         |
|                           | "syscall", "vme", "dtes64", "invpcid",   |
|                           | "tsc", "fsgsbase", "xsave", "vmx",       |
|                           | "erms", "xtpr", "cmov", "smep", "ssse3", |
|                           | "est", "pat", "monitor", "smx", "pbe",   |
|                           | "lm", "msr", "nx", "fxsr", "tm",         |
|                           | "sse4.1", "pae", "sse4.2", "pclmuldq",   |
|                           | "acpi", "fma", "tsc-deadline", "mmx",    |
|                           | "osxsave", "cx8", "mce", "de", "tm2",    |
|                           | "ht", "dca", "lahf_lm", "abm", "popcnt", |
|                           | "mca", "pdpe1gb", "apic", "sse", "f16c", |
|                           | "pse", "ds", "invtsc", "pni", "rdtscp",  |
|                           | "avx2", "aes", "sse2", "ss", "ds_cpl",   |
|                           | "bmi1", "bmi2", "pcid", "fpu", "cx16",   |
|                           | "pse36", "mtrr", "movbe", "pdcm",        |
|                           | "rdrand", "x2apic"]                      |
| cpu_info_model            | Haswell-noTSX                            |
| cpu_info_topology_cells   | 2                                        |
| cpu_info_topology_cores   | 12                                       |
| cpu_info_topology_sockets | 1                                        |
| cpu_info_topology_threads | 2                                        |
| cpu_info_vendor           | Intel                                    |

Hypervisor 2:

| cpu_info_arch             | x86_64                                   |
| cpu_info_features         | ["smap", "avx", "clflush", "sep", "rtm", |
|                           | "vme", "dtes64", "invpcid", "tsc",       |
|                           | "fsgsbase", "xsave", "pge", "vmx",       |
|                           | "erms", "xtpr", "cmov", "hle", "smep",   |
|                           | "ssse3", "est", "pat", "monitor", "smx", |
|                           | "pbe", "lm", "msr", "adx",               |
|                           | "3dnowprefetch", "nx", "fxsr",           |
|                           | "syscall", "tm", "sse4.1", "pae",        |
|                           | "sse4.2", "pclmuldq", "acpi", "fma",     |
|                           | "tsc-deadline", "mmx", "osxsave", "cx8", |
|                           | "mce", "de", "tm2", "ht", "dca",         |
|                           | "lahf_lm", "abm", "rdseed", "popcnt",    |
|                           | "mca", "pdpe1gb", "apic", "sse", "f16c", |
|                           | "pse", "ds", "invtsc", "pni", "rdtscp",  |
|                           | "avx2", "aes", "sse2", "ss", "ds_cpl",   |
|                           | "bmi1", "bmi2", "pcid", "fpu", "cx16",   |
|                           | "pse36", "mtrr", "movbe", "pdcm",        |
|                           | "rdrand", "x2apic"]                      |
| cpu_info_model            | Broadwell                                |
| cpu_info_topology_cells   | 2                                        |
| cpu_info_topology_cores   | 14                                       |
| cpu_info_topology_sockets | 1                                        |
| cpu_info_topology_threads | 2                                        |
| cpu_info_vendor           | Intel                                    |

The destination host does not have enough capacity for migration.

Cause: The new host does not have enough resources for migration.

Resolution: Check the capacity of the destination host. If it does not have enough resources left, select another eligible host and try live migration again.

Take the following steps to check the capacity of a host:

  1. Log on to the Horizon dashboard as cloud_admin.
  2. Under the Admin topic, expand System and click Hypervisors. You can view the used and total resouces of each host about VCPU/RAM/Local Storage. Feel free to have us verify the resources across your cluster by opening a support ticket.

The instance has an invalid availability zone property in the Nova DB.

Cause: This is a known issue in the OpenStack community. The instance may have an invalid availability zone property in the Nova DB, which causes a failure during the migration or resizing process. You may see this bug for more information.

Resolution: This invalid property cannot be found in Horizon or the command line but only by querying the database. You can contact IBM Bluemix Private Cloud support to get the availability zone database property for the instance updated.