Introduction
The following article will provide the steps to recover a VM if you're unable to SSH and have not yet set a Serial Console password.
There could be many reasons why you're unable to SSH to a VM. Most common reasons for the SSH Daemon being unresponsive are if the home boot disk reaching max capacity, mistakenly mounted the local nvme drive in /fstab, etc. To resolve this, the recommended step is to serial console into the VM and restart the SSH Daemon. The following article provides the step to do so: How-To Re-Enable SSHd for Connection Refused errors. Or perform a soft reset on the VM. However, if you haven't created a Serial Console password, you can recover the VM by creating a separate "Rescue VM" which can be of any instance type (c1a, s1a, etc.), or an existing VM (not recommended for production VM's as you will need to install extra tools and packages) and following the steps below to set a password.
Prerequisite
- Crusoe Cloud account
- Access to the Crusoe Cloud UI or Crusoe CLI
Instructions
Step 1: Create a new VM that can be used as a "Rescue VM". This can be any VM that’s already running but highly recommended to create a new VM to avoid issues and can be of any instance type (c1a, s1a, etc).
Note: The “Rescue VM” must be in the same project and the same location as the “Broken VM”
Step 2: Ensure the “Rescue VM” is in a RUNNING state and the “Broken VM” is in a STOPPED state
Step 3: Contact Support to attach the OS boot disk of the "Broken VM" to the "Rescue VM"
Note:
1. You will be unable to STOP the 'Rescue VM' if there is more than 1 boot disks attached. Additionally, you will be unable to START or Delete the 'Broken VM' until the boot disk of the 'Broken VM' is detached from the 'Rescue VM'
2. The status of the “Broken VM” will then appear as “Blocked”.
Step 4: SSH to the "Rescue VM" and run the lsblk command. You should see the disk from the "Broken VM" has been mounted to the "Rescue VM".
Note: The 'Recovery VM' disk will appear as vda
and the 'Broken VM disk' as vdb
Step 5: Run the following command to update available packages and download a set of tools that will allow you to inspect and modify VM disk images:
sudo apt update -y && sudo apt install nano wget curl libguestfs-tools -y
Step 6: Acquire the serial number of the “Broken” VM disk and paste in the following command:
sudo virt-customize --format raw -a /dev/disk/by-id/virtio-<SERIAL> --root-password password:<PASSWORD>
- Update the <SERIAL> field with the serial number of the “Broken VM” disk and the <PASSWORD> field with the desired root password
- The Serial Number can be acquired from the UI by selecting the Broken VM. Under the “Disks” section the third column will list the “Serial Number”
- Or from the CLI using the following command:
crusoe compute vms get <Broken-VM-Name> -f json | jq -r '.disks[] | select(.attachment_type | contains("os")).serial_number'
Step 7: Once completed, detach the disk from the Rescue VM.
This can be done from the Crusoe Cloud UI by selecting the "Rescue VM" and clicking the X action on the right of the "Broken VM's BootDisk (recovery mode)" or by running the following command:
crusoe compute vms detach-disks <Rescue_VM_Name> --disks <Disk-SERIAL-Number>
- The <Disk-SERIAL-Number> is the serial number from the "Broken VM" BootDisk acquired above in step 6.
Note:
1. You will NOT be able to STOP the VM if the disk is still attached to the Rescue VM
2. You will NOT be able to START or Delete the Broken VM if the Rescue VM still has the boot disk attached
Step 8: After you have detached the boot disk from the "Rescue VM", you can then STOP the "Rescue VM" and START the "Broken VM" again and use the newly created password to serial console into the VM as a root user.
Additional Use Case:
The following steps can be used if you need to access the home directory of the "Broken VM". This can be used to add a public SSH key to the authorized_keys file.
Step 1: Recommended to switch to root user for elevated privileges:
sudo su
Step 2: Create a recovery directory:
mkdir /mnt/recovery
Step 3: Mount the Broken VM disk to the newly created recovery directory:
sudo mount -t ext4 /dev/vdb1 /mnt/recovery
Step 4: Change directory to the home of newly created recovery directory:
cd /mnt/recovery/home/ubuntu/.ssh
Step 5: Edit the authorized_keys file using nano/vim to include your public SSH key
Step 6: Unmount the Broken VM disk:
sudo umount /mnt/recovery
Step 7: Once completed, detach the disk from the Rescue VM
Step 8: After you have detached the boot disk from the "Rescue VM", you can then STOP the "Rescue VM" and START the "Broken VM" again
Important Notes / FAQs
1. The “Rescue VM” must be in the same project and the same location as “Broken VM”
2. Ensure the “Rescue VM” is in a RUNNING state and the “Broken VM” is in a STOPPED state during the process
3. You will not be able to STOP the Rescue VM if there is more than 1 boot disks attached
4. You will not be able to START or Delete the Broken VM until the boot disk of the Broken VM is detached from the Rescue VM
5. It is highly recommended to create a new instance to use as a “Rescue VM” to avoid any issues as you will need to install libguestfs-tools and packages in a production VM but an existing VM can be used
6. A “Broken VM” cannot be recovered by more than one “Rescue VM” at a time
7. One “Rescue VM” can only recover one “Broken VM” at a time
i. But the same “Rescue VM” can be used to recover another “Broken VM” once the first one is completed fully
Comments
0 comments
Article is closed for comments.