Some of you may know that exporting a live instance off EC2 is not easy. From all the
limitations the most important is that the only way to export an instance is if it was imported from another virtualization environment in the first place.
In my opinion these area artificial limitations designed to make it harder for you to leave the Amazon eco-system. Legitimate, but not nice. It gets even worse due to the fact that the formats accepted are all connected with paid virtualization solutions.
As I do not care about paid virtualization solutions and just wanted to dump the instance as a way to keep a perfect copy of the instance for backup and make tests I decided to research on how to dump an instance the "monkey" way.
Dumping and running the instance locally
For my goals I consider an instance as a virtual private server with a main storage device. The storage device should contain all that is required to boot and start "as if" it was still running in the cloud.
There is a quick and dirty way, and there is a proper way
The quick and dirty involves making the disk dump while it is being used. This will almost certainly lead to a slightly corrupted image, but it is more than recoverable if you call sync and are not doing any particularly intensive or important IO operation.
The proper way is just a few extra steps before the dirty way.
- Start a random instance to which you have ssh credentials and root access.
- Stop the instance which you want to backup.
- Detach the EBS volume from your main instance and attach it the newly created temporary instance.
- Start the temporary instance but do not mount the EBS storage.
- Get the device name of the EBS volume.
lsblk
Start of the quick and dirty way
- On your local machine download the device contents.
ssh -T $INSTANCE_ADDRESS 'dd bs=16M if=$EBS_DEVICE | bzip2 -c' | bzcat > $IMAGE_DESTINATION
- (Optional) Convert image to qcow2
qemu-img convert -f raw -O qcow2 $IMAGE_DESTINATION $IMAGE_DESTINATION.qcow2
- Boot image on qemu.
qemu-system-x86_64 -drive format=raw,file=$IMAGE_DESTINATION -enable-kvm -serial mon:stdio -vga virtio -device rtl8139,netdev=net0 -netdev user,id=net0,hostfwd=tcp::10022-:22,hostfwd=tcp::10443-:443,hostfwd=tcp::1080-:80 -m 2G
Notes and small explanations
Different compression algorithms
Instead of bzip it is possible to use other compression algorithms as well as different levels of compression. An example would be a gzip -> zcat pipe pair, or xz -> xzcat pair.
Convert image
What you download should be a raw disk image. You may want the disk image to be a more clever underlying storage like for example qcow2. You can have other formats though, including vdl, to use on Virtualbox application. See more information about qemu-img convert
here.
System interaction
Contrary to the Amazon EC2, you can actually have live view of the booting process and kernel printing. This is quite useful to see if something went wrong with the dump. For example if you went with the quick and dirty way your file systems may need to be checked or some systems fail due to on-line file system dumping. This is in most cases survivable but you will be glad to have this information.
The command line part that redirects the kernel console to the console of the qemu application is
-serial mon:stdio
Connectivity
QEMU connectivity and command line is huge and highly configurable. It is assumed that anybody doing work with an instance will want to connect to the machine through SSH or expose server services running there. There are mutiple ways to do it with different degrees of difficulty. In the qemu command presented above you can find that the ports of the Guest(Instance) will be exposed on the localhost.
- Port 22 (SSH) of the Guest mapped to localhost:10022
- Port 443 (SSL) mapped to 10443
- Port 80 (HTTP) mapped to 1080
Note that mapping ports to different values may lead to, for example broken web services. This is due to the fact that there may be automatic redirections on the browser side that discard the port numbers and will point to the Guest VM ports, which will then not match. The reason I did not match the ports of the guest directly to the localhost(host) is because some lower ports like 443 and 80 require root privileges, so qemu would need to be root.
Another trick to really fool all the redirections in you host to point to your virtual machine is to add an entry to the /etc/hosts file mapping for example a domain name of your real instance to your loopback address. This has worked really well for me but be aware that you need to disable after you are done otherwise you will be very confused :D