ILande - Migrating an NT4 VM from VMWare to NT4

One of the tasks I have been working on over recent weeks is the migration of an NT4 VM from VMWare over to qemu 0.9.1. Since this entire process was frought with pitfalls and dangers (some of which could not be answered by Google), I felt it would be useful to document it somewhere in a corner of the web to help other people stuck in the same situation.

VMWare disk images are stored within vmdk files, so the first step in the qemu migration process was to convert the disk image from vmdk format to qemu raw format. Fortunately this is easily done with the qemu-supplied qemu-img program, which allows conversion between a number of different virtualisation disk formats including vmdk. The initial conversion was performed by simply doing:

qemu-img convert disk.vmdk -O raw disk.img

The disk image itself was 2G in size, and so the conversion itself took just a few minutes. A quick test using dd and hexdump showed that the resulting image had a valid MBR in place, so it was time to fire up the image in qemu…. to find that it wouldn’t boot. Qemu could see the disk without any problems, but instead of firing up the NT boot menu as would be expected, the console sat waiting with the error message “A disk read error occurred”. So it was time to begin some detective work.

The first port of call was to mount the resulting disk.img using the Linux loopback filesystem to make sure that the disk image looked relatively sane. My first suspicion was that somehow the drive geometry had been mis-translated from the .vmdk file to the raw .img file. The image mounted without a problem, and I was able to run fdisk and view the partition table. Everything looked good, and I was able to see a single NTFS partition and mount it using the ntfs-3g driver. Navigating through the file system for a few minutes suggested that the drive geometry looked correct, as I wasn’t seeing disk read errors whilst running files thorough cat/dd, so the conversion appeared to have been successful except for the fact that it didn’t boot.

My next thought was that there was a bug in qemu that was preventing the NT bootloader from running under emulation. So I dug out the original NT4 CD and did a basic installation on my laptop with qemu 0.9.1 to make sure that it worked as expected - and of course it did. This pointed the figure back towards the migrated image as the culprit (rather than qemu) and so it was back to the drawing board once again.

Given that I was able to mount the NTFS image partition using ntfs-3g but not boot successfully, I began to wonder if the MBR had been corrupted during the migration process. This meant I needed to dig out various members of the *disk family in order to check the drive geometries within the MBR. My normal tool of choice is sfdisk because it is easy to switch over to working in absolute sectors rather than getting involved with the whole CHS game. Interestingly enough, running sfdisk -l immediately gave a warning stating that it looked like the NTFS partition was made for a partition size of X*2 cylinders, 128 heads and 63 sectors as opposed to X cylinders, 255 heads and 63 sectors. Hmmm.

The next thing to look at was the values for CHS encoded within the MBR to determine whether the number of heads was 128 or 255 using dd/hexdump. These clearly showed that the MBR thought that partition size should be using 255 heads. So where on earth was sfdisk getting this idea of 128 heads from? I wondered if there was any special boot code within the first sector of the NTFS partition (VBR) as opposed to the MBR. Sure enough, some googling showed that NTFS stores a BPB copy within the first sector of the partition, and from the hexdump output it was clear that the “Disk read error” was coming from within the VBR. I then realised that I had previously installed a test copy of NT4 on my laptop with the same partition sizes that did boot, and so I should compare the two VBRs to see if there was a difference.

Firstly from the migrated image file:


00000000  eb 5b 90 4e 54 46 53 20  20 20 20 00 02 01 00 00  |.[.NTFS    .....|
00000010  00 00 00 00 00 f8 00 00  3f 00 ff 00 3f 00 00 00  |........?...?...|
00000020  00 00 00 00 80 00 80 00  85 fa 3f 00 00 00 00 00  |..........?.....|
00000030  e1 63 02 00 00 00 00 00  42 fd 1f 00 00 00 00 00  |.c......B.......|
00000040  02 00 00 00 08 00 00 00  67 2f 3a f4 60 3a f4 ea  |........g/:.`:..|
00000050  00 00 00 00 00 00 00 00  00 00 00 00 00 fa 33 c0  |..............3.|

Secondly from the working local NT4 installation:


00000000  eb 5b 90 4e 54 46 53 20  20 20 20 00 02 01 00 00  |.[.NTFS    .....|
00000010  00 00 00 00 00 f8 00 00  3f 00 80 00 3f 00 00 00  |........?...?...|
00000020  00 00 00 00 80 00 80 00  85 fa 3f 00 00 00 00 00  |..........?.....|
00000030  e1 63 02 00 00 00 00 00  42 fd 1f 00 00 00 00 00  |.c......B.......|
00000040  02 00 00 00 08 00 00 00  67 2f 3a f4 60 3a f4 ea  |........g/:.`:..|
00000050  00 00 00 00 00 00 00 00  00 00 00 00 00 fa 33 c0  |..............3.|

Can you spot the deliberate error? For some reason, when NT4 is installed under qemu, the NTFS BPB specifies a value of 128 heads (and boots), whereas the migrated VMware image specifies a value of 255 heads (matching the MBR) but does not boot??! Perhaps this is a bug in the qemu BIOS emulation or IDE disks? Anyway, the next thing to do was to try swapping the number of heads on the migrated VMWare image and see if it would then boot. Now this is where it gets strange. I really wanted to find a quick disk editor that would allow me to open up a raw device, tweak the byte in question, and then write back changes. But after lots of searching, I couldn’t find a program that would do this - which is generally very unusual for linux-based systems. Therefore I came up with the following glorious hack in C to switch the byte over (where the disk was mounted as /dev/sdb):

#include <stdio.h>
#include <fcntl.h>

int main()
{
        FILE *disk;
        unsigned char data[1024];
        unsigned char c;
        int count = 0;

        printf("Disk sector editor\n");

        disk = open("/dev/sdb1", O_RDONLY);
        count = read(disk, data, 512);
        close(disk);

        printf("count: %d\n", count);

        data[26] = 0x80;

        disk = open("/dev/sdb1", O_WRONLY);
        count = write(disk, data, 512);
        close(disk);

        printf("count: %d\n", count);
}

The program appeared to run without error, and so now it was time to see whether the image would now work with qemu. Everything was crossed as I launched qemu and prayed…. success! The NT boot loader appeared and I was finally able to get to the NT4 blue loader screen… except that the boot process stopped with a HAL error (more on this in another post). Now I have no idea why the NTFS VBR BPB requires a different value for the number of heads from those listed in the MBR to boot in qemu. Any takers anyone?

Migrating an NT4 VM from VMWare to NT4 - Part 1