Saturday, September 6. 2008
At Sirius, we have a number of boxes running Xen which we use to run virtuals for infrastructure & testing. Now it came up a couple of weeks ago that we needed to test some packages under FC8, and I was about to install the OS under Qemu when I realised that since Fedora have been shipping Xen kernels for a while, there must be a way getting a working FC8 DomU running on our existing Debian Etch Dom0. Unfortunately it wasn't as straightforward as I was hoping (with small snippets of information all over the place), so I've included the instructions below in the hope that someone else will find them useful.
1. Obtain a FC8 installation image
The easiest way to do this was to use the Rinse utility to create a FC8 image in a suitable directory on the virtualisation host.
2. Chroot to the installation image directory
3. Install the Xen DomU kernel
Rinse didn't install the Xen kernel by default and so I did a "yum install kernel-xen" to install the kernel image and initrd. These we then copied outside of the chroot so they could be loaded by the dom0.
4. Rebuild the initrd
The initrd built by Rinse appeared to have missing filenames within its startup script, however re-building the initrd within the chroot appeared to resolve the issue. I also found it was necessary to explicitly include the Xen drivers for disk/network access within the initrd by including "--with=xenblk --with=xennet" on the initrd command line.
5. Change the TTY numbers in /etc/inittab
Xen seems to use TTYs named "xvcX" rather than "ttyX", so I altered the mingetty entries in /etc/inittab to use "xvcX" instead.
6. Add devpts to /etc/fstab
Fedora kernels seem to need devpts, and Rinse didn't add the required entry to /etc/fstab. Hence I added the following line:
devpts /dev/pts devpts gid=5,mode=620 0 0
7. Create the domU & enjoy
With all the above changes in place, I was able to successfully start the Fedora Core 8 Xen kernel under a Debian Etch Dom0. I hope that this article will save people time if they are required to use a mixed distribution Xen environment.
Saturday, May 31. 2008
This post is really just following up from the last one explaining about problems arising from using NT4 under qemu. Having successfully booted the VM (with some help from http://scottcross.blogspot.com/2006/03/haldll-requires-mps-version-11-system.html), there were a couple of issues I was seeing with general use. It also seems that these were the exact same issues that were being experienced here: http://www.debian-administration.org/users/ajt/weblog/115.
1) White Mouse Pointer
This is an interesting one. My current workstation runs Debian lenny, whereas the server cluster the VM will run on is Debain etch. Both machines are running qemu 0.9.1: the only real difference that the etch server is 32-bit whereas my workstation is 64-bit. When mounting the VM image locally on my workstation and running qemu, the mouse pointer appears fine when using the VGA driver and the Cirrus driver. However, when running the image on the 32-bit etch server, the mouse pointer appears fine using the in-built VGA driver, but appears as a white box when using the Cirrus driver This is a pain, since the application running on the server requires 256 colours to look respectable. It does make one wonder if this is some kind of compiler bug/word size bug of some description.
After some Google research, I found out that VBE ReactOS drivers were available, which should be able to drive the VM in 256 colours. So I ended up downloading the VGA drivers from here: http://www.bearwindows.boot-land.net/vbemp.htm. Now after installing these video drivers, the mouse pointer appears fine on the VM in 256 colours.
2) Mouse pointer goes crazy
The symptoms of this bug were very simple: after several minutes of using the VM, the mouse pointer would go crazy where one small touch would send the pointer flying all over the screen. The only cure AFAICT is to reboot the VM which is fairly annoying. Given that if there was a bug of this nature within NT4 someone would have complained by now, I guessed that the qemu mouse emulation was suspect. After adding some extra debug information on the console, I could see that in the middle of the PS/2 mouse stream, an extra ENABLE/ACK pair was being embedded into the PS/2 data stream, and it was soon after this that the mouse would deteriorate.
My first thought was that since qemu emulated an intellimouse (rather than a standard PS/2 mouse), that there was either a problem with the qemu mouse emulation or the intellimouse part of the standard PS/2 driver. Thus I came up with the following patch against qemu 0.9.1 to disable intellimouse emulation:
** hw/ps2.c.orig 2008-05-29 17:25:53.000000000 +0100
I'm pleased to report so far with this patch, the mouse pointer has stopped exhibiting this behaviour - I suspect that this is a bug in qemu somewhere. I must admit to being slightly disappointed with the qemu development team. Even though I have posted these questions to the development mailing list, no-one has really given me any sensible answers, even when I have asked what extra information they would need to pass it on. Seems like they are all too busy with the dyngen->TCG migration to really care
Wednesday, May 21. 2008
One of the tasks I have been working on over recent weeks is the migration of an NT4 VM from VMWare over to qemu 0.9.1. Since this entire process was frought with pitfalls and dangers (some of which could not be answered by Google), I felt it would be useful to document it somewhere in a corner of the web to help other people stuck in the same situation.
VMWare disk images are stored within vmdk files, so the first step in the qemu migration process was to convert the disk image from vmdk format to qemu raw format. Fortunately this is easily done with the qemu-supplied qemu-img program, which allows conversion between a number of different virtualisation disk formats including vmdk. The initial conversion was performed by simply doing:
qemu-img convert disk.vmdk -O raw disk.img
The disk image itself was 2G in size, and so the conversion itself took just a few minutes. A quick test using dd and hexdump showed that the resulting image had a valid MBR in place, so it was time to fire up the image in qemu.... to find that it wouldn't boot. Qemu could see the disk without any problems, but instead of firing up the NT boot menu as would be expected, the console sat waiting with the error message "A disk read error occurred". So it was time to begin some detective work.
The first port of call was to mount the resulting disk.img using the Linux loopback filesystem to make sure that the disk image looked relatively sane. My first suspicion was that somehow the drive geometry had been mis-translated from the .vmdk file to the raw .img file. The image mounted without a problem, and I was able to run fdisk and view the partition table. Everything looked good, and I was able to see a single NTFS partition and mount it using the ntfs-3g driver. Navigating through the file system for a few minutes suggested that the drive geometry looked correct, as I wasn't seeing disk read errors whilst running files thorough cat/dd, so the conversion appeared to have been successful except for the fact that it didn't boot.
My next thought was that there was a bug in qemu that was preventing the NT bootloader from running under emulation. So I dug out the original NT4 CD and did a basic installation on my laptop with qemu 0.9.1 to make sure that it worked as expected - and of course it did. This pointed the figure back towards the migrated image as the culprit (rather than qemu) and so it was back to the drawing board once again.
Given that I was able to mount the NTFS image partition using ntfs-3g but not boot successfully, I began to wonder if the MBR had been corrupted during the migration process. This meant I needed to dig out various members of the *disk family in order to check the drive geometries within the MBR. My normal tool of choice is sfdisk because it is easy to switch over to working in absolute sectors rather than getting involved with the whole CHS game. Interestingly enough, running sfdisk -l immediately gave a warning stating that it looked like the NTFS partition was made for a partition size of X*2 cylinders, 128 heads and 63 sectors as opposed to X cylinders, 255 heads and 63 sectors. Hmmm.
The next thing to look at was the values for CHS encoded within the MBR to determine whether the number of heads was 128 or 255 using dd/hexdump. These clearly showed that the MBR thought that partition size should be using 255 heads. So where on earth was sfdisk getting this idea of 128 heads from? I wondered if there was any special boot code within the first sector of the NTFS partition (VBR) as opposed to the MBR. Sure enough, some googling showed that NTFS stores a BPB copy within the first sector of the partition, and from the hexdump output it was clear that the "Disk read error" was coming from within the VBR. I then realised that I had previously installed a test copy of NT4 on my laptop with the same partition sizes that did boot, and so I should compare the two VBRs to see if there was a difference.
Firstly from the migrated image file:
Secondly from the working local NT4 installation:
Can you spot the deliberate error? For some reason, when NT4 is installed under qemu, the NTFS BPB specifies a value of 128 heads (and boots), whereas the migrated VMware image specifies a value of 255 heads (matching the MBR) but does not boot??! Perhaps this is a bug in the qemu BIOS emulation or IDE disks? Anyway, the next thing to do was to try swapping the number of heads on the migrated VMWare image and see if it would then boot. Now this is where it gets strange. I really wanted to find a quick disk editor that would allow me to open up a raw device, tweak the byte in question, and then write back changes. But after lots of searching, I couldn't find a program that would do this - which is generally very unusual for linux-based systems. Therefore I came up with the following glorious hack in C to switch the byte over (where the disk was mounted as /dev/sdb):
The program appeared to run without error, and so now it was time to see whether the image would now work with qemu. Everything was crossed as I launched qemu and prayed.... success! The NT boot loader appeared and I was finally able to get to the NT4 blue loader screen... except that the boot process stopped with a HAL error (more on this in another post). Now I have no idea why the NTFS VBR BPB requires a different value for the number of heads from those listed in the MBR to boot in qemu. Any takers anyone?
Sunday, May 18. 2008
As you all know, I've been working on PostGIS for quite a while now, and one of the things that has really been bugging me is the amount of legacy code in the codebase. Hands up all those of you who remember having to type "SELECT update_geometry_stats()" to build the optimiser index statistics on your database manually...!
After various discussions on the PostGIS mailing lists, I finally got the go-ahead to commit a new build system using new autoconf code and PGXS, the discussion of which can be found here. The main reasons for re-writing the build system are to move most of the version detection logic from the Makefile into autoconf/autoheader, so that we can start to think about other toolchains (think MSVC in particular), and to help maintainability of the project in the long run. Because there are so many different compile options, it was fairly easy for one developer to miss something, and hence break some else's build environment. The new build system helps by making PROJ.4 and GEOS compulsary, which means that we considerably reduce the overhead of maintaining multiple versions of the regression tests.
One of the side effects of this is that in order to use PGXS, we require PostgreSQL > 8.0. Since last time I asked about this on-list, I received just 1 reply asking for older versions - so I don't think too many people will lose sleep over it. It also gives me a chance to remove a lot of legacy code from the codebase - anyone browsing lwpostgis.sql.in from the 1.3 series branch will go cross-eyed very quickly...
Sunday, May 18. 2008
It's been quite an exciting year for me so far (now that I have moved to Weybridge to join Sirius), and I realise that because of this I have been neglecting this site for quite a while. Hence I am now going to make a concerted effort to keep everyone informed of what I am currently doing - so I am now officially back from blogging holiday
Tuesday, February 27. 2007
The latest version of ProxyTunnel was released at FOSDEM over the weekend which includes a patch I wrote to allow an SSH tunnel to work over a secure HTTP connection. In brief, there are now two ways to use SSL encryption in ProxyTunnel:
1) Standard (--encrypt option)
i) Connect to internal proxy server using HTTP
ii) Issue CONNECT command
iii) Perform SSL handshake with remote server
iv) Start tunnel
2) Modified (--encrypt-proxy option)
i) Connect to external proxy server using HTTPS
ii) Issue CONNECT command
iii) Start tunnel
Option 1 is the existing option, which is useful for layer 7 inspection firewalls. By wrapping the tunnel in SSL at both end, the firewall perceives that the traffic represents a valid SSL connection regardless of its contents, and so lets it pass. Option 2 is the new option added by my patch, and allows a HTTPS site running on Apache somewhere on the internet (which may also be used to host an existing secure website) to host a tunnel another machine. For those of you interested in more information, I highly recommend reading Dag Wieer's site SSH tunneling page here.
One final aspect of the patch which is likely to be overlooked is that it introduces an abstraction API called streams to handle both SSL and non-SSL traffic. This made the final modifications trivial, since the only thing that changed between the two options was the timing of the SSL handshake on the connection. It also allowed most of the #ifdef USE_SSL ... #endif statements to be moved into one location, rather than scattered arbitrarily through many different files which should make the application much more robust in the face of future changes.
Monday, February 26. 2007
One of the hardest parts in maintaining the Win32 port of PostGIS is that changes to the underlying GEOS library can unintentionally stop it from compiling on Windows. A full compile of GEOS using MingW/MSYS on my reasonably fast Athlon X2 system takes 40 mins, and so makes fixes for the GEOS inlines problem (where the entire compile would complete but the process would fail during link) extremely time-consuming. So I figured that it would be an interesting exercise to see whether I could use a cross-compiler under my installation of Ubuntu to speed up this process.
A quick search on Google for "gcc cross compile" gave back lots of resources to get me started, however some were still lacking in detail as to how to setup the compiler in parallel with an existing GCC compiler. Since there wasn't much help installing into a specific directory root, I now reproduce below the steps required to setup a GCC/MingW cross compiler:
1. Download binutils to $HOME/mingw-build (latest version was 2.17).
2. Build the binutils package:
./configure --prefix=$HOME/mingw-build/rel --target=i686-pc-mingw32Note that I had to do "make" followed by "make install" since running "make install" directly didn't work for me.
3. Download the latest MingW runtime and API files from the MingW project. At the time of writing these were w32api-3.6.tar.gz and mingw-runtime-3.9.tar.gz. Extract these in the directory $HOME/mingw-build/rel/i686-pc-mingw32 which was created in step 2.
4. Download and install GCC 3.4.2 (this is the version currently supported by the MingW project)
PATH=$HOME/mingw-build/rel/bin:$PATH ./configure --prefix=$HOME/mingw-build/rel --target=i686-pc-mingw32And that was it. So now that the cross-compiler was setup, time to test the new setup on GEOS to see if it would compile any faster. The version of GEOS tested was GEOS 3.0.0rc4. Running ./configure for GEOS didn't work the first time, since autotools refuses to detect 64-bit integer types in a cross-compile environment since the capabilities of the target and the host systems can be different. I eventually got GEOS to work with the following ./configure statement:
PATH=$HOME/mingw-build/rel/bin:$PATH CXXFLAGS="-g -O2 -DHAVE_INT64_T_64" ./configure --host=i686-pc-mingw32 --target=i686-pc-mingw32 --prefix=$HOME/tmp/geos/relI then invoked "make" using the following:
So how long did it take to build the cross-compiled DLL on Ubuntu instead of using MSYS on Windows? 7 mins! During testing, I did find that the resulting DLL still had the issue where C++ exceptions are not correctly propogated back up to the C code in PostgreSQL. It appears that this is because the linker option "-lstdc++" doesn't quite work correctly in the cross-compile environment, so if anyone can shed light on this that would be great. However, I now have the ability to check the latest GEOS builds don't break the MingW compiler process without having to wait 40 mins each time - and that is a good thing.
(Page 1 of 1, totaling 7 entries)
Syndicate This Blog