www.asplund.nu

...my place on the web

  • Increase font size
  • Default font size
  • Decrease font size

XEN Cluster HowTo

I have tried to run both Debian Etch and Ubuntu 8.04 Server on the cluster nodes, in Dom0. I started my tests with Debian, but I had some issues with slow samba performance in one VM that I couldn't fix so I decided to try Ubuntu Server, for the first time. Both installation went OK, the main difference was that I used mainly source code in Debian, but only packages in Ubuntu. I actually ran into more problems with Ubuntu due to some early bugs in the 8.04 release, will describe them below as I go along.

2011-02-27: The setup in Ubuntu was very stable and I had initially very few problems. I ran into some trouble later on when it turned out that my heartbeat in combination with XEN was negatively effected by some system updates. Never had time to resolve it so I eventually switch off heartbeat as it was just causing issues.
Instead I have started with a new install consisting of Debian Squeeze and XEN version 4. I'm migrating the data upgrading one node at the time. Documenting the migration phase and will publish it together with a fresh install guide.

Installation How to

Index:

Linux - Ubuntu 8.04 Server
Ubuntu - Base configuration
Installing XEN
Configure LVM
Configure XEN-tools
Using XEN-tools to install DomU
Install and configure DRBD
Configure DomU to use your DRBD device
Configure DomU on your other Dom0 node
Configure Live Migration
Install and Configure Heartbeat
Bug in default xendomains script
Troubleshooting

Linux - Ubuntu 8.04 Server

I am not going to go through each and every step of the Ubuntu installation as that is really straight forward. I downloaded the 8.04 Server version from http://www.ubuntu.com/getubuntu/download and started the installation one on of the nodes.

During disk partitioning, select manual mode and delete any existing partitions (if there are any).
Create one partition for your main system and make it 5GB. This partition will be used for your Dom0 and 5GB should be more than enough.
Create a second partition for your swap for Dom0. Make this 512MB as we later will configure Dom0 to use 512MB of RAM.
Create a third partition with all your remaining space.

At the packages selection step do not select any packages. Leave everything deselected.

A few minutes later you should have a basic installation of Ubuntu 8.04 Server running.

Ubuntu - Base configuration

A few things need to be done to the freshly installed Ubuntu Server before we start installing XEN.

Most of the commands need to be executed with super user rights, so we start by typing

sudo su

I installed SSH early on so I didn't need to run everything locally on the console

apt-get install ssh

Configure network interfaces for static IPs:

nano /etc/networking/interfaces

and configure it so it looks something like this:

auto lo
iface lo inet loopback

# The primary network interface
auto eth0
iface eth0 inet static
address 10.1.1.100
netmask 255.255.255.0
network 10.1.1.0
broadcast 10.1.1.255
gateway 10.1.1.1

# Secondary interface used between the two clusters
auto eth1
iface eth1 inet static
address 192.168.1.100
netmask 255.255.255.0
network 192.168.1.0
broadcast 192.168.1.255

This setup will use eth0 to connect to my network and eth1 as the link between the two nodes. For node ha2 use ip address 10.1.1.101 and 192.168.1.101.

Edit /etc/hosts and add both cluster nodes

nano /etc/hosts

my file looks like this:

127.0.0.1 localhost
127.0.1.1 ha1.domain.local ha1
10.1.1.100 ha1.domain.local
ha1
10.1.1.101 ha2.
domain.local ha2
192.168.1.100 ha1X
192.168.1.101 ha2X

for node ha2 change the second line to 127.0.1.1 ha2.domain.local ha2

Edit the /etc/hostname file to match your Fully Qualified Domain Name (FQDN)

nano /etc/hostname
ha1.domain.local

and for node ha2 change this to: ha2.domain.local

Reboot machine to confirm everything is working

reboot

You should now be able to connect to your static IP address and your hostname should be your FQDN. Run:

hostname
hostname -f

and they should both return your FQDN, like this:

root@ha1:/# hostname
ha1.domain.local
root@ha1:/# hostname -f
ha1.domain.local

Install NTP to have your time synchronized. This is important for your VMs, especially when migrating a VM to the other node:

apt-get install ntp 

Now we are done with the basic configuration of the system and can move on to the fun part, to install Xen, DRBD & Heartbeat

Installing XEN

I will use only packages during this installation, and installing XEN on Ubuntu is then a quick and hassle free process. I will only cover the important parts and make comments where needed, if you want to look at a more comprehensive guide for installing XEN on Ubuntu then please check this link: http://howtoforge.com/ubuntu-8.04-server-install-xen-from-ubuntu-repositories

apt-get install ubuntu-xen-server 

Now you should have Xen installed.

Even though you might not use loop devices in your XEN setup it can be a good idea to extend the number of allowed loop devices so you don't run into trouble later if you plan to use them. Edit /etc/module and modify the line with "loop" as below:

nano /etc/modules 

loop max_loop=64

Now it is time to reboot your system so it will boot with the new xen kernel:

reboot 

After reboot, run:

uname -r

to confirm your system is using the new xen kernel. It should look like this:

root@ha1:/# uname -r
2.6.24-19-xen

Configure LVM

To install LVM run:

apt-get install lvm2 

We will now configure a Volume Group of the third partition created during installation. On my system this is /dev/sda3. You can run:

fdisk -l 

to confirm this.

Do the following to create a volume group called "vg":

pvcreate /dev/sda3 
vgcreate vg /dev/sda3

To check the status of the volume group, run:

vgdisplay 

and then reboot:

reboot 

Within this volume group we will later create logical volumes that will be use by our VMs. You can create the logical volumes manually when installing a new VM, but below I will use xen-tools which will do all that for us.

Configure XEN-tools

If you have a cluster with two machines like me it is not necessary to configure xen-tools on both of them. You can decide that you always will use one when adding a new DomU. For flexibility I have my both machines configured the same way so I can run xen-tools on both.

Xen-tools is installed automatically as part of the Xen installation above. So now we just need to configure it. The configuration for xen-tools is stored in /etc/xen-tools/xen-tools.conf

Run:

nano /etc/xen-tools/xen-tools.conf 

and perform the following changes in the xen-tools.conf file:

We first uncomment the line with LVM and specify the volume group we created above:

lvm = vg

Configure the "Disk and Sizing options" section. Mine looks like this at the moment:

size = 5Gb # Disk image size.
memory = 384Mb # Memory size
swap = 384Mb # Swap size
# noswap = 1 # Don't use swap at all for the new system.
fs = ext3 # use the EXT3 filesystem for the disk image.
dist = hardy # Default distribution to install.
image = sparse # Specify sparse vs. full disk images.

the disk size and the swap size will be used to create Logical Volumes (LVs) in your VG specified above.

Next we edit the "Networking" section. Mine looks like this:

gateway = 10.1.1.1
netmask = 255.255.255.0
broadcast = 10.1.1.255

we will specify the IP address of the VM and its hostname when creating the VM with xen-tools.

Uncomment the line with passwd to always be asked for the password for your VM. Should look like:

passwd = 1

As I have a 64bit AMD CPU I set the following value for my architecture:

arch=amd64

Next and last we change the mirrors used when installing Debian and Ubuntu. This is how mine looks like using mirrors in The Netherlands:

# The default mirror for debootstrap to install Debian-derived distributions
#
mirror = http://ftp.nl.debian.org/debian/

#
# A mirror suitable for use when installing the Dapper release of Ubuntu.
#
mirror = http://nl.archive.ubuntu.com/ubuntu/

#
# If you like you could use per-distribution mirrors, which will
# be more useful if you're working in an environment where you want
# to regularly use multiple distributions:
#
mirror_sid=http://ftp.nl.debian.org/debian
mirror_sarge=http://ftp.nl.debian.org/debian
mirror_etch=http://ftp.nl.debian.org/debian
# mirror_dapper=http://archive.ubuntu.com/ubuntu
# mirror_edgy=http://archive.ubuntu.com/ubuntu
# mirror_feisty=http://archive.ubuntu.com/ubuntu
# mirror_gutsy=http://archive.ubuntu.com/ubuntu

for Debian I had to enable the per-distribution mirrors as shown above. Otherwise installing sid, sarge or etch would fail.

Now we are done configuring /etc/xen-tools/xen-tools.conf

The default setting in xen-tools.conf is using the debootstrap installation method. For installing Fedora and CentOS you need to use rinse. If you want to try that later on you should install rinse by running:

apt-get install rinse 

Now we are ready to use xen-tools!

Using XEN-tools to install DomU

Perform this step on only one of your machines, either HA1 or HA2, to create your first DomU.

Time to install our first DomU. The application we use for this part of Xen-tools is called: xen-create-image. The only two parameters we supply when using xen-create-image is the IP address and hostname of the DomU. All other settings will be taken from xen-tools.conf. But you can override all these settings at command line. For example, to change the distribution you would write --dist=etch to install Debian etch instead of Ubuntu Hardy.

Now, run the following to create a DomU with IP: 10.1.1.50 and with hostname: test:

xen-create-image --hostname=test --ip=10.1.1.50 

After a while the process hopefully completed successfully and your DomU is ready. For me it took about 5 minutes to complete.

xen-create-image has now automatically created two LVs in your VG. If you used the hostname "test", you will have one LV called "test-disk" and another one called "test-swap". Full path to these are: "/dev/vg/test-disk" and "/dev/vg/test-swap"

Run:

lvdisplay

and you will see the details of your two LVs.

You should now be able to start your DomU called: test. The config files for your DomUs are stored in /etc/xen/. We first change to that folder:

root@ha1:/# cd /etc/xen 

In /etc/xen/ you will find test.cfg which is the config file for your newly created DomU.

Start the DomU:

root@ha1:/etc/xen# xm create test.cfg 

you can also add -c to the line above, that will automatically bring you to the console of the DomU. If you didn't, we can check the status of the running DomU with:

xm list 

To change to the console of the running DomU, run:

xm console test

if everything went fine you will now see the prompt to login to your DomU test. To leave console mode in your DomU you need to press Ctrl and ]:

Ctrl + ]

Perfect! You now have your first virtual machine running.

NEXT

We will soon go ahead and configure DRBD and Heartbeat to allow for live migration and high availability. This is the fun part! But before we do that we need to duplicate the LV disk setup on the other machine.

So on HA2 we need to create one LV with 5GB and another one with 384MB.

Run:

root@ha2:/# lvcreate -L 5G -n test-disk vg 

to create the "test-disk" with 5GB of space. We use the same name here for simplicity, but the names doesn't need to match as we will specify that later in the DRBD configuration.

Run:

root@ha2:/# lvcreate -L 384M -n test-swap vg 

to create "test-swap" with 384MB of space.

Now we have the same disk setup on both machines.

 

Install and configure DRBD

To install DRBD run:

apt-get install drbd8-utils 

The configuration file for DRBD is located in /etc/drbb.conf. We will now configure it to use the LVs we create above and later we will change the Xen configuration of our test DomU to use the DRBD device instead of directly the LV. Edit /etc/drbd.conf:

root@ha1:/# nano /etc/drbd.conf 

Below is a copy of my settings with all default comments removed from the file:

global {
usage-count yes;
}

common {
syncer { rate 90M; }
}

resource test-disk {
protocol C;
startup {
wfc-timeout 120; ## 2 min
degr-wfc-timeout 120; ## 2 minutes.
}
disk {
on-io-error detach;
}
net {
allow-two-primaries;
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;

timeout 60;
connect-int 10;
ping-int 10;
max-buffers 2048;
max-epoch-size 2048;
}
syncer {
}

on ha1.domain.local {
address 192.168.1.1:7789;
device /dev/drbd1;
disk /dev/vg/test-disk;
meta-disk /dev/vg/meta[0];

}

on ha2.domain.local {
address 192.168.1.2:7789;
device /dev/drbd1;
disk /dev/vg/test-disk;
meta-disk /dev/vg/meta[0];
}
}

resource test-swap {
protocol C;
startup {
wfc-timeout 120; ## 2 min
degr-wfc-timeout 120; ## 2 minutes.
}
disk {
on-io-error detach;
}
net {
allow-two-primaries;
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;

timeout 60;
connect-int 10;
ping-int 10;
max-buffers 2048;
max-epoch-size 2048;
}
syncer {
}

on ha1.domain.local {
address 192.168.1.1:7790;
device /dev/drbd2;
disk /dev/vg/test-swap;
meta-disk /dev/vg/meta[1];
}
on ha2.domain.local {
address 192.168.1.2:7790;
device /dev/drbd2;
disk /dev/vg/test-swap;
meta-disk /dev/vg/meta[1];
}
}

I have given the DRBD resource the same name as its corresponding LV. So DRBD resource: test-disk is using LV: test-disk.

Next we will create a separate volume where we store DRBD's meta data. Meta data is used by DRBD to store information about the device. This can either be internal or external. Internal mode is easier to setup for a new devices but requires resizing operations when using an already formatted device. Read further about this on: http://www.drbd.org/users-guide/ch-internals.html

As we already have data on our LVs, created by XEN tools, we will use external meta data. So we create another LV with 1GB of space:

lvcreate -L 1G -n meta vg 

If drbd.conf, above, you will see that we specify the line with meta-disk as using /dev/vg/meta[0] and /dev/vg/meta[1]. Same device can be used to store meta-data for several DRBD resources, that is done with adding [X] after the device name.

To initiate the meta data, run:

drbdadm create-md test-disk
drbdadm create-md test-swap

Redo the DRBD configuration above for the other node if you haven't done already.

Now we will startup DRBD on both nodes. Run:

/etc/init.d/drbd start 

To check status of DRBD use the following commands:

/etc/init.d/drbd status
cat /proc/drbd

Before the replication of data will begin we have to make one node the primary node for each DRBD resource. Run the following on the node you installed your DomU above to replicate the data to the other node (Do NOT run it on the other node):

drbdsetup /dev/drbd1 primary -o
drbdsetup /dev/drbd2 primary -o

Check the status again to see that the replication started:

/etc/init.d/drbd status
cat /proc/drbd

If everything is OK after data is replicated you should see something like this for each DRBD device when running /etc/init.d/drbd status. It should state Primary/Secondary and UpToDate/UpToDate:

/etc/init.d/drbd status
1:test-disk Connected Primary/Secondary UpToDate/UpToDate C
2:test-swap Connected Primary/Secondary UpToDate/UpToDate C

Now we are done setting up the DRBD device for our LV. Next is to configure or DomU to use DRBD.

 

Configure DomU to use your DRBD device

The configuration files for your DomUs are stored in /etc/xen/. So we first change to that directory:

cd /etc/xen 

In here you have your test.cfg file. Edit it by running:

/etc/xen# nano test.cfg 

You will find a section that looks like below:

disk = [
'phy:/dev/vg/test-swap,xvda1,w',
'phy:/dev/vg/test-disk,xvda2,w',
]

Edit this section so it looks like this:

disk = [
'drbd:test-swap,xvda1,w',
'drbd:test-disk,xvda2,w',
]

Done! That is all, now your DomU is ready to be started using the DRBD device.

If your DomU is still running we first have to stop it. Run:

xm shutdown test 

Try to start your DomU again:

xm create test.cfg -c

Hopefully everything went fine. Smile Login and then shutdown your DomU:

shutdown -h now 

When you are back to your Dom0 prompt check the DRBD status again:

/etc/init.d/drbd status

You will now see the status of the DRBD devices as Secondary/Secondary:

1:test-disk Connected Secondary/Secondary UpToDate/UpToDate C
2:test-swap Connected
Secondary/Secondary UpToDate/UpToDate C

The good thing with this is that Xen takes care of your DRBD devices and will automatically bring them up and down as needed. Same goes when you start a DomU, Xen will first make sure the DRBD devices are in primary mode before starting the DomU.

 

Configure DomU on your other Dom0 node

Now we need to prepare the other Dom0 node to be able to run the DomU called test. This is simply done by copying the /etc/xen/test.cfg from ha1 to ha2. Copy the whole file or the content of the file, what ever is easier for you. When the file is copied we will try to start the DomU on ha2.

First make sure that DomU test is not running on ha1:

root@ha1:/# xm list 

If it is running the stop it:

root@ha1:/# xm shutdown test 

Then go to ha2 and start it:

root@ha2:/# xm create test.cfg 

If everything is fine you should see DomU test running. Check with xm list:

root@ha2:/# xm list 

Perfect! You can now start the DomU on both nodes. (Note: Don't try to start the DomU on both nodes at the same time.)

 

Next we will start with the real cool things!
We will go ahead and configure live migration so we can move a DomU between the two Dom0 nodes with hardly any downtime. Cool

 

Configure Live Migration

By default XEN does not allow live migration, we have to enable this is /etc/xen/xend-config.sxp. Make sure the following line is commented, it should look like this:

#(xend-relocation-hosts-allow '^localhost$ ^localhost\\.localdomain$')

and that the following line is not commented, it should look like this:

(xend-relocation-port 8002)

Restart xend, reload has no effect, but restart will not kill running DomUs. Run:

 /etc/init.d/xend restart 

Make sure to do this two changes above on both nodes, both ha1 & ha2.

NOW, lets try a live migration!

If everything is in order your DomU called test should be running on your ha2 node. Please confirm this with xm list:

root@ha2:/# xm list
Name ID Mem VCPUs State Time(s)
Domain-0 0 500 2 r----- 10268.0
test 7 384 1 -b---- 4694.7

To migrate a DomU we use the xm migrate command. Run:

root@ha2:/# xm migrate test ha1x --live

So first we write xm migrate. Then the name of the DomU we want to migrate, in this case test. Then the hostname or IP of the other node, in this case ha1x (remember that we specified ha1x and ha2x in the hosts file on both nodes and mapped it to the IP of the interface with the cross over connection between ha1 and ha2). And we end the command with --live, that will instruct xen to do a live migration.

If everything went fine your DomU test should be running on ha1. Run:

root@ha1:/# xm list
Name ID Mem VCPUs State Time(s)
Domain-0 0 500 2 r----- 10260.1
test 11 384 1 -b---- 2.2

Try the migration a few times between the two nodes. Try accessing the DomU during a live migration, ping it from another machine in your network, open a SSH session and see how incredible fast it migrates. I have not timed it myself, but running a normal ping from a Windows machine I lose no packets or maximum 1 packet during a migration. Smile

 

Install and Configure Heartbeat

We will use Heartbeat, part of the Linux High Availability project (http://www.linux-ha.org/), to monitor our XEN resources and provide failover between our two nodes. I currently use version 2 of hearbeat but utilizing versions 1's configuration files, so that is what I will describe below. Read more in the link above about the difference in configuration files.

To install Heartbeat we will use package in the Ubuntu repository. Run this on both nodes:

apt-get install heartbeat 

Configuration files for Heartbeat is stored in /etc/ha.d/. We need to configure the following files: authkeys, haresources and ha.cf.

We will start with configuring authkeys and ha.cf as they are the easiest to explain.
Authkeys is used to configure authentication between the cluster nodes. Configure it to look something like this:

root@ha1:/etc/ha.d# nano authkeys 
auth 1
1 sha1 SecretKey123!!!

This will tell heartbeat to use the method sha1 with the supplied key. Note: Make sure to copy the exact same copy to your ha2 node.

Lets continue with ha.cf. This file contains the configuration for heartbeat: nodes in the cluster, how they communicate and timer settings. What I will use is the version 1 configuration format. Configure the file to look something like this:

root@ha1:/etc/ha.d# nano ha.cf 
logfacility local0
udpport 694
keepalive 1
deadtime 10
warntime 3
initdead 20
ucast eth0 10.1.1.100
ucast eth0 10.1.1.101
auto_failback on
watchdog /dev/watchdog
debugfile /var/log/ha-debug
node ha1.domain.local
node ha2.domain.local

The two lines above that begin with "ucast eth0 " configures the heartbeat communication. The reason I have put both node's IP addresses is so the file can be identical on both nodes, heartbeat will ignore the IP of the local machine so this is perfectly fine. Note: Make sure to copy the exact same copy to your ha2 node.

Now we will continue with haresources file. The file itself is actually very easy but we need to setup the resources used by heartbeat which requires some explanation. We begin with the file, configure it to look like this:

root@ha1:/etc/ha.d# nano haresources

ha1.domain.local xendomainsHA1
ha2.domain.local xendomainsHA2

Looks very simple, doesn't it? Smile What this means is that the resource(script that will control our DomUs) xendomainsHA1 will default to node HA1 and xendomainsHA2 to node HA2. These two scripts are copies of the /etc/init.d/xendomains script and modified for our two node cluster. We need to do this to be able to differentiate between DomUs on HA1 resp. HA2.

First we copy the /etc/init.d/xendomains twice to create xendomainsHA1 and xendomainsHA2:

root@ha1:/# cp /etc/init.d/xendomains /etc/ha.d/resource.d/xendomainsHA1
root@ha1:/# cp /etc/init.d/xendomains /etc/ha.d/resource.d/xendomainsHA2

Now we edit both files to change two lines so it looks like below:

root@ha1:/# nano /etc/ha.d/resource.d/xendomainsHA1 
LOCKFILE=/var/lock/xendomainsHA1
XENDOM_CONFIG=/etc/default/xendomainsHA1
root@ha1:/# nano /etc/ha.d/resource.d/xendomainsHA2 
LOCKFILE=/var/lock/xendomainsHA2
XENDOM_CONFIG=/etc/default/xendomainsHA2

Please make sure that all the Heartbeat configuration above is exactly the same on node HA2. Below the configuration will differ slightly.

As you noticed above we specified different configuration files for the resources. There is a default configuration file already located in /etc/default, called xendomains. We will copy it as below:

root@ha1:/# cp /etc/default/xendomains /etc/default/xendomainsHA1

We copy only xendomainsHA1 to begin with. We will modify it and then later use it for xendomainsHA2.

root@ha1:/# nano /etc/default/xendomainsHA1

XENDOMAINS_MIGRATE="ha2X --live"

This will allow for live migration to HA2 when node is shutdown.

XENDOMAINS_SAVE=

Disable save feature.

XENDOMAINS_SHUTDOWN_ALL=

Disable this to prevent ALL DomUs to be shutdown even them not controlled by this script

XENDOMAINS_RESTORE=false

Disable as we don't save DomUs

XENDOMAINS_AUTO=/etc/xen/auto/HA1

Point to location for DomU configuration files that will be controlled by this script

XENDOMAINS_AUTO_ONLY=true

Only DomUs started via config files in XENDOMAINS_AUTO will be managed

Now we copy xendomainsHA1 to create xendomainsHA2:

root@ha1:/# cp /etc/default/xendomainsHA1 /etc/default/xendomainsHA2

And we modify xendomainsHA2 to point to correct folder for DomU configuration files:

root@ha1:/# nano /etc/default/xendomainsHA2

XENDOMAINS_AUTO=/etc/xen/auto/HA2

Now we can copy /etc/default/xendomainsHA1 and /etc/default/xendomainsHA2 to node HA2. Do that by any means you want, either file transfer or copy and paste.

On HA2 we need to modify the settings for live migration:

root@ha2:/# nano /etc/default/xendomainsHA1

XENDOMAINS_MIGRATE="ha1X --live"

root@ha2:/# nano /etc/default/xendomainsHA2

XENDOMAINS_MIGRATE="ha1X --live"

Next we need to create the two folders /etc/xen/auto/HA1 and /etc/xen/auto/HA2 as referred above. Do this on both node HA1 & HA2.

mkdir /etc/xen/auto/HA1
mkdir /etc/xen/auto/HA2

Create a symlink on both nodes in /etc/xen/auto/HA1 pointing to our test.cfg file in /etc/xen/

ln -s /etc/xen/test.cfg /etc/xen/auto/HA1/test 

Whenever creating a new DomU you need to decide if you want it by default to run on HA1 or HA2, the location of the symlink will decide that.

Remove the default xendomains script to start automatically, heartbeat will now control this for us. Do this on both nodes:

update-rc.d -f xendomains remove 

Shutdown DomU test if it is running:

xm shutdown test

Start heartbeat manually on both nodes:

/etc/init.d/heartbeat start 

Hopefully everything is fine. Try to reboot one node at a time to see that DomU test is migrated between the two nodes.

Bug in default xendomains script

There is a bug in the default xendomains script that breaks the script if there are more than one domain in the AUTO directory. This is only true for non standard configurations like mine above when XENDOMAINS_AUTO_ONLY=true in /etc/defaults/xendomains. By default this setting is false hence I guess no one picked up on it. Been too lazy to create a patch for it but I will try to get around and do that.

Solution:

The rdnames() function now looks like this:

rdnames()
{
NAMES=
if ! contains_something "$XENDOMAINS_AUTO"
then
return
fi
for dom in $XENDOMAINS_AUTO/*; do
rdname $dom
if test -z "$NAMES"; then
NAMES=$NM;
else
NAMES="$NAMES $NM";
fi
done
}

And the beginning of the stop() function look likes this:

stop()
{
# Collect list of domains to shut down
if test "$XENDOMAINS_AUTO_ONLY" = "true"; then
rdnames
fi
echo -n "Shutting down Xen domains:"
while read LN; do
parseln "$LN"
if test "$id" = "0"; then continue; fi
echo -n " $name"
found="0"
if test "$XENDOMAINS_AUTO_ONLY" = "true"; then
for i in ${NAMES[@]}
do
if test $found="0"; then
if test $i = $name; then
found=1
fi
fi
done
if test $found = "0"; then
echo -n "(skip)"
continue
fi
fi
# XENDOMAINS_SYSRQ chould be something like just "s"

This script shall now work for more than one DomU. Smile

Troubleshooting

Here is a string I use in my Dom0 to check relevant logs of the system, xen and heartbeat. Keep this running in a separate shell when testing migration and heartbeat failover:

tail -f /var/log/syslog /var/log/xen/xend.log /var/log/xen/xend-debug.log /var/log/ha-debug  

more to come about troubleshooting...

Last Updated on Tuesday, 01 March 2011 21:55  
Comments (46)
THANKS
1 Friday, 08 August 2008 12:20
kilolima
Got the link from the ubuntu forums. Thank you very much. Tried it out and it works really nice. Only had one little problem, but that was my fault Smile
Kernel panic on link failure
2 Wednesday, 03 September 2008 15:58
Federico Fanton
Hi! Thanks for your guide, it helped me greatly Mr. Green Anyway, didn't you have any issue with kernel panics on ethernet link failure? (Due to kernel 2.6.24.. http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=399981
Re: Kernel panic on link failure
3 Wednesday, 03 September 2008 18:09
Daniel
I have no issues with kernel panics as described in the bugreport at normal reboots. Have still to do some more extensive testing with heartbeat and effects of certain failures and also I am not sure the DRBD config is optimal.

Are you also running Ubuntu 8.04 Server 64 bit or something else?
Re: Kernel panic on link failure
4 Thursday, 04 September 2008 13:26
Federico Fanton
I'm running Ubuntu 8.04 Server, but the 32bit version.. Could you please try a link failure with your setup?
Steps to reproduce the crash on my system:
- Shutdown -h on the "master" node
- Wait a few minutes
- Panic!

If I ping the vm during the shutdown phase, I get just a 4-seconds gap before the connection is up again (until the kernel crashes, of course) so I think everything is set up correctly.
Many thanks for your help!
Re: Kernel panic on link failure
5 Friday, 05 September 2008 21:34
Daniel
I followed your steps, but that didn't cause my system to crash. When I checked the logs I found out that crossover link didn't go down as I had Wake on LAN enabled. So I did another test where I disconnected the crossover cable just after shutdown of the master node. But I still did not experience a crash. Let me know if you want to try something else.
Errors in xendomains script
6 Friday, 05 September 2008 21:37
Federico Fanton
For reference, there's a bug report on Launchpad for xendomains: https://bugs.launchpad.net/ubuntu/+source/xen-3.2/+bug/216761
Re: Errors in xendomains script
7 Friday, 05 September 2008 22:14
Daniel
I think I ran that patch on my system as well, but I didn't have any notes to confirm if I did or not. Should add that to my instructions. But that patch is not resolving the issue described above. Did you confirm if you have the same behavior when running more than two DomUs?
relocation
8 Friday, 05 September 2008 23:01
paras
Instead of live migration, relocating is going on when I reboot the primary node. Paras.
Re: relocation
9 Friday, 05 September 2008 23:22
Daniel
Look in your /etc/defaults/xendomains file(s) and check your line with: XENDOMAINS_MIGRATE=
You need to have --live added to that string, like this:
XENDOMAINS_MIGRATE="ha1 --live"

I see now, when looking at this line of the article using Firefox, it looks like a single long dash "-" instead of two dashes. If a manual live migration is working I guess this is your problem, or maybe your /etc/ha.d/haresources file is pointing to the wrong resource-file.
Re: Kernel panic on link failure
10 Monday, 08 September 2008 08:03
Federico Fanton
I solved the crashing problem by sheer luck Mr. Green I changed the NIC (I had a 3Com SOHO100TX, switched with a Realtek RTL-8169) and the problem went away..
Many thanks for your helpfulness Smile
Re: Errors in xendomains script
11 Monday, 08 September 2008 08:07
Federico Fanton
I didn't try with more than one DomU, but I had to apply the patch because of many scripting errors :/
re: relocation
12 Monday, 08 September 2008 15:59
Paras Pradhan
Yes manual migration is working fine. Not lively migrated when rebooted and shutdown ha1. I have checked haresources and associated files at resources.d and /etc/default/xendomains, all of them are equipped with --live option.

Can anyone tell me one more thing. Is the automatic live migration of Virtual machine possible from ha1 to ha2 if I pull the network cable immediately from ha1?

Thanks
Paras.
Re: relocation
13 Tuesday, 14 October 2008 17:27
Federico Fanton
As I understand it, live migration is a maintenance/balancing tool, not a high-availability one.. So everything must be in place for it to work, no pulled cables Smile
Expected behavior on primary node failure
14 Tuesday, 14 October 2008 17:33
Federico Fanton
I'm sorry, what's the expected behavior on primary node failure, with your setup? For example, imagine the following:
I have a Samba server on a VM, I pull the network cables from the primary node, and the VM starts on the secondary.. After a while the Samba server comes up again, and I copy a file to it. Then I re-attach the cables.
From my tests, heartbeat shuts the VM down on node2 and restarts it on node1, while DRBD goes in StandAlone mode on both nodes..
What should I do now to prevent losing the file that I copied during node1 failure?
Many thanks for your time, I'm really a HA-newbie Embarrassed
Re: Expected behavior on primary node failure
15 Thursday, 16 October 2008 07:29
Federico Fanton
I figured it out (by asking on the drbd ML actually Mr. Green ).. What I had when I reattached the cables was a split-brain situation, that's why drbd couldn't resynchronize Smile Thanks all the same!
Re: Expected behavior on primary node failure
16 Thursday, 16 October 2008 14:15
Daniel
Did you improve you configuration to prevent this state?

This is something I need to look into myself, but I have had no time so far. I'm currently not using fencing like STONITH.
Re: Expected behavior on primary node failure
17 Friday, 17 October 2008 09:00
Federico Fanton
Well I tried setting up dopd to prevent splitbrains, but it didn't work as expected.. Maybe I did something wrong, I didn't investigate the matter because I had already spent a lot of time to build the cluster :/ So for the moment I extended deadtime to one minute and wrote notes to *watch out* in case of an unplugged cable.. When there's more time I'd like to try to bind to xendomainsX a script that would check if the actual node is the off-the-net one (by pinging a router maybe) and then shutting down the VMs..
Another xendomains bug
18 Monday, 20 October 2008 10:06
Federico Fanton
In case of more than one DomU with XEN_DOMAINS_AUTO, be sure to apply the patch at http://xenbits.xensource.com/xen-unstable.hg?diff/01c8ccb551b0/tools/examples/init.d/xendomains otherwise Heartbeat won't be able to migrate VMs during "xendomains stop", and you'll end up with the VMs on both nodes Crying or Very Sad
Re: Another xendomains bug
19 Monday, 20 October 2008 19:04
Daniel
I believe you are referring to the same problem as above.
But this patch looks very neat and way simpler than mine. Did you try it and can confirm it works?
Re: Another xendomains bug
20 Tuesday, 21 October 2008 08:03
Federico Fanton
Oops, it's the same problem, you're right Embarrassed Anyway I tested the patch yesterday, works nicely.
Why not LVM on DRBD?
21 Monday, 10 November 2008 20:58
Nathan Stratton
Why not run LVM on DRBD? With this method you need to build a DRBD config for every DomU, if you move LVM up a notch you don't need to worry about that. Did you find your method to be faster? I am currently running LVM on DRBD in production for BlinkMind, http://www.blinkmind.com The only downside I found to running LVM on DRBD is the 4 TB limit.

-Nathan

Re: Another xendomains bug
22 Thursday, 04 December 2008 23:31
Federico Fanton
I think I found another bug.. If you put all your VMs inside one of the /etc/xen/auto/* dirs and leave the other one empty, during failback $NAMES (line 333) becomes empty and the script throws a syntax error (resulting in unintended migration of the VMs, in my case Wink )
I patched the scripts and wrote to xen-devel about it.
vm running on both nodes (primary/primary) after reboot
23 Tuesday, 23 December 2008 23:28
Stephan
First of all thx for the great howto - and merry chrismas Wink

I just followed your how-to line by line and everything seems to be perfect, but after a little bit more testing I run into a problem:

If I reboot ha1 (with vm test running on it), the vm will be migrated correctly to ha2. But when ha1 comes back up it starts a SECOND vm test and drbd says primary/primary.

Do you have a clue what may be wrong.

Thanks in advance!
Stephan
Re: vm running on both nodes (primary/primary) after reboot
24 Thursday, 25 December 2008 11:57
Federico Fanton
I'd try looking for clues at /var/log/ha-debug on both nodes, maybe xendomains* is getting something wrong :/
Re: Re: vm running on both nodes (primary/primary) after reboot
25 Friday, 26 December 2008 18:52
Stephan
Thanks for your answer! I think you are right with your guess that something is wrong with the xendomains* script. I found another post where somebody else has exactly the same problem. The problem was that i updated everything before this howto and now the xendomain script is somehow buggy.

May I ask for posting a running xendomains* script or sending it per email (stephanheck'AT'gmx'dot'de)? Then I could test if thats the problem.

Thanks!
Stephan
Re: vm running on both nodes (primary/primary) after reboot
26 Monday, 29 December 2008 23:39
Federico Fanton
Sure, here's the version I'm using -> http://pastebin.com/m3c9625a0
All scripts Idea
27 Sunday, 04 January 2009 11:26
Daniel
Even though I have listed all modification to the default scripts it might be a good idea to list all scripts in full. I will make a section for that and post it when I get some time over.

Cheers, Daniel
Nice howto, here is a fix
28 Tuesday, 20 January 2009 21:18
first great howto, i did as many other and run apt-get upgrade first and that broke a script, well here is a quick fix


###CUT###
--- xendomains 2008-06-04 21:21:55.000000000 +0200
+++ xendomains.fix 2008-06-04 21:23:06.000000000 +0200
@@ -183,7 +183,7 @@
{
name=`echo "$1" | cut -d\ -f1`
name=${name%% *}
- rest=`echo "$1" | cut cut -d\ -f2-`
+ rest=`echo "$1" | cut -d\ -f2-`
read id mem cpu vcpu state tm
Re: Nice howto, here is a fix
29 Monday, 23 February 2009 11:03
Daniel
Kim, this is covered in the following bug report: https://bugs.launchpad.net/ubuntu/+source/xen-3.2/+bug/216761 Haven't added this in the main HowTo but the link was provided by Frederico above. Probably I wait to update the HowTo until I perform an upgrade to Ubuntu 8.10
Script for adding domU's ?
30 Wednesday, 10 June 2009 14:02
Hi and greetings from Sweden!

I've successfully used the guide above and have it setup and running. However i wonder if anyone have a nice script or workflow to easy add new domU's to the config?

Hälsningar /Johan
Two primaries
31 Wednesday, 30 September 2009 13:42
Stas

Hi. Super help-full explanation! I followed article closely, and have only one issue - when I start the heartbeat on both servers, I'm getting the VM running on both machines - and the DRBD switches to both primary mode. Does anyone has an idea about this - or can provide a 100% working xendomains scripts? Thanks!

Stonith
32 Wednesday, 07 October 2009 10:12
vlad
Hi!
Thanks for your guide!
I have an issue:
When I take out the lan cable on first node vm's start very well on the second node, but they are starting on the first node too, because he considers himself primary. If I connect the cable i get duplicates of the vm's running on second node.
What can I do to that would first node shutdown or shutdown heartbeat and drbd that I could to start them manual?
xm migrate fails
33 Monday, 09 November 2009 23:48
linux n00b
when I get to the xm migrate step I receive the following error
Error: /usr/lib64/xen/bin/xc_save 22 1 0 0 1 failed

when I cat /var/log/xen/xend.log I see the same error on the primary and pretty much the same on the secondary
XendError: /usr/lib/xen/bin/xc_restore 16 3 1 2 0 0 0 failed

an xm save returns the same error. Any thoughts on where to look?
Added layer of iSCSI
34 Friday, 27 November 2009 00:14
Lennart Rolland

Hi! I discovered your great project just after starting to build my own xen cluster setup similar to yours. The only difference is that my storage is on a separate storage cluster (two identical servers with software raid + DRBD + LVM + Enterprise iSCSI Target + Heartbeat) and the number of xen dom0 nodes is 9 instead of 2 (11 coputers in total). I choose iSCSI as the storage interface because it is very scalable and a "safe choice" since there are a lot of large companys selling huge hardware iSCSI enabled SAN solutions. So when i grow out of my current homebuilt iSCSI san i can seamlessly migrate to a more powerful solution. I really love XEN becuase it seems to have everything I could possibly dream up. For example XEN supports live migration with iSCSI. Lovely! Anyways if you are interrested here is a little bit of info on how to set up booting your XEN Dom0 and DomUs from iSCSI with Debian Lenny: http://www.etherboot.org/wiki/sanboot/debian_lenny_iscsi I went though wuite a bit having that work in my config and i edited the wiki while workinf so its a bit messy. But it should be complete! PS: The XEN spesific info is at the bottom of the article. Good luck with your project!

Very Helpful HowTo for Xen-Cluster! Thank you!
35 Friday, 12 February 2010 10:17
Hansi H

Here you'll find too some helping advices, but not so detailed as the howto here! http://www.thomas-krenn.com/de/wiki/Kategorie:Xen Go cluster! Laughing

sudo su
36 Sunday, 24 April 2011 17:49
Ben

"sudo su"?! Surprised What have you done, my eyes are burning!! Use "sudo -i" instead. Rolling Eyes

Tremendous Job
37 Saturday, 14 January 2012 20:58
Its a marvellous how-to...

Very descriptive and cut to cut...

Thank you very much...
Heartbeat launches both domUs
38 Thursday, 05 April 2012 23:50
This tutorial has been very helpful. With some tribulations, I've got 2 nodes of xen ubuntu 11.10 dom0s, drbd, and live migration works great.

I have 2 VMs right now, I can migrate them at will.

Heartbeat though... it starts both my VMs on both nodes! Any ideas?
Re: Heartbeat launches both domUs
39 Friday, 06 April 2012 09:45
Daniel
I'm afraid I can't be of any help as I haven't had heartbeat running for a long time. I actually got issues with heartbeat after some system updates in Ubuntu but I never investigated so I don't know the root cause.
So I guess you have to check the logs and trust in google. Wink

Instead I have moved to Debian but still haven't finalised the setup, heartbeat is one outstanding thing. Preparing a tutorial for this new setup.
live migration
40 Wednesday, 30 May 2012 09:38
darya

hi i need your help. i wanna do live migration, for this i try to install centos6 on 2 pces, and then install xen on one of them , is it right until now? and after that i don't know really what should i do?

Live migration
41 Tuesday, 04 December 2012 14:45
Digvijay
Hello, I want to do TCP/UDP analysis on xen using live migration of virtual machine. I installed fedora17 on 2 PCs and then installed xen. But now what to do I am not getting enuf idea. So pls help. How to install domU on host and what to do ?
pull out the cable from the primary
42 Wednesday, 05 December 2012 23:05
Felipe Oliveira Gutierrez
Hi Daniel,

Thanks for the tutorial! It is really good!
I am trying to do live migration when I pull out the cable from the master. As others people said, this is not working. I believe I need to use pacemaker with stonith. I am finding it too hard to configure pacemaker. Do you have some links that could help?

Thanks
Felipe
Re: Live migration
43 Saturday, 15 December 2012 11:27
Daniel

Hi Digvijay, Not sure how far you came with your installation reading your post, but it sounds like you have installed and can properly boot the XEN kernel. What method did you try to install a domU? I have always used xen-tools that came with the xen packaged on ubuntu but you could also look at virt-install: http://wiki.xen.org/wiki/DomU_Install_with_Virt-Install

Re: pull out the cable from the primary
44 Saturday, 15 December 2012 12:01
Daniel
Hi Felipe,

Honestly I stopped using heartbeat after a system update that broke my configuration and I haven't been bothered redoing it. If I would implement HA again I would probably opt for HA for services running on the VM instead of HA for the VM itself. But of course that would put other requirements on the underlying storage.

A live migration when there is a failure will never work, as the source is dead there is no where to migrate from. So what should happen is that the second host will detect the failure and start the VM from fresh. But I guess that is what you meant. Very Happy

I don't have any additional information that can be of help unfortunately. If I find time I will try to configure HA again, and if I keep my current setup I will do it properly including stonith.
Ganeti : Another solution for clustering with HA
45 Monday, 21 January 2013 22:32
Ganeti is another useful tool for xen clustering with HA features. A web interface allows management of the cluster and VM creation.
Problem with auto
46 Monday, 23 September 2013 13:48
Rafal
Great tutorial. But i have one problem. I've installed it on Debian Wheezy and almost all is fine, but ln -s to /etc/xen/auto not working properly. I have one link in HA1 and two in HA2. All is fine when HA1 is down and all is on HA2. But when HA1 is back again i still has all on HA2 Sad Not back. Maybe some auto problem? I tried found an don't your fix, but i don't know where Sad Could you help me with that?