Hi! I discovered your great project just after starting to build my own xen cluster setup similar to yours. The only difference is that my storage is on a separate storage cluster (two identical servers with software raid + DRBD + LVM + Enterprise iSCSI Target + Heartbeat) and the number of xen dom0 nodes is 9 instead of 2 (11 coputers in total). I choose iSCSI as the storage interface because it is very scalable and a "safe choice" since there are a lot of large companys selling huge hardware iSCSI enabled SAN solutions. So when i grow out of my current homebuilt iSCSI san i can seamlessly migrate to a more powerful solution. I really love XEN becuase it seems to have everything I could possibly dream up. For example XEN supports live migration with iSCSI. Lovely! Anyways if you are interrested here is a little bit of info on how to set up booting your XEN Dom0 and DomUs from iSCSI with Debian Lenny: http://www.etherboot.org/wiki/sanboot/debian_lenny_iscsi I went though wuite a bit having that work in my config and i edited the wiki while workinf so its a bit messy. But it should be complete! PS: The XEN spesific info is at the bottom of the article. Good luck with your project!
xm migrate fails
3
Monday, 09 November 2009 23:48
linux n00b
when I get to the xm migrate step I receive the following error
Error: /usr/lib64/xen/bin/xc_save 22 1 0 0 1 failed
when I cat /var/log/xen/xend.log I see the same error on the primary and pretty much the same on the secondary
XendError: /usr/lib/xen/bin/xc_restore 16 3 1 2 0 0 0 failed
an xm save returns the same error. Any thoughts on where to look?
Stonith
4
Wednesday, 07 October 2009 10:12
vlad
Hi!
Thanks for your guide!
I have an issue:
When I take out the lan cable on first node vm's start very well on the second node, but they are starting on the first node too, because he considers himself primary. If I connect the cable i get duplicates of the vm's running on second node.
What can I do to that would first node shutdown or shutdown heartbeat and drbd that I could to start them manual?
Two primaries
5
Wednesday, 30 September 2009 13:42
Stas
Hi.
Super help-full explanation!
I followed article closely, and have only one issue - when I start the heartbeat on both servers, I'm getting the VM running on both machines - and the DRBD switches to both primary mode.
Does anyone has an idea about this - or can provide a 100% working xendomains scripts?
I've successfully used the guide above and have it setup and running. However i wonder if anyone have a nice script or workflow to easy add new domU's to the config?
Hälsningar /Johan
Re: Nice howto, here is a fix
7
Monday, 23 February 2009 11:03
Daniel
Kim, this is covered in the following bug report: https://bugs.launchpad.net/ubuntu/+source/xen-3.2/+bug/216761 Haven't added this in the main HowTo but the link was provided by Frederico above. Probably I wait to update the HowTo until I perform an upgrade to Ubuntu 8.10
Even though I have listed all modification to the default scripts it might be a good idea to list all scripts in full. I will make a section for that and post it when I get some time over.
Cheers, Daniel
Re: vm running on both nodes (primary/primary) after reboot
Re: Re: vm running on both nodes (primary/primary) after reboot
11
Friday, 26 December 2008 18:52
Stephan
Thanks for your answer! I think you are right with your guess that something is wrong with the xendomains* script. I found another post where somebody else has exactly the same problem. The problem was that i updated everything before this howto and now the xendomain script is somehow buggy.
May I ask for posting a running xendomains* script or sending it per email (stephanheck'AT'gmx'dot'de)? Then I could test if thats the problem.
Thanks!
Stephan
Re: vm running on both nodes (primary/primary) after reboot
12
Thursday, 25 December 2008 11:57
Federico Fanton
I'd try looking for clues at /var/log/ha-debug on both nodes, maybe xendomains* is getting something wrong :/
vm running on both nodes (primary/primary) after reboot
13
Tuesday, 23 December 2008 23:28
Stephan
First of all thx for the great howto - and merry chrismas
I just followed your how-to line by line and everything seems to be perfect, but after a little bit more testing I run into a problem:
If I reboot ha1 (with vm test running on it), the vm will be migrated correctly to ha2. But when ha1 comes back up it starts a SECOND vm test and drbd says primary/primary.
Do you have a clue what may be wrong.
Thanks in advance!
Stephan
Re: Another xendomains bug
14
Thursday, 04 December 2008 23:31
Federico Fanton
I think I found another bug.. If you put all your VMs inside one of the /etc/xen/auto/* dirs and leave the other one empty, during failback $NAMES (line 333) becomes empty and the script throws a syntax error (resulting in unintended migration of the VMs, in my case )
I patched the scripts and wrote to xen-devel about it.
Why not LVM on DRBD?
15
Monday, 10 November 2008 20:58
Nathan Stratton
Why not run LVM on DRBD? With this method you need to build a DRBD config for every DomU, if you move LVM up a notch you don't need to worry about that. Did you find your method to be faster? I am currently running LVM on DRBD in production for BlinkMind, http://www.blinkmind.com The only downside I found to running LVM on DRBD is the 4 TB limit.
-Nathan
Re: Another xendomains bug
16
Tuesday, 21 October 2008 08:03
Federico Fanton
Oops, it's the same problem, you're right Anyway I tested the patch yesterday, works nicely.
Re: Another xendomains bug
17
Monday, 20 October 2008 19:04
Daniel
I believe you are referring to the same problem as above. But this patch looks very neat and way simpler than mine. Did you try it and can confirm it works?
Well I tried setting up dopd to prevent splitbrains, but it didn't work as expected.. Maybe I did something wrong, I didn't investigate the matter because I had already spent a lot of time to build the cluster :/ So for the moment I extended deadtime to one minute and wrote notes to *watch out* in case of an unplugged cable.. When there's more time I'd like to try to bind to xendomainsX a script that would check if the actual node is the off-the-net one (by pinging a router maybe) and then shutting down the VMs..
Re: Expected behavior on primary node failure
20
Thursday, 16 October 2008 14:15
Daniel
Did you improve you configuration to prevent this state?
This is something I need to look into myself, but I have had no time so far. I'm currently not using fencing like STONITH.
Re: Expected behavior on primary node failure
21
Thursday, 16 October 2008 07:29
Federico Fanton
I figured it out (by asking on the drbd ML actually ).. What I had when I reattached the cables was a split-brain situation, that's why drbd couldn't resynchronize Thanks all the same!
Expected behavior on primary node failure
22
Tuesday, 14 October 2008 17:33
Federico Fanton
I'm sorry, what's the expected behavior on primary node failure, with your setup? For example, imagine the following:
I have a Samba server on a VM, I pull the network cables from the primary node, and the VM starts on the secondary.. After a while the Samba server comes up again, and I copy a file to it. Then I re-attach the cables.
From my tests, heartbeat shuts the VM down on node2 and restarts it on node1, while DRBD goes in StandAlone mode on both nodes..
What should I do now to prevent losing the file that I copied during node1 failure?
Many thanks for your time, I'm really a HA-newbie
Re: relocation
23
Tuesday, 14 October 2008 17:27
Federico Fanton
As I understand it, live migration is a maintenance/balancing tool, not a high-availability one.. So everything must be in place for it to work, no pulled cables
re: relocation
24
Monday, 08 September 2008 15:59
Paras Pradhan
Yes manual migration is working fine. Not lively migrated when rebooted and shutdown ha1. I have checked haresources and associated files at resources.d and /etc/default/xendomains, all of them are equipped with --live option.
Can anyone tell me one more thing. Is the automatic live migration of Virtual machine possible from ha1 to ha2 if I pull the network cable immediately from ha1?
Thanks
Paras.
Re: Errors in xendomains script
25
Monday, 08 September 2008 08:07
Federico Fanton
I didn't try with more than one DomU, but I had to apply the patch because of many scripting errors :/
Re: Kernel panic on link failure
26
Monday, 08 September 2008 08:03
Federico Fanton
I solved the crashing problem by sheer luck I changed the NIC (I had a 3Com SOHO100TX, switched with a Realtek RTL-8169) and the problem went away..
Many thanks for your helpfulness
Re: relocation
27
Friday, 05 September 2008 23:22
Daniel
Look in your /etc/defaults/xendomains file(s) and check your line with: XENDOMAINS_MIGRATE=
You need to have --live added to that string, like this:
XENDOMAINS_MIGRATE="ha1 --live"
I see now, when looking at this line of the article using Firefox, it looks like a single long dash "-" instead of two dashes. If a manual live migration is working I guess this is your problem, or maybe your /etc/ha.d/haresources file is pointing to the wrong resource-file.
relocation
28
Friday, 05 September 2008 23:01
paras
Instead of live migration, relocating is going on when I reboot the primary node. Paras.
Re: Errors in xendomains script
29
Friday, 05 September 2008 22:14
Daniel
I think I ran that patch on my system as well, but I didn't have any notes to confirm if I did or not. Should add that to my instructions. But that patch is not resolving the issue described above. Did you confirm if you have the same behavior when running more than two DomUs?
I followed your steps, but that didn't cause my system to crash. When I checked the logs I found out that crossover link didn't go down as I had Wake on LAN enabled. So I did another test where I disconnected the crossover cable just after shutdown of the master node. But I still did not experience a crash. Let me know if you want to try something else.
Re: Kernel panic on link failure
32
Thursday, 04 September 2008 13:26
Federico Fanton
I'm running Ubuntu 8.04 Server, but the 32bit version.. Could you please try a link failure with your setup?
Steps to reproduce the crash on my system:
- Shutdown -h on the "master" node
- Wait a few minutes
- Panic!
If I ping the vm during the shutdown phase, I get just a 4-seconds gap before the connection is up again (until the kernel crashes, of course) so I think everything is set up correctly.
Many thanks for your help!
Re: Kernel panic on link failure
33
Wednesday, 03 September 2008 18:09
Daniel
I have no issues with kernel panics as described in the bugreport at normal reboots. Have still to do some more extensive testing with heartbeat and effects of certain failures and also I am not sure the DRBD config is optimal.
Are you also running Ubuntu 8.04 Server 64 bit or something else?
Here you'll find too some helping advices, but not so detailed as the howto here! http://www.thomas-krenn.com/de/wiki/Kategorie:Xen Go cluster!
Hi! I discovered your great project just after starting to build my own xen cluster setup similar to yours. The only difference is that my storage is on a separate storage cluster (two identical servers with software raid + DRBD + LVM + Enterprise iSCSI Target + Heartbeat) and the number of xen dom0 nodes is 9 instead of 2 (11 coputers in total). I choose iSCSI as the storage interface because it is very scalable and a "safe choice" since there are a lot of large companys selling huge hardware iSCSI enabled SAN solutions. So when i grow out of my current homebuilt iSCSI san i can seamlessly migrate to a more powerful solution. I really love XEN becuase it seems to have everything I could possibly dream up. For example XEN supports live migration with iSCSI. Lovely! Anyways if you are interrested here is a little bit of info on how to set up booting your XEN Dom0 and DomUs from iSCSI with Debian Lenny: http://www.etherboot.org/wiki/sanboot/debian_lenny_iscsi I went though wuite a bit having that work in my config and i edited the wiki while workinf so its a bit messy. But it should be complete! PS: The XEN spesific info is at the bottom of the article. Good luck with your project!
Error: /usr/lib64/xen/bin/xc_save 22 1 0 0 1 failed
when I cat /var/log/xen/xend.log I see the same error on the primary and pretty much the same on the secondary
XendError: /usr/lib/xen/bin/xc_restore 16 3 1 2 0 0 0 failed
an xm save returns the same error. Any thoughts on where to look?
Thanks for your guide!
I have an issue:
When I take out the lan cable on first node vm's start very well on the second node, but they are starting on the first node too, because he considers himself primary. If I connect the cable i get duplicates of the vm's running on second node.
What can I do to that would first node shutdown or shutdown heartbeat and drbd that I could to start them manual?
Super help-full explanation!
I followed article closely, and have only one issue - when I start the heartbeat on both servers, I'm getting the VM running on both machines - and the DRBD switches to both primary mode.
Does anyone has an idea about this - or can provide a 100% working xendomains scripts?
Thanks!
I've successfully used the guide above and have it setup and running. However i wonder if anyone have a nice script or workflow to easy add new domU's to the config?
Hälsningar /Johan
###CUT###
--- xendomains 2008-06-04 21:21:55.000000000 +0200
+++ xendomains.fix 2008-06-04 21:23:06.000000000 +0200
@@ -183,7 +183,7 @@
{
name=`echo "$1" | cut -d\ -f1`
name=${name%% *}
- rest=`echo "$1" | cut cut -d\ -f2-`
+ rest=`echo "$1" | cut -d\ -f2-`
read id mem cpu vcpu state tm
Cheers, Daniel
May I ask for posting a running xendomains* script or sending it per email (stephanheck'AT'gmx'dot'de)? Then I could test if thats the problem.
Thanks!
Stephan
I just followed your how-to line by line and everything seems to be perfect, but after a little bit more testing I run into a problem:
If I reboot ha1 (with vm test running on it), the vm will be migrated correctly to ha2. But when ha1 comes back up it starts a SECOND vm test and drbd says primary/primary.
Do you have a clue what may be wrong.
Thanks in advance!
Stephan
I patched the scripts and wrote to xen-devel about it.
-Nathan
But this patch looks very neat and way simpler than mine. Did you try it and can confirm it works?
This is something I need to look into myself, but I have had no time so far. I'm currently not using fencing like STONITH.
I have a Samba server on a VM, I pull the network cables from the primary node, and the VM starts on the secondary.. After a while the Samba server comes up again, and I copy a file to it. Then I re-attach the cables.
From my tests, heartbeat shuts the VM down on node2 and restarts it on node1, while DRBD goes in StandAlone mode on both nodes..
What should I do now to prevent losing the file that I copied during node1 failure?
Many thanks for your time, I'm really a HA-newbie
Can anyone tell me one more thing. Is the automatic live migration of Virtual machine possible from ha1 to ha2 if I pull the network cable immediately from ha1?
Thanks
Paras.
Many thanks for your helpfulness
You need to have --live added to that string, like this:
XENDOMAINS_MIGRATE="ha1 --live"
I see now, when looking at this line of the article using Firefox, it looks like a single long dash "-" instead of two dashes. If a manual live migration is working I guess this is your problem, or maybe your /etc/ha.d/haresources file is pointing to the wrong resource-file.
Steps to reproduce the crash on my system:
- Shutdown -h on the "master" node
- Wait a few minutes
- Panic!
If I ping the vm during the shutdown phase, I get just a 4-seconds gap before the connection is up again (until the kernel crashes, of course) so I think everything is set up correctly.
Many thanks for your help!
Are you also running Ubuntu 8.04 Server 64 bit or something else?