Ubuntu

Ganeti, dealing with node failure

  • Get paged
  • Stop panicking
  • Be sure to log into the broken node to verify it actually died. If the VMs are still running correctly on it, and it’s simply a networking problem, if you proceed to bring them up again you will encounter a bad state known as ‘Split Brain’. This is difficult to recover from, so please verify the dead node is truly dead.
  • If there is more than 1 node left, try logging into the cluster IP (kvm.infra.scl1.mozilla.com vs kvm1.infra.scl1.mozilla.com)
  • If there is only 1 remaining node, voting won’t work, so ganeti-masterd will have to be started by hand:
root@vm1-1:~# ganet-masterd --no-voting

Once your master node is online, we need to set the failed node to Offline mode.

Sharp Netwalker Ubuntu Lucid Image

I wanted a recent Ubuntu distribution though, which I couldn’t find provided for me online. So I rolled my own, and am reasonably pleased with the results. All of the hardware works, although there are some annoyances, such as no battery meter in the GNOME notification area, and the wireless card isn’t supported by NetworkManager, although still works with iwconfig and wpasupplicant. Additionally, the hotkeys on the top are not bound to any programs.

Part II: The Munin-master

First off, you’re going to have to install the munin package. You’ll also need to have a web server installed, I prefer apache:

# For Gentoo,
emerge munin apache
# For CentOS/Fedora
yum install munin apache
# For Debian
aptitude install munin apache

Gentoo and CentOS both install the HTTP root in /var/www/localhost/htdocs/munin. After setup, the graphs will be available at http://localhost/munin

Debian requires a bit more attention. Debian munin installs the HTTP root in /var/www/munin. so after the setup, a vhost will have to be created to access it. To do this(on Gentoo) I created an apache vhost file, /etc/apache2/vhosts.d/08_munin.conf. On Debian, the file is the same but the path is different, /etc/apache2/sites-available/08munin

Part I: Setting up Munin-Node

Munin showed the most promise and compatibility with many of the services we run at the OSL, such as memcached and varnish. I liked how the plugin system is set up independently on each host, and that each plugin can be managed, configured, and consolidated through symlinks.

For the benefit of the uninitiated the setup on each node goes something like this:

#For gentoo
emerge munin
#For Fedora/CentOS
yum install munin-node
#For Debian/Ubuntu
apt-get install munin-node

Each client is a “node”, and runs the daemon ‘munin-node’. It’s configuration file exists at /etc/munin/munin-node.conf. This file merely tells the daemon which user to run as, which interface/port to listen to, where to log, and which munin masters are allowed to poll statistics.