Design Your Unique Solutions with Reliability and Performance: Building a highly available solution using RHEL cluster suite

“When mission critical applications fail, so does your business”. This is a true statement in today’s environments where most of the organizations are spending million of dollars in making their services available 24x7x365. Organizations, regardless of the fact, that whether they are serving external customers or internal customers are deploying high available solutions to make their applications as highly available applications.

In view of growing demand for high available solutions, now almost every IT vendor is providing high availability solutions for its specific platform. Famous commercial high availability solutions are IBM’s HACMP, Veritas cluster server and HP service guard.

If you go for any commercially sold out highly availability solution on Red-Hat Enterprise Linux, probably best choice would be Red-Hat cluster suite itself.
In early 2002 Red Hat introduced the first member of its Red Hat Enterprise Linux family of products - Red Hat Enterprise Linux AS (originally called Red Hat
Linux Advanced Server). Since then the family of products has grown steadily and now includes Red Hat Enterprise Linux ES (for entry/mid range servers) and
Red Hat Enterprise Linux WS (for desktops/workstations). These products are designed specifically for use in enterprise environments to deliver superior application support, performance, availability and scalability.
The original release of Red Hat Enterprise Linux AS, version 2.1, included a high availability clustering feature as part of the base product. This feature was not included in the smaller Red Hat Enterprise Linux ES product. However, with the success of the Red Hat Enterprise Linux family it became clear that high availability clustering was a feature that should be made available for both AS and ES server products. Consequently, with the release of Red Hat Enterprise Linux, version 3, in October 2003, the high availability clustering feature was packaged into an optional layered product, called Red Hat Cluster Suite, and certified for use on both the Enterprise Linux AS and Enterprise Linux ES products.

It should be noted down that RHEL cluster suite is a separate licensed product and it should be purchased from Red-Hat on top of Red-Hat base ES Linux license.

1.0 Red-Hat Cluster Suite Overview

The product Red-Hat cluster suite comprised of two major features, one is cluster manager which provides high availability while other feature is called IP load balancing (this feature is originally called Pirhana). Cluster Manager and IP Load Balancing (Piranha) are complementary high availability technologies that can be used separately or in combination, depending on application requirements. Both of these technologies are integrated in Red Hat Cluster Suite.

In this article, we will focus towards cluster manager as it is mainly used for building high availability solutions.

1.1 Software Components

From software components & software subsystem point of view, following are the major components of RHEL cluster manager:

Software Subsystem Component Purpose
Fence fenced To provide fencing infrastructure for specific hardware platform
DLM libdlm,dlm-kernel Contains distributed lock management (DLM) library

Cman cman Contains the Cluster Manager (CMAN), which is used for managing
cluster membership, messaging, and notification

GFS & related locks Lock_NoLock Contains shared filesystem support which can be concurrently mounted on multiple nodes
GULM gulm Contains the GULM lock management userspace tools and libraries
(an alternative to using CMAN and DLM).

Rgmanager clurgmgrd ,clustat Manages cluster services and resources

CCS ccsd , ccs_test and ccs_tool Contains the cluster configuration services daemon (ccsd) and
associated files

Cluster Configuration Tool System-config-cluster Contains the Cluster Configuration Tool, used to
graphically configure the cluster and the display of the current status of the
nodes, resources, fencing agents, and cluster services

Magma magma and magma-plugins Contains an interface library for cluster lock management and required plug-ins

IDDEV iddev Contains libraries used to identify the file system (or volume
manager) in which a device is formatted

1.2 Shared Storage & Data Integrity

Lock management is a common cluster-infrastructure service that provides a mechanism for other cluster infrastructure components to synchronize their access to shared resources. In a Red Hat cluster, DLM (Distributed Lock Manager) or alternatively GULM (grand Unified Lock Manager) are possible lock manager choices. GULM, on the other hand, is a server-based unified cluster/lock manager for GFS, GNBD, and CLVM. It can be used in place of CMAN and DLM. A single GULM server can be run in stand-alone mode but introduces a single point of failure for GFS. Three or five GULM servers can also be run together in which case the failure of 1 or 2 servers can be tolerated respectively. GULM servers are usually run on dedicated machines, although this is not a strict requirement.
In my cluster implementation, I used DLM which is a distributed lock manager and it runs in each cluster nodes. DLM is good choice for small clusters (up to two nodes) as it removes quorum requirements as imposed by GULM mechanism.
Based on DLM or GLM locking functionality, there are two basic data integrity techniques which can be used by RHEL cluster for ensuring data integrity in concurrent access environments. The traditional way is the use of clvm which works well in most of such RHEL clusters implementation with lvm based logical volumes.

Another technique is GFS. It is a cluster file system which allows a cluster of nodes to simultaneously access a block device that is shared among the nodes. It employs distributed metadata and multiple journals for optimal operation in a cluster. To maintain file system integrity, GFS uses lock manager (DLM or GULM) to coordinate I/O. When one node changes data on a GFS file system, that change is immediately visible to the other cluster nodes using that file system.

Hence when you are implementing a RHEL cluster with concurrent data access requirement (like in case of Oracle RAC implementation), you can use either GFS or clvm. In most of implementation of such Red-Hat cluster implementation, GFS is used with direct access configuration to shared SAN from all cluster nodes. However, for same purpose, you can also deploy GFS in a cluster that is connected to a LAN with servers that use GNBD (Global Network Block Device) or to iSCSI (Internet Small Computer System Interface) devices.
It must be noted down that both GFS and CLVM use locks from the lock manager. However GFS uses locks from the lock manager to synchronize access to file system metadata (on shared storage), while CLVM uses locks from the lock manager to synchronize updates to LVM volumes and volume groups (also on shared storage).
For non –concurrent RHEL cluster implementations, you can rely on CLVM or can use native RHEL Journaling based techniques (like ext2 and ext3). As for non-concurrent access clusters, data integrity issues are minimal, I tried to keep simplicity within my cluster implementations by using native RHEL OS techniques.

1.3 Fencing Infrastructure

Fencing is also an important component of every RHEL based cluster implementation. Main purpose of fencing implementation is to ensure data integrity in clustered environment.
Infact, to ensure data integrity, only one node can run a cluster service and access cluster-service data at one time. The use of power switches in the cluster hardware configuration enables a node to power-cycle another node before restarting that node's cluster services during the failover process. This prevents any two systems from simultaneously accessing the same data and corrupting it. It is strongly recommended that fence devices (hardware or software solutions that remotely power, shutdown, and reboot cluster nodes) are used to guarantee data integrity under all failure conditions. Software based Watchdog timers are an alternative used to ensure correct operation of cluster service failover, however in most of RHEL cluster implementation, hardware fence devices are used, like HP ILO, APC power switches, IBM Blade center devices and Bull Novascale Platform Administration Processor (PAP) Interface.
It must be noted down, that for RHEL cluster solutions with shared storage, and implementation of fence infrastructure is a mandatory requirement.

2.0 Step-by Step Implementation of RHEL cluster

Implementation of RHEL clusters start with selection of proper hardware and their connectivity. In most of implementations ( without IP load balancing ) , a shared storage is used with two or more than two servers running RHEL operating system and RHEL cluster suite.
A proper designed cluster, no matter you are building RHEL based cluster or IBM HACMP based cluster should not contain any single point of failures. Keeping this fact in your mind, you have to remove any single point of failures from your cluster design. For this purpose, you can place your servers physically in two separate Racks with redundant power supplies. You also have to remove SPOF from network infrastructure used for cluster. Ideally speaking, you should have at least two network adapters on each cluster node and two network switches should be used for building network infrastructure for cluster implementation.

2.1 Software Installation

Building up RHEL cluster starts with installation of RHEL on two cluster nodes. In my setup I have two HP Proliant servers (DL740) servers with shared fiber storage ( HP MSA 1000 storage ).
I started with RHEL V4 installation on both nodes. It is always better to install latest available operating system version and its update. I selected V4 update 4 (which was latest version of RHEL while I was building that cluster). If you have a valid software subscription from Red-Hat, you can login to Red-Hat network and go to software channels to download latest update available there. Later, once you download ISO images, you can burn it to CDs using any appropriate software.
During RHEL OS installation, you will be gone through various configuration selections, most important of them are date and time zone configuration, Root user password setting, firewall settings and OS security level selection. Another important configuration option is Network settings; these settings can be left for a later stage especially in building a high availability solution with ether-channel (or ethernet bonding configuration).

After OS installation, it is always nice idea to go for necessary drivers and hardware support packages installation, In my case, as HP hardware platform was used, so I proceeded for downloading necessary RHEL support package for DL 740 servers (HP Proliant Support pack which is available from http://h18004.www1.hp.com/products/servers/linux/dl740-drivers-cert.html).

Next step would be installation of cluster software package itself. This package is again available from RHEL network and you definitely have to select latest available cluster package over there. I selected rhel-cluster-2.4.0.1 for my setup which was latest cluster suite available at that time.
Once downloaded, it will be in tar format. Extract it and then install following rpms at least so that RHEL cluster with DLM can be installed and configured.

Magma and magma-plugins
Perl-net-telnet
Rgmanager
System-config-cluster
DLM & dlm kernel
DLM-kernel-hugemem & SMP support for DLM
Iddev and ipvsadm
Cman , cman-smp, cman-hugemem & cman-kernelheaders
Ccs

It is always a good idea to restart both RHEL cluster nodes after installation of vendor related hardware support drivers and RHEL cluster suite.

2.2 Network Configuration

For network configuration, best way to use is network configuration GUI. However, if you plan to use ethernet channel bonding, configuration steps would be slightly different.

Ethernet channel bonding allows for a fault tolerant network connection by combining two Ethernet devices into one virtual device. The resulting channel bonded interface ensures that in the event that one Ethernet device fails, the other device will become active. Ideally speaking, connections from these ethernet devices should go to separate ethernet switches or hubs so that SPOF would be eliminated even on ethernet switches and hubs level.
To configure two network devices for channel bonding, perform the following on node1:
1. Create bonding devices in /etc/modules.conf. For example, I used following commands on each cluster node
alias bond0 bonding
options bonding miimon=100 mode=1
2. This loads the bonding device with the bond0 interface name, as well as passes options to the bonding driver to configure it as an active-backup master device for the enslaved network interfaces.
3. Edit the /etc/sysconfig/network-scripts/ifcfg-eth0 configuration file for both eth0 and /etc/sysconfig/network-scripts/ifcfg-eth1 for eth1 interface so that these files show identical contents as shown below
DEVICE=ethx
USERCTL= no
ONBOOT=yes
MASTER=bond0
SLAVE=yes
BOOTPROTO=none
4. This will enslave ethX (replace X with the assigned number of the Ethernet devices) to the bond0 master device.
5. Create a network script for the bonding device (for example, /etc/sysconfig/network-scripts/ifcfg-bond0), which would appear like the following example:
DEVICE=bond0
USERCTL=no
ONBOOT=yes
BROADCAST=172.16.2.255
NETWORK=172.16.2.0
NETMASK=255.255.255.0
GATEWAY=172.16.2.1
IPADDR=172.16.2.182
6. Reboot the system for the changes to take effect.

Similarly on node 2, repeat the same steps with the only difference that file /etc/sysconfig/network-scripts/ifcfg-bond0 will contain IPADDR entry with value of 172.16.2.183.

As a result of these configuration steps, you would finish with two RHEL cluster nodes with IP address of 172.16.2.182 and 172.16.2.183 which has been assigned to virtual ethernet channels (underlying two physical ethernet adapters for each ethernet channel).

Now you can easily use Network configuration GUI on cluster nodes to set other network configuration details like hostname and primary/secondary DNS server configuration. I set Commsvr1 and Commsvr2 as hostnames for cluster nodes and also ensure that name resolution in both long name and short names should work fine from both DNS server as well as /etc/hosts file.

It should be noted down that RHEL cluster, by default uses /etc/hosts for node names resolution. The cluster node name needs to match the output of uname -n or the value of HOSTNAME in /etc/sysconfig/network.

############################################################
Contents of /etc/hosts file in each server are as follows:
############################################################
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1 localhost.localdomain localhost
172.16.2.182 Commsvr1 Commsvr1.kmefic.com.kw
172.16.2.183 Commsvr2
172.16.1.186 Commilo1 Commilo1.kmefic.com.kw
172.16.1.187 Commilo2 Commilo2.kmefic.com.kw
172.16.2.188 Commserver
192.168.10.1 node1
192.168.10.2 node2
172.16.2.4 KMETSM

################################################################

If you have an additional ethernet interface in each cluster node, it is always a better idea to configure separate IP network as additional network for heart beats between cluster nodes. It is important that RHEL cluster uses, by default eth0 on cluster nodes for heart beats. However, it is still possible to use other interfaces for additional heartbeat exchanges.
For this type of configuration, you can simply use network configuration GUI to assign ip addresses like 192.168.10.1 and 192.168.10.2 on eth2 and get it resolved from /etc/hosts file.

2.3 Setup of Fencing Device

In my case, as HP hardware was being used, I relied on configuration of HP ILO devices as fencing device for my cluster. You may however consider configuration of other fencing devices, depending upon hardware type being used for your cluster configuration.

To configure HP ILO, you have to reboot your servers and press F8 key to enter into ILO configuration menus. Basic configuration is relatively simpler; you have to just assign Ip addresses to ILO devices with name of ILO device. I assigned 172.16.1.100 with Commilo1 as name of ILO device on node1 and 172.16.1.101 with Commilo2 as ILO device name on node2.Be sure , however , to connect ethernet cables to ILO adapters which are usually marked noticeably on back side of HP servers.
Once rebooted, you can use browsers on your Linux servers to access ILO devices. Default username is Administrator with password, which is usually available on hard copy tag associated physically with HP servers. Later, you can change Administrator password to password of your own choice, using same web based ILO administration interface.

2.4 Setup of Shared storage drive and Quorum partitions

In my environment for cluster setup, I used an HP Fiber based shared storage MSA1000. I configured a RAID-1 of 73.5 GB size using HP smart array utility and then assign to both of my cluster nodes using selective host presentation feature.
After rebooting both nodes, I used HP fiber utilities like hp_scan so that both servers should be able to see this array physically.
For verification of physical availability of shared storage to both cluster nodes , you can look into /dev/proc/proc file and look for any entry like /dev/sda or /dev/sdb depending upon your environment.
Once you find your shared storage on OS level, you have to partition it according to your cluster storage requirements. I used parted tool on one of my cluster node to partition the shared storage.

I created two small primary partitions to hold raw devices, while third primary partition was created to hold shared data filesystem

Parted> select /dev/sda

Parted > mklabel /dev/sda msdos

Parted > mkpart primary ext3 0 20

Parted > mkpart primary ext3 20 40

Parted > mkpart primary ext3 40 40000

I rebooted both cluster nodes, followed by creation of /etc/sysconfig/rawdevices file with following contents:
#####################################################################
/dev/raw/raw1 /dev/sda1
/dev/raw/raw2 /dev/sda2
#######################################################################
A restart of rawdevices services on both nodes will configure raw devices as quorum partitions

/home/root> services rawdevices restart

I then crated a JFS2 filesystem on third primary partition using mke2jfs command, however its related entry should not be putted in /etc/fstab file on either cluster nodes as this shared filesystem will be under the control of Rgmanager of cluster suite.

/home/root> mke2jfs –j –b 4096 /dev/sda3

You can now create a directory structure called /shared/data on both nodes and then verify accessibility of shared filesystem from both cluster nodes by mounting that filesystem one by one at each cluster node ( mount /dev/sda3 /shared/data). However, never try to mount this filesystem on both cluster nodes simultaneously as it might corrupt the filesystem itself.

2.5 Cluster Configuration

Now almost everything required for cluster infrastructure has been done so next step would be configuration of cluster itself.

A RHEL cluster can be configured in many ways. However, easiest way to configure a RHEL cluster is to use RHEL GUIsystem management -> cluster management.-> create a cluster
I created a cluster with Cluster Name Commcluster with node names of Commsvr1 and Commsvr2.
I added fencing to both nodes fencing devices Commilo1 and Commilo2 respectively, so that each node should have one fence level with one fence device. In your environment, if you have multiple fence devices, you can add another fence level with more fence devices to each node.
I also added a shared IP address of 172.16.2.188, which will be used as the service IP address for this cluster. This is the IP address which should also be used as service IP address for applications or databases (like for listener configuration, if you are going to use Oracle database in cluster).
I added a failover domain namely Kmeficfailover with priorities given in following sequence:
Commsvr1
Commsvr2

I added a service called CommSvc and then put that service in the above defined failover domain. Next step would be addition of resources to this service. I added a private resource of filesystem type which have the characteristic of device=/dev/sd3, mount point of /shared/data and mount type of ext3.
I also added a private resource of script type (/root/CommS.sh) to service CommSvc. This script will start my C-based application and therefore have to be present in /root directory on both cluster nodes. It is very important to have correct ownership of root & security otherwise you may expect unpredictable behavior during cluster startup and shutdown.
It should be noted down that Application or database startup and shutdown scripts are very important for proper functionalities of RHEL based cluster. RHEL cluster uses same scripts for providing application/database monitoring and high availability so every application script to be used in RHEL cluster should have a specific format.
All such scripts at least should have a start & stop subsections along with status subsection. In case, when application or database is available and running, status subsection of script should return a return value of 0; while in case when application is not running and available , it should return a return value of 1. The script should also contain a restart subsection in which services are tried to be restarted if application is found to be dead.
It should be noted that RHEL cluster always tries to restart application on same node which is previous owner of application, before trying to move that application to other cluster node.
A sample application script, which was used in my RHEL cluster implementation (to provide high availability to a legacy C-based application), is as follows:

##################################################################
#Script Name: CommS.sh
#Script Purpose:To provide application start/stop/status under Cluster
#Script Author: Khurram Shiraz
##################################################################
#!/bin/sh
basedir=/home/kmefic/KMEFIC/CommunicationServer
case $1 in
'start')
cd $basedir
su kmefic -c "./CommunicationServer -f Dev-CommunicationServer.conf"
exit 0
;;
'stop')
z=`ps -ef | grep Dev-CommunicationServer | grep -v "grep"| awk ' { print $2 } '
`
if [[ $? -eq 0 ]]
then
kill -9 $z
fuser -mk /home/kmefic
exit 0
fi
;;
'restart')
/root/CommunicationS.sh stop
sleep 2
echo Now starting......
/root/CommunicationS.sh start
echo "restarted"
;;

'status')
ps -U kmefic | grep CommunicationSe 1>/dev/null
if [[ $? = 0 ]]
then
exit 0
else
exit 1
fi
;;
esac

################################################################

Finally , you have to add shared IP address ( 172.16.2.188) to service present in our failover domain so that the service conclusively should contain three resources ( two private resources (one filesystem and one script) along with one shared resource which is service IP address for cluster).

Last step would be synchronization of cluster configuration across the cluster nodes. RHEL cluster administration & configuration tool provides an option “save configuration to cluster” but it will appear once you start cluster services. Hence for the first time synchronization, it is better to send cluster configuration file manually to all cluster nodes. You can easily use scp command to synchronize /etc/cluster/cluster.conf file across the cluster nodes.

/home/root> scp /etc/cluster/cluster.conf Commsvr2:/etc/cluster/cluster.conf

Once synchronized, you can start cluster services on both cluster nodes. There is a special sequence in which you should start and stop RHEL related cluster services.
To start:
service ccsd start
service cman start
service fenced start
service rgmanager start
To stop:
service rgmanager stop
service fenced stop
service cman stop
service ccsd stop

( please note that if you use GFS then startup/shutdown of gfs and clvmd services have to be included in this sequence)

I, therefore, prepared three simple shell scripts which can start and stop RHEL cluster services as well as can also give status information about cluster services. These shell scripts are as follows

2.6 Additional Considerations

In my environment, I decided to not to start cluster services at RHEL boot time and not to shutdown these services automatically at the time of shutdown of RHEL box. However, if you want depending upon 24x7 services availability requirement of your business , you can easily do this by using chkconfig command.

Another consideration is to log cluster messages into a different log file. By default, all cluster messages goes into RHEL log messages file (/var/log/messages), which makes cluster troubleshooting somewhat difficult in some scenarios.
For this purpose , I edited /etc/syslog.conf file to enable the cluster to log events to a file that is different from the default log file and add following line

daemon.* /var/log/cluster

To apply this change, I restarted syslogd with the service syslog restart command. Another important step would be specification of time period for rotation of cluster log files. This can be done by specifying name of cluster log file in /etc/logrotate.conf file (default is weekly rotation).
-------------------------------------------------------------------------------------------------
/var/log/messages /var/log/secure /var/log/maillog /var/log/spooler
/var/log/boot.log /var/log/cron /var/log/cluster {
sharedscripts
postrotate
/bin/kill -HUP `cat /var/run/syslogd.pid 2> /dev/null` 2>
/dev/null || true
endscript
}

-------------------------------------------------------------------------------------------------

You also have to give special attention to keep UIDs and GIDs synchronized across cluster nodes. This is important in the sense that proper permissions should be maintained especially with reference to shared data filesystem.

Grub also need to configure to suite environment specific needs. For instance, many system administrators, in RHEL cluster environment reduces GRUB selection timeout to some lower values like 2 seconds to accelerate system restart times.

3.0 Databases Integration with RHEL cluster

Same RHEL clustered infrastructure can be used for providing high availability to databases like Oracle, Mysql and IBM DB2.
Most important thing is to remember to base your database related services on shared IP address, for example you have to configure Oracle listener based on shared service IP address.
In the last section of this article I will guide you through simple steps to demonstrate how you can use an already configured RHEL cluster for providing high availability to Mysql database server which is no doubt one of the most commonly used database on RHEL.

I am assuming that mysql related rpms are installed on both cluster nodes and RHEL cluster is already configured with service IP address of 172.16.2.188.
Now proceeding ahead, you have to simply define a failover domain using cluster configuration tool (with cluster node of your choice, having high priority than other one). This failover domain will have mysql service , which in turn will have two private resources and one shared resource ( service ip address).

Out of private resources, one of the private resources should be of filesystem type (in my configuration, having a mount point of /shared/mysqld) while other private resource should be of script type, pointing towards /etc/init.d/mysql.server script. Contents of this script are as follows (which should be available on both cluster nodes)

##################################################################

#!/bin/sh
# Copyright Abandoned 1996 TCX DataKonsult AB & Monty Program KB & Detron HB
# This file is public domain and comes with NO WARRANTY of any kind

# MySQL daemon start/stop script.

# Usually this is put in /etc/init.d (at least on machines SYSV R4 based
# systems) and linked to /etc/rc3.d/S99mysql and /etc/rc0.d/K01mysql.
# When this is done the mysql server will be started when the machine is
# started and shut down when the systems goes down.

# Comments to support chkconfig on RedHat Linux
# chkconfig: 2345 64 36
# description: A very fast and reliable SQL database engine.
###################################################################
# Comments to support LSB init script conventions
### BEGIN INIT INFO
# Provides: mysql
# Required-Start: $local_fs $network $remote_fs
# Required-Stop: $local_fs $network $remote_fs
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Short-Description: start and stop MySQL
# Description: MySQL is a very fast and reliable SQL database engine.
### END INIT INFO

# If you install MySQL on some other places than /usr/local/mysql, then you
# have to do one of the following things for this script to work:
#
# - Run this script from within the MySQL installation directory
# - Create a /etc/my.cnf file with the following information:
# [mysqld]
# basedir=
# - Add the above to any other configuration file (for example ~/.my.ini)
# and copy my_print_defaults to /usr/bin
# - Add the path to the mysql-installation-directory to the basedir variable
# below.
#
# If you want to affect other MySQL variables, you should make your changes
# in the /etc/my.cnf, ~/.my.cnf or other MySQL configuration files.

# If you change base dir, you must also change datadir. These may get
# overwritten by settings in the MySQL configuration files.

basedir=
datadir=

# The following variables are only set for letting mysql.server find things.

# Set some defaults
pid_file=
server_pid_file=
use_mysqld_safe=1
user=mysql
if test -z "$basedir"
then
basedir=/usr/local/mysql
bindir=./bin
if test -z "$datadir"
then
datadir=/shared/mysqld/data
#datadir=/usr/local/mysql/data
fi
sbindir=./bin
libexecdir=./bin
else
bindir="$basedir/bin"
if test -z "$datadir"
then
datadir="$basedir/data"
fi
sbindir="$basedir/sbin"
libexecdir="$basedir/libexec"
fi

# datadir_set is used to determine if datadir was set (and so should be
# *not* set inside of the --basedir= handler.)
datadir_set=

#
# Use LSB init script functions for printing messages, if possible
#
lsb_functions="/lib/lsb/init-functions"
if test -f $lsb_functions ; then
source $lsb_functions
else
log_success_msg()
{
echo " SUCCESS! $@"
}
log_failure_msg()
{
echo " ERROR! $@"
}
fi

PATH=/sbin:/usr/sbin:/bin:/usr/bin:$basedir/bin
export PATH

mode=$1 # start or stop

case `echo "testing\c"`,`echo -n testing` in
*c*,-n*) echo_n= echo_c= ;;
*c*,*) echo_n=-n echo_c= ;;
*) echo_n= echo_c='\c' ;;
esac

parse_server_arguments() {
for arg do
case "$arg" in
--basedir=*) basedir=`echo "$arg" | sed -e 's/^[^=]*=//'`
bindir="$basedir/bin"
if test -z "$datadir_set"; then
datadir="$basedir/data"
fi
sbindir="$basedir/sbin"
libexecdir="$basedir/libexec"
;;
--datadir=*) datadir=`echo "$arg" | sed -e 's/^[^=]*=//'`
datadir_set=1
;;
--user=*) user=`echo "$arg" | sed -e 's/^[^=]*=//'` ;;
--pid-file=*) server_pid_file=`echo "$arg" | sed -e 's/^[^=]*=//'` ;;
--use-mysqld_safe) use_mysqld_safe=1;;
--use-manager) use_mysqld_safe=0;;
esac
done
}

parse_manager_arguments() {
for arg do
case "$arg" in
--pid-file=*) pid_file=`echo "$arg" | sed -e 's/^[^=]*=//'` ;;
--user=*) user=`echo "$arg" | sed -e 's/^[^=]*=//'` ;;
esac
done
}

wait_for_pid () {
i=0
while test $i -lt 35 ; do
sleep 1
case "$1" in
'created')
test -s $pid_file && i='' && break
;;
'removed')
test ! -s $pid_file && i='' && break
;;
*)
echo "wait_for_pid () usage: wait_for_pid created|removed"
exit 1
;;
esac
echo $echo_n ".$echo_c"
i=`expr $i + 1`
done

if test -z "$i" ; then
log_success_msg
else
log_failure_msg
fi
}

# Get arguments from the my.cnf file,
# the only group, which is read from now on is [mysqld]
if test -x ./bin/my_print_defaults
then
print_defaults="./bin/my_print_defaults"
elif test -x $bindir/my_print_defaults
then
print_defaults="$bindir/my_print_defaults"
elif test -x $bindir/mysql_print_defaults
then
print_defaults="$bindir/mysql_print_defaults"
else
# Try to find basedir in /etc/my.cnf
conf=/etc/my.cnf
print_defaults=
if test -r $conf
then
subpat='^[^=]*basedir[^=]*=$.*$$'
dirs=`sed -e "/$subpat/!d" -e 's//\1/' $conf`
for d in $dirs
do
d=`echo $d | sed -e 's/[ ]//g'`
if test -x "$d/bin/my_print_defaults"
then
print_defaults="$d/bin/my_print_defaults"
break
fi
if test -x "$d/bin/mysql_print_defaults"
then
print_defaults="$d/bin/mysql_print_defaults"
break
fi
done
fi

# Hope it's in the PATH ... but I doubt it
test -z "$print_defaults" && print_defaults="my_print_defaults"
fi

#
# Read defaults file from 'basedir'. If there is no defaults file there
# check if it's in the old (depricated) place (datadir) and read it from there
#

extra_args=""
if test -r "$basedir/my.cnf"
then
extra_args="-e $basedir/my.cnf"
else
if test -r "$datadir/my.cnf"
then
extra_args="-e $datadir/my.cnf"
fi
fi

parse_server_arguments `$print_defaults $extra_args mysqld server mysql_server mysql.server`

# Look for the pidfile
parse_manager_arguments `$print_defaults $extra_args manager`

#
# Set pid file if not given
#
if test -z "$pid_file"
then
pid_file=$datadir/mysqlmanager-`/bin/hostname`.pid
else
case "$pid_file" in
/* ) ;;
* ) pid_file="$datadir/$pid_file" ;;
esac
fi
if test -z "$server_pid_file"
then
server_pid_file=$datadir/`/bin/hostname`.pid
else
case "$server_pid_file" in
/* ) ;;
* ) server_pid_file="$datadir/$server_pid_file" ;;
esac
fi

# Safeguard (relative paths, core dumps..)
cd $basedir

case "$1" in
start)
# Start daemon
manager=$bindir/mysqlmanager
if test -x $libexecdir/mysqlmanager
then
manager=$libexecdir/mysqlmanager
elif test -x $sbindir/mysqlmanager
then
manager=$sbindir/mysqlmanager
fi

echo $echo_n "Starting MySQL"
if test -x $manager -a "$use_mysqld_safe" = "0"
then
# Give extra arguments to mysqld with the my.cnf file. This script may
# be overwritten at next upgrade.
$manager --user=$user --pid-file=$pid_file >/dev/null 2>&1 &
wait_for_pid created

# Make lock for RedHat / SuSE
if test -w /var/lock/subsys
then
touch /var/lock/subsys/mysqlmanager
fi
elif test -x $bindir/mysqld_safe
then
# Give extra arguments to mysqld with the my.cnf file. This script
# may be overwritten at next upgrade.
pid_file=$server_pid_file
$bindir/mysqld_safe --datadir=$datadir --pid-file=$server_pid_file >/dev/null 2>&1 &
wait_for_pid created

# Make lock for RedHat / SuSE
if test -w /var/lock/subsys
then
touch /var/lock/subsys/mysql
echo "mysql.server" > /var/lock/mysql
fi
else
log_failure_msg "Couldn't find MySQL manager or server"
fi
echo "I was here `date`" >> /var/log/rhcs.debug
;;

stop)
# Stop daemon. We use a signal here to avoid having to know the
# root password.

# The RedHat / SuSE lock directory to remove
lock_dir=/var/lock/subsys/mysqlmanager

# If the manager pid_file doesn't exist, try the server's
if test ! -s "$pid_file"
then
pid_file=$server_pid_file
lock_dir=/var/lock/subsys/mysql
fi

if test -s "$pid_file"
then
mysqlmanager_pid=`cat $pid_file`
echo $echo_n "Shutting down MySQL"
kill $mysqlmanager_pid
echo "stopped" > /var/lock/mysql
# mysqlmanager should remove the pid_file when it exits, so wait for it.
wait_for_pid removed

# delete lock for RedHat / SuSE
if test -f $lock_dir
then
rm -f $lock_dir
fi
else
log_failure_msg "MySQL manager or server PID file could not be found!"
fi
;;

restart)
# Stop the service and regardless of whether it was
# running or not, start it again.
$0 stop
$0 start
;;

reload)
if test -s "$server_pid_file" ; then
mysqld_pid=`cat $server_pid_file`
kill -HUP $mysqld_pid && log_success_msg "Reloading service MySQL"
touch $server_pid_file
else
log_failure_msg "MySQL PID file could not be found!"
fi
;;

status)
(mysql -e "select 1" > /var/mysql)||exit 1
state=`cat /var/mysql|head -1`
#echo "state is $state"
#if [$state=1]; then
if [ "$state" = "1" ]; then
touch /var/lock/subsys/mysql
echo "mysql.server" > /var/lock/mysql
cat /var/lock/mysql
exit 0
fi
;;
*)
# usage
echo "Usage: $0 start|stop|status|restart|reload"
exit 1
;;
esac

########################################################################

As you can see, this script is setting data directory to /shared/mysqld/data , which is infact available on our shared RAID array and should be available from both cluster nodes.

Testing for high availability of mysql database can easily be carried out with help of any mysql client. I used “SQLyog” which is windows based mysql client, connect to mysql database on Commsvr1 and then crashed this cluster node using “halt command”. As a result of this system crash, RHEL cluster events triggered and mysql database automatically restarted on Commsvr2. This whole failover process took one to two minutes and happened quite seamlessly.

Summary: RHEL clustering technology, no doubt provides a reliable high available infrastructure which can be used for meeting 24x7 business requirements for databases as well as legacy applications. The most important thing to be kept in minds is that it is always better to plan carefully before actual implementation and nevertheless test thoroughly your cluster and all possible failover scenarios before going live with RHEL cluster. A well documented cluster test plan can also be very much helpful in this regard.

About Author: Khurram Shiraz is senior system Administrator at KMEFIC, Kuwait. In his eight years of IT experience, he worked mainly with IBM technologies and products especially AIX, HACMP Clustering, Tivoli and IBM SAN/ NAS Storage. He also has worked with IBM Integrated Technology Services group. His area of expertise includes design and implementation of high availability and DR solutions based on pSeries, Linux and windows infrastructure. He can be reached at aix_tiger@yahoo.com.

Note: This article is one of my printed article which was published in Linux Journal October 2007.

Design Your Unique Solutions with Reliability and Performance

Thursday, 9 July 2009

Building a highly available solution using RHEL cluster suite

No comments:

Post a Comment

Total Pageviews

Followers