Friday 8 May 2009

Monitor Data centre Temperatures with AIX servers

Temperatures inside data centre are a very important element to be monitored for any IT infrastructures. In most of IT environments, data centres have large number of servers being installed and each of these servers contribute to temperature inside the data centres. A few people knows that as processing load on servers increases , over all temperature of servers processors increases and hence forth average data centre temperature increases.
On the other hand, if you don’t have proper air conditioning and air flow infrastructure inside data centre, your business infrastructure could face severe problems in maintaining services continuous availability.

In this article, I will highlight, how you can use your AIX servers to monitor average temperature within data centre and generate email alerts when data centre‘s temperature goes up and enters in dangerous zone.


Building up the monitoring solution

While you build up temperature monitoring solutions with your AIX servers, you have to consider which AIX servers you are going to use in your solution. For this purpose, I divide available AIX servers into main categories. First category is those AIX servers which are non-HMC managed, but posses’ different kind of environmental sensors (including thermal sensors). These servers are bit old one, but still provide a reliable way of determining average temperature inside data centres.
The second category, on the other hand, comprises of relatively new pSreies servers which are managed by HMC. Although both categories of pSeries servers are capable of being part of temperature monitoring solution, I will first concentrate in developing the solution using a p440 server which belongs to first category and then use same basic technique for second category of pSeries servers.

Let’s start with old p440 server .This kind of old pSeries server has built-in environmental sensor available which can provide you valuable information about the whole environment (fan speeds and temperature of your system )

To check availability of such kind of environmental sensors , just execute following command on your pSeries server

[root@sys /] /usr/lpp/diagnostics/bin/uesensor -a
3 0 11 31 P1
9001 0 11 2100 F1
9001 1 11 2760 F2
9001 2 11 1890 F3
9001 3 11 1890 F4
9002 0 11 5129 P1
9002 1 11 3129 P1
9002 2 11 5129 P1
9002 3 11 12077 P1
9004 0 11 3 P3-V1
9004 1 11 3 P3-V2
9004 2 11 3 P3-V3


However, if your pSeries server does not support this sensor, you will get a message like following:
/home/root> /usr/lpp/diagnostics/bin/uesensor –l


Now based on this tool, I wrote following script which generates SMS to predefined mobile numbers of operations persons if the processor temperature (which closely equals to average temperature inside data centre) raises above 25 C and generates email alerts for data centre personnel’s if the data centre temperature is greater than 22 C. This script is caused to run every half hour using AIX crontab facility and generates also temperature logs data which can be used for future analysis.

------------------------------------------------------------------------------------------------
# Script Name: checktemp.sh
# Script Purpose: To monitor average temperature inside data center
# Script Author: Khurram Shiraz
-------------------------------------------------------------------------------------------
#!/bin/ksh
export NSORDER=local
dt=`date`
tmplogs=/tmp/templogs
tmpmess=/tmp/tempmess
z=`/usr/lpp/diagnostics/bin/uesensor -l | grep -p "thermal sensor" | grep Value | awk '{ print $3}'`
echo "$z at $dt" >>$tmplogs
if [[ $z -gt 25 ]]
then
echo " Data Center temperature is critical, $z at $dt" >>$tmplogs
echo " Data Center temperature is critical, $z at $dt" > $tmpmess
mail -s " Data Center Temperature" mishry@kmefic.com.kw khurram@fic.com.kw < $tmpmess
cd /home/kmetsm
./SMS ### A java program for sending SMS to predefined mobile numbers
cd –
exit 0
fi
if [[ $z -gt 22 ]]
then
echo "Data Center temperature is alarming, $z at $dt" >> $tmplogs
echo " Data Center temperature is critical, $z at $dt" > $tmpmess
mail -s " Data Center Temperature" khurram@fic.com.kw < $tmpmess
fi
exit 0


------------------------------------------------------------------------------------------------

Data generated by this script about computer room temperature will have following format.

21 at Thu May 17 15:17:11 SAUST 2007
21 at Thu May 17 15:30:00 SAUST 2007
21 at Thu May 17 16:00:00 SAUST 2007
21 at Thu May 17 16:30:00 SAUST 2007
21 at Thu May 17 17:00:00 SAUST 2007
21 at Thu May 17 17:30:00 SAUST 2007
21 at Thu May 17 18:00:00 SAUST 2007
21 at Thu May 17 18:30:00 SAUST 2007
21 at Thu May 17 19:00:00 SAUST 2007
21 at Thu May 17 19:30:00 SAUST 2007
21 at Thu May 17 20:00:00 SAUST 2007
21 at Thu May 17 20:30:00 SAUST 2007
21 at Thu May 17 21:00:01 SAUST 2007
21 at Thu May 17 21:30:00 SAUST 2007
20 at Thu May 17 22:00:00 SAUST 2007
20 at Thu May 17 22:30:00 SAUST 2007
20 at Thu May 17 23:00:00 SAUST 2007
20 at Thu May 17 23:30:00 SAUST 2007


Same type of data can be used for preparing a chart which can be presented to management for their review (or it can be also being placed on to intranet web site for internal review by operations or IT department). A simple way of doing it would be use of Microsoft Excel, which provides capability of creation of different types of charts.

Now the only question, which we will try to cover is that how we can achieve same objective, if we don’t have any old PSeries Server with environmental sensors. However, as discussed earlier, even if you have any HMC managed PSeries server , you can easily achieve same objective using sensors available on HMC.
The key to the solution, in that case, would be “lshwinfo” command which is available for execution on HMC.

The lshwinfo command displays hardware information such as temperature of the managed system:

Format of this command is as follows:


lshwinfo -r sys -e frame-name -n object-name [ | —all ] [-F < format > ] [ —help]
where:

• -r – the resource type to display. A valid value is sys for system.

• -e – the name of the frame the system is in.

• -n – the name of the object to perform the listing on. This parameter cannot be specified with -all.

• -all – list all the objects of a particular resource type. This parameter cannot be used with -n.

• -F – if specified, has a delimiter-separated list of property names to be queried. Valid values are temperature, current, voltage, power, and total_power.

So to get temperature from HMC, you can execute following command on HMC:

lshwinfo -r sys -e "frame1" -n "object name" –Ftemperature


Based on this technique, I first of all established SSH setup between any AIX server or Lpar and HMC , so that I can execute commands from one of my AIX Lpar to HMC (without any password prompt).
Following are main steps for setup of SSH between AIX Lpar and HMC.

First step would be installation of Openssh on AIX Lpar. For this purpose I used Openssh software available on Bull web site ( freeware.openssh.rte 3.8.1.0), installed it along with openssl library ( openssl 0.9.6.7).I then created a user on AIX Lpar with name of hscadmin and als ocreated same user on hmc HMC1. I assigned "Managed system profile” to hscadmin user on HMC .I also allowed remote command execution (so that HMC can allow SSH remote connections to be established with it)
On AIX lpar "aqbtest", i generated RSA key pairs by following commands
/home/root> su - hscadmin
/home/hscadmin> ssh -keygen -t rsa ( accept default values with blank passphrase )
/home/hscadmin> export hscadminkey=`cat id_rsa.pub`
/home/hscadmin> ssh hscadmin@HMC1 mkauthkeys -a / "$hscadminkey/" ( replace it with back slash while final editing )
The above command will copy public key from AIX Lpar aqbtest to HMC1. Once copied , you can also directly login to HMC as hscadmin using ssh and varify that key has been copied successfully or not by executing " cat .ssh/authorized_keys2 " command.
You should now be able to login to HMC from AIX management Lpar without any password prompt. You can verify by executing
/home/hscadmin> ssh HMC1 lsusers

which will show all users presnt on hmc.

If you face any problem while login into hmc using ssh , you can always make the authorized_keys file empty and then try again with above procedure. To make this file empty, you can follw the following command sequence on AIX management lpar
/home/hasadmin> touch /tmp/mykeyfile ( an empty file )
/home/hscadmin> scp /tmp/mykeyfile hscadmin@HMC1:.ssh/authorized_keys2

Once you tested remote prompt-less login from AIX LPAR to HMC, you can easily use following shell script to get average data centre temperature , as seen by HMC.
----------------------------------------------------------------------------------------------

#!/bin/ksh

export NSORDER=local
dt=`date`
tmplogs=/tmp/templogs
tmpmess=/tmp/tempmess
z=`ssh hscadmin@HMC " lshwinfo -r sys -e "frame1" -n "KMEobj" –Ftemperature “`
echo "$z at $dt" >>$tmplogs
if [[ $z -gt 25 ]]
then
echo " Data Center temperature is critical, $z at $dt" >>$tmplogs
echo " Data Center temperature is critical, $z at $dt" > $tmpmess
mail -s " Data Center Temperature" mishry@kmefic.com.kw khurram@fic.com.kw < $tmpmess
cd /home/kmetsm
./SMS ### A java program for sending SMS to predefined mobile numbers
cd –
exit 0
fi
if [[ $z -gt 22 ]]
then
echo "Data Center temperature is alarming, $z at $dt" >> $tmplogs
echo " Data Center temperature is critical, $z at $dt" > $tmpmess
mail -s " Data Center Temperature" khurram@fic.com.kw < $tmpmess
fi
exit 0

------------------------------------------------------------------------------------------------------------------------


Summary:
It is obvious now that data centres equipped with both old types of RISC servers as well as latest PSeries servers can easily be monitored with respect to average temperature inside these data centres. Once you get these temperature values, you can develop charts as well as you can feed these values to any small database for long term recording and analysis.


Note: This is one of my article , which was published in June 2007 issue of AIX Update.Hopefully , you would have enjoyed that with uniqueness of idea behind.

No comments:

Post a Comment

 How to Enable Graphical Mode on Red Hat 7 he recommended way to enable graphical mode on RHEL  V7 is to install first following packages # ...