Wednesday, 8 April 2009

Using FlashCopy with AIX for online backups

Automated Online Backup solution using FlashCopy in AIX Environment

Design and implementation of a fool proof backup strategy has been an important topic for companies over the years. With the growth of data ( like Terabytes ) in recent years companies are now looking forward to have such backup solutions which are not only fool proof but also capable of completing whole backup process in shortest possible period of time ( no matter what is the size of data itself).

Think of a bank which has to complete its end of day operations daily before 8:00 AM in the morning so that next day business can normally start. Usually in such environments, end of day processes are always accompanied with “before end of day” and “after end of day “backup operations. If the data size of such organization is in Terabytes, it is really very difficult to complete these processes within few hours unless some “snapshot” techniques are used.

IBM storage solutions comprising of all high end storages ( like DS4300,DS6800 & DS8000) comes with advance feature of “Flash Copy” which help customers to meet their business needs. This feature is in fact a data snapshot technique on storage hardware level which copies data bit by bit. Flash copy feature also supports incremental flash copy operations which are in deed much faster than normal flash copy operations. Keep in mind that when we talk about even normal flash operations these are too quick that whole consistent snapshot of Terabytes of data may be made available in less than one or two minute’s time.

The only thing which we should keep in our mind is to make sure consistency with respect to database and operating system level as these snapshots are done on hardware or storage level. This article describes procedures to use FlashCopy feature for consistent and automated backup operations in AIX environment.

Technical Review of FlashCopy Feature

IBM FlashCopy technique provides an instant point-in-time copy of Luns present on DS Storage subsystems. The point-in-time copy function gives an instantaneous copy or ‘view’ of the original data at a specific point in time. This is also known as the T0 (Time Zero) copy of original data.

When FlashCopy is invoked, the command returns to the operating system as soon as the FlashCopy pair relationship has been established and the necessary control bitmaps has been established. This proc takes only a few seconds to complete. Thereafter, we have acc to a T0 copy of the original logical volumes. As soon as the relationship of both copies has been established, read and write operations can be done on both the source and target volumes. So one of the great advantage of using FlashCopy is that source data remains online and available for users ( although writes are required to be temporarily suspended on application or database level in order to ensure data consistency) during FlashCopy operation. Similarly as this operation is done on storage or hardware level so it does not impact servers performances and usually completes in fraction of seconds (regardless of size of the data which may be terabytes in some scenarios)

Due to all these benefits, the point-in-time copy created by FlashCopy is typically used when a copy of the production system is needed with minimum downtime. This state of technology feature is also used for fast online backup of production systems with minimal impact to system performance. Below is an illustration of FlashCopy concept.

Reference: IBM White paper “Storage Solutions for Oracle Database:

Snapshot Backup and Recovery with IBM Total Storage Enterprise Storage Server”

Enabling and activating FlashCopy Feature

FlashCopy is a premium feature that can be purchased separately with IBM DS4000/DS6000 and DS8000 series Storage boxes. Although ways of using this feature differ from storage to storage, basic technology used behind this feature is same.

Obtaining a Feature Key File for all premium features including FlashCopy also varies depending upon DS4000 packaging procedures for the country where the storage box was purchased and time of order:

_ If you bought any premium feature together with the DS4000, the feature key file might be included in the installation package (usually on CDs)

_ If no Feature Key File has been supplied on the installation media and only proof of license is supplied, you can generate a key using feature enabling identifier present on proof of license card and serial number of storage box on the Web at:

https://www-912.ibm.com/PremiumFeatures

Reference: IBM Red Book DS4000 Series, Storage Manager and Copy Services

Possible Operations with FlashCopy drives

There are four possible states with FlashCopy drives which are created with DS4000 Storage subsystems FlashCopy operations. These operations are creation, deletion, recreation (or so called enabling) and disabling of FlashCopy operations.

You can create FlashCopy logical drives either through the Create FlashCopy Logical Drive

Wizard or by using the command line interface (CLI) with the create command. The latter can

be scripted to support automatic operations. This operation will also create a repository drive associated with FlashCopy LUN.

It is usually recommended to stop application access to base logical drive and unmount it before creating FlashCopy drive in order to ensure consistency. So practically speaking , unmounting of base logical drive is not possible especially in a 24x7 environment so only stopping write access to base logical drive during FlashCopy creation operation suffice in most cases.

After creation, FlashCopy drive has to be assigned to host using logical drive-to-host mappings present in Mappings View of the Subsystem Management window of fast storage manager software.

Deletion process simply deletes FlashCopy drive and associated repository drive. This process also deletes hosts mappings for FlashCopy drive without having any impact on IO or access by host to base logical drive.

Disabling operation for a FlashCopy drive is a tricky thing. If a FlashCopy logical drive is no longer needed, it may be disabled. In fact as long as a FlashCopy logical drive is enabled, performance of DS storage subsystem is slightly impacted as a continuous copy-on-write activity is going on with the associated FlashCopy repository logical drive. However when FlashCopy logical drive is disabled, the copy-on-write activity stops and performance returns to its optimal state once again.

Main advantage for disabling the FlashCopy logical drive instead of deleting it is that FlashCopy drive along with its repository drive and hosts mappings is retained. Then, when you need to create a different FlashCopy of the same base logical drive, you can just use the re-create option to reuse a disabled FlashCopy. This takes less time than to create a new one and give a fresh snapshot of changed data.

When you re-create a FlashCopy logical drive, please note that:

  • The FlashCopy logical drive must be in either an optimal or a disabled state.
  • All copy-on-write data on the FlashCopy repository logical drive is deleted.
  • FlashCopy and FlashCopy repository logical drive parameters remain the same as previously disabled FlashCopy logical drive and its associated FlashCopy repository

logical drive. After the FlashCopy logical drive is re-created, you can change parameters

on the FlashCopy repository logical drive through the appropriate menu options.

For automated FlashCopy operations I did creation of FlashCopy drives for all required base LUN drives once using Storage manager software. For subsequent operations, I choose disabling /recreating of FlashCopy drives rather than creation/deletion due to their ability of retaining host-to-lun mappings.

Implementation Description



My Environment comprised of AIX 5.3, Oracle 9.2 and SAP. However FlashCopy can be used with any relational database which supports online backups.

I used DS4300 flash copy (disable & recreate functions) to make instant image of all data file systems along with archive log filesystem and make these target filesystems available on same host (SAP DB/CI) server. Hence source filesystems as well as target filesystems are mounted on same server in my implementation (although it is possible to mount target filesystems on any AIX node different from source AIX node). These Target file systems are then backed up to TSM server using TSM B/A AIX client with help of TSM scheduler.

I automated all these daily operations by integrating UNIX shell scripting with DScli (Command Line Interface) feature. Two shell scripts namely flashrecreate.sh & flashdisable.sh are present in appendix which were used in this whole automated implementation. These scripts are then scheduled by AIX cron facility so that flash creation is done every night at 1:00 AM and flash disable operation is done at every morning 10:00 AM. The FlashCopy disable operation was equally important as it stop unnecessarily tracking of data changes and hence helped in avoiding possible scenario

of filling up associated FlashCopy repository drives.

In order to ensure consistency in such snapshots operations, with respect to operating system, AIX 5L provides “freeze” option with JFS2 filesystems. This option can be used with chfs command to freeze IO to mounted JFS2 filesystem before initiating FlashCopy operation. Most tricky part is to how to make FlashCopy operations consistent in simple JFS environments (like my case). For that purpose I used following techniques

  1. First, I putted whole oracle database in hot backup mode before performing flash copy enable operation so that all write operations on database level should be stopped before starting backup operations.
  2. I synced file systems cache to disk using sync command and then wait for around one minute before starting flash copy task so that any data present in filesystem cache should be written to disk. Please note that AIX “sync” command does not guarantee 100% for writing data from cache to disk but still a useful tool for this purpose.
  3. After execution of sync command , a slight delay of few seconds ( let’s say 10 seconds ) was also putted so that oracle redo log files updation process also completes before actual start of FlashCopy commands.

  1. Finally, before mounting target file systems on AIX Server, I run fsck command against every target filesystem. In my case I calculated that total fsck operation took around 35 minutes for 500 GB filesystem (if done sequentially for five filesystems). Depending upon size of backup window, this fsck operation can be started in parallel on all target filesystems to save time. In my environment, as this time delay was acceptable for us so I did this operation sequentially and did not start TSM scheduled archive operations till all target filesystems are mounted.

In order to avoid any possible execution of flash copy enabling operation without disabling already existing FlashCopy drives for base logical drives, I placed a logical lock in flash shell scripts so that enabling (or so called recreating FlashCopy task) would be done if and only if the FlashCopy operation is already disabled for that logical drive.

Backups Restoration

Another important concern about every backup strategy is the ease and flexibility with which every backup taken can be restored. In fact no backup strategy guarantee 100% surety about restoration success but the only way to ensure is to restore backups on regular basis.

With FlashCopy backup technique, target filesystems can be mounted on same AIX server containing source filesystems as well as on different AIX server. The only important thing to note down is that these target FlashCopy filesystems are mounted on AIX hosts with mount points starting with /fs/ by default. When backed up to TSM server using TSM BA client (or even by using simple tar command), these filesystems are archived using same mount points. Therefore after restoration on target AIX host system mount point of these filesystems have to be changed to / using chfs command before starting application or database on target server.

A simple shell script can also be written in order to change mount points of all restored filesystems using simple chfs command thereby automating restoration process. I observed restoration time of around 45 minutes to 55 minutes to restore 500 GB of data using TSM B/A retrieve function (using GB Ethernet network)

Possible Issues and Resolutions

There were many issues faced in early implementation of this whole strategy. Some of these issues were resolved as follows:

  1. Sometimes logical drives on storage subsystems level change their controller ownerships from their preferred controllers due to any temporary hardware problems on SAN or on storage subsystem level. Although AIX handle this controller ownership issue with RDAC driver without any impact on accessibility from operating system, problems may arise due to such kind of event while using FlashCopy especially though DSCLI in an unattended mode. To resolve this I used IP addresses of both storage controllers with DSCLI commands in order to make sure that recreate and disable operations should work successfully in any such case.
  2. FlashCopy operations might fail due to Repository drive full up issue. This possible scenario was avoided by disabling FlashCopy on daily basis, once backups are done.
  3. FlashCopy recreate tasks might create uncertain problems if tried to be executed without disabling already existing enabled FlashCopy drives. This scenario was avoided by developing simple logic in shell scripts used for recreate/disable FlashCopy operations.
  4. Filesystems corruption might lead to several OS issues including even crash of AIX servers. This was avoided by running full fsck before mounting target filesystems on AIX.


Appendix A - Scripts

------------------------------------------------------------------------------------------------------------------

# Written : For R3PSAP AIX node

# Date : Mar 2005

# Script : begin_backup.sql

# Purpose : It will place all SAP table spaces into begin backup mode

# will ensure database consistency before online backup is taking using #flashcopy

--------------------------------------------------------------------------------------------------------------------

#!/bin/ksh

connect /as sysdba

alter tablespace PSAPBTABD begin backup;

alter tablespace PSAPBTABI begin backup;

alter tablespace PSAPCLUD begin backup;

alter tablespace PSAPCLUI begin backup;

alter tablespace PSAPDDICD begin backup;

alter tablespace PSAPDDICI begin backup;

alter tablespace PSAPDOCUD begin backup;

alter tablespace PSAPDOCUI begin backup;

alter tablespace PSAPEL46CD begin backup;

alter tablespace PSAPEL46CI begin backup;

alter tablespace PSAPES46CD begin backup;

alter tablespace PSAPES46CI begin backup;

alter tablespace PSAPLOADD begin backup;

alter tablespace PSAPLOADI begin backup;

alter tablespace PSAPPOOLD begin backup;

alter tablespace PSAPPOOLI begin backup;

alter tablespace PSAPPROTD begin backup;

alter tablespace PSAPPROTI begin backup;

alter tablespace PSAPROLL begin backup;

alter tablespace PSAPSOURCED begin backup;

alter tablespace PSAPSOURCEI begin backup;

alter tablespace PSAPSTABD begin backup;

alter tablespace PSAPSTABI begin backup;

alter tablespace PSAPTEMP begin backup;

alter tablespace PSAPUSER1D begin backup;

alter tablespace PSAPUSER1I begin backup;

alter tablespace SYSTEM begin backup;

alter system switch logfile;

alter system switch logfile;

alter system switch logfile;

alter system switch logfile;

--------------------------------------------------------------------------------------------------------------

#/bin/ksh

#

# Written : For R3PSAP AIX node

# Date : Mar 2005

# Script : end_backup.sql

# Purpose : To bring all SAP table spaces back to normal state

--------------------------------------------------------------------------------------------------------------

connect / as sysdba

alter tablespace PSAPBTABD end backup;

alter tablespace PSAPBTABI end backup;

alter tablespace PSAPCLUD end backup;

alter tablespace PSAPCLUI end backup;

alter tablespace PSAPLOADD end backup;

alter tablespace PSAPLOADI end backup;

alter tablespace PSAPPOOLD end backup;

alter tablespace PSAPPOOLI end backup;

alter tablespace PSAPPROTD end backup;

alter tablespace PSAPPROTI end backup;

alter tablespace PSAPROLL end backup;

alter tablespace PSAPSOURCED end backup;

alter tablespace PSAPSOURCEI end backup;

alter tablespace PSAPSTABD end backup;

alter tablespace PSAPSTABI end backup;

alter tablespace PSAPTEMP end backup;

alter tablespace PSAPUSER1D end backup;

alter tablespace PSAPUSER1I end backup;

alter tablespace SYSTEM end backup;

---------------------------------------------------------------------------------------------------------------------

-----------------------------------------------------------------------------------------------------------------

#script name: flashrecreate.sh

#!/bin/ksh

#

# written For: R3PSAP AIX node

# Date : Mar 2005

# Created By: Khurram Shiraz

# Purpose : UNIX Shell Script for recreating flash drives and making #them available for TSM client to be backed up to TSM Server

-----------------------------------------------------------------------

TESTFILE="/scripts/lockfile"

if [ ! -f $TESTFILE ];

then

echo Please ensure that Flash Copy Pairs are already disabled

echo It seems that they are not disabled

echo therefore exiting!!!!

exit 1

else

echo Putting Oracle into hot backup Mode

echo please wait ............................

#

su - orar3p -c "sqlplus /nolog < /scripts/begin_backup.sql"

sync

sleep 10

# Execution of SMcli commands

cd /usr/SMclient

./SMcli 192.168.10.208 192.168.10.209 -c 'recreateFlashCopy logicalDrive ["Disk1-1"];';

./SMcli 192.168.10.208 192.168.10.209 -c 'recreateFlashCopy logicalDrive ["Disk2-1"];';

./SMcli 192.168.10.208 192.168.10.209 -c 'recreateFlashCopy logicalDrive ["Disk4-1"];';

./SMcli 192.168.10.208 192.168.10.209 -c 'recreateFlashCopy logicalDrive ["Disk5-1"];';

# Now working for Flashed Data.......

#

cfgmgr

#z=`lsdev -Cc disk | grep Snapshot | awk ' { printf $1 }`

# Preparation of LVM & VGs for mounting of filesystems

echo sleeping

sleep 10

chdev -l hdisk9 -a pv=clear

chdev -l hdisk11 -a pv=clear

chdev -l hdisk12 -a pv=clear

chdev -l hdisk13 -a pv=clear

recreatevg -y copyvg1 -Y cpy_ hdisk9

recreatevg -y copyvg2 -Y cpy_ hdisk11

recreatevg -y copyvg3 -Y cpy_ hdisk12

recreatevg -y copyvg4 -Y cpy_ hdisk13

# Putting Oracle back to normal Mode

su - orar3p -c "sqlplus /nolog < /scripts/end_backup.sql"

echo now running fsck & mounting fs

fsck -y /fs/oracle/R3P/sapdata1

mount /fs/oracle/R3P/sapdata1

fsck -y /fs/oracle/R3P/sapdata2

mount /fs/oracle/R3P/sapdata2

fsck -y /fs/oracle/R3P/sapdata3

mount /fs/oracle/R3P/sapdata3

fsck –y /fs/oracle/R3P/sapdata4

mount /fs/oracle/R3P/sapdata4

fsck –y /fs/oracle/R3P/sapdata5

mount /fs/oracle/R3P/sapdata5

cd /scripts

rm lockfile

exit 0

fi

---------------------------------------------------------------------------------------------------------------------# flashdisable .sh

#

# Written : For R3PSAP AIX node

# Date : Mar 2005

# Purpose : Shell Script for disabling flash target drives from TSM client # node and removing all related OS information.

-----------------------------------------------------------------------------

#!/bin/ksh

# Unmount all fileystems which are created during Flash Copy operation

#

unmount /fs/oracle/R3P/sapdata1

unmount /fs/oracle/R3P/sapdata2

unmount /fs/oracle/R3P/sapdata3

unmount /fs/oracle/R3P/sapdata4

unmount /fs/oracle/R3P/sapdata5

# Varyoff all Flash copy volume Groups

#

varyoffvg copyvg1

varyoffvg copyvg2

varyoffvg copyvg3

varyoffvg copyvg4

# Export all Flash copy volume groups

exportvg copyvg1

exportvg copyvg2

exportvg copyvg3

exportvg copyvg4

# Remove all snapshot logical drives

rmdev -dl hdisk9

rmdev -dl hdisk11

rmdev -dl hdisk12

rmdev -dl hdisk13

cd /usr/SMclient

./SMcli 192.168.10.208 192.168.10.209 -c 'disableFlashCopy logicalDrive["Disk1-1"];';

./SMcli 192.168.10.208 192.168.10.209 -c 'disableFlashCopy logicalDrive["Disk2-1"];';

./SMcli 192.168.10.208 192.168.10.209 -c 'disableFlashCopy logicalDrive["Disk4-1"];';

./SMcli 192.168.10.208 192.168.10.209 -c 'disableFlashCopy logicalDrive["Disk5-1"];';

cd /scripts

touch lockfile

exit 0

No comments:

Post a Comment

 How to Enable Graphical Mode on Red Hat 7 he recommended way to enable graphical mode on RHEL  V7 is to install first following packages # ...