AIX Health Check

TOPICS: AIX, SECURITY, SYSTEM ADMINISTRATION
System-wide separated shell history files for each user and session
Here's how you can set up your /etc/profile and /etc/environment, in order to create a separate shell history file for each user and each login session. This is very usefull when you need to know who exactly ran a specific command at a point in time.
Put this in /etc/profile on all servers:
# HISTFILE # execute only if interactive if [ -t 0 -a "${SHELL}" != "/bin/bsh" ] then d=`date "+%H%M.%m%d%y"` t=`tty | cut -c6-` u=$(ps -fp $(proctree $PPID | grep "\-ksh" | grep -v grep | \ awk '{print $1}' | head -1) | tail -1 | awk '{print $1}') w=`who -ms | awk '{print $NF}' | sed "s/(//g" | sed "s/)//g"` y=`tty | cut -c6- | sed "s/\//-/g"` mkdir $HOME/.history.$LOGIN 2>/dev/null export HISTFILE=$HOME/.history.$LOGIN/.sh_history.$LOGIN.$u.$w.$y.$d find $HOME/.history.$LOGIN/.s* -type f -ctime +91 -exec rm {} \; \ 2>/dev/null
H=ùname -n` mywhoami=`whoami` if [ ${mywhoami} = "root" ] ; then PS1='${USER}@(${H}) ${PWD##/*/} # ' else PS1='${USER}@(${H}) ${PWD##/*/} $ ' fi fi
# Time out after 60 minutes # Use readonly if you don't want users to be able to change it. # readonly TMOUT=3600 TMOUT=3600 export TMOUT
Put this in /etc/environment, to turn on time stamped history files:

# Added for extended shell history EXTENDED_HISTORY=ON
This way, *every* user on the system will have a separate shell history in the .history directory of their home directory. Each shell history file name shows you which account was used to login, which account was switched to, on which tty this happened, and at what date and time this happened. Shell history files are also timestamped internally (you can run "fc -t" to
show the shell history timestamped), and old shell history files are cleaned up after 3 months. Plus, user accounts will log out automatically after 60 minutes (3600 seconds) of inactivity. You can avoid running into a time-out by simply typing "read" or "\" followed by ENTER on the command line, or by adding "TMOUT=0" to a user's .profile, which essentially disables the time-out for that particular user.
One issue that you now may run into, is that because a separate history file is created for each login session, that it will become difficult to run "fc -t", because the fc command will only list the commands from the current session, and not those written to a different history file. To overcome this issue, you can set the HISTFILE variable to the file you want to run "fc -t" faor:
# export HISTFILE=.sh_history.root.user.10.190.41.116.pts-4.1706.120210
Then, to list all the commands for this history file, make sure you start a new shell and run the "fc -t" command:
# ksh "fc -t -10"
This will list the last 10 commands for that history file. TOPICS: AIX, SYSTEM ADMINISTRATION
Use machstat to identify power or cooling issues

Not very much known is the machstat command in AIX that can be used to display the status of the Power Status Register, and thus can be helpful to identify any issues with either Power or Cooling.
# machstat -f 0 0 0
If it returns all zeroes, everything is fine. Anything else is not good. The first digit (the so-called EPOW Event) indicates the type of problem: EPOW Event 0 1 2 3 4 5 7 Description normal operation non-critical cooling problem non-critical power problem severe power problem - halt system severe problems - halt immediately unhandled issue unhandled issue
Another way to determine if the system may have a power or cooling issue, is by looking at a crontab entry in the root user's crontab:
# crontab -l root | grep -i powerfail 0 00,12 * * * wall%rc.powerfail:2::WARNING!!! The system is now operating with a power problem. This message will be walled every 12 hours. Remove this crontab entry after the problem is resolved.
If a powerfail message is present in the crontab of user root, this may indicate that there is an issue to be looked into. Contact your IBM representative to check the system out. Afterwards, make sure to remove the powerfail entry from the root user's crontab. TOPICS: AIX, LVM, SYSTEM ADMINISTRATION
LVM command history

Want to know which LVM commands were run on a system? Simply run the following command, and get a list of the LVM command history:
# alog -o -t lvmcfg
To filter out only the actual commands:

# alog -o -t lvmcfg | grep -v -E "workdir|exited|tellclvmd" [S 06/11/13-16:52:02:236 lvmstat.c 468] lvmstat -v testvg [S 06/11/13-16:52:02:637 lvmstat.c 468] lvmstat -v rootvg [S 07/20/13-15:02:15:076 extendlv.sh 789] extendlv testlv 400 [S 07/20/13-15:02:33:199 chlv.sh 527] chlv -x 4096 testlv [S 08/22/13-12:29:16:807 chlv.sh 527] chlv -e x testlv [S 08/22/13-12:29:26:150 chlv.sh 527] chlv -e x fslv00 [S 08/22/13-12:29:46:009 chlv.sh 527] chlv -e x loglv00 [S 08/22/13-12:30:55:843 reorgvg.sh 590] reorgvg
TOPICS: AIX, SYSTEM ADMINISTRATION
Suspending and resuming a process

You may be familiar with suspending a process that is running in the foreground by pressing CTRL-Z. It will suspend the process, until you type "fg", and the process will resume again.
# sleep 400
After pressing CTRL-Z, you'll see:

[1] + Stopped (SIGTSTP) sleep 400
Then type "fg" to resume the process:

# fg sleep 400
But what if you wish to suspend a process that is not attached to a terminal, and is running in the background? This is where the kill command is useful. Using signal 17, you can suspend a process, and using signal 19 you can resume a process.
This is how it works: First look up the process ID you wish to suspend:
# sleep 400 & [1] 8913102
# ps -ef | grep sleep root 8913102 10092788 0 07:10:30 0 07:10:34 pts/1 pts/1 0:00 sleep 400 0:00 grep sleep
root 14680240 10092788
Then suspend the process with signal 17:

# kill -17 8913102 [1] + Stopped (SIGSTOP) sleep 400 &
To resume it again, send signal 19:

# kill -19 8913102
RANDOM Korn Shell built-in

The use of $RANDOM in Korn Shell can be very useful. Korn shell built-in $RANDOM can generate random numbers in the range 0:32767. At every call a new random value is generated:
# echo $RANDOM
19962 # echo $RANDOM 19360
The $RANDOM Korn shell built-in can also be used to generate numbers within a certain range, for example, if you want to run the sleep command for a random number of seconds.
To sleep between 1 and 600 seconds (up to 10 minutes):

# sleep $(print $((RANDOM%600+1)))
Number of active virtual processors

To know quickly how many virtual processors are active, run:
# echo vpm | kdb
For example:
# echo vpm | kdb ... VSD Thread State. CPU VP_STATE SLEEP_STATE PROD_TIME: SECS NSECS CEDE_LAT
0 1 2 3 4 5 6 7
ACTIVE ACTIVE ACTIVE ACTIVE DISABLED DISABLED DISABLED DISABLED
AWAKE AWAKE AWAKE AWAKE AWAKE SLEEPING SLEEPING SLEEPING
0000000000000000 0000000000000000 0000000000000000 0000000000000000 00000000503536C7 0000000051609EAF 0000000051609E64 0000000051609E73
00000000 00000000 00000000 00000000 261137E1 036D61DC 036D6299 036D6224
00 00 00 00 00 02 02 02
Fix user accounts

Security guidelines nowadays can be annoying. Within many companies people have to comply with strict security in regards to password expiration settings, password complexity and system security settings. All these settings and regulations more than often result in people getting locked out from their accounts on AIX systems, and also getting frustrated.
To help your users, you can't go change default security settings on the AIX systems. Your auditor will make sure you won't do that. But instead, there are some "tricks" you can do, to ensure that a user account is and stays available to your end user. We've put all those tricks together in one simple script, that can fix a user account, and we called it fixuser.ksh.
You can run this script as often as you like and for any user that you like. It will help you to ensure that a user account is not locked, that AIX won't bug the user to change their password, that the user doesn't have a failed login count (from typing too many passwords), and a bunch of other stuff that usually will keep your users from logging in and getting pesky "Access Denied" messages.
The script will not change any default security settings, and it can easily be adjusted to run for several user accounts, or can be run from a crontab so user accounts stay enabled for your users. The script is a win-win situation for everyone: Your auditor is happy, because security settings are strict on your system; Your users are happy for being able to just login without any hassle; And the sys admin will be happy for not having to resolve login issues manually anymore.
The script:
#!/usr/bin/ksh
fixit() { user=${1} unset myid myid=`lsuser ${user} 2>/dev/null`
if [ ! -z "${myid}" ] ; then # Unlock account printf "Unlocking account for ${user}..." chuser account_locked=false ${user} echo " Done."
# Remove password history printf "Removing password history for ${user}..." d=`lssec -f /etc/security/user -s default -a histsize|cut -f2 -d=` chuser histsize=0 ${user} chuser histsize=${d} ${user} echo " Done."
# Reset failed login count printf "Reset failed login count for ${user}..." chuser unsuccessful_login_count=0 ${user} echo " Done."
# Reset expiration date printf "Reset expiration date for ${user}..." chuser expires=0 ${user} echo " Done."
# Allow the user to login printf "Enable login for ${user}..." chuser login=true ${user}
echo " Done."
# Allow the user to login remotely printf "Enable remote login for ${user}..." chuser rlogin=true ${user} echo " Done."
# Reset maxage printf "Reset the maxage for ${user}..." m=`lssec -f /etc/security/user -s default -a maxage | cut -f2 -d=` chuser maxage=${m} ${user} echo " Done."
# Clear password change requirement printf "Clear password change requirement for ${user}..." pwdadm -c ${user} echo " Done."
# Reset password last update printf "Reset the password last update for ${user}..." let sinceepoch=`perl -e 'printf(time)' | awk '{print $1}'` n=`lssec -f /etc/security/user -s default -a minage | cut -f2 -d=` let myminsecs="${n}*7*24*60*60" let myminsecs="${myminsecs}+1000" let newdate="${sinceepoch}-${myminsecs}" chsec -f /etc/security/passwd -s ${user} -a lastupdate=${newdate} echo " Done." fi }
unset user
if [ ! -z "${1}" ] ; then user=${1} fi
# If a username is provided, fix that user account
unset myid myid=ìd ${user} 2>/dev/null` if [ ! -z "${myid}" ] ; then
echo "Fixing account ${user}..." fixit ${user} echo "Done." else echo "User ${user} does not exist." fi
How to read the /var/adm/ras/diag log file

There are 2 ways for reading the Diagnostics log file, located in /var/adm/ras/diag:
The first option uses the diag tool. Run:
# diag
Then hit ENTER and select "Task Selection", followed by "Display Previous Diagnostic Results" and "Display Previous Results".
The second option is to use diagrpt. Run:

# /usr/lpp/diagnostics/bin/diagrpt -s 010101
To display only the last entry, run:

# /usr/lpp/diagnostics/bin/diagrpt -o
TOPICS: AIX, BACKUP & RESTORE, SYSTEM ADMINISTRATION, VIRTUAL I/O SERVER, VIRTUALIZATION
How to make a system backup of a VIOS

To create a system backup of a Virtual I/O Server (VIOS), run the following commands (as user root):
# /usr/ios/cli/ioscli viosbr -backup -file vios_config_bkup -frequency daily -numfiles 10 # /usr/ios/cli/ioscli backupios -nomedialib -file /mksysb/$(hostname).mksysb -mksysb
The first command (viosbr) will create a backup of the configuration information to /home/padmin/cfgbackups. It will also schedule the command to run every day, and keep up to 10 files in /home/padmin/cfgbackups.
The second command is the mksysb equivalent for a Virtual I/O Server: backupios. This command will create the mksysb image in the /mksysb folder, and exclude any ISO repositiory in rootvg, and anything else excluded in /etc/exclude.rootvg. TOPICS: AIX, BACKUP & RESTORE, STORAGE, SYSTEM ADMINISTRATION
Using mkvgdata and restvg in DR situations

It is useful to run the following commands before you create your (at least) weekly mksysb image:
# lsvg -o | xargs -i mkvgdata {} # tar -cvf /sysadm/vgdata.tar /tmp/vgdata
Add these commands to your mksysb script, just before running the mksysb command. What this does is to run the mkvgdata command for each online volume group. This will generate output for a volume group in /tmp/vgdata. The resulting output is then tar'd and stored in the /sysadm folder or file system. This allows information regarding your volume groups, logical volumes, and file systems to be included in your mksysb image.
To recreate the volume groups, logical volumes and file systems:
Run:
# tar -xvf /sysadm/vgdata.tar
Now edit /tmp/vgdata/{volume group name}/{volume group name}.data file and look for the line with "VG_SOURCE_DISK_LIST=". Change the line to have the hdisks, vpaths or hdiskpowers as needed. Run:
# restvg -r -d /tmp/vgdata/{volume group name}/{volume group name}.data
Make sure to remove file systems with the rmfs command before running restvg, or it will not run correctly. Or, you can just run it once, run the exportvg command for the same volume group, and run the restvg command again. There is also a "-s" flag for restvg that lets you shrink the file system to its minimum size needed, but depending on when the vgdata was created, you could run out of space, when restoring the contents of the file system. Just something to keep in mind.
Here's how you can set up your /etc/profile and /etc/environment, in order to create a separate shell history file for each user and each login session. This is very usefull when you need to know who exactly ran a specific command at a point in time. Put this in /etc/profile on all servers:
H=ùname -n` mywhoami=`whoami` if [ ${mywhoami} = "root" ] ; then
PS1='${USER}@(${H}) ${PWD##/*/} # ' else PS1='${USER}@(${H}) ${PWD##/*/} $ ' fi fi

This way, *every* user on the system will have a separate shell history in the .history directory of their home directory. Each shell history file name shows you which account was used to login, which account was switched to, on which tty this happened, and at what date and time this happened. Shell history files are also timestamped internally (you can run "fc -t" to show the shell history timestamped), and old shell history files are cleaned up after 3 months. Plus, user accounts will log out automatically after 60 minutes (3600 seconds) of inactivity. You can avoid running into a time-out by simply typing "read" or "\" followed by ENTER on the command line, or by adding "TMOUT=0" to a user's .profile, which essentially disables the time-out for that particular user. One issue that you now may run into, is that because a separate history file is created for each login session, that it will become difficult to run "fc -t", because the fc command will only list the commands from the current session, and not those written to a different history file. To overcome this issue, you can set the HISTFILE variable to the file you want to run "fc -t" faor:
# ksh "fc -t -10"
This will list the last 10 commands for that history file. TOPICS: AIX, SYSTEM ADMINISTRATION

# machstat -f 0 0 0
If it returns all zeroes, everything is fine. Anything else is not good. The first digit (the socalled EPOW Event) indicates the type of problem: EPOW Event 0 1 2 3 4 5 7 Description normal operation non-critical cooling problem non-critical power problem severe power problem - halt system severe problems - halt immediately unhandled issue unhandled issue
LVM command history

# alog -o -t lvmcfg


# sleep 400


# fg sleep 400
But what if you wish to suspend a process that is not attached to a terminal, and is running in the background? This is where the kill command is useful. Using signal 17, you can suspend a process, and using signal 19 you can resume a process. This is how it works: First look up the process ID you wish to suspend:
# sleep 400 & [1] 8913102
root 14680240 10092788


# kill -19 8913102

# echo $RANDOM 19962 # echo $RANDOM 19360
The $RANDOM Korn shell built-in can also be used to generate numbers within a certain range, for example, if you want to run the sleep command for a random number of seconds. To sleep between 1 and 600 seconds (up to 10 minutes):

# echo vpm | kdb
For example:
0 1 2 3 4 5 6 7
0000000000000000 0000000000000000 0000000000000000 0000000000000000 00000000503536C7 0000000051609EAF 0000000051609E64 0000000051609E73
00000000 00000000 00000000 00000000 261137E1 036D61DC 036D6299 036D6224
00 00 00 00 00 02 02 02
Fix user accounts

Security guidelines nowadays can be annoying. Within many companies people have to comply with strict security in regards to password expiration settings, password complexity and system security settings. All these settings and regulations more than often result in people getting locked out from their accounts on AIX systems, and also getting frustrated. To help your users, you can't go change default security settings on the AIX systems. Your auditor will make sure you won't do that. But instead, there are some "tricks" you can do, to ensure that a user account is and stays available to your end user. We've put all those tricks together in one simple script, that can fix a user account, and we called it fixuser.ksh. You can run this script as often as you like and for any user that you like. It will help you to ensure that a user account is not locked, that AIX won't bug the user to change their password, that the user doesn't have a failed login count (from typing too many passwords), and a bunch of other stuff that usually will keep your users from logging in and getting pesky "Access Denied" messages. The script will not change any default security settings, and it can easily be adjusted to run for several user accounts, or can be run from a crontab so user accounts stay enabled for your users. The script is a win-win situation for everyone: Your auditor is happy, because security settings are strict on your system; Your users are happy for being able to just login without any hassle; And the sys admin will be happy for not having to resolve login issues manually anymore. The script:
#!/usr/bin/ksh
# Allow the user to login printf "Enable login for ${user}..." chuser login=true ${user} echo " Done."
# Reset maxage
printf "Reset the maxage for ${user}..." m=`lssec -f /etc/security/user -s default -a maxage | cut -f2 -d=` chuser maxage=${m} ${user} echo " Done."
unset user
if [ ! -z "${1}" ] ; then user=${1} fi
unset myid myid=ìd ${user} 2>/dev/null` if [ ! -z "${myid}" ] ; then echo "Fixing account ${user}..." fixit ${user} echo "Done." else echo "User ${user} does not exist." fi
There are 2 ways for reading the Diagnostics log file, located in /var/adm/ras/diag: The first option uses the diag tool. Run:
# diag
Then hit ENTER and select "Task Selection", followed by "Display Previous Diagnostic Results" and "Display Previous Results". The second option is to use diagrpt. Run:


The first command (viosbr) will create a backup of the configuration information to /home/padmin/cfgbackups. It will also schedule the command to run every day, and keep up to 10 files in /home/padmin/cfgbackups. The second command is the mksysb equivalent for a Virtual I/O Server: backupios. This command will create the mksysb image in the /mksysb folder, and exclude any ISO repositiory in rootvg, and anything else excluded in /etc/exclude.rootvg. TOPICS: AIX, BACKUP & RESTORE, STORAGE, SYSTEM ADMINISTRATION

Add these commands to your mksysb script, just before running the mksysb command. What this does is to run the mkvgdata command for each online volume group. This will generate output for a volume group in /tmp/vgdata. The resulting output is then tar'd and stored in the /sysadm folder or file system. This allows information regarding your volume groups, logical volumes, and file systems to be included in your mksysb image.
To recreate the volume groups, logical volumes and file systems: Run:
Now edit /tmp/vgdata/{volume group name}/{volume group name}.data file and look for the line with "VG_SOURCE_DISK_LIST=". Change the line to have the hdisks, vpaths or hdiskpowers as needed.
Run:
Make sure to remove file systems with the rmfs command before running restvg, or it will not run correctly. Or, you can just run it once, run the exportvg command for the same volume group, and run the restvg command again. There is also a "-s" flag for restvg that lets you shrink the file system to its minimum size needed, but depending on when the vgdata was created, you could run out of space, when restoring the contents of the file system. Just something to keep in mind.
TOPICS: AIX, STORAGE, SYSTEM ADMINISTRATION
Erasing disks
During a system decommission process, it is advisable to format or at least erase all drives. There are 2 ways of accomplishing that: If you have time: AIX allows disks to be erased via the Format media service aid in the AIX diagnostic package. To erase a hard disk, run the following command:
# diag -T format
This will start the Format media service aid in a menu driven interface. If prompted, choose your terminal. You will then be presented with a resource selection list. Choose the hdisk devices you want to erase from this list and commit your changes according to the instructions on the screen. Once you have committed your selection, choose Erase Disk from the menu. You will then be asked to confirm your selection. Choose Yes. You will be asked if you want to Read data from drive or Write patterns to drive. Choose Write patterns to drive. You will then have the opportunity to modify the disk erasure options. After you specify the options you prefer, choose Commit Your Changes. The disk is now erased. Please note, that it can take a long time for this process to complete. If you want to do it quick-and-dirty:
For each disk, use the dd command to overwrite the data on the disk. For example:
for disk in $(lspv | awk '{print $1}') ; do dd if=/dev/zero of=/dev/r${disk} bs=1024 count=10 echo $disk wiped done
This does the trick, as it reads zeroes from /dev/zero and outputs 10 times 1024 zeroes to each disk. That overwrites anything on the start of the disk, rendering the disk useless. TOPICS: AIX, SYSTEM ADMINISTRATION
Unconfiguring child objects

When removing a device on AIX, you may run into a message saying that a child device is not in a correct state. For example:
# rmdev -dl fcs3 Method error (/usr/lib/methods/ucfgcommo): 0514-029 Cannot perform the requested function because a child device of the specified device is not in a correct state.
To determine what the child devices are, use the -p option of the lsdev command. From the man page of the lsdev command:
-p Parent Specifies the device logical name from the Customized Devices object class for the parent of devices to be displayed. The -p Parent flag can be used to show the child devices of the given Parent. The Parent argument to the -p flag may contain the same wildcard charcters that can be used with the odmget command. This flag cannot be used with the -P flag.
For example:
# lsdev -p fcs3 fcnet3 Defined 07-01-01 Fibre Channel Network Protocol Device
fscsi3 Available 07-01-02 FC SCSI I/O Controller Protocol Device
To remove the device, and all child devices, use the -R option. From the man page for the rmdev command:
-R Unconfigures the device and its children. When used with the -d or -S flags, the children are undefined or stopped, respectively.
The command to remove adapter fcs3 and all child devices, will be:
# rmdev -Rdl fcs3
mkpasswd
An interesting open source project is Expect. It's a tool that can be used to automate interactive applications. You can download the RPM for Expect can be downloaded fromhttp://www.perzl.org/aix/index.php?n=Main.Expect, and the home page for Expect ishttp://www.nist.gov/el/msid/expect.cfm. A very interesting tool that is part of the Expect RPM is "mkpasswd". It is a little Tcl script that uses Expect to work with the passwd program to generate a random password and set it immediately. A somewhat adjusted version of "mkpasswd" can be downloaded here. The adjusted version of mkpasswd will generate a random password for a user, with a length of 8 characters (the maximum password length by default for AIX), if you run for example:
# /usr/local/bin/mkpasswd username sXRk1wd3
To see the interactive work performed by Expect for mkpasswd, use the -v option:
# /usr/local/bin/mkpasswd -v username spawn /bin/passwd username Changing password for "username" username's New password: Enter the new password again: password for username is s8qh1qWZ
By using mkpasswd, you'll never have to come up with a random password yourself again, and it will prevent Unix system admins from assigning new passwords to accounts that are easily guessible, such as "changeme", or "abc1234". Now, what if you would want to let "other" users (non-root users) to run this utility, and at the same time prevent them from resetting the password of user root? Let's say you want user pete to be able to reset other user's passwords. Add the following entries to the /etc/sudoers file by running visudo:
# visudo
Cmnd_Alias MKPASSWD = /usr/local/bin/mkpasswd, \ ! /usr/local/bin/mkpasswd root pete ALL=(ALL) NOPASSWD:MKPASSWD
This will allow pete to run the /usr/local/bin/mkpasswd utility, which he can use to reset passwords. First, to check what he can run, use the "sudo -l" command:
# su - pete $ sudo -l
User pete may run the following commands on this host: (ALL) NOPASSWD: /usr/local/bin/mkpasswd, !/usr/local/bin/mkpasswd root
Then, an attempt, using pete's account, to reset another user's password (which is successful):
$ sudo /usr/local/bin/mkpasswd mark oe09'ySMj
Then another attempt, to reset the root password (which fails):

$ sudo /usr/local/bin/mkpasswd root Sorry, user pete is not allowed to execute '/usr/local/bin/mkpasswd root' as root.
Migrating users from one AIX system to another

Since the files involved in the following procedure are flat ASCII files and their format has not changed from V4 to V5, the users can be migrated between systems running the same or different versions of AIX (for example, from V4 to V5). Files that can be copied over: /etc/group /etc/passwd /etc/security/group /etc/security/limits /etc/security/passwd /etc/security/.ids /etc/security/environ /etc/security/.profile NOTE: Edit the passwd file so the root entry is as follows:
root:!:0:0::/:/usr/bin/ksh
When you copy the /etc/passwd and /etc/group files, make sure they contain at least a minimum set of essential user and group definitions. Listed specifically as users are the following: root, daemon, bin, sys, adm, uucp, guest, nobody, lpd Listed specifically as groups are the following: system, staff, bin, sys, adm, uucp, mail, security, cron, printq, audit, ecs, nobody, usr If the bos.compat.links fileset is installed, you can copy the /etc/security/mkuser.defaults file over. If it is not installed, the file is located as mkuser.default in the /usr/lib/security directory. If
you copy over mkuser.defaults, changes must be made to the stanzas. Replace group with pgrp, and program with shell. A proper stanza should look like the following:
user: pgrp = staff groups = staff shell = /usr/bin/ksh home = /home/$USER
The following files may also be copied over, as long as the AIX version in the new machine is the same: /etc/security/login.cfg /etc/security/user NOTE: If you decide to copy these two files, open the /etc/security/user file and make sure that variables such as tty, registry, auth1 and so forth are set properly with the new machine. Otherwise, do not copy these two files, and just add all the user stanzas to the new created files in the new machine. Once the files are moved over, execute the following:
# usrck -t ALL # pwdck -t ALL # grpck -t ALL
This will clear up any discrepancies (such as uucp not having an entry in /etc/security/passwd). Ideally this should be run on the source system before copying over the files as well as after porting these files to the new system. NOTE: It is possible to find user ID conflicts when migrating users from older versions of AIX to newer versions. AIX has added new user IDs in different release cycles. These are reserved IDs and should not be deleted. If your old user IDs conflict with the newer AIX system user IDs, it is advised that you assign new user IDs to these older IDs. From: http://www-01.ibm.com/support/docview.wss?uid=isg3T1000231 TOPICS: AIX, STORAGE AREA NETWORK, SYSTEM ADMINISTRATION
AIX fibre channel error - FCS_ERR6

This error can occur if the fibre channel adapter is extremely busy. The AIX FC adapter driver is trying to map an I/O buffer for DMA access, so the FC adapter can read or write into the buffer. The DMA mapping is done by making a request to the PCI bus device driver. The PCI bus device driver is saying that it can't satisfy the request right now. There was simply too much IO at that moment, and the adapter couldn't handle them all. When the FC adapter is configured, we tell the PCI bus driver how much resource to set aside for us, and it may have gone over the limit. It is therefore recommended to increase the max_xfer_size on
the fibre channel devices. It depends on the type of fibre channel adapter, but usually the possible sizes are: 0x100000, 0x200000, 0x400000, 0x800000, 0x1000000 To view the current setting type the following command:
# lsattr -El fcsX -a max_xfer_size
Replace the X with the fibre channel adapter number. You should get an output similar to the following:
max_xfer_size 0x100000 Maximum Transfer Size True
The value can be changed as follows, after which the server needs to be rebooted:
# chdev -l fcsX -a max_xfer_size=0x1000000 -P
TOPICS: AIX, SYSTEM ADMINISTRATION, VIRTUALIZATION
Set up private network between 2 VIO clients

The following is a description of how you can set up a private network between two VIO clients on one hardware frame. Servers to set up connection: server1 and server2 Purpose: To be used for Oracle interconnect (for use by Oracle RAC/CRS) IP Addresses assigned by network team:
192.168.254.141 (server1priv) 192.168.254.142 (server2priv) Subnetmask: 255.255.255.0
VLAN to be set up: PVID 4. This number is basically randomly chosen; it could have been 23 or 67 or whatever, as long as it is not yet in use. Proper documentation of your VIO setup and the defined networks, is therefore important. Steps to set this up: Log in to HMC GUI as hscroot. Change the default profile of server1, and add a new virtual Ethernet adapter. Set the port virtual Ethernet to 4 (PVID 4). Select "This adapter is required for virtual server activation". Configuration -> Manage Profiles -> Select "Default" -> Actions -> Edit -> Select "Virtual Adapters" tab -> Actions -> Create Virtual Adapter -> Ethernet adapter -> Set "Port Virtual Ethernet" to 4 -> Select "This adapter is required for virtual server activation." -> Click Ok -> Click Ok -> Click Close. Do the same for server2.
Now do the same for both VIO clients, but this time do "Dynamic Logical Partitioning". This way, we don't have to restart the nodes (as we previously have only updated the default profiles of both servers), and still get the virtual adapter.
Run cfgmgr on both nodes, and see that you now have an extra Ethernet adapter, in my case ent1. Run "lscfg -vl ent1", and note the adapter ID (in my case C5) on both nodes. This should match the adapter IDs as seen on the HMC. Now configure the IP address on this interface on both nodes. Add the entries for server1priv and server2priv in /etc/hosts on both nodes. Run a ping: ping server2priv (from server1) and vice versa. Done! Steps to throw it away:
On each node: deconfigure the en1 interface:

# ifconfig en1 detach
Rmdev the devices on each node:

# rmdev -dl en1 # rmdev -dl ent1
Remove the virtual adapter with ID 5 from the default profile in the HMC GUI for server1 and server2. DLPAR the adapter with ID 5 out of server1 and server2. Run cfgmgr on both nodes to confirm the adapter does not re-appear. Check with:
# lsdev -Cc adapter
Done! TOPICS: AIX, POWERHA / HACMP, SYSTEM ADMINISTRATION
clstat: Failed retrieving cluster information.

If clstat is not working, you may get the following error, when running clstat:
# clstat Failed retrieving cluster information.
There are a number of possible causes: clinfoES or snmpd subsystems are not active. snmp is unresponsive. snmp is not configured correctly. Cluster services are not active on any nodes.
Refer to the HACMP Administration Guide for more information. Additional information for verifying the SNMP configuration on AIX 6
can be found in /usr/es/sbin/cluster/README5.5.0.UPDATE
To resolve this, first of all, go ahead and read the README that is referred to. You'll find that you have to enable an entry in /etc/snmdv3.conf:
Commands clstat or cldump will not start if the internet MIB tree is not enabled in snmpdv3.conf file. This behavior is usually seen in AIX 6.1 onwards where this internet MIB entry was intentionally disabled as a security issue. This internet MIB entry is required to view/resolve risc6000clsmuxpd (1.3.6.1.4.1.2.3.1.2.1.5) MIB sub tree which is used by clstat or cldump functionality.
There are two ways to enable this MIB sub tree(risc6000clsmuxpd) they are:
1) Enable the main internet MIB entry by adding this line in /etc/snmpdv3.conf file
VACM_VIEW defaultView internet - included -
But doing so is not advisable as it unlocks the entire MIB tree
2) Enable only the MIB sub tree for risc6000clsmuxpd without enabling the main MIB tree by adding this line in /etc/snmpdv3.conf file
VACM_VIEW defaultView 1.3.6.1.4.1.2.3.1.2.1.5 - included -
Note: After enabling the MIB entry above snmp daemon must be restarted with the following commands as shown below:
# stopsrc -s snmpd # startsrc -s snmpd
After snmp is restarted leave the daemon running for about two minutes before attempting to start clstat or cldump.
Sometimes, even after doing this, clstat or cldump still don't work. The next thing may sound silly, but edit the /etc/snmpdv3.conf file, and take out the coments. Change this:
smux 1.3.6.1.4.1.2.3.1.2.1.2 gated_password # gated
smux 1.3.6.1.4.1.2.3.1.2.1.5 clsmuxpd_password # HACMP/ES for AIX ...
To:
smux 1.3.6.1.4.1.2.3.1.2.1.2 gated_password smux 1.3.6.1.4.1.2.3.1.2.1.5 clsmuxpd_password
Then, recycle the deamons on all cluster nodes. This can be done while the cluster is up and running:
# stopsrc -s hostmibd # stopsrc -s snmpmibd # stopsrc -s aixmibd # stopsrc -s snmpd # sleep 4 # chssys -s hostmibd -a "-c public" # chssys -s aixmibd # chssys -s snmpmibd # sleep 4 # startsrc -s snmpd # startsrc -s aixmibd # startsrc -s snmpmibd # startsrc -s hostmibd # sleep 120 # stopsrc -s clinfoES # startsrc -s clinfoES # sleep 120 -a "-c public" -a "-c public"
Now, to verify that it works, run either clstat or cldump, or the following command:
# snmpinfo -m dump -v -o /usr/es/sbin/cluster/hacmp.defs cluster
Still not working at this point? Then run an Extended Verification and Synchronization:
# smitty cm_ver_and_sync.select
After that, clstat, cldump and snmpinfo should work. TOPICS: AIX, SYSTEM ADMINISTRATION
Too many open files

To determine if the number of open files is growing over a period of time, issue lsof to report the open files against a PID on a periodic basis. For example:
# lsof -p (PID of process) -r (interval) > lsof.out
Note: The interval is in seconds, 1800 for 30 minutes. This output does not give the actual file names to which the handles are open. It provides only the name of the file system (directory) in which they are contained. The lsof command indicates if the open file is associated with an open socket or a file. When it references a file, it identifies the file system and the inode, not the file name. Run the following command to determine the file name:
# df -kP filesystem_from_lsof | awk '{print $6}' | tail -1
Now note the filesystem name. And then run:

# find filesystem_name -inum inode_from_lsof -print
This will show the actual file name.
To increase the number, change or add the nofiles=XXXXX parameter in the /etc/security/limits file, run:
# chuser nofiles=XXXXX user_id
You can also use svmon:

# svmon -P java_pid -m | grep pers
This lists opens files in the format: filesystem_device:inode. Use the same procedure as above for finding the actual file name. TOPICS: AIX, SECURITY, SYSTEM ADMINISTRATION
DSH fails with host key verification failed

If you try to estabilish a dsh session with a remote node sometimes you may get an error message like this:
# dsh -n server date server.domain.com: Host key verification failed. dsh: 2617-009 server.domain.com remote shell had exit code 255
Connecting with ssh works well with key authentication:

# ssh server
The difference between the two connections is that the dsh uses the FQDN, and the FQDN needs to be added to the known_hosts file for SSH. Therefore you must make an ssh connection first with FQDN to the host:
# ssh server.domain.com date The authenticity of host server.domain.com can't be established. RSA key fingerprint is 1b:b1:89:c0:63:d5:f1:f1:41:fa:38:14:d8:60:ce. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added server.domain.com (RSA) to the list of known hosts. Tue Sep 6 11:56:34 EDT 2011
Now try to use dsh again, and you'll see it will work:
# dsh -n server date server.domain.com: Tue Sep 6 11:56:38 EDT 2011
TOPICS: AIX, BACKUP & RESTORE, SYSTEM ADMINISTRATION
Restoring individual files from a mksysb image

Sometimes, you just need that one single file from a mksysb image backup. It's really not that difficult to accomplish this. First of all, go to the directory that contains the mksysb image file:
# cd /sysadm/iosbackup
In this example, were using the mksysb image of a Virtual I/O server, created using iosbackup. This is basically the same as a mksysb image from a regular AIX system. The image file for this mksysb backup is called vio1.mksysb
First, try to locate the file you're looking for; For example, if you're looking for file nimbck.ksh:
# restore -T -q -l -f vio1.mksysb | grep nimbck.ksh New volume on vio1.mksysb: Cluster size is 51200 bytes (100 blocks). The volume number is 1. The backup date is: Thu Jun Files are backed up by name. The user is padmin. -rwxr-xr-x- 10 staff May 23 08:37 1801 ./home/padmin/nimbck.ksh 9 23:00:28 MST 2011
Here you can see the original file was located in /home/padmin. Now recover that one single file:
# restore -x -q -f vio1.mksysb ./home/padmin/nimbck.ksh x ./home/padmin/nimbck.ksh
Note that it is important to add the dot before the filename that needs to be recovered. Otherwise it won't work. Your file is now restore to ./home/padmin/nimbck.ksh, which is a relative folder from the current directory you're in right now:
# cd ./home/padmin # ls -als nimbck.ksh 4 -rwxr-xr-x 1 10 staff 1801 May 23 08:37 nimbck.ksh
TOPICS: AIX, BACKUP & RESTORE, LVM, SYSTEM ADMINISTRATION
Use dd to backup raw partition

The savevg command can be used to backup user volume groups. All logical volume information is archived, as well as JFS and JFS2 mounted filesystems. However, this command cannot be used to backup raw logical volumes. Save the contents of a raw logical volume onto a file using:
# dd if=/dev/lvname of=/file/system/lvname.dd
This will create a copy of logical volume "lvname" to a file "lvname.dd" in file system /file/system. Make sure that wherever you write your output file to (in the example above to /file/system) has enough disk space available to hold a full copy of the logical volume. If the logical volume is 100 GB, you'll need 100 GB of file system space for the copy. If you want to test how this works, you can create a logical volume with a file system on top of it, and create some files in that file system. Then unmount he filesystem, and use dd to copy the logical volume as described above. Then, throw away the file system using "rmfs -r", and after that has been completed, recreate
the logical volume and the file system. If you now mount the file system, you will see, that it is empty. Unmount the file system, and use the following dd command to restore your backup copy:
# dd if=/file/system/lvname.dd of=/dev/lvname
Then, mount the file system again, and you will see that the contents of the file system (the files you've placed in it) are back. TOPICS: AIX, HARDWARE, SYSTEM ADMINISTRATION
Identifying devices with usysident

There is a LED which you can turn on to identify a device, which can be useful if you need to replace a device. It's the same binary as being used by diag. To show the syntax:
# /usr/lpp/diagnostics/bin/usysident ? usage: usysident [-s {normal | identify}] [-l location code | -d device name] usysident [-t]
To check the LED status of the system:

# /usr/lpp/diagnostics/bin/usysident normal
To check the LED status of /dev/hdisk1:

# /usr/lpp/diagnostics/bin/usysident -d hdisk1 normal
To activate the LED of /dev/hdisk1:

# /usr/lpp/diagnostics/bin/usysident -s identify -d hdisk1 # /usr/lpp/diagnostics/bin/usysident -d hdisk1 identify
To turn of the LED again of /dev/hdisk1:

# /usr/lpp/diagnostics/bin/usysident -s normal -d hdisk1 # /usr/lpp/diagnostics/bin/usysident -d hdisk1 normal
Keep in mind that activating the LED of a particular device does not activate the LED of the system panel. You can achieve that if you omit the device parameter. TOPICS: AIX, LVM, SYSTEM ADMINISTRATION
Renaming disk devices

Getting disk devices named the same way on, for example, 2 nodes of a PowerHA cluster, can be really difficult. For us humans though, it's very useful to have the disks named the same way on all nodes, so we can recognize the disks a lot faster, and don't have to worry about picking the wrong disk. The way to get around this usually involved either creating dummy disk devices or running
configuration manager on a specific adapter, like: cfgmgr -vl fcs0. This complicated procedure is not needed any more since AIX 7.1 and AIX 6.1 TL6, because a new command has been made available, called rendev, which is very easy to use for renaming devices:
# lspv hdisk0 hdisk1 00c8b12ce3c7d496 00c8b12cf28e737b rootvg None active
# rendev -l hdisk1 -n hdisk99
# lspv hdisk0 00c8b12ce3c7d496 rootvg None active
hdisk99 00c8b12cf28e737b
Lsmksysb
There's a simple command to list information about a mksysb image, called lsmksysb:
# lsmksysb -lf mksysb.image VOLUME GROUP: BACKUP DATE/TIME: UNAME INFO: BACKUP OSLEVEL: rootvg Mon Jun 6 04:00:06 MST 2011 AIX testaix1 1 6 0008CB1A4C00 6.1.6.0
MAINTENANCE LEVEL: 6100-06 BACKUP SIZE (MB): SHRINK SIZE (MB): VG DATA ONLY: 49920 17377 no
rootvg: LV NAME hd5 hd6 hd8 hd4 hd2 hd9var hd3 hd1 hd10opt dumplv1 dumplv2 hd11admin TYPE boot paging jfs2log jfs2 jfs2 jfs2 jfs2 jfs2 jfs2 sysdump sysdump jfs2 LPs 1 32 1 8 40 40 40 8 8 16 16 1 PPs 2 64 2 16 80 80 80 16 16 16 16 2 PVs 2 2 2 2 2 2 2 2 2 1 1 2 LV STATE closed/syncd open/syncd open/syncd open/syncd open/syncd open/syncd open/syncd open/syncd open/syncd open/syncd open/syncd open/syncd MOUNT POINT N/A N/A N/A / /usr /var /tmp /home /opt N/A N/A /admin
TOPICS: AIX, LVM, STORAGE, SYSTEM ADMINISTRATION
VGs (normal, big, and scalable)

The VG type, commonly known as standard or normal, allows a maximum of 32 physical volumes (PVs). A standard or normal VG is no more than 1016 physical partitions (PPs) per PV and has an upper limit of 256 logical volumes (LVs) per VG. Subsequently, a new VG type was introduced which was referred to as big VG. A big VG allows up to 128 PVs and a maximum of 512 LVs. AIX 5L Version 5.3 has introduced a new VG type called scalable volume group (scalable VG). A scalable VG allows a maximum of 1024 PVs and 4096 LVs. The maximum number of PPs applies to the entire VG and is no longer defined on a per disk basis. This opens up the prospect of configuring VGs with a relatively small number of disks and fine-grained storage allocation options through a large number of PPs, which are small in size. The scalable VG can hold up to 2,097,152 (2048 K) PPs. As with the older VG types, the size is specified in units of megabytes and the size variable must be equal to a power of 2. The range of PP sizes starts at 1 (1 MB) and goes up to 131,072 (128 GB). This is more than two orders of magnitude above the 1024 (1 GB), which is the maximum for both normal and big VG types in AIX 5L Version 5.2. The new maximum PP size provides an architectural support for 256 petabyte disks. The table below shows the variation of configuration limits with different VG types. Note that the maximum number of user definable LVs is given by the maximum number of LVs per VG minus 1 because one LV is reserved for system use. Consequently, system administrators can configure 255 LVs in normal VGs, 511 in big VGs, and 4095 in scalable VGs. VG type Normal VG Big VG Scalable VG Max PVs 32 128 1024 Max LVs 256 512 4096 Max PPs per VG 32,512 (1016 * 32) 130,048 (1016 * 128) 2,097,152 Max PP size 1 GB 1 GB 128 GB
The scalable VG implementation in AIX 5L Version 5.3 provides configuration flexibility with respect to the number of PVs and LVs that can be accommodated by a given instance of the new VG type. The configuration options allow any scalable VG to contain 32, 64, 128, 256, 512, 768, or 1024 disks and 256, 512, 1024, 2048, or 4096 LVs. You do not need to configure the maximum values of 1024 PVs and 4096 LVs at the time of VG creation to account for potential future growth. You can always increase the initial settings at a later date as required. The System Management Interface Tool (SMIT) and the Web-based System Manager graphical user interface fully support the scalable VG. Existing SMIT panels, which are related to VG management tasks, have been changed and many new panels added to account for the scalable VG type. For example, you can use the new SMIT fast path _mksvg to directly access the Add a Scalable VG SMIT menu.
The user commands mkvg, chvg, and lsvg have been enhanced in support of the scalable VG type. For more information: http://www.ibm.com/developerworks/aix/library/au-aix5l-lvm.html. TOPICS: AIX, LVM, SYSTEM ADMINISTRATION
bootlist: Multiple boot logical volumes found

This describes how to resolve the following error when setting the bootlist:
# bootlist -m normal hdisk2 hdisk3 0514-229 bootlist: Multiple boot logical volumes found on 'hdisk2'. Use the 'blv' attribute to specify the one from which to boot.
To resolve this: clear the boot logical volumes from the disks:
# chpv -c hdisk2 # chpv -c hdisk3
Verify that the disks can no longer be used to boot from by running:
# ipl_varyon -i
Then re-run bosboot on both disks:

# bosboot -ad /dev/hdisk2 bosboot: Boot image is 38224 512 byte blocks. # bosboot -ad /dev/hdisk3 bosboot: Boot image is 38224 512 byte blocks.
Finally, set the bootlist again:

# bootlist -m normal hdisk2 hdisk3
Another way around it is by specifying hd5 using the blv attribute:

# bootlist -m normal hdisk2 blv=hd5 hdisk3 blv=hd5
This will set the correct boot logical volume, but the error will show up if you ever run the bootlist command again without the blv attribute. TOPICS: AIX, LVM, SYSTEM ADMINISTRATION
Mirrorvg without locking the volume group

When you run the mirrorvg command, you will (by default) lock the volume group it is run against. This way, you have no way of knowing what the status is of the sync process that occurs after mirrorvg has run the mklvcopy commands for all the logical volumes in the volume group. Especially with very large volume groups, this can be a problem. The solution however is easy: Make sure to run the mirrorvg command with the -s option, to prevent it to run the sync. Then, when mirrorvg has completed, run the syncvg yourself with the -P option. For example, if you wish to mirror the rootvg from hdisk0 to hdisk1:
# mirrorvg -s rootvg hdisk1
Of course, make sure the new disk is included in the boot list for the rootvg:
Now rootvg is mirrored, but not yet synced. Run "lsvg -l rootvg", and you'll see this. So run the syncvg command yourself. With the -P option you can specify the number of threads that should be started to perform the sync process. Usually, you can specify at least 2 to 3 times the number of cores in the system. Using the -P option has an extra feature: there will be no lock on the volume group, allowing you to run "lsvg rootvg" within another window, to check the status of the sync process.
# syncvg -P 4 -v rootvg
And in another window:

# lsvg rootvg | grep STALE | xargs STALE PVs: 1 STALE PPs: 73
TOPICS: AIX, LVM, SYSTEM ADMINISTRATION
File system creation time

To determine the time and date a file system was created, you can use the getlvcb command. First, figure out what the logical volume is that is used for a partical file system, for example, if you want to know for the /opt file system:
# lsfs /opt Name Nodename Mount Pt VFS /opt jfs2 Size Options Auto Accounting yes no
/dev/hd10opt --
4194304 --
So file system /opt is located on logical volume hd10opt. Then run the getlvcb command:
# getlvcb -AT hd10opt AIX LVCB intrapolicy = c copies = 2 interpolicy = m lvid = 00f69a1100004c000000012f9dca819a.9 lvname = hd10opt label = /opt machine id = 69A114C00 number lps = 8 relocatable = y strict = y stripe width = 0 stripe size in exponent = 0 type = jfs2 upperbound = 32 fs = vfs=jfs2:log=/dev/hd8:vol=/opt:free=false:quota=no time created = Thu Apr 28 20:26:36 2011
time modified = Thu Apr 28 20:40:38 2011
You can clearly see the "time created" for this file system in the example above. TOPICS: AIX, ORACLE, SDD, STORAGE, SYSTEM ADMINISTRATION
RAC OCR and VOTE LUNs

Consisting naming is nog required for Oracle ASM devices, but LUNs used for the OCR and VOTE functions of Oracle RAC environments must have the same device names on all RAC systems. If the names for the OCR and VOTE devices are different, create a new device for each of these functions, on each of the RAC nodes, as follows: First, check the PVIDs of each disk that is to be used as an OCR or VOTE device on all the RAC nodes. For example, if you're setting up a RAC cluster consisting of 2 nodes, called node1 and node2, check the disks as follows:
root@node1 # lspv | grep vpath | grep -i none vpath6 vpath7 vpath8 vpath13 vpath14 00f69a11a2f620c5 00f69a11a2f622c8 00f69a11a2f624a7 00f69a11a2f62f1f 00f69a11a2f63212 None None None None None
root@node2 /root # lspv | grep vpath | grep -i none vpath4 vpath5 vpath6 vpath9 vpath10 00f69a11a2f620c5 00f69a11a2f622c8 00f69a11a2f624a7 00f69a11a2f62f1f 00f69a11a2f63212 None None None None None
As you can see, vpath6 on node 1 is the same disk as vpath4 on node 2. You can determine this by looking at the PVID. Check the major and minor numbers of each device:
root@node1 # cd /dev root@node1 # lspv|grep vpath|grep None|awk '{print $1}'|xargs ls -als 0 brw------0 brw------0 brw------0 brw------0 brw------1 root 1 root 1 root 1 root 1 root system system system system system 47, 47, 47, 6 Apr 28 18:56 vpath6 7 Apr 28 18:56 vpath7 8 Apr 28 18:56 vpath8
47, 13 Apr 28 18:56 vpath13 47, 14 Apr 28 18:56 vpath14
root#node2 # cd /dev root@node2 # lspv|grep vpath|grep None|awk '{print $1}'|xargs ls -als 0 brw------1 root system 47, 4 Apr 29 13:33 vpath4
0 brw------0 brw------0 brw------0 brw-------
1 root 1 root 1 root 1 root
system system system system
47, 47, 47,
5 Apr 29 13:33 vpath5 6 Apr 29 13:33 vpath6 9 Apr 29 13:33 vpath9
47, 10 Apr 29 13:33 vpath10
Now, on each node set up a consisting naming convention for the OCR and VOTE devices. For example, if you wish to set up 2 ORC and 3 VOTE devices: On server node1:
# mknod /dev/ocr_disk01 c 47 6 # mknod /dev/ocr_disk02 c 47 7 # mknod /dev/voting_disk01 c 47 8 # mknod /dev/voting_disk02 c 47 13 # mknod /dev/voting_disk03 c 47 14
On server node2:
mknod /dev/ocr_disk01 c 47 4 mknod /dev/ocr_disk02 c 47 5 mknod /dev/voting_disk01 c 47 6 mknod /dev/voting_disk02 c 47 9 mknod /dev/voting_disk03 c 47 10
This will result in a consisting naming convention for the OCR and VOTE devices on bothe nodes:
root@node1 # ls -als /dev/*_disk* 0 crw-r--r-- 1 root system 0 crw-r--r-- 1 root system 0 crw-r--r-- 1 root system 0 crw-r--r-- 1 root system 0 crw-r--r-- 1 root system 47, 47, 47, 6 May 13 07:18 /dev/ocr_disk01 7 May 13 07:19 /dev/ocr_disk02 8 May 13 07:19 /dev/voting_disk01
47, 13 May 13 07:19 /dev/voting_disk02 47, 14 May 13 07:20 /dev/voting_disk03
root@node2 # ls -als /dev/*_disk* 0 crw-r--r-- 1 root system 0 crw-r--r-- 1 root system 0 crw-r--r-- 1 root system 0 crw-r--r-- 1 root system 0 crw-r--r-- 1 root system 47, 47, 47, 47, 4 May 13 07:20 /dev/ocr_disk01 5 May 13 07:20 /dev/ocr_disk02 6 May 13 07:21 /dev/voting_disk01 9 May 13 07:21 /dev/voting_disk02
47, 10 May 13 07:21 /dev/voting_disk03
TOPICS: AIX, POWERHA / HACMP, SYSTEM ADMINISTRATION
Error in HACMP in LVM

If you run into the following error:
cl_mklv: Operation is not allowed because vg is a RAID concurrent volume group.
This may be caused by the volume group being varied on, on the other node. If it should not be varied on, on the other node, run:
# varyoffvg vg
And then retry the LVM command. If it continues to be a problem, then stop HACMP on both nodes, export the volume group and re-import the volume group on both nodes, and then restart the cluster.
Logical volumes with customized owner / group / mode

Some applications, for example Oracle when using raw logical volumes, may require specific access to logical volumes. Oracle will require that the raw logical volume is owned by the oracle account, and it may or may not require custom permissions. The default values for a logical volume are: dev_uid=0 (owned by user root), dev_gid=0 (owned by group system) and dev_perm=432 (mode 660). You can check the current settings of a logical volume by using the readvgda command:
# readvgda vpath51 | egrep "lvname|dev_|Logical" lvname: dev_uid: dev_gid: dev_perm: testlv (i=2) 0 0 432
If the logical volume was create with or has been modified to use customized owner/group/mode values, the dev_values will show the current uid/gid/perm values, for example:
# chlv -U user -G staff -P 777 testlv # ls -als /dev/*testlv 0 crwxrwxrwx 1 user staff 57, 3 Mar 10 14:39 /dev/rtestlv 0 brwxrwxrwx 1 user staff 57, 3 Mar 10 14:39 /dev/testlv # readvgda vpath51 | egrep "lvname|dev_|Logical" lvname: dev_uid: dev_gid: dev_perm: testlv (i=2) 3878 1 511
When the volume group is exported, and re-imported, this information is lost:
# errpt # exportvg testvg # importvg -y testvg vpath51 testvg # ls -als /dev/*testlv 0 crw-rw---- 1 root system 57, 3 Mar 10 15:11 /dev/rtestlv
0 brw-rw---- 1 root system 57, 3 Mar 10 15:11 /dev/testlv
To avoid this from happening, make sure to use the -R option, that will restore any specific settings:
# varyoffvg testvg # exportvg testvg importvg -Ry testvg vpath51 testvg # ls -als /dev/*testlv 0 crwxrwxrwx 1 user staff 57, 3 Mar 10 15:14 /dev/rtestlv 0 brwxrwxrwx 1 user staff 57, 3 Mar 10 15:14 /dev/testlv
Never use the chown/chmod/chgrp commands to change the same settings on the logical volume. It will work, however, the updates will not be written to the VGDA, and as soon as the volume group is exported out and re-imported on the system, the updates will be gone:
# chlv -U root -G system -P 660 testlv # ls -als /dev/*testlv 0 crw-rw---- 1 root system 57, 3 Mar 10 15:14 /dev/rtestlv 0 brw-rw---- 1 root system 57, 3 Mar 10 15:14 /dev/testlv # chown user.staff /dev/testlv /dev/rtestlv # chmod 777 /dev/testlv /dev/rtestlv # ls -als /dev/*testlv 0 crwxrwxrwx 1 user staff 57, 3 Mar 10 15:14 /dev/rtestlv 0 brwxrwxrwx 1 user staff 57, 3 Mar 10 15:14 /dev/testlv # readvgda vpath51 | egrep "lvname|dev_|Logical" lvname: dev_uid: dev_gid: dev_perm: testlv (i=2) 0 0 360
Notice above how the chlv command changed the owner to root, the group to system, and the permissions to 660. Even after the chown and chmod commands are run, and the changes are visible on the device files in /dev, the changes are not seen in the VGDA. This is confirmed when the volume group is exported and imported, even with using the -R option:
# varyoffvg testvg
# exportvg testvg # importvg -Ry testvg vpath51 testvg # ls -als /dev/*testlv 0 crw-rw---- 1 root system 57, 3 Mar 10 15:23 /dev/rtestlv 0 brw-rw---- 1 root system 57, 3 Mar 10 15:23 /dev/testlv
So, when you have customized user/group/mode settings for logical volumes, and you need to export and import the volume group, always make sure to use the -R option when running importvg. Also, make sure never to use the chmod/chown/chgrp commands on logical volume block and character devices in /dev, but use the chlv command instead, to make sure the VGDA is updated accordingly. Note: A regular volume group does not store any customized owner/group/mode in the VGDA. It is only stored for Big or Scalable volume groups. In case you're using a regular volume group with customized owner/group/mode settings for logical volumes, you will have to use the chmod/chown/chgrp commands to update it, especially after exporting and reimporting the volume group. TOPICS: AIX, SYSTEM ADMINISTRATION
Using colors in Korn Shell

Here are some color codes you can use in the Korn Shell:
## Reset to normal: \033[0m NORM="\033[0m"
## Colors: BLACK="\033[0;30m" GRAY="\033[1;30m" RED="\033[0;31m" LRED="\033[1;31m" GREEN="\033[0;32m" LGREEN="\033[1;32m" YELLOW="\033[0;33m" LYELLOW="\033[1;33m" BLUE="\033[0;34m" LBLUE="\033[1;34m" PURPLE="\033[0;35m" PINK="\033[1;35m" CYAN="\033[0;36m" LCYAN="\033[1;36m"
LGRAY="\033[0;37m" WHITE="\033[1;37m"
## Backgrounds BLACKB="\033[0;40m" REDB="\033[0;41m" GREENB="\033[0;42m" YELLOWB="\033[0;43m" BLUEB="\033[0;44m" PURPLEB="\033[0;45m" CYANB="\033[0;46m" GREYB="\033[0;47m"
## Attributes: UNDERLINE="\033[4m" BOLD="\033[1m" INVERT="\033[7m"
## Cursor movements CUR_UP="\033[1A" CUR_DN="\033[1B" CUR_LEFT="\033[1D" CUR_RIGHT="\033[1C"
## Start of display (top left) SOD="\033[1;1f"
Just copy everyting above and paste it into your shell or in a script. Then, you can use the defined variables:
## Example - Red underlined echo "${RED}${UNDERLINE}This is a test!${NORM}"
## Example - different colors echo "${RED}This ${YELLOW}is ${LBLUE}a ${INVERT}test!${NORM}"
## Example - cursor movement # echo " ${CUR_LEFT}Test"
## Create a rotating thingy while true ; do printf "${CUR_LEFT}/" perl -e "use Time::HiRes qw(usleep); usleep(100000)"
printf "${CUR_LEFT}-" perl -e "use Time::HiRes qw(usleep); usleep(100000)" printf "${CUR_LEFT}\\" perl -e "use Time::HiRes qw(usleep); usleep(100000)" printf "${CUR_LEFT}|" perl -e "use Time::HiRes qw(usleep); usleep(100000)" done
Note that the perl command used above will cause a sleep of 0.1 seconds. Perl is used here, because the sleep command can't be used to sleep less than 1 second. TOPICS: AIX, INSTALLATION, SYSTEM ADMINISTRATION
Compare_report
The compare_report command is a very useful utility to compare the software installed on two systems, for example for making sure the same software is installed on two nodes of a PowerHA cluster. First, create the necessary reports:
# ssh node2 "lslpp -Lc" > /tmp/node2 # lslpp -Lc > /tmp/node1
Next, generate the report. There are four interesting options: -l, -h, -m and -n: -l Generates a report of base system installed software that is at a lower level. -h Generates a report of base system installed software that is at a higher level. -m Generates a report of filesets not installed on the other system. -n Generates a report of filesets not installed on the base system. For example:
# compare_report -b /tmp/node1 -o /tmp/node2 -l #(baselower.rpt) #Base System Installed Software that is at a lower level #Fileset_Name:Base_Level:Other_Level bos.msg.en_US.net.ipsec:6.1.3.0:6.1.4.0 bos.msg.en_US.net.tcp.client:6.1.1.1:6.1.4.0 bos.msg.en_US.rte:6.1.3.0:6.1.4.0 bos.msg.en_US.txt.tfs:6.1.1.0:6.1.4.0 xlsmp.msg.en_US.rte:1.8.0.1:1.8.0.3
# compare_report -b /tmp/node1 -o /tmp/node2 -h #(basehigher.rpt) #Base System Installed Software that is at a higher level #Fileset_Name:Base_Level:Other_Level idsldap.clt64bit62.rte:6.2.0.5:6.2.0.4
idsldap.clt_max_crypto64bit62.rte:6.2.0.5:6.2.0.4 idsldap.cltbase62.adt:6.2.0.5:6.2.0.4 idsldap.cltbase62.rte:6.2.0.5:6.2.0.4 idsldap.cltjava62.rte:6.2.0.5:6.2.0.4 idsldap.msg62.en_US:6.2.0.5:6.2.0.4 idsldap.srv64bit62.rte:6.2.0.5:6.2.0.4 idsldap.srv_max_cryptobase64bit62.rte:6.2.0.5:6.2.0.4 idsldap.srvbase64bit62.rte:6.2.0.5:6.2.0.4 idsldap.srvproxy64bit62.rte:6.2.0.5:6.2.0.4 idsldap.webadmin62.rte:6.2.0.5:6.2.0.4 idsldap.webadmin_max_crypto62.rte:6.2.0.5:6.2.0.4 AIX-rpm:6.1.3.0-6:6.1.3.0-4
# compare_report -b /tmp/node1 -o /tmp/node2 -m #(baseonly.rpt) #Filesets not installed on the Other System #Fileset_Name:Base_Level Java6.sdk:6.0.0.75 Java6.source:6.0.0.75 Java6_64.samples.demo:6.0.0.75 Java6_64.samples.jnlp:6.0.0.75 Java6_64.source:6.0.0.75 WSBAA70:7.0.0.0 WSIHS70:7.0.0.0
# compare_report -b /tmp/node1 -o /tmp/node2 -n #(otheronly.rpt) #Filesets not installed on the Base System #Fileset_Name:Other_Level xlC.sup.aix50.rte:9.0.0.1
FIRMWARE_EVENT
If FIRMWARE_EVENT entries appear in the AIX error log without FRU or location code callout, these events are likely attributed to an AIX memory page deconfiguration event, which is the result of a single memory cell being marked as unusable by the system firmware. The actual error is and will continue to be handled by ECC; however, notification of the unusable bit is also passed up to AIX. AIX in turn migrates the data and deallocates the memory page associated with this event from its memory map. This process is an AIX RAS feature which became available in AIX 5.3 and provides extra memory resilience and is no cause for alarm. Since the failure represents a single bit, a hardware action is NOT warranted.
To suppress logging, the following command will have to be entered and the partition will have to be rebooted to make the change effective:
# chdev -l sys0 -a log_pg_dealloc=false
Check the current status:

# lsattr -El sys0 -a log_pg_dealloc
More information about this function can be found in the "Highly Available POWER Servers for Business-Critical Applications" document which is available at the following link: ftp://ftp.software.ibm.com/common/ssi/rep_wh/n/POW03003USEN/POW03003USEN.PDF(se e pages 17-22 specifically). TOPICS: AIX, NETWORKING, SYSTEM ADMINISTRATION
Using iptrace
The iptrace command can be very useful to find out what network traffic flows to and from an AIX system. You can use any combination of these options, but you do not need to use them all: -a Do NOT print out ARP packets. -s [source IP] Limit trace to source/client IP address, if known. -d [destination IP] Limit trace to destination IP, if known. -b Capture bidirectional network traffic (send and receive packets). -p [port] Specify the port to be traced. -i [interface] Only trace for network traffic on a specific interface. Example: Run iptrace on AIX interface en1 to capture port 80 traffic to file trace.out from a single client IP to a server IP:
# iptrace -a -i en1 -s clientip -b -d serverip -p 80 trace.out
This trace will capture both directions of the port 80 traffic on interface en1 between the clientip and serverip and sends this to the raw file of trace.out. To stop the trace:
# ps -ef|grep iptrace # kill
The ipreport command can be used to transform the trace file generated by iptrace to human readable format:
# ipreport trace.out > trace.report
TOPICS: AIX, INSTALLATION, SYSTEM ADMINISTRATION
How to update the AIX-rpm virtual package
AIX-rpm is a "virtual" package which reflects what has been installed on the system by installp. It is created by the /usr/sbin/updtvpkg script when the rpm.rte is installed, and can be run anytime the administrator chooses (usually after installing something with installp that is required to satisfy some dependency by an RPM package). Since AIX-rpm has to have some sort of version number, it simply reflects the level of bos.rte on the system where /usr/sbin/updtvpkg is being run. It's just informational - nothing should be checking the level of AIX-rpm. AIX doesn't just automatically run /usr/sbin/updtvpkg every time that something gets installed or deinstalled because on some slower systems with lots of software installed, /usr/sbin/updtvpkg can take a LONG time. If you want to run the command manually:
# /usr/sbin/updtvpkg
If you get an error similar to "cannot read header at 20760 for lookup" when running updtvpkg, run a rpm rebuilddb:
# rpm --rebuilddb
Once you run updtvpkg, you can run a rpm -qa to see your new AIX-rpm package. TOPICS: AIX, SYSTEM ADMINISTRATION
PRNG is not SEEDED

If you get a message "PRNG is not SEEDED" when trying to run ssh, you probably have an issue with the /dev/random and/or /dev/urandom devices on your system. These devices are created during system installation, but may sometimes be missing after an AIX upgrade. Check permissions on random numbers generators, the "others" must have "read" access to these devices:
# ls -l /dev/random /dev/urandom crw-r--r-- 1 root system 39, 0 Jan 22 10:48 /dev/random crw-r--r-- 1 root system 39, 1 Jan 22 10:48 /dev/urandom
If the permissions are not set correctly, change them as follows:

# chmod o+r /dev/random /dev/urandom
Now stop and start the SSH daemon again, and retry if ssh works.
# stopsrc -s sshd # startsrc -s sshd
If this still doesn't allow users to use ssh and the same message is produced, or if devices /dev/random and/or /dev/urandom are missing:
# stopsrc -s sshd # rm -rf /dev/random # rm -rf /dev/urandom
# mknod /dev/random c 39 0 # mknod /dev/urandom c 39 1 # randomctl -l # ls -ald /dev/random /dev/urandom # startsrc -s sshd
TOPICS: AIX, BACKUP & RESTORE, LVM, PERFORMANCE, STORAGE,SYSTEM ADMINISTRATION
Using lvmstat
One of the best tools to look at LVM usage is with lvmstat. It can report the bytes read and written to logical volumes. Using that information, you can determine which logical volumes are used the most. Gathering LVM statistics is not enabled by default:
# lvmstat -v data2vg 0516-1309 lvmstat: Statistics collection is not enabled for this logical device. Use -e option to enable.
As you can see by the output here, it is not enabled, so you need to actually enable it for each volume group prior to running the tool using:
# lvmstat -v data2vg -e
The following command takes a snapshot of LVM information every second for 10 intervals:
# lvmstat -v data2vg 1 10
This view shows the most utilized logical volumes on your system since you started the data collection. This is very helpful when drilling down to the logical volume layer when tuning your systems.
# lvmstat -v data2vg
Logical Volume appdatalv loglv00 data2lv
iocnt 306653 34 453
Kb_read 47493022 0 234543
Kb_wrtn 383822 3340 234343
Kbps 103.2 2.8 89.3
What are you looking at here? iocnt: Reports back the number of read and write requests. Kb_read: Reports back the total data (kilobytes) from your measured interval that is read. Kb_wrtn: Reports back the amount of data (kilobytes) from your measured interval that is written. Kbps: Reports back the amount of data transferred in kilobytes per second. You can use the -d option for lvmstat to disable the collection of LVM statistics.
Spreading logical volumes over multiple disks

A common issue on AIX servers is, that logical volumes are configured on only one single disk, sometimes causing high disk utilization on a small number of disks in the system, and impacting the performance of the application running on the server. If you suspect that this might be the case, first try to determine which disks are saturated on the server. Any disk that is in use more than 60% all the time, should be considered. You can use commands such as iostat, sar -d, nmon and topas to determine which disks show high utilization. If the do, check which logical volumes are defined on that disk, for example on an IBM SAN disk:
# lspv -l vpath23
A good idea always is to spread the logical volumes on a disk over multiple disk. That way, the logical volume manager will spread the disk I/O over all the disks that are part of the logical volume, utilizing the queue_depth of all disks, greatly improving performance where disk I/O is concerned. Let's say you have a logical volume called prodlv of 128 LPs, which is sitting on one disk, vpath408. To see the allocation of the LPs of logical volume prodlv, run:
# lslv -m prodlv
Let's also assume that you have a large number of disks in the volume group, in which prodlv is configured. Disk I/O usually works best if you have a large number of disks in a volume group. For example, if you need to have 500 GB in a volume group, it is usually a far better idea to assign 10 disks of 50 GB to the volume group, instead of only one disk of 512 GB. That gives you the possibility of spreading the I/O over 10 disks instead of only one. To spread the disk I/O prodlv over 8 disks instead of just one disk, you can create an extra logical volume copy on these 8 disks, and then later on, when the logical volume is synchronized, remove the original logical volume copy (the one on a single disk vpath408). So, divide 128 LPs by 8, which gives you 16LPs. You can assign 16 LPs for logical volume prodlv on 8 disks, giving it a total of 128 LPs. First, check if the upper bound of the logical volume is set ot at least 9. Check this by running:
# lslv prodlv
The upper bound limit determines on how much disks a logical volume can be created. You'll need the 1 disk, vpath408, on which the logical volume already is located, plus the 8 other disks, that you're creating a new copy on. Never ever create a copy on the same disk. If that single disk fails, both copies of your logical volume will fail as well. It is usually a good idea to set the upper bound of the logical volume a lot higher, for example to 32:
# chlv -u 32 prodlv
The next thing you need to determine is, that you actually have 8 disks with at least 16 free LPs in the volume group. You can do this by running:
# lsvg -p prodvg | sort -nk4 | grep -v vpath408 | tail -8 vpath188 vpath163 vpath208 vpath205 vpath194 vpath24 vpath304 vpath161 active active active active active active active active 959 959 959 959 959 959 959 959 40 42 96 192 240 243 340 413 00..00..00..00..40 00..00..00..00..42 00..00..96..00..00 102..00..00..90..00 00..00..00..48..192 00..00..00..51..192 00..89..152..99..00 14..00..82..125..192
Note how in the command above the original disk, vpath408, was excluded from the list. Any of the disks listed, using the command above, should have at least 1/8th of the size of the logical volume free, before you can make a logical volume copy on it for prodlv. Now create the logical volume copy. The magical option you need to use is "-e x" for the logical volume commands. That will spread the logical volume over all available disks. If you want to make sure that the logical volume is spread over only 8 available disks, and not all the available disks in a volume group, make sure you specify the 8 available disks:
# mklvcopy -e x prodlv 2 vpath188 vpath163 vpath208 \ vpath205 vpath194 vpath24 vpath304 vpath161
Now check again with "mklv -m prodlv" if the new copy is correctly created:
# lslv -m prodlv | awk '{print $5}' | grep vpath | sort -dfu | \ while read pv ; do result=`lspv -l $pv | grep prodlv` echo "$pv $result" done
The output should similar like this:

vpath161 prodlv vpath163 prodlv vpath188 prodlv vpath194 prodlv vpath205 prodlv vpath208 prodlv vpath24 prodlv 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 00..00..16..00..00 00..00..00..00..16 00..00..00..00..16 00..00..00..16..00 16..00..00..00..00 00..00..16..00..00 00..00..00..16..00 00..16..00..00..00 N/A N/A N/A N/A N/A N/A N/A N/A
vpath304 prodlv
Now synchronize the logical volume:

# syncvg -l prodlv
And remove the original logical volume copy:
# rmlvcopy prodlv 1 vpath408
Then check again:

# lslv -m prodlv
Now, what if you have to extend the logical volume prodlv later on with another 128 LPs, and you still want to maintain the spreading of the LPs over the 8 disks? Again, you can use the "e x" option when running the logical volume commands:
# extendlv -e x prodlv 128 vpath188 vpath163 vpath208 \ vpath205 vpath194 vpath24 vpath304 vpath161
You can also use the "-e x" option with the mklv command to create a new logical volume from the start with the correct spreading over disks. TOPICS: AIX, PERFORMANCE, SYSTEM ADMINISTRATION
Creating a CSV file from NMON data

Shown below a script that can be used to create a simple comma separated values file (CSV) from NMON data. If you wish to create a CSV file of the CPU usage on your system, you can grep for "CPU_ALL," in the nmon file. If you want to create a CSV file of the memory usage, grep for "MEM," in the nmon file. The script below creates a CSV file for the CPU usage.
#!/bin/ksh
node=`hostname` rm -f /tmp/cpu_all.tmp /tmp/zzzz.tmp /tmp/${node}_nmon_cpu.csv for nmon_file in `ls /var/msgs/nmon/*nmon` do datestamp=ècho ${nmon_file} | cut -f2 -d"_"` grep CPU_ALL, $nmon_file > /tmp/cpu_all.tmp grep ZZZZ $nmon_file > /tmp/zzzz.tmp grep -v "CPU Total " /tmp/cpu_all.tmp | sed "s/,/ /g" | \ while read NAME TS USER SYS WAIT IDLE rest do timestamp=`grep ${TS} /tmp/zzzz.tmp | awk 'FS=","{print $4" "$3}'` TOTAL=ècho "scale=1;${USER}+${SYS}" | bc` echo $timestamp,$USER,$SYS,$WAIT,$IDLE,$TOTAL >> \ /tmp/${node}_nmon_cpu.csv done rm -f /tmp/cpu_all.tmp /tmp/zzzz.tmp done
Note: the script assumes that you've stored the NMON output files in /var/msgs/nmon. Update the script to the folder you're using to store NMON files.
Difference between major and minor numbers

A major number refers to a type of device, and a minor number specifies a particular device of that type or sometimes the operation mode of that device type. Example:
# lsdev -Cc tape rmt0 Available 3F-08-02 IBM 3580 Ultrium Tape Drive (FCP) rmt1 Available 3F-08-02 IBM 3592 Tape Drive (FCP) smc0 Available 3F-08-02 IBM 3576 Library Medium Changer (FCP)
In the list above: rmt1 is a standalone IBM 3592 tape drive; rmt0 is an LTO4 drive of a library; smc0 is the medium changer (or robotic part) of above tape library. Now look at their major and minor numbers:
# ls -l /dev/rmt* /dev/smc* crw-rw-rwT 1 root system 38, 0 Nov 13 17:40 /dev/rmt0 crw-rw-rwT 1 root system 38,128 Nov 13 17:40 /dev/rmt1 crw-rw-rwT 1 root system 38, 1 Nov 13 17:40 /dev/rmt0.1 crw-rw-rwT 1 root system 38, 66 Nov 13 17:40 /dev/smc0
All use IBM tape device driver (and so have the same major number of 38), but actually they are different entities (with minor number of 0, 128 and 66 respectively). Also, compare rmt0 and rmt0.1. It's the same device, but with different mode of operation. TOPICS: AIX, SYSTEM ADMINISTRATION
Longer login names

User names can only be eight characters or fewer in AIX version 5.2 and earlier. Starting with AIX version 5.3, IBM increased the maximum number of characters to 255. To verify the setting in AIX 5.3 and later, you can extract the value from getconf:
# getconf LOGIN_NAME_MAX 9
Or use lsattr:
# lsattr -El sys0 -a max_logname max_logname 9 Maximum login name length at boot time True
To change the value, simply adjust the v_max_logname parameter (shown as max_logname in lsattr) using chdev to the maximum number of characters desired plus one to accommodate the terminating character. For example, if you want to have user names that are 128 characters long, you would adjust the v_max_logname parameter to 129:
# chdev -l sys0 -a max_logname=129
sys0 changed
Please note that this change will not go into effect until you have rebooted the operating system. Once the server has been rebooted, you can verify that the change has taken effect:
Keep in mind, however, that if your environment includes IBM RS/6000 servers prior to AIX version 5.3 or operating systems that cannot handle user names longer than eight characters and you rely on NIS or other authentication measures, it would be wise to continue with the eight-character user names. TOPICS: AIX, INSTALLATION, NIM, SYSTEM ADMINISTRATION
Nimadm
A very good article about migrating AIX from version 5.3 to 6.1 can be found on the following page of IBM developerWorks: http://www.ibm.com/developerworks/aix/library/au-migrate_nimadm/index.html?ca=drs For a smooth nimadm process, make sure that you clean up as much filesets of your server as possible (get rid of the things you no longer need). The more filesets that need to be migrated, the longer the process will take. Also make sure that openssl/openssh is up-to-date on the server to be migrated; this is likely to break when you have old versions installed. Very useful is also a gigabit Ethernet connection between the NIM server and the server to be upgraded, as the nimadm process copies over the client rootvg to the NIM server and back. The log file for a nimadm process can be found on the NIM server in /var/adm/ras/alt_mig. TOPICS: AIX, INSTALLATION, NIM, SYSTEM ADMINISTRATION
Adding a fileset to a SPOT

For example, if you wish to add the bos.alt_disk_install.rte fileset to a SPOT: List the available spots:
# lsnim -t spot | grep 61 SPOTaix61tl05sp03 SPOTaix61tl03sp07 resources resources spot spot
List the available lpp sources:

# lsnim -t lpp_source | grep 61 LPPaix61tl05sp03 LPPaix61tl03sp07 resources resources lpp_source lpp_source
Check if the SPOT already has this file set:

# nim -o showres SPOTaix61tl05sp03 | grep -i bos.alt
No output is shown. The fileset is not part of the SPOT. Check if the LPP Source has the file set:
# nim -o showres LPPaix61tl05sp03 | grep -i bos.alt bos.alt_disk_install.boot_images 6.1.5.2 bos.alt_disk_install.rte 6.1.5.1 I I N usr
N usr,root
Install the first fileset (bos.alt_disk_install.boot_images) in the SPOT. The other fileset is a prerequisite of the first fileset and will be automatically installed as well.
# nim -o cust -a filesets=bos.alt_disk_install.boot_images -a lpp_source=LPPaix61tl05sp03 SPOTaix61tl05sp03
Note: Use the -F option to force a fileset into the SPOT, if needed (e.g. when the SPOT is in use for a client). Check if the SPOT now has the fileset installed:
# nim -o showres SPOTaix61tl05sp03 | grep -i bos.alt bos.alt_disk_install.boot_images bos.alt_disk_install.rte 6.1.5.1 C F Alternate Disk Installation
Restrict a user to FTP to a particular folder

You can restrict a certain user account to only access a single folder. This is handled by file /etc/ftpaccess.ctl. There's a manual page available within AIX on file ftpaccess.ctl:
# man ftpaccess.ctl
In general, file /etc/ftpusers controls which accounts are allowed to FTP to a server. So, if this file exists, you will have to add the account to this file. Here's an example of what you would set in the ftpaccess.ctl if you wanted user ftp to have login to /home/ftp. The user will be able to change directory forward, but not outside this directory. Also, when user ftp logs in and runs pwd it will show only "/" and not "/home/ftp".
# cat /etc/ftpaccess.ctl useronly: ftp
If the user is required to write files to the server with specific access, for example, read and write access for user, group and others, then this can be accomplished by the user itself by running the FTP command:
ftp> site umask 111 200 UMASK set to 111 (was 027) ftp> site umask 200 Current UMASK is 111
To further restrict the FTP account to a server, especially for accounts that are only used for FTP purposes, make sure to disable login and remote login for the account via smitty user. TOPICS: AIX, SYSTEM ADMINISTRATION
PS1
The following piece of code fits nicely in the /etc/profile file. It makes sure the PS1, the prompt is set in such a way, that you can see who is logged in at what system and what the current path is. At the same time it also sets the window title the same way.
H=ùname -n` if [ $(whoami) = "root" ] ; then PS1='^[]2;${USER}@(${H}) ${PWD##/*/}^G^M${USER}@(${H}) ${PWD##/*/} # ' else PS1='^[]2;${USER}@(${H}) ${PWD##/*/}^G^M${USER}@(${H}) ${PWD##/*/} $ ' fi
Note: to type the special characters, such as ^], you have to type first CRTL-V, and then CTRL-]. Likewise for ^G: type it as CTRL-V and then CTRL-G. Second note: the escape characters only work properly when setting the window title using PuTTY. If you or any of your users use Reflection to access the servers, the escape codes don't work. In that case, shorten it to:
if [ $(whoami) = "root" ] ; then PS1='${USER}@(${H}) ${PWD##/*/} # ' else PS1='${USER}@(${H}) ${PWD##/*/} $ ' fi
TOPICS: AIX, NETWORKING, SYSTEM ADMINISTRATION
IP alias
To configure IP aliases on AIX: Use the ifconfig command to create an IP alias. To have the alias created when the system starts, add the ifconfig command to the /etc/rc.net script. The following example creates an alias on the en1 network interface. The alias must be defined on the same subnet as the network interface.
# ifconfig en1 alias 9.37.207.29 netmask 255.255.255.0 up
The following example deletes the alias:

# ifconfig en1 delete 9.37.207.29
TOPICS: AIX, LINUX, SYSTEM ADMINISTRATION
VIM Swap and backup files

VIM on many different types of installations will create both swap files and backup files. How to disable VIM swap and backup files: Go into your _vimrc file. Add these lines to the bottom:
set nobackup set nowritebackup set noswapfile
TOPICS: AIX, SDD, STORAGE AREA NETWORK, SYSTEM ADMINISTRATION
Method error when running cfgmgr

If you see the following error when running cfgmgr:
Method error (/usr/lib/methods/fcmap >> /var/adm/essmap.out): 0514-023 The specified device does not exist in the customized device configuration database.
This is caused when you have ESS driver filesets installed, but no ESS (type 2105) disks in use on the system. Check the type of disks by running:
# lsdev -Cc disk | grep 2105
If no type 2105 disks are found, you can uninstall any ESS driver filesets:
# installp -u ibm2105.rte ibmpfe.essutil.fibre.data ibmpfe.essutil.rte
Olson time zone support

"The public-domain time zone database contains code and data that represent the history of local time for many representative locations around the globe. It is updated periodically to reflect changes made by political bodies to time zone boundaries, UTC offsets, and daylightsaving rules. This database (often called tz or zoneinfo) is used by several implementations. Each location in the database represents a national region where all clocks keeping local time have agreed since 1970. Locations are identified by continent or ocean and then by the name of the location, which is typically the largest city within the region. For example, America/New_York represents most of the US eastern time zone; America/Phoenix represents most of Arizona, which uses mountain time without daylight saving time (DST); America/Detroit represents most of Michigan, which uses eastern time but with different DST rules in 1975; and other entries represent smaller regions like Starke County, Indiana, which switched from central to eastern time in 1991 and switched back in 2006." (from http://www.twinsun.com/tz/tz-link.htm) The public-domain time zone database is also widely known as the Olson time zone database and is the architecture on which the International Components for Unicode (ICU) and the Common Locale Data Repository (CLDR) time zone support relies. In previous AIX releases, the method by which the operating system supports time zone conventions is based on the POSIX time zone specification. In addition to this industry standard approach, AIX V6.1 recognizes and processes the Olson time zone naming conventions to facilitate support for a comprehensive set of time zones.
This enhancement leverages the uniform time zone naming convention of the Olson database to offer an intuitive set of time zone values that can be assigned to the TZ time zone environment variable. Note: Time zone definitions conforming to the POSIX specification are still supported and recognized by AIX. AIX checks the TZ environment variable to determine if the environment variable follows the POSIX specification rules. If the TZ environment variable does not match the POSIX convention, AIX calls the ICU library to get the Olson time zone translation. The use of the Olson database for time zone support within AIX provides significant advantages over the traditional POSIX rules. One of the biggest advantages is that Olson database maintains a historical record of what the time zone rules were at given points in time, so that if the rules change in a particular location, dates and times can be interpreted correctly both in the present and past. A good example of this is the US state of Indiana, which just began using daylight saving time in the year 2006. Under the POSIX implementation, Indiana would have to set its time zone value to EST5EDT, which would format current dates correctly using daylight saving time, but would also format times from previous years as though they were on daylight saving time, which is incorrect. Use of the ICU API set for time zones also allows for localized display names for time zones. For example, Central Daylight Saving Time would have an abbreviation of CDT for all locales under a POSIX implementation, but under ICU/Olson, it displays properly as HAC (Heure Avance du Centre) in a French locale. As in previous AIX releases, system administrators can rely on the Systems Management Interface Tool (SMIT) to configure the time zone by using system defined values for the TZ environment variable. To accomplish this task, enter the main SMIT menu and select System Environments, Change / Show Date and Time to access the Change Time Zone Using System Defined Values menu. Alternatively, the SMIT fast path chtz_date will directly open the Change / Show Date and Time menu. Selecting the Change Time Zone Using System Defined Values option will prompt SMIT to open the Select COUNTRY or REGION menu.
SMIT uses the undocumented /usr/lib/nls/lstz -C command to produce the list of available countries and regions. Note that undocumented commands and features are not officially supported for customer use, are not covered by the AIX compatibility statement, and may be subject to change without notice. After you have chosen the country or region in the Select COUNTRY or REGION menu, a new selection menu will list all available time zones for the country or region in question.
The selected value of the first column will be passed by SMIT to the chtz command, which in turn will change the TZ variable value in the /etc/environment system level configuration file. As with previous AIX releases, time zone configuration changes always require a system reboot to become effective. SMIT uses the internal /usr/lib/nls/lstz -c command to produce the list of available time zones for a given country and region. The -c flag uses a country or region designation as the input parameter. The /usr/lib/nls/lstz -C command provides a list of available input parameters. The /usr/lib/nls/lstz command used without any flag provides a full list of all Olson time zones available on AIX. Note that undocumented commands and features are not officially supported for customer use, are not covered by the AIX compatibility statement, and may be subject to change without notice.
Changing a password without a prompt

If you have to change the password for a user, and you have a need to script this, for example if you have the password for multiple users, or on several different servers, then here's an easy way to change the password for a user, without having to type the password on the command line prompt:
# echo "user:password" | chpasswd
Sendmail tips
To find out if sendmail is running:

# ps -ef | grep sendmail
To stop and restart sendmail:

# stopsrc -s sendmail # startsrc -s sendmail -a "-bd -q30m"
Or:
# refresh -s sendmail
Use the -v flag on the mail command for "verbose" output. This is especially useful if you can't deliver mail, but also don't get any errors. E.g.:
# cat /etc/motd |mailx -v -s"test" email@address.com
To get sendmail to work on a system without DNS, create and/or edit /etc/netsvc.conf. It should contain 1 line only:
hosts=local
If you see the following error in the error report when starting sendmail:
DETECTING MODULE 'srchevn.c'@line:'355' FAILING MODULE sendmail
Then verify that your /etc/mail/sendmail.cf file is correct, and/or try starting the sendmail daemon as follows (instead of just running "startsrc -s sendmail"):
# startsrc -s sendmail -a "-bd -q30m"
More tips can be found here: http://www.angelfire.com/il2/sgillen/sendmail.html TOPICS: AIX, BACKUP & RESTORE, SYSTEM ADMINISTRATION, VERITAS NETBACKUP
Client logging with Veritas NetBackup

If an AIX server is backed up by Veritas NetBackup, then this is how you can enable logging of the backups on your AIX client: First, make sure the necessary folders exist in /usr/openv/netbackup/logs, and the access is set to 777, by running:
# mkdir bp bparchive bpbackup bpbkar bpcd bpdbsbora # mkdir bpfilter bphdb bpjava-msvc bpjava-usvc bpkeyutil # mkdir bplist bpmount bpnbat bporaexp bporaexp64 # mkdir bporaimp bporaimp64 bprestore db_log dbclient # mkdir symlogs tar user_ops # chmod 777 *
Then, you have to change the default debug level in /usr/openv/netbackup/bp.conf, by adding:
VERBOSE = 2
By default, VERBOSE is set to one, which means there isn't any logging at all, so that is not helpful. You can go up to "VERBOSE = 5", but that may create very large log files, and this
may fill up the file system. In any case, check how much disk space is available in /usr before enabling the logging of the Veritas NetBackup client. Backups through Veritas NetBackup are initiated through inetd:
# egrep "bpcd" /etc/services bpcd bpcd 13782/tcp # VERITAS NetBackup 13782/udp # VERITAS NetBackup
# grep bpcd /etc/inetd.conf bpcd stream tcp nowait root /usr/openv/netbackup/bin/bpcd bpcd
Now all you have to do is wait for the NetBackup server (the one listed in /usr/openv/netbackup/bp.conf) to start the backup on the AIX client. After the backup has run, you should at least find a log file in the bpcd and bpbkar folders in /usr/openv/netbackup. TOPICS: AIX, HMC, SYSTEM ADMINISTRATION
How to change the HMC password (of user hscroot)

You can ssh as user hscroot to the HMC, and change the password this way:
hscroot@hmc> chhmcusr -u hscroot -t passwd Enter the new password: Retype the new password:
NTP slewing in clusters

In order to keep the system time synchronized with other nodes in an HACMP cluster or across the enterprise, Network Time Protocol (NTP) should be implemented. In its default configuration, NTP will periodically update the system time to match a reference clock by resetting the system time on the node. If the time on the reference clock is behind the time of the system clock, the system clock will be set backwards causing the same time period to be passed twice. This can cause internal timers in HACMP and Oracle databases to wait longer periods of time under some circumstances. When these circumstances arise, HACMP may stop the node or the Oracle instance may shut itself down. Oracle will log an ORA-29740 error when it shuts down the instance due to inconsistent timers. The hatsd daemon utilized by HACMP will log a TS_THREAD_STUCK_ER error in the system error log just before HACMP stops a node due to an expired timer. To avoid this issue, system managers should configure the NTP daemon to increment time on the node slower until the system clock and the reference clock are in sync (this is called "slewing" the clock) instead of resetting the time in one large increment. The behavior is configured with the -x flag for the xntpd daemon. To check the current running configuration of xntpd for the -x flag:
# ps -aef | grep xntpd | grep -v grep
root
409632
188534
0 11:46:45
0:00 /usr/sbin/xntpd
To update the current running configuration of xntpd to include the -x flag:

# chssys -s xntpd -a "-x" 0513-077 Subsystem has been changed. # stopsrc -s xntpd 0513-044 The /usr/sbin/xntpd Subsystem was requested to stop. # startsrc -s xntpd 0513-059 The xntpd Subsystem has been started. Subsystem PID is 40932. # ps -f | grep xntpd | grep -grep root 409632 188534 0 11:46:45 0:00 /usr/sbin/xntpd -x
TOPICS: AIX, PERFORMANCE, STORAGE, SYSTEM ADMINISTRATION
Creating a RAM disk on AIX

The AIX mkramdisk command allows system administrators to create memory-resident file systems. The performance benefits of using RAM disk can be astonishing. The unload of a large TSM database was reduced from 40 hours on SAN disks down to 10 minutes using RAM disk. The configuration of a RAM disk file system is very simple and takes just a few minutes. Once the file system is mounted, it can be used like any other file system. There are three steps involved: creating the RAM disk, making the file system and then mounting the file system. First, we create the RAM disk, specifying the size we want. Let's create a RAM disk of 4 GB:
# mkramdisk 4G
The system will assign the next available RAM disk. Since this is our first one, it will be assigned the name ramdisk0:
# ls -l /dev/ram* brw------1 root system 46, 0 Sep 22 08:01 /dev/ramdisk0
If there isn't sufficient available memory to create the RAM disk you have requested, the mkramdisk command will alert you. Free up some memory or create a smaller size RAM disk. You can use Dynamic LPAR on the HMC or IVM to assign more memory to your partition. We could use the RAM disk /dev/ramdisk0 as a raw logical volume, but here were going to create and mount a JFS2 file system. Here's how to create the file system using the RAM disk as its logical volume:
# mkfs -V jfs2 /dev/ramdisk0
Now create the mount point:

# mkdir -p /ramdisk0
And mount the file system:

# mount -V jfs2 -o log=NULL /dev/ramdisk0 /ramdisk0
Note: mounting a JFS2 file system with logging disabled (log=NULL) only works in AIX 6.1. On AIX 5.3, here are the steps to create the ramdisk:
# mkramdisk 4G # mkfs -V jfs /dev/ramdisk0 # mkdir /ramdisk0 # mount -V jfs -o nointegrity /dev/ramdisk0 /ramdisk0
You should now be able to see the new file system using df and you can write to it as you would any other file system. When you're finished, unmount the file system and then remove the ramdisk using the rmramdisk command.
# rmramdisk ramdisk0
Core file naming

Before AIX 5L Version 5.1, a core file was always stored in a file named core. If the same or another application generated another core file before you renamed the previous one, the original content was lost. Beginning with AIX 5L Version 5.1, you can enable a unique naming of core files, but be aware that the default behavior is to name the files core. You apply the new enhancement by setting the environment variable CORE_NAMING to a non-NULL value, for example:
CORE_NAMING=yes
After setting CORE_NAMING, you can disable this feature by setting the variable to the NULL value. For example, if you are using the Korn shell, do the following:
export CORE_NAMING=
After setting CORE_NAMING, all new core will be stored in files of the format core.pid.ddhhmmss, where: pid: Process ID dd: Day of the month hh: Hours mm: Minutes ss: Seconds In the following example, two core files are generated by a process identified by PID 30480 at different times:
# ls -l core* -rw-r--r--rw-r--r-1 user group 1 user group 8179 Jan 28 2010 8179 Jan 28 2010 core.30480.28232347 core.30482.28232349
The time stamp used is in GMT and your time zone will not be used. Also check out the lscore and the chcore commands, which can also be used to list and set
core naming. These commands can also be set to define a core location, and to turn core compression on. TOPICS: AIX, SECURITY, SYSTEM ADMINISTRATION
Clearing password history

Sometimes when password rules are very strict, a user may have problems creating a new password that is both easy to remember, and still adheres to the password rules. To aid the user, it could be useful to clear the password history for his or her account. The password history is stored in /etc/secuirty/pwdhist.pag. The command you use to remove the password history is:
# chuser histsize=0 username
Actually, this command not only removes the password history, but also changes the setting of histsize for the account to zero, meaning, that a user is never checked again on re-using old passwords. After running the command above, you may want to set it back again to the default value:
# grep -p ^default /etc/security/user | grep histsize histsize = 20
TOPICS: AIX, LINUX, MONITORING, SECURITY, SYSTEM ADMINISTRATION
Sudosh
Sudosh is designed specifically to be used in conjunction with sudo or by itself as a login shell. Sudosh allows the execution of a root or user shell with logging. Every command the user types within the root shell is logged as well as the output. This is different from "sudo -s" or "sudo /bin/sh", because when you use one of these instead of sudosh to start a new shell, then this new shell does not log commands typed in the new shell to syslog; only the fact that a new shell started is logged. If this newly started shell supports commandline history, then you can still find the commands called in the shell in a file such as .sh_history, but if you use a shell such as csh that does not support command-line logging you are out of luck. Sudosh fills this gap. No matter what shell you use, all of the command lines are logged to syslog (including vi keystrokes). In fact, sudosh uses the script command to log all key strokes and output. Setting up sudosh is fairly easy. For a Linux system, first download the RPM of sudosh, for example from rpm.pbone.net. Then install it on your Linux server:
# rpm -ihv sudosh-1.8.2-1.2.el4.rf.i386.rpm Preparing... 1:sudosh ########################################### [100%] ########################################### [100%]
Then, go to the /etc file system and open up /etc/sudosh.conf. Here you can adjust the default shell that is started, and the location of the log files. Default, the log directory is /var/log/sudosh. Make sure this directory exists on your server, or change it to another existing directory in the sudosh.conf file. This command will set the correct authorizations on the log directory:
# sudosh -i [info]: chmod 0733 directory /var/log/sudosh
Then, if you want to assign a user sudosh access, edit the /etc/sudoers file by running visudo, and add the following line:
username ALL=PASSWD:/usr/bin/sudosh
Now, the user can login, and run the following command to gain root access:
$ sudo sudosh Password: # whoami root
Now, as a sys admin, you can view the log files created in /var/log/sudosh, but it is much cooler to use the sudosh-replay command to replay (like a VCR) the actual session, as run by the user with the sudosh access. First, run sudosh-replay without any paramaters, to get a list of sessions that took place using sudosh:
# sudosh-replay Date ==== Duration From To ======== ==== == ID ==
09/16/2010 6s
root root root-root-1284653707-GCw26NSq
Usage: sudosh-replay ID [MULTIPLIER] [MAXWAIT] See 'sudosh-replay -h' for more help. Example: sudosh-replay root-root-1284653707-GCw26NSq 1 2
Now, you can actually replay the session, by (for example) running:
# sudosh-replay root-root-1284653707-GCw26NSq 1 5
The first paramtere is the session-ID, the second parameter is the multiplier. Use a higher value for multiplier to speed up the replay, while "1" is the actual speed. And the third parameter is the max-wait. Where there might have been wait times in the actual session, this parameter restricts to wait for a maximum max-wait seconds, in the example above, 5 seconds. For AIX, you can find the necessary RPM here. It is slightly different, because it installs in /opt/freeware/bin, and also the sudosh.conf is located in this directory. Both Linux and AIX require of course sudo to be installed, before you can install and use sudosh. TOPICS: AIX, SECURITY, SYSTEM ADMINISTRATION
SUID
Always watch out for files with the SUID bit set; especially if these are files that are not on the AIX system by default. Before any vendor or application team installs additional software on the AIX system, it may be worth file to run the following command, to discover any files with the SUID bit set:
# find / $ -perm -2000 -o -perm -4000 $ -type f -ls
Save the output of this command for later reference. Once the vendor or application team is done installing their application and/or database software, run the same command again, to discover if any newly created files exist, especially those that are owned by user root and have the SUID bit set. This allows other users to run the command as if they we're root. The SUID bit can only be set on binary executables on AIX (starting with release 3.2.5 of AIX). Other UNIX operating systems, such as Fedora, may allow scripts to run with SUID bits set. On AIX it is allowed to set the SUID bit on a script, but AIX simply ignores it, and runs the script as the user who started the script, not using the account that owns the script, because this would be a huge security hole. However, it is still very easy to write a C program that does the trick. The following example is a program called "sushi". The source code of the program, sushi.c, looks like this:
#include <stdio.h> #include <time.h>
char *getlogin();
#define LOG_FILE "/tmp/sushilog"
main(argc, argv) int argc; char **argv; { char buf[1024], *p=buf; int i, t; FILE *log; char msg[BUFSIZ], *ct, *name;
*p='\0'; for (i=1; i<argc; ++i) { strcpy(p, argv[i]); p += strlen(argv[i]);
if (i < argc-1) { *p = ' '; ++p; *p = '\0'; } }
time(&t); ct = ctime(&t); name = getlogin();
setuid(0);
log = fopen(LOG_FILE, "a"); if (!log) printf("Couldn't open log file!\n"); else { sprintf(msg, "SUSHI: %s fputs(msg, log); fclose(log); system(buf); } } %s %s\n", name, buf, ct);
The makefile looks like this (and makes sure the SUID bit is set when running "make":
################################################ # Make rules #
################################################
all:
sushi
sushi: sushi.o $(CC) -o $@ sushi.o
clean: rm -f *.o sushi
install: cp -p sushi /bin chown root /bin/sushi
chmod a+rx chmod u+s
/bin/sushi /bin/sushi
################################################ # Source/object rules #
################################################
sushi.o: sushi.c gcc -c $*.c
Now, if this file is compiled as user root, a program called /bin/sushi will exists; it will be owned by user root, and the SUID will be set:
# ls -als /bin/sushi 8 -rwsr-xr-x 1 root root 6215 Sep 9 09:21 /bin/sushi
The sushi program basically takes everything entered as a parameter on the command line, and runs it. So if the file is owned by user root, it will run the parameter as user root. For example, if you would want to open a Korn shell as a regular user, and get root access:
$ /bin/sushi ksh # whoami root
This is something that you want to avoid. Even vendors are known to build backdoors like these into their software. The find command shown at the beginning of this article will help you discover commands as these. Note that the good thing of the sushi program shown above is, that it will write an entry into log file /tmp/sushilog each time someone uses the command. To avoid users being able to run commands with the SUID set, you may want to add the "nosuid" option in /etc/filesystems for each file system:
/exports/install: dev vfs nodename mount options account = "/exports/install" = nfs = fileserver.company.com = true = ro,bg,hard,intr,nodev,nosuid,sec=sys = false
Especially for (permanently) NFS mounted file systems, it is a VERY good idea to have this nosuid option set, avoiding someone to create a sushi-like program on a NFS server, and being able to run the program as a regular user on the NFS client system, to gain root access on the NFS client; or if you want to mount a NFS share on a client temporarily, enable the nosuid by running:
# mount -o nosuid server:/filesystem /mountpoint
Check if it is set by running:
# mount | grep nfs server /filesystem /mountpoint nfs3 Sep 09 09:30 nosuid
Truss
To get more information on what a specific process is doing, you can get the truss command. That may be very useful, for example when a process appears to be hanging. For example, if you want to know what the "recover" process is doing, first look up the PID of this process:
# ps -ef | grep -i recover | grep -v grep root 348468 373010 0 17:30:25 pts/1 0:00 recover -f -a /etc
Then, run the truss command using that PID:

cscnimmaster# truss -p 348468 kreadv(0, 0x00000000, 0, 0x00000000) (sleeping...)
This way, you can see the process is actually sleeping. TOPICS: AIX, INSTALLATION, SYSTEM ADMINISTRATION
Translate hardware address to physical location

This is how to translate a hardware address to a physical location: The command lscfg shows the hardware addresses of all hardware. For example, the following command will give you more detail on an individual device (e.g. ent1):
# lscfg -pvl ent1 ent1 U788C.001.AAC1535-P1-T2 2-Port 10/100/1000 Base-TX PCI-X Adapter
2-Port 10/100/1000 Base-TX PCI-X Adapter: Network Address.............001125C5E831 ROM Level.(alterable).......DV0210 Hardware Location Code......U788C.001.AAC1535-P1-T2
PLATFORM SPECIFIC
Name: ethernet Node: ethernet@1,1 Device Type: network Physical Location: U788C.001.AAC1535-P1-T2
This ent1 device is an 'Internal Port'. If we check ent2 on the same box:
# lscfg -pvl ent2 ent2 U788C.001.AAC1535-P1-C13-T1 2-Port 10/100/1000 Base-TX PCI-X
2-Port 10/100/1000 Base-TX PCI-X Adapter: Part Number.................03N5298 FRU Number..................03N5298 EC Level....................H138454 Brand.......................H0 Manufacture ID..............YL1021 Network Address.............001A64A8D516 ROM Level.(alterable).......DV0210 Hardware Location Code......U788C.001.AAC1535-P1-C13-T1
PLATFORM SPECIFIC
Name: ethernet Node: ethernet@1 Device Type: network Physical Location: U788C.001.AAC1535-P1-C13-T1
This is a device on a PCI I/O card. For a physical address like U788C.001.AAC1535-P1-C13-T1: U788C.001.AAC1535 - This part identifies the 'system unit/drawer'. If your system is made up of several drawers, then look on the front and match the ID to this section of the address. Now go round the back of the server. P1 - This is the PCI bus number. You may only have one. C13 - Card Slot C13. They are numbered on the back of the server. T1 - This is port 1 of 2 that are on the card. Your internal ports won't have the Card Slot numbers, just the T number, representing the port. This should be marked on the back of your server. E.g.: U788C.001.AAC1535-P1-T2 means unit U788C.001.AAC1535, PCI bus P1, port T2 and you should be able to see T2 printed on the back of the server. TOPICS: AIX, INSTALLATION, SYSTEM ADMINISTRATION
install_all_updates
A usefull command to update software on your AIX server is install_all_updates. It is similar to running smitty update_all, but it works from the command line. The only thing you need to provide is the directory name, for example:
# install_all_updates -d .
This installs all the software updates from the current directory. Of course, you will have to make sure the current directory contains any software. Don't worry about generating a Table Of Contents (.toc) file in this directory, because install_all_updates generates one for you. By default, install_all_updates will apply the filesets. Use -c to commit any software. Also, by default, it will expand any file systems; use -x to prevent this behavior). It will install any
requisites by default (use -n to prevent). You can use -p to run a preview, and you can use -s to skip the recommended maintenance or technology level verification at the end of the install_all_updates output. You may have to use the -Y option to agree to all licence agreements. To install all available updates from the cdrom, and agree to all license agreements, and skip the recommended maintenance or technology level verification, run:
# install_all_updates -d /cdrom -Y -s
TOPICS: AIX, POWERHA / HACMP
AIX 5.3 end-of-service

The EOM date (end of marketing) has been announced for AIX 5.3: 04/11; meaning that AIX 5.3 will no longer be marketed by IBM from April 2011, and that it is now time for customers to start thinking about upgrading to AIX 6.1. The EOS (end of service) date for AIX 5.3 is 04/12, meaning AIX 5.3 will be serviced by IBM until April 2012. After that, IBM will only service AIX 5.3 for an additional fee. The EOL (end of life) date is 04/16, which is the end of life date at April 2016. The final technology level for AIX 5.3 is technology level 12. Some service packs for TL12 will be released though. IBM has also announced EOM and EOS dates for HACMP 5.4 and PowerHA 5.5, so if you're using any of these versions, you also need to upgrade to PowerHA 6.1: Sep 30, 2010: EOM HACMP 5.4, PowerHA 5.5 Sep 30, 2011: EOS HACMP 5.4 Sep 30, 2012: EOS HACMP 5.5 TOPICS: AIX, EMC, INSTALLATION, POWERHA / HACMP, STORAGE AREA NETWORK, SYSTEM ADMINISTRATION
Quick setup guide for HACMP

Use this procedure to quickly configure an HACMP cluster, consisting of 2 nodes and disk heartbeating. Prerequisites: Make sure you have the following in place: Have the IP addresses and host names of both nodes, and for a service IP label. Add these into the /etc/hosts files on both nodes of the new HACMP cluster. Make sure you have the HACMP software installed on both nodes. Just install all the filesets of the HACMP CD-ROM, and you should be good. Make sure you have this entry in /etc/inittab (as one of the last entries):
clinit:a:wait:/bin/touch /usr/es/sbin/cluster/.telinit
In case you're using EMC SAN storage, make sure you configure you're disks correctly as hdiskpower devices. Or, if you're using a mksysb image, you may want to follow this procedure EMC ODM cleanup. Steps:
Create the cluster and its nodes:

# smitty hacmp Initialization and Standard Configuration Configure an HACMP Cluster and Nodes
Enter a cluster name and select the nodes you're going to use. It is vital here to have the hostnames and IP address correctly entered in the /etc/hosts file of both nodes. Create an IP service label:
# smitty hacmp Initialization and Standard Configuration Configure Resources to Make Highly Available Configure Service IP Labels/Addresses Add a Service IP Label/Address
Enter an IP Label/Address (press F4 to select one), and enter a Network name (again, press F4 to select one). Set up a resource group:
# smitty hacmp Initialization and Standard Configuration Configure HACMP Resource Groups Add a Resource Group
Enter the name of the resource group. It's a good habit to make sure that a resource group name ends with "rg", so you can recognize it as a resource group. Also, select the participating nodes. For the "Fallback Policy", it is a good idea to change it to "Never Fallback". This way, when the primary node in the cluster comes up, and the resource group is up-and-running on the secondary node, you won't see a failover occur from the secondary to the primary node. Note: The order of the nodes is determined by the order you select the nodes here. If you put in "node01 node02" here, then "node01" is the primary node. If you want to have this any other way, now is a good time to correctly enter the order of node priority. Add the Servie IP/Label to the resource group:
# smitty hacmp Initialization and Standard Configuration Configure HACMP Resource Groups Change/Show Resources for a Resource Group (standard)
Select the resource group you've created earlier, and add the Service IP/Label.
Run a verification/synchronization:
# smitty hacmp Extended Configuration Extended Verification and Synchronization
Just hit [ENTER] here. Resolve any issues that may come up from this synchronization attempt. Repeat this process until the verification/synchronization process returns "Ok". It's a good idea here to select to "Automatically correct errors". Start the HACMP cluster:
# smitty hacmp System Management (C-SPOC) Manage HACMP Services Start Cluster Services
Select both nodes to start. Make sure to also start the Cluster Information Daemon. Check the status of the cluster:
# clstat -o # cldump
Wait until the cluster is stable and both nodes are up. Basically, the cluster is now up-and-running. However, during the Verification & Synchronization step, it will complain about not having a non-IP network. The next part is for setting up a disk heartbeat network, that will allow the nodes of the HACMP cluster to exchange disk heartbeat packets over a SAN disk. We're assuming here, you're using EMC storage. The process on other types of SAN storage is more or less similar, except for some differences, e.g. SAN disks on EMC storage are called "hdiskpower" devices, and they're called "vpath" devices on IBM SAN storage. First, look at the available SAN disk devices on your nodes, and select a small disk, that won't be used to store any data on, but only for the purpose of doing the disk heartbeat. It is a good habit, to request your SAN storage admin to zone a small LUN as a disk heartbeating device to both nodes of the HACMP cluster. Make a note of the PVID of this disk device, for example, if you choose to use device hdiskpower4:
# lspv | grep hdiskpower4 hdiskpower4 000a807f6b9cc8e5 None
So, we're going to set up the disk heartbeat network on device hdiskpower4, with PVID 000a807f6b9cc8e5: Create an concurrent volume group:
# smitty hacmp System Management (C-SPOC) HACMP Concurrent Logical Volume Management Concurrent Volume Groups Create a Concurrent Volume Group
Select both nodes to create the concurrent volume group on by pressing F7 for each node. Then select the correct PVID. Give the new volume group a name, for example "hbvg". Set up the disk heartbeat network:
# smitty hacmp Extended Configuration Extended Topology Configuration Configure HACMP Networks Add a Network to the HACMP Cluster
Select "diskhb" and accept the default Network Name. Run a discovery:
# smitty hacmp Extended Configuration Discover HACMP-related Information from Configured Nodes
Add the disk device:

# smitty hacmp Extended Configuration Extended Topology Configuration Configure HACMP Communication Interfaces/Devices Add Communication Interfaces/Devices Add Discovered Communication Interface and Devices Communication Devices
Select the disk device on both nodes by selecting the same disk on each node by pressing F7. Run a Verification & Synchronization again, as described earlier above. Then check with clstat and/or cldump again, to check if the disk heartbeat network comes online. TOPICS: AIX, POWERHA / HACMP, SYSTEM ADMINISTRATION
NFS mounts on HACMP failing

When you want to mount an NFS file system on a node of an HACMP cluster, there are a couple of items you need check, before it will work: Make sure the hostname and IP address of the HACMP node are resolvable and provide the correct output, by running:
# nslookup [hostname] # nslookup [ip-address]
The next thing you will want to check on the NFS server, if the node names of your HACMP cluster nodes are correctly added to the /etc/exports file. If they are, run:
# exportfs -va
The last, and tricky item you will want to check is, if a service IP label is defined as an IP alias on the same adapter as your nodes hostname, e.g.:
# netstat -nr Routing tables Destination Gateway Flags Refs Use If Exp Groups
Route Tree for Protocol Family 2 (Internet): default 10.251.14.0 10.251.14.50 10.251.14.1 10.251.14.50 127.0.0.1 UG UHSb UGHS 4 0 3 180100 en1 0 en1 791253 lo0 -
The example above shows you that the default gateway is defined on the en1 interface. The next command shows you where your Service IP label lives:
# netstat -i Name en1 en1 en1 lo0 lo0 lo0 Mtu 1500 1500 1500 Network link#2 Address Ipkts Ierrs Opkts 940024 940024 940024 1914185 1914185 1914185
0.2.55.d3.75.77 2587851 0 2587851 0 2587851 0 1912870 0 loopback 1912870 0 1912870 0
10.251.14 node01 10.251.20 serviceip
16896 link#1 16896 127 16896 ::1
As you can see, the Service IP label (in the example above called "serviceip") is defined on en1. In that case, for NFS to work, you also want to add the "serviceip" to the /etc/exports file on the NFS server and re-run "exportfs -va". And you should also make sure that hostname "serviceip" resolves to an IP address correctly (and of course the IP address resolves to the correct hostname) on both the NFS server and the client. TOPICS: AIX, SYSTEM ADMINISTRATION
MD5 for AIX

If you need to run an MD5 check-sum on a file on AIX, you will notice that there's not md5 or md5sum command available on AIX. Instead, use the following command to do this:
# csum -h MD5 [filename]
Note: csum can't handle files larger than 2 GB. TOPICS: AIX, PERFORMANCE, SYSTEM ADMINISTRATION
Nmon analyser - A free tool to produce AIX performance reports

Searching for an easy way to create high-quality graphs that you can print, publish to the Web, or cut and paste into performance reports? Look no further. The nmon_analyser tool takes files produced by the NMON performance tool, turns them into Microsoft Excel spreadsheets, and automatically produces these graphs.
You can download the tool here: http://www.ibm.com/developerworks/aix/library/au-nmon_analyser/ TOPICS: AIX, BACKUP & RESTORE
NFS mksysb script

Here's a script you can use to run mksysb backups of your clients to a NFS server. It is generally a good idea to set up a NIM server and also use this NIM server as a NFS server. All your clients should then be configured to create their mksysb backups to the NIM/NFS server, using the script that you can download here: nimbck.ksh. By doing this, the latest mksysb images are available on the NIM server. This way, you can configure a mksysb resource on the NIM server (use: smitty nim_mkres) pointing to the mksysb image of a server, for easy recovery. TOPICS: AIX, BACKUP & RESTORE, NIM, SYSTEM ADMINISTRATION
How to unconfigure items after mksysb recovery using NIM

There will be a situation where you want to test a mksysb recovery to a different host. The major issue with this is, that you bring up a server within the same network, that is a copy of an actual server that's already in your network. To avoid running into 2 exactly the same servers in your network, here's how you do this: First make sure that you have a separate IP address available for the server to be recovered, for configuration on your test server. You definitely don't want to bring up a second server in your network with the same IP configuration. Make sure you have a mksysb created of the server that you wish to recover onto another server. Then, create a simple script that disables all the items that you don't want to have running after the mksysb recovery, for example:
# cat /export/nim/cust_scripts/custom.ksh #!/bin/ksh
# Save a copy of /etc/inittab cp /etc/inittab /etc/inittab.org
# Remove unwanted entries from the inittab rmitab hacmp 2>/dev/null rmitab tsmsched 2>/dev/null rmitab tsm 2>/dev/null rmitab clinit 2>/dev/null rmitab pst_clinit 2>/dev/null rmitab qdaemon 2>/dev/null
rmitab sddsrv 2>/dev/null rmitab nimclient 2>/dev/null rmitab nimsh 2>/dev/null rmitab naviagent 2>/dev/null
# Get rid of the crontabs mkdir -p /var/spool/cron/crontabs.org mv /var/spool/cron/crontabs/* /var/spool/cron/crontabs.org/
# Disable start scripts chmod 000 /etc/rc.d/rc2.d/S01app
# copy inetd.conf cp /etc/inetd.conf /etc/inetd.conf.org # take out unwanted items cat /etc/inetd.conf.org | grep -v bgssd > /etc/inetd.conf
# remove the hacmp cluster configuration if [ -x /usr/es/sbin/cluster/utilities/clrmclstr ] ; then /usr/es/sbin/cluster/utilities/clrmclstr fi
# clear the error report errclear 0
# clean out mail queue rm /var/spool/mqueue/*
The next thing you need to do, is to configure this script as a 'script resource' in NIM. Run:
# smitty nim_mkres
Select 'script' and complete the form afterwards. For example, if you called it 'UnConfig_Script':
# lsnim -l UnConfig_Script UnConfig_Script: class type comments Rstate prev_state location = resources = script = = ready for use = unavailable for use = /export/nim/cust_scripts/custom.ksh
alloc_count = 0 server = master
Then, when you are ready to perform the actual mksysb recovery using "smitty nim_bosinst", you can add this script resource on the following line:
Customization SCRIPT to run after installation [UnConfig_Script]
TOPICS: AIX, BACKUP & RESTORE, NIM, SYSTEM ADMINISTRATION
Using the image_data resource to restore a mksysb without preserving mirrors using NIM
Specify using the 'image_data' resource when running the 'bosinst' command from the NIM master: From command line on the NIM master:
# nim -o bos_inst -a source=mksysb -a lpp_source=[lpp_source] -a spot=[SPOT] -a mksysb=[mksysb] -a image_data=mksysb_image_data -a accept_licenses=yes server1
Using smit on the NIM master:

# smit nim_bosinst
Select the client to install. Select 'mksysb' as the type of install. Select a SPOT at the same level as the mksysb you are installing. Select an lpp_source at the same level than the mksysb you are installing. NOTE: It is recommended to use an lpp_source at the same AIX Technology Level, but if using an lpp_source at a higher level than the mksysb, the system will be updated to the level of the lpp_source during installation. This will only update Technology Levels. If you're using an AIX 5300-08 mksysb, you cannot use an AIX 6.1 lpp_source. This will not migrate the version of AIX you are running to a higher version. If you're using an AIX 5300-08 mksysb and allocate a 5300-09 lpp_source, this will update your target system to 5300-09.
Install the Base Operating System on Standalone Clients
Type or select values in entry fields. Press Enter AFTER making all desired changes.
[TOP] * Installation Target * Installation TYPE * SPOT LPP_SOURCE MKSYSB
[Entry Fields] server1 mksysb SPOTaix53tl09sp3 [LPPaix53tl09sp3] server1_mksysb
BOSINST_DATA to use during installation
[]
IMAGE_DATA to use during installation
[server1_image_date]
Creating an image_data resource without preserving mirrors for use with NIM
Transfer the /image.data file to the NIM master and store it in the location you desire. It is a good idea to place the file, or any NIM resource for that matter, in a descriptive manor, for example: /export/nim/image_data. This will ensure you can easily identify your "image_data" NIM resource file locations, should you have the need for multiple "image_data" resources. Make sure your image.data filenames are descriptive also. A common way to name the file would be in relation to your clientname, for example: server1_image_data. Run the nim command, or use smitty and the fast path 'nim_mkres' to define the file that you have edited using the steps above: From command line on the NIM master:
# nim -o define -t image_data -a server=master -a location=/export/nim/image_data/server1_image_data -a comments="image.data file with broken mirror for server1" server1_image_data
NOTE: "server1_image_data" is the name given to the 'image_data' resource. Using smit on the NIM master:
# smit nim_mkres
Select 'image_data' as the Resource Type. Then complete the following screen:
Define a Resource
[Entry Fields] * Resource Name * Resource Type * Server of Resource * Location of Resource Comments [server1_image_data] image_data [master] [/export/nim/image_data/server1_image_data] []
Source for Replication
[]
Run the following command to make sure the 'image_data' resource was created:
# lsnim -t image_data
The command will give output similar to the following:
# lsnim -t image_data server1_image_data resources image_data
Run the following command to get information about the 'image_data' resource:
# lsnim -l server1_image_data server1_image_data: class type Rstate prev_state location = resources = image_data = ready for use = unavailable for use = /export/nim/image_data/server1_image_data
How to edit an image.data file to break a mirror

Create a new image.data file by running the following command:
# cd / # mkszfile
Edit the image.data file to break the mirror, by running the following command:
# vi /image.data
What you are looking for are the "lv_data" stanzas. There will be one for every logical volume associated with rootvg. The following is an example of an lv_data stanza from an image.data file of a mirrored rootvg. The lines that need changing are marked bold:
lv_data: VOLUME_GROUP= rootvg LV_SOURCE_DISK_LIST= hdisk0 hdisk1 LV_IDENTIFIER= 00cead4a00004c0000000117b1e92c90.2 LOGICAL_VOLUME= hd6 VG_STAT= active/complete TYPE= paging MAX_LPS= 512 COPIES= 2 LPs= 124 STALE_PPs= 0 INTER_POLICY= minimum INTRA_POLICY= middle MOUNT_POINT= MIRROR_WRITE_CONSISTENCY= off LV_SEPARATE_PV= yes
PERMISSION= read/write LV_STATE= opened/syncd WRITE_VERIFY= off PP_SIZE= 128 SCHED_POLICY= parallel PP= 248 BB_POLICY= non-relocatable RELOCATABLE= yes UPPER_BOUND= 32 LABEL= MAPFILE= /tmp/vgdata/rootvg/hd6.map LV_MIN_LPS= 124 STRIPE_WIDTH= STRIPE_SIZE= SERIALIZE_IO= no FS_TAG= DEV_SUBTYP=
Note: There are two disks in the 'LV_SOURCE_DISK_LIST', THE 'COPIES' value reflects two copies, and the 'PP' value is double that of the 'LPs' value. The following is an example of the same lv_data stanza after manually breaking the mirror. The lines that have been changed are marked bold. Edit each 'lv_data' stanza in the image.data file as shown below to break the mirrors.
lv_data: VOLUME_GROUP= rootvg LV_SOURCE_DISK_LIST= hdisk0 LV_IDENTIFIER= 00cead4a00004c0000000117b1e92c90.2 LOGICAL_VOLUME= hd6 VG_STAT= active/complete TYPE= paging MAX_LPS= 512 COPIES= 1 LPs= 124 STALE_PPs= 0 INTER_POLICY= minimum INTRA_POLICY= middle MOUNT_POINT= MIRROR_WRITE_CONSISTENCY= off LV_SEPARATE_PV= yes PERMISSION= read/write LV_STATE= opened/syncd
WRITE_VERIFY= off PP_SIZE= 128 SCHED_POLICY= parallel PP= 124 BB_POLICY= non-relocatable RELOCATABLE= yes UPPER_BOUND= 32 LABEL= MAPFILE= /tmp/vgdata/rootvg/hd6.map LV_MIN_LPS= 124 STRIPE_WIDTH= STRIPE_SIZE= SERIALIZE_IO= no FS_TAG= DEV_SUBTYP=
Note: The 'LV_SOURCE_DISK_LIST' has been reduced to one disk, the 'COPIES' value has been changed to reflect one copy, and the 'PP' value has been changed so that it is equal to the 'LPs' value. Save the edited image.data file. At this point you can use the edited image.data file to do one of the following: You can now use your newly edited image.data file to create a new mksysb to file, tape, or DVD. E.g.: To file or tape: place the edited image.data file in the / (root) directory and rerun your mksysb command without using the "-i" flag. If running the backup through SMIT, make sure you set the option "Generate new /image.data file?" to 'no' (By default it is set to 'yes'). To DVD: Use the -i flag and specify the [/location] of the edited image.data file. If running through SMIT specify the edited image.data file location in the "User supplied image.data file" field. Within NIM you would create an 'image_data' resource for use with NIM to restore a mksysb without preserving mirrors. Note: If you don't want to edit the image.data file manually, here's a script that you can use to have it updated to a single disk for you, assuming your image_data file is called /image.data:
cat /image.data | while read LINE ; do if [ "${LINE}" = "COPIES= 2" ] ; then COPIESFLAG=1 echo "COPIES= 1" else
if [ ${COPIESFLAG} -eq 1 ] ; then PP=ècho ${LINE} | awk '{print $1}'` if [ "${PP}" = "PP=" ] ; then PPNUM=ècho ${LINE} | awk '{print $2}'` ((PPNUMNEW=$PPNUM/2)) echo "PP= ${PPNUMNEW}" COPIESFLAG=0 else echo "${LINE}" fi else echo "${LINE}" fi fi done > /image.data.1disk
How to restore an image.data file from an existing mksysb file

Change the /tmp directory (or a directory where you would like to store the /image.data file from the mksysb image) and restore the /image.data file from the mksysb:
# cd /tmp # restore -xqvf [/location/of/mksysb/file] ./image.data
If you want to list the files in a mksysb image first, you can run the following command:
# restore -Tqvf [/location/of/mksysb/file]
How to restore an image.data file from tape

Restoring from tape: First change the block size of the tape device to 512:
# chdev -l rmt0 -a block_size=512
Check to make sure the block size of the tape drive has been changed:
# tctl -f /dev/rmt0 status
You will receive output similar to this:

rmt0 Available 09-08-00-0,0 LVD SCSI 4mm Tape Drive attribute value description user_settable
block_size compress
512 yes
BLOCK size (0=variable length) Use data COMPRESSION DENSITY setting #1
True True True
density_set_1 71
density_set_2 38 extfm mode ret ret_error size_in_mb yes yes no no
DENSITY setting #2 Use EXTENDED file marks Use DEVICE BUFFERS during writes RETENSION on tape change or reset
True True True True
RETURN error on tape change or reset True False
36000 Size in Megabytes
Change to the /tmp directory (or a directory where you would like to store the /image.data file from the mksysb image) and restore the /image.data file from the tape:
# cd /tmp # restore -s2 -xqvf /dev/rmt0.1 ./image.data
TOPICS: AIX, EMC, STORAGE, STORAGE AREA NETWORK, SYSTEM ADMINISTRATION
Unable to remove hdiskpower devices due to a method error

If you get a method error when trying to rmdev -dl your hdiskpower devices, then follow this procedure.
Cannot remove hdiskpower devices with rmdev, get error "method error (/etc/methods/ucfgpowerdisk):"
The fix is to uninstall/reinstall Powerpath, but you won't be able to until you remove the hdiskpower devices with this procedure:
1. # odmdelete -q name=hdiskpowerX -o CuDv
(for every hdiskpower device)

2. # odmdelete -q name=hdiskpowerX -o CuAt

3. 4. 5. # odmdelete -q name=powerpath0 -o CuDv # odmdelete -q name=powerpath0 -o CuAt # rm /dev/powerpath0
6.
You must remove the modified files installed by powerpath and then reboot the server. You will then be able to uninstall powerpath after the reboot via the "installp -u EMCpower" command. The files to be removed are as follows: (Do not be concerned if some of the removals do not work as PowerPath may not be fully configured properly).
7. 8. 9. 10. 11.
# rm ./etc/PowerPathExtensions # rm ./etc/emcp_registration # rm ./usr/lib/boot/protoext/disk.proto.ext.scsi.pseudo.power # rm ./usr/lib/drivers/pnext # rm ./usr/lib/drivers/powerdd
12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35.
# rm ./usr/lib/drivers/powerdiskdd # rm ./usr/lib/libpn.a # rm ./usr/lib/methods/cfgpower # rm ./usr/lib/methods/cfgpowerdisk # rm ./usr/lib/methods/chgpowerdisk # rm ./usr/lib/methods/power.cat # rm ./usr/lib/methods/ucfgpower # rm ./usr/lib/methods/ucfgpowerdisk # rm ./usr/lib/nls/msg/en_US/power.cat # rm ./usr/sbin/powercf # rm ./usr/sbin/powerprotect # rm ./usr/sbin/pprootdev # rm ./usr/lib/drivers/cgext # rm ./usr/lib/drivers/mpcext # rm ./usr/lib/libcg.so # rm ./usr/lib/libcong.so # rm ./usr/lib/libemcp_mp_rtl.so # rm ./usr/lib/drivers/mpext # rm ./usr/lib/libmp.a # rm ./usr/sbin/emcpreg # rm ./usr/sbin/powermt # rm ./usr/share/man/man1/emcpreg.1 # rm ./usr/share/man/man1/powermt.1 # rm ./usr/share/man/man1/powerprotect.1
36.
Re-install Powerpath.
TOPICS: AIX, PERFORMANCE, SYSTEM ADMINISTRATION
Paging space best practices

Here are a couple of rules that your paging spaces should adhere to, for best performance: The size of paging space should match the size of the memory. Use more than one paging space, on different disks to each other. All paging spaces should have the same size. All paging spaces should be mirrored. Paging spaces should not be put on "hot" disks. TOPICS: AIX, SYSTEM ADMINISTRATION
How best to configure the /etc/netsvc.conf file

How best to configure the /etc/netsvc.conf file, making it easier to troubleshoot when resolving DNS issues: This file should resolve locally and through DNS. The line would read:
hosts=local,bind
You then would need to make sure that all the local adapter IP addresses are entered in /etc/hosts. After that is complete, for every adapter on the system you would apply:
# host
This will ensure a host command generates the same ouput (the hostname) with and without /etc/netsvc.conf. That way, you'll know you can continue to do certain things while troubleshooting a DNS problem. TOPICS: AIX, INSTALLATION, SYSTEM ADMINISTRATION
Download AIX install media

No more ordering CDROMs or DVDs and waiting days. Download the .iso image over the web and install from there. Use the virtual DVD drive on your VIOS 2.1 and install directly into the LPAR or read the contents into your NIM Server. Mount the .ISO image: On AIX 6.1 or AIX 7, use the loopmount command:http://publib.boulder.ibm.com/infocenter/aix/v6r1/index.jsp?topic=/com.ibm.aix .cmds/doc/aixcmds3/loopmount.htm On AIX 5.3, use the mklv-dd-mount trick:https://www.ibm.com/developerworks/wikis/display/WikiPtype/AIXV53MntISO Details on the new service: You have to prove you are entitled via: Customer number, Machine serial numbers or SWMA. The Entitled Software Support Download User Guide can be downloaded here:ftp://public.dhe.ibm.com/systems/support/planning/essreg/I1128814.pdf. Then you can download the AIX media, Expansion Packs, Linux Toolbox and more. Start at: www.ibm.com/eserver/ess. TOPICS: AIX, PERFORMANCE, SYSTEM ADMINISTRATION
AIX memory consumption

The command svmon -G can be used to determine the actual memory consumption of a server. To determine if the memory is overcommitted, you need to divide the memory-virtual value by the memory-size value, e.g.:
# svmon -G size memory pg space 5079040 7864320 inuse 5076409 12885 free 2631 pin 706856 virtual 2983249
work pin in use 540803 2983249
pers 0 0
clnt 2758 2093160
other 163295
PageSize s m 4 KB 64 KB
PoolSize -
inuse 4918761 9853
pgsp 12885 0
pin 621096 5360
virtual 2825601 9853
In this example, the memory-virtual value is 2983249, and the memory-size value is 5079040. Note that the actual memory-inuse is nearly the same as the memory-size value. This is simply AIX caching as much as possible in its memory. Hence, the memory-free value is typically very low. Now, to determine the actual memory consumption, devide memory-virtual by memory-size:
# bc scale=2 2982321/5079040 .58
Thus, the actual memory consumption is 58% of the memory (5079040 blocks of 4 KB = 19840 MB). The free memory is thus: (100% - 58%) * 19840 MB = 8332 MB. Try to keep the value of memory consumption less than 90%. Above that, you will generally start seeing paging activity using the vmstat command. By that time, it is a good idea to lower the load on the system or to get more memory in your system.
Using NFS
The Networked File System (NFS) is one of a category of filesystems known as distributed filesystems. It allows users to access files resident on remote systems without even knowing that a network is involved and thus allows filesystems to be shared among computers. These remote systems could be located in the same room or could be miles away. In order to access such files, two things must happen. First, the remote system must make the files available to other systems on the network. Second, these files must be mounted on the local system to be able to access them. The mounting process makes the remote files appear as if they are resident on the local system. The system that makes its files available to others on the network is called a server, and the system that uses a remote file is called a client. NFS Server NFS consists of a number of components including a mounting protocol, a file locking protocol, an export file and daemons (mountd, nfsd, biod, rpc.lockd, rpc.stad) that coordinate basic file services.
Systems using NFS make the files available to other systems on the network by "exporting" their directories to the network. An NFS server exports its directories by putting the names of these directories in the /etc/exports file and executing the exportfs command. In its simplest form, /etc/exports consists of lines of the form:
pathname -option, option ...
Where pathname is the name of the file or directory to which network access is to be allowed; if pathname is a directory, then all of the files and directories below it within the same filesystem are also exported, but not any filesystems mounted within it. The next fields in the entry consist of various options that specify the type of access to be given and to whom. For example, a typical /etc/exports file may look like this:
/cyclop/users /usr/share/man /usr/mail -access=homer:bart, root=homer -access=marge:maggie:lisa
This export file permits the filesystem /cyclops/users to be mounted by homer and bart, and allows root access to it from homer. In addition, it lets /usr/share/man to be mounted by marge, maggie and lisa. The filesystem /usr/mail can be mounted by any system on the network. Filesystems listed in the export file without a specific set of hosts are mountable by all machines. This can be a sizable security hole. When used with the -a option, the exportfs command reads the /etc/exports file and exports all the directories listed to the network. This is usually done at system startup time.
# exportfs -va
If the contents of /etc/exports change, you must tell mountd to reread it. This can be done by re-executing the exportfs command after the export file is changed. The exact attributes that can be specified in the /etc/exports file vary from system to system. The most common attributes are: -1). -anon : Specifies UID that should be used for requests coming from an unknown user. Defaults to nobody. -hostname : Allow hostname to mount the filesystem. For example:
/cyclop/users -rw=moe,anon=-1 /usr/inorganic -ro
-access=list : Colon-separated list of hostnames and netgroups that can mount the filesystem. -ro : Export read-only; no clients may write on the filesystem. -rw=list : List enumerates the hosts allowed to mount for writing; all others must mount read-only. -root=list : Lists hosts permitted to access the filesystem as root. Without this option, root access from a client is equivalent to access by the user nobody (usually UID
This allows moe to mount /cyclop/users for reading and writing, and maps anonymous users (users from other hosts that do not exist on the local system and the root user from any remote system) to the UID -1. This corresponds to the nobody account, and it tells NFS not to allow such users access to anything. NFS Clients After the files, directories and/or filesystems have been exported, an NFS client must explicitly mount them before it can use them. It is handled by the mountd daemon (sometimes called rpc.mountd). The server examines the mount request to be sure the client has proper authorization. The following syntax is used for the mount command. Note that the name of the server is followed by a colon and the directory to be mounted:
# mount server1:/usr/src /src
Here, the directory structure /usr/src resident on the remote system server1 is mounted on the /src directory on the local system. When the remote filesystem is no longer needed, it is unmounted with the umount:
# umount server1:/usr/src
The mount command can be used to establish temporary network mounts, but mounts that are part of a system's permanent configuration should be either listed in /etc/filesystems (for AIX) or handled by an automatic mounting service such as automount or amd. NFS Commands NFS.
# lsnfsexp software -ro
lsnfsexp : Displays the characteristics of directories that are exported with the
mknfsexp -d path -t ro : Exports a read-only directory to NFS clients and add it to /etc/exports.
# mknfsexp -d /software -t ro /software ro Exported /software # lsnfsexp /software -ro
rmnfsexp -d path : Unexports a directory from NFS clients and remove it from /etc/exports.
# rmnfsexp -d /software
lsnfsmnt : Displays the characteristics of NFS mountable file systems.
showmount -e : List exported filesystems.

# showmount -e export list for server: /software (everyone)
showmount -a : List hosts that have remotely mounted local systems.

# showmount -a
server2:/sourcefiles server3:/datafiles
Start/Stop/Status NFS daemons In the following discussion, reference to daemon implies any one of the SRC-controlled daemons (such as nfsd or biod). The NFS daemons can be automatically started at system (re)start by including the /etc/rc.nfs script in the /etc/inittab file. They can also be started manually by executing the following command:
# startsrc -s Daemon or startsrc -g nfs
Where the -s option will start the individual daemons and -g will start all of them. These daemons can be stopped one at a time or all at once by executing the following command:
# stopsrc -s Daemon or stopsrc -g nfs
You can get the current status of these daemons by executing the following commands:
# lssrc -s [Daemon] # lssrc -a
If the /etc/exports file does not exist, the nfsd and the rpc.mountd daemons will not start. You can get around this by creating an empty /etc/exports file. This will allow the nfsd and the rpc.mountd daemons to start, although no filesystems will be exported. TOPICS: AIX, SYSTEM ADMINISTRATION
Why AIX Memory Typically Runs Near 100% Utilization

Memory utilization on AIX systems typically runs around 100%. This is often a source of concern. However, high memory utilization in AIX does not imply the system is out of memory. By design, AIX leaves files it has accessed in memory. This significantly improves performance when AIX reaccesses these files because they can be reread directly from memory, not disk. When AIX needs memory, it discards files using a "least used" algorithm. This generates no I/O and has almost no performance impact under normal circumstances. Sustained paging activity is the best indication of low memory. Paging activity can be monitored using the "vmstat" command. If the "page-in" (PI) and "page-out" (PO) columns
show non-zero values over "long" periods of time, then the system is short on memory. (All systems will show occasional paging, which is not a concern.) Memory requirements for applications can be empirically determined using the AIX "rmss"command. The "rmss" command is a test tool that dynamically reduces usable memory. The onset of paging indicates an application's minimum memory requirement. Finally, the "svmon" command can be used to list how much memory is used each process. The interpretation of the svmon output requires some expertise. See the AIX documentation for details. To test the performance gain of leaving a file in memory, a 40MB file was read twice. The first read was from disk, the second was from memory. The first read took 10.0 seconds. The second read took 1.3 second: a 7.4x improvement. TOPICS: AIX, STORAGE, SYSTEM ADMINISTRATION
Working with disks

With the passing time, some devices are added, and some are removed from a system. AIX learns about hardware changes when the root user executes the cfgmgr command. Without any attributes, it scans all buses for any attached devices. Information acquired by cfgmgr is stored in the ODM (Object Database Manager). Cfgmgr only discovers new devices. Removing devices is achieved with rmdev or odmdelete. Cfgmgr can be executed in the quiet (cfgmgr) or verbose (cfgmgr -v) mode. It can be directed to scan all or selected buses. The basic command to learn about disks is lspv. Executed without any parameters, it will generate a listing of all disks recorded in the ODM, for example:
# lspv hdisk0 hdisk1 hdisk4 hdisk2 hdisk3 hdisk5 00c609e0a5ec1460 00c609e037478aad 00c03c8a14fa936b 00c03b1a32e50767 00c03b1a32ee4222 00c03b1a35cdcdf0 rootvg rootvg abc_vg None None None active active active
Each row describes one disk. The first column shows its name followed by the PVID and the volume group it belongs to. "None" in the last column indicates that the disk does not belong to any volume group. "Active" in the last column indicates, that the volume group is varied on. Existence of a PVID indicates possibility of presence of data on the disk. It is possible that such disk belongs to a volume group which is varied off. Executing lspv with a disk name generates information only about this device:
# lspv hdisk4
PHYSICAL VOLUME: PV IDENTIFIER: PV STATE: STALE PARTITIONS: PP SZE: TOTAL PPs: FREE PPs: USED PPs:
hdisk4 00c03c8a14fa936b active 0 16 megabyte(s) 639 (10224 megabytes) 599 (9584 megabytes) 40 (640 megabytes)
VOLUME GROUP: VG IDENTIFIER:
abc_vg 00c03b1a000
ALLOCATABLE:
yes
LOGICAL VOLUMES: 2 VG DESCRIPTORS: HOT SPARE: MAX REQUEST: 2 no 256 kb
FREE DISTRIBUTION: 128..88..127..128..128 USED DISTRIBUTION: 00..40..00..00..00
In the case of hdisks, we are able to determine its size, the number of logical volumes (two), the number of physical partitions in need of synchronization (Stale Partitions) and the number of VGDA's. Executing lspv against a disk without a volume group membership does nothing useful:
# lspv hdisk2 0516-304: Unable to find device id hdisk2 in the Device configuration database
How do you establish the capacity of a disk that does not belong to a volume group? The next command provides this in megabytes:
# bootinfo -s hdisk2 10240
The same (and much more) information can be retrieved by executing lsattr -El hdisk#:
# lsattr -El hdisk0 PCM algorithm dist_err_pcnt dist_tw_width PCM/scsiscsd fail_over 0 50 Path Control Module Algorithm Distributed Error % Sample Time False True True True
hcheck_interval 0 hcheck_mode max_transfer pvid queue_depth reserve_policy size_in_mb unique_id nonactive 0x40000 00c609e0a5ec1460 3 single_path 73400 26080084C1AF0FHU
Health Check Interval True Health Check Mode True
Maximum TRANSFER Size True Volume identifier Queue DEPTH Reserve Policy Size in Megabytes Unique identifier False False True False False
The last command can be limited to show only the size if executed as shown:
# lsattr -El hdisk0 -a size_in_mb size_in_mb 73400 Size in Megabytes False
A disk can get a PVID in one of two ways: by the virtue of membership in a volume group (when running extendvg or mkvg commands) or as the result of execution of the chdev command. Command lqueryvg helps to establish if there is data on the disk or not.
# lqueryvg -Atp hdisk2 0516-320 lqueryvg: hdisk2 is not assigned to a volume group. Max LVs: PP Size: Free PPs: LV count: PV count: Total VGDAs: Conc Allowed: MAX PPs per PV MAX PVs: Quorum (disk): Quorum (dd): Auto Varyon ?: Conc Autovaryo Varied on Conc Physical: 256 26 1117 0 3 3 0 1016 32 1 1 1 0 0 00c03b1a32e50767 00c03b1a32ee4222 00c03b1a9db2f183 Total PPs: LTG size: HOT SPARE: AUTO SYNC: VG PERMISSION: SNAPSHOT VG: IS_PRIMARY VG: PSNFSTPP: VARYON MODE: VG Type: Max PPs: 1117 128 0 0 0 0 0 4352 ??????? 0 32512 1 1 1 0 0 0
This disk belongs to a volume group that had three disks:

PV count: 3
Their PVIDs are:

Physical: 00c03b1a32e50767 00c03b1a32ee4222 00c03b1a9db2f183 1 1 1 0 0 0
At this time, it does not have any logical volumes:

LV count: 0
It is easy to notice that a disk belongs to a volume group. Logical volume names are the best proof of this. To display data stored on a disk you can use the command lquerypv.
A PVID can be assigned to or removed from a disk if it does not belong to a volume group, by executing the command chdev.
# chdev -l hdisk2 -a pv=clear hdisk2 changed lspv | grep hdisk2 hdisk2 none None
Now, let's give the disk a new PVID:

# chdev -l hdisk2 -a pv=yes hdisk2 changed # lspv | grep hdisk2 hdisk2 00c03b1af578bfea None
At times, it is required to restrict access to a disk or to its capacity. You can use command chpv for this purpose. To prevent I/O to access to a disk:
# chpv -v r hdisk2
To allow I/O:
# chpv -v a hdisk2
I/O on free PPs is not allowed:

# chpv -a n hdisk2
I/O on free PPs is allowed:

# chpv -a y hdisk2
AIX was created years ago, when disks were very expensive. I/O optimization, the decision what part of data will be read/written faster than other data, was determined by its position on the disk. Between I/O, disk heads are parked in the middle. Accordingly, the fastest I/O takes place in the middle. With this in mind, a disk is divided into five bands called: outer, outermiddle, center, inner and inner-edge. This method of assigning physical partitions (logical volumes) as the function of a band on a disk, is called the intra-physical policy. This policy and the policy defining the spread of logical volume on disks (inter-physical allocation policy) gains importance while creating logical volumes. Disk topology, the range of physical partitions on each band is visualized with command lsvg p vg_name and lspv hdisk#. Note the last two lines of the lspv:
FREE DISTRIBUTION: USED DISTRIBUTION: 128..88..127..128..128 00..40..00..00..00
The row labeled FREE DISTRIBUTION shows the number of free PPs in each band. The row labeled USED DISTRIBUTION shows the number of used PPs in each band. As you can see, some bands of this disk have no data. Presently, this policy lost its meaning as even the slowest disks are much faster then their predecesors. In the case of RAID or SAN disks, this policy has no meaning at all. For those who still use individual SCSI or SSA disks, it is good to remember that the data closer to the outer edge is read/written the slowest. To learn what logical volumes are located on a given disk, you can execute command lspv -l
hdisk#. The reversed relation is established executing lslv -M lv_name. It is always a good idea to know what adapter and what bus any disk is attached to. Otherwise, if one of the disks breaks, how will you know which disk needs to be removed and replaced? AIX has many commands that can help you. It is customary to start from the adapter, to identify all adapters known to the kernel:
# lsdev -Cc adapter | grep -i scsi scsi0 scsi1 scsi2 Available 1S-08 Available 1S-09 Available 1c-08 Wide/Ultra-3 SCSI I/O Controller Wide/Ultra-3 SCSI I/O Controller Wide/Fast-20 SCSI I/O Controller
The last command produced information about SCSI adapters present during the last execution of the cfgmgr command. This output allows you to establish in what drawer the adapter is located as well. The listing, tells us that there are three SCSI adapters. The second colums shows the device state (Available: ready to be used; Defined: device needs further configuration). The next column shows its location (drawer/bus). The last column contains a short description. Executing the last command against a disk from rootvg produces:
# lsdev -Cc disk -l hdisk0 hdisk0 Available 1S-08-00-8,0 16 Bit LVD SCSI Disk Drive
From both outputs we can determine what SCSI adapter controls this disk - scsi0. Also, we see that disk has SCSI ID 8,0. How to determine the type/model/capacity/part number, etc?
# lscfg -vl hdisk0 hdisk0 U0.1-P2/Z1-A8 16 Bit LVD SCSI Disk Drive (36400 MB)
Manufacturer................IBM Machine Type and Model......IC35L036UCDY10-0 FRU Number..................00P3831 ROS Level and ID............53323847 Serial Number...............E3WP58EC EC Level....................H32224 Part Number.................08K0293 Device Specific.(Z0)........000003029F00013A Device Specific.(Z1)........07N4972 Device Specific.(Z2)........0068 Device Specific.(Z3)........04050 Device Specific.(Z4)........0001 Device Specific.(Z5)........22 Device Specific.(Z6)........
You can get more details by executing command: lsattr -El hdisk0. This article has been based on an article published on wmduszyk.com. TOPICS: AIX, SECURITY, SYSTEM ADMINISTRATION
How to show the timestamp in your shell history in AIX 5.3

The environment variable EXTENDED_HISTORY in AIX will timestamp your shell history. In ksh, you set it as follows:
# export EXTENDED_HISTORY=ON
A good practice is to set this variable in /etc/environment. To view your history:

# history 888 ? :: cd aix_auth/ 889 ? :: vi server 890 ? :: ldapsearch 891 ? :: fc -lt 892 ? :: fc -l
NOTE: before setting this environment variable, the previous commands in your history will have a question mark in the timestamp field. If you use the fc command, you will have to use the "-t" option to see the timestamp:
# fc -t
TOPICS: AIX, EMC, POWERHA / HACMP, STORAGE, STORAGE AREA NETWORK, SYSTEM ADMINISTRATION
Missing disk method in HACMP configuration

Issue when trying to bring up a resource group: For example, the hacmp.out log file contains the following:
cl_disk_available[187] cl_fscsilunreset fscsi0 hdiskpower1 false cl_fscsilunreset[124]: openx(/dev/hdiskpower1, O_RDWR, 0, SC_NO_RESERVE): Device busy cl_fscsilunreset[400]: ioctl SCIOLSTART id=0X11000 lun=0X1000000000000 : Invalid argument
To resolve this, you will have to make sure that the SCSI reset disk method is configured in HACMP. For example, when using EMC storage: Make sure emcpowerreset is present in /usr/lpp/EMC/Symmetrix/bin/emcpowerreset. Then add new custom disk method: Enter into the SMIT fastpath for HACMP "smitty hacmp". Select Extended Configuration. Select Extended Resource Configuration. Select HACMP Extended Resources Configuration.
Select Configure Custom Disk Methods. Select Add Custom Disk Methods.
Change/Show Custom Disk Methods
[Entry Fields] * Disk Type (PdDvLn field from CuDv) * New Disk Type * Method to identify ghost disks * Method to determine if a reserve is held disk/pseudo/power [disk/pseudo/power] [SCSI3] [SCSI_TUR]
* Method to break reserve [/usr/lpp/EMC/Symmetrix/bin/emcpowerreset] Break reserves in parallel * Method to make the disk available true [MKDEV]
How to run background jobs

There are a couple of options for running background jobs: Option one: Start the job as normal, then press CTRL-Z. It will say it is stopped, and then type "bg". It will continue in the background. Then type "fg", if you want it to run in the foreground again. You can repeat typing CTRL-Z, bg, fg as much as you like. The process will be killed once you log out. You can avoid this by running: nohup command. Option two: Use the at command: run the command as follows:
# echo "command" | at now
This will start it in the background and it will keep on running even if you log out. Option three: Run it with an ampersand: command & This will run it in the background. But the process will be killed if you log out. You can avoid the process being killed by running: nohup command &. Option four:
Schedule it one time in the crontab. With all options, make sure you redirect any output and errors to a file, like:
# command > command.out 2>&1
The creation date of a UNIX file

UNIX doesn't store a file creation timestamp in the inode information. The timestamps recorded are the last access timestamp, the last modified timestamp and the last changed timestamp (which is the last change to the inode information). When a file is brand new, the last modified timestamp will be the creation timestamp of the file, but that piece of information is lost as soon as the file is modified in any way. To get this information, use the istat command, for example for the /etc/rc.tcpip file:
# ls -li /etc/rc.tcpip 8247 -rwxrwxr-- 1 root system 6607 Jan 06 06:25 /etc/rc.tcpip
Now you know the inode number: 8247.

# istat /etc/rc.tcpip Inode 8247 on device 10/4 Protection: rwxrwxr-Owner: 0(root) Link count: 1 Group: 0(system) Length 6607 bytes File
Last updated: Last modified: Last accessed:
Wed Jan Wed Jan Tue May
6 06:25:49 PST 2010 6 06:25:49 PST 2010 4 14:00:37 PDT 2010
The same type of information can be found using the fsdb command. Start the fsdbcommand with the file system where the file is located; in the example below the root file system. Then type the number of the inode, followed by "i":
# fsdb / File System: File System Size: Disk Map Size: Inode Map Size: Fragment Size: Allocation Group Size: Inodes per Allocation Group: Total Inodes: Total Fragments: / 2097152 20 38 4096 2048 4096 524288 262144 (512 byte blocks) (4K blocks) (4K blocks) (bytes) (fragments)
8247i i#: 8247 md: f---rwxrwxr-ln: 1 uid: 0 gid: 0
szh: a0: 0x1203 a4: 0x00
szl:
6607
(actual size: a2: 0x00 a6: 0x00
6607) a3: 0x00 a7: 0x00
a1: 0x1204 a5: 0x00
at: Tue May 04 14:00:37 2010 mt: Wed Jan 06 06:25:49 2010 ct: Wed Jan 06 06:25:49 2010
TOPICS: AIX, INSTALLATION, LOGICAL PARTITIONING, VIRTUALIZATION
Workload Partitioning (WPAR) in AIX 6.1

The most popular innovation of IBM AIX Version 6.1 is clearly workload partitioning (WPARs). Once you get past the marketing hype, you'll need to determine the value that WPARs can provide in your environment. What can WPARs do that Logical Partitions (LPARs) could not? How and when should you use WPARs? Equally as important, when should you not use Workload Partitioning. Finally, how do you create, configure, and administer workload partitions? For a very good introduction to WPARs, please refer to the following article:https://www.ibm.com/developerworks/aix/library/au-wpar61aix/ or download the PDF version here. This article describes the differences between system and application WPARs, the various commands available, such as mkwpar, lswpar, startwpar and clogin. It also describes how to create and manage file systems and users, and it discusses the WPAR manager. It ends with an excellent list of references for further reading. TOPICS: AIX, SYSTEM ADMINISTRATION
AIX Multiple page size support

To list the supported page sizes on a system:
# pagesize -a 4096 65536 16777216 17179869184 # pagesize -af 4K 64K 16M 16G
To learn more about the multiple page size support for AIX, please read the relatedwhitepaper here. TOPICS: AIX, SYSTEM ADMINISTRATION
UNKNOWN_ user in /etc/security/failedlogin
An "unknown" entry appears when somebody tried to log on with a user id which is not known to the system. It would be possible to show the userid they attempted to use, but this is not done as a common mistake is to enter the password instead of the userid. If this was recorded it would be a security risk. TOPICS: AIX, SYSTEM ADMINISTRATION
Reset an unknown root password

Insert the product media for the same version and level as the current installation into the appropriate drive. Power on the server. Boot the server into SMS mode and Choose Start Maintenance Mode for System Recovery. Select Access a Root Volume Group. A message displays explaining that you will not be able to return to the Installation menus without rebooting if you change the root volume group at this point. Type the number of the appropriate volume group from the list. Select Access this Volume Group and start a shell. At the prompt, type the passwd command to reset the root password. To write everything from the buffer to the hard disk and reboot the system, type the following:
# sync;sync;sync;reboot
df -I
The attribute "-I" (capital "i") for the df command can help you to show the actual used space within file systems, instead of giving you percentages with the regular df command:
# df -g Filesystem /dev/hd4 /dev/hd2 /dev/hd9var /dev/hd3 /dev/hd1 /proc /dev/hd10opt GB blocks 1.00 4.00 1.00 1.00 1.00 0.50 Free %Used 0.76 1.20 0.74 0.54 0.97 0.31 25% 70% 27% 46% 4% 39% Iused %Iused Mounted on 5255 55403 5324 325 1334 4162 2% / 6% /usr 3% /var 1% /tmp 1% /home /proc
4% /opt
# df -gI Filesystem /dev/hd4 GB blocks 1.00 Used 0.24 Free %Used Mounted on 0.76 25% /
/dev/hd2 /dev/hd9var /dev/hd3 /dev/hd1 /proc /dev/hd10opt
4.00 1.00 1.00 1.00 0.50
2.80 0.26 0.46 0.03 0.19
1.20 0.74 0.54 0.97 0.31
70% /usr 27% /var 46% /tmp 4% /home /proc
39% /opt
Calculating with UNIX timestamps

Starting with AIX 5.3, you can use the following command to get the number of seconds since the UNIX EPOCH (January 1st, 1970):
# date +"%s"
On older AIX versions, or other UNIX operating systems, you may want to use the following command to get the same answer:
# perl -MPOSIX -le 'print time'
Getting this UNIX timestamp can be very useful when doing calculations with time stamps. If you need to convert a UNIX timestamp back to something readable:
now=`perl -MPOSIX -le 'print time'` # 3 months ago = # 30 days * 3 months * 24 hours * 60 minutes * 60 seconds = # 7776000 seconds. let threemonthsago="${now}-7776000" perl -MPOSIX -le "print scalar(localtime($threemonthsago))"
Converting HEX to DEC

Here's a simple command to convert a hexadecimal number to decimal. For example if you wish to convert hexadecimal "FF" to decimal:
# echo "ibase=16; FF" | bc 255
Sdiff
A very usefull command to compary 2 files is sdiff. Let's say you want to compare the lslpp from 2 different hosts, then sdiff -s shows the differences between two files next to each other:
# sdiff -s /tmp/a /tmp/b > > > gskta.rte lum.base.cli 7.0.3.27 5.1.2.0 | | gskta.rte lum.base.cli 7.0.3.17 5.1.0.0 bos.loc.com.utf bos.loc.utf.EN_US 5.3.9.0 5.3.0.0
lum.base.gui lum.msg.en_US.base.cli lum.msg.en_US.base.gui rsct.basic.sp
5.1.2.0 5.1.2.0 5.1.2.0 2.4.10.0
| | | < <
lum.base.gui lum.msg.en_US.base.cli lum.msg.en_US.base.gui
5.1.0.0 5.1.0.0 5.1.0.0
rsct.compat.basic.sp
2.4.10.0
< <
rsct.compat.clients.sp
2.4.10.0
< <
rsct.opt.fence.blade rsct.opt.fence.hmc bos.clvm.enh lum.base.cli
2.4.10.0 2.4.10.0 5.3.8.3 5.1.2.0
< < | | bos.clvm.enh lum.base.cli 5.3.0.50 5.1.0.0
The default log file has been changed

You may encounter the following entry now and then in your errpt:
IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION 573790AA 0528212209 I O RMCdaemon The default log file has been changed.
An example of such an entry is:

----------------------------------------------------------------LABEL: RMCD_INFO_2_ST IDENTIFIER: 573790AA
Date/Time: Sun May 17 22:11:46 PDT 2009 Sequence Number: 8539 Machine Id: 00GB214D4C00 Node Id: blahblah Class: O Type: INFO Resource Name: RMCdaemon
Description The default log file has been changed.
Probable Causes The current default log file has been renamed and a new log file created.
Failure Causes The current log file has become too large.
Recommended Actions No action is required.
Detail Data DETECTING MODULE RSCT,rmcd_err.c,1.17,512 ERROR ID 6e0tBL/GsC28/gQH/ne1K//................... REFERENCE CODE
File name /var/ct/IW/log/mc/default
This error report entry refers to a file that was created, called /var/ct/IW/log/mc/default. Actually, when the file reaches 256 Kb, a new one is created, and the old one is renamed to default.last. The following messages can be found in this file:
2610-217 Received 193 unrecognized messages in the last 10.183333 minutes. Service is rmc.
This message more or less means: "2610-217 Received count of unrecognized messages unrecognized messages in the last time minutes. Service is service_name. Explanation: The RMC daemon has received the specified number of unrecognized messages within the specified time interval. These messages were received on the UDP port, indicated by the specified service name, used for communication among RMC daemons. The most likely cause of this error is that this port number is being used by another application. User Response: Validate that the port number configured for use by the Resource Monitoring and Control daemon is only being used by the RMC daemon." Check if something else is using the port of the RMC daemon:
# grep RMC /etc/services rmc rmc # lsof -i :657 COMMAND rmcd rmcd PID USER 1384574 root 1384574 root FD 3u 14u TYPE DEVICE SIZE/OFF NODE NAME 0t0 0t0 UDP *:rmc TCP *:rmc (LISTEN) 657/tcp 657/udp # RMC # RMC
IPv6 0xf35f20 IPv6 0xf2fd39
# netstat -Aan | grep 657 f1000600022fd398 tcp f10006000635f200 udp 0 0 0 0 *.657 *.657 *.* *.* LISTEN
The socket 0x22fd008 is being held by proccess 1384574 (rmcd).
No, it is actually the RMC daemon that is using this port, so this is fine. Start an IP trace to find out who's transmitting to this port:
# iptrace -a -d host1 -p 657 /tmp/trace.out # ps -ef | grep iptrace root 2040018 iptrace -a -d lawtest2 -p 657 /tmp/trace.out # kill 2040018 iptrace: unload success! # ipreport -n /tmp/trace.out > /tmp/trace.fmt
The IP trace reports only shows messages from RMC daemon of the HMC:
Packet Number 3 ====( 458 bytes received on interface en4 )==== 12:12:34.927422418 ETHERNET packet : [14:5e:81:60:9d -> 14:5e:db:29:9a] type 800 (IP) IP header breakdown: < SRC = < DST = 10.231.21.55 > 10.231.21.54 > (hmc) (host1)
ip_v=4, ip_hl=20, ip_tos=0, ip_len=444, ip_id=0, ip_off=0 DF ip_ttl=64, ip_sum=f8ce, ip_p = 17 (UDP) UDP header breakdown:
[ udp length = 424 | udp checksum = 6420 ] 00000000 00000010 00000020 00000030 00000040 00000050 00000060 00000070 00000080 00000090 000000a0 000000b0 000000c0 000000d0 ******** 00000150 02007108 00000000 4a03134a 40000000 |..q.....J..J@...| 0b005001 f0fff0ff e81fd7bf 01000100 ec9f95eb 85807522 02010000 05001100 2f001543 a88ba597 4a03134a 50a00200 00000000 00000000 4ca00200 00000000 85000010 00000000 01000000 45a34f3f fe5dd3e7 3901eb8d 169826cb cc22d391 e6045340 e2d4b997 1efc9b78 f0bfce77 487cbbd9 21fda20c f5cf8920 53d2f55a 2de3eb9d 62ba1eef 10b80598 e90f1918 9cd9c654 8fb26c66 2ba6f7f0 7d885d34 aa8d9f39 d2cd7277 7a87b6aa 494bb728 53dea666 65d92428 e2ad90ed 73869b8d d1deb7b2 719c27c5 e643dfdf 50000000 00000000 00000000 00000000 00000000 |..P.............| |......u"........| |/..C....J..JP...| |........L.......| |............E.O?| |.]..9.....&.."..| |..S@.......x...w| |H|..!...... S..Z| |-...b...........| |...T..lf+...}.]4| |...9..rwz...IK.(| |S..fe.$(....s...| |....q.'..C..P...| |................|
00000160 00000170 00000180 00000190
9c4670e2 7ec24946 de09ff13 f31c3647 f2a41648 3ae78b97 cd4f0177 d4f83407 37c6cdb0 4f089868 24b217b1 d37e9544 371bd914 eb79725b ef68a79f d50b4dd5
|.Fp.~.IF......6G| |...H:....O.w..4.| |7...O..h$....~.D| |7....yr[.h....M.|
To start iptrace on LPAR, do:

# startsrc -s iptrace -a "-b -p 657 /tmp/iptrace.bin"
To turn on PRM trace, on LPAR do:

# /usr/sbin/rsct/bin/rmctrace -s ctrmc -a PRM=100
Monitor /var/ct/3410054220/log/mc/default file on LPAR make sure you see NEW errors for 2610-217 log after starting trace, may need to wait for 10min (since every 10 minutes it logs one 2610-217 error entry). To monitor default file, do:
# tail -f /var/ct/3410054220/log/mc/default
To stop iptrace, on LPAR do:

# stopsrc -s iptrace
To stop PRM trace, on LPAR do:

To format the iptraces, do:

# ipreport -rns /tmp/ipt > /tmp/ipreport.out
Collect ctsnap data, on LPAR do:

# ctsnap -x runrpttr
When analyzing the data you may find several nodeid's in the packets. On HMC side, you can run: /usr/sbin/rsct/bin/rmcdomainstatus -s ctrmc to find out if 22758085eb959fec was managed by HMC. You will need to have root access on the HMC to run this command. And you can get a temporary password from IBM to run with the pesh command as the hscpe user to get this root access. This command will list the known managed systems to the HMC and their nodeid's. Then, on the actual LPARs run /usr/sbin/rsct/bin/lsnodeid to determine the nodeid of that LPAR. If you find any discrepancies between the HMC listing of nodeid's and the nodeid's found on the LPAR's, then that is causing the errpt message to appear about the change of the log file. To solve this, you have to recreate the RMC deamon databases on both the HMC and on the LPARs that have this issue: On HMC side run:
# /usr/sbin/rsct/bin/rmcctrl -z # /usr/sbin/rsct/bin/rmcctrl -A # /usr/sbin/rsct/bin/rmcctrl -p
Then run /usr/sbin/rsct/install/bin/recfgct on the LPARs:

# /usr/sbin/rsct/install/bin/recfgct 0513-071 The ctcas Subsystem has been added.
0513-071 The ctrmc Subsystem has been added. 0513-059 The ctrmc Subsystem has been started. Subsystem PID is 194568. # /usr/sbin/rsct/bin/lsnodeid 6bcaadbe9dc8904f
Repeat this for every LPAR connected to the HMC. After that, you can run on the HMC again:
# /usr/sbin/rsct/bin/rmcdomainstatus -s ctrmc # /usr/sbin/rsct/bin/lsrsrc IBM.ManagedNode Hostname UniversalId
After that, all you have to do is check on the LPARs if any messages are logged in 10 minute intervals:
# ls -als /var/ct/IW/log/mc/default
Duplicate errpt entries

By default, AIX will avoid logging duplicate errpt entries. You can see the default settings using smitty errdemon, which will be set to checking duplicate entries within a 10000 milliseconds time interval (10 seconds). Also, the default duplicate error maximum is set to 1000, so after 1000 duplicates, an additional entry will be made, depending on which is reached earlier, the duplicate time interval of 10 seconds or the duplicate error maximum. More information can be found here. Specifically, have a look at the section for "Customizing Duplicate Error Handling" (the -m and -t options). TOPICS: AIX, INSTALLATION, NIM, SYSTEM ADMINISTRATION
How to migrate from p5 to p6

If your AIX server level is below 5.3 TL06, the easiest way is just to upgrade your current OS to TL 06 at minimum (take note it will depend of configurations for Power6 processors) then clone your server and install it on the new p6. But if you want to avoid an outage on your account, you can do the next using a NIM server (this is not official procedure for IBM, so they do not support this): Create your mksysb resource and do not create a spot from mksysb. Create an lppsource and spot with minimum TL required (I used TL08). Once you do nim_bosinst, choose the mksysb, and the created spot. It will send a warning message about spot is not at same level as mksysb, just ignore it. Do all necessary to boot from NIM. Once restoring the mksysb, there's some point where it is not able to create the bootlist because it detects the OS level is not supported on p6. So It will ask to continue and fix it later via SMS or fix it right now. Choose to fix it right now (it will open a shell). You will notice oslevel is as the same as mksysb.
Create a NFS from NIM server or another server where you have the necessary TL and mount it on the p6. Proceed to do an upgrade, change the bootlist, exit the shell. Server will boot with new TL over the p6. TOPICS: AIX, NETWORKING, SYSTEM ADMINISTRATION
Map a socket to a process

Let's say you want to know what process is tying up port 25000:
# netstat -aAn | grep 25000 f100060020cf1398 f10006000d490c08 tcp4 0 0 0 *.25000 0 *.* LISTEN 0 0 0 /tmp/.sapicm25000
stream
f1df487f8
So, now let's see what the process is:

# rmsock f100060020cf1398 tcpcb The socket 0x20cf1008 is being held by proccess 1806748 (icman).
If you have lsof installed, you can get the same result with the lsof command:
# lsof -i :[PORT]
Example:
# lsof -i :5710 COMMAND oracle PID USER FD 18u TYPE DEVICE SIZE/OFF NODE NAME TCP host:5710
2638066 oracle
IPv4 0xf1b3f398 0t1716253
SCP Stalls
When you encounter an issue where ssh through a firewall works perfectly, but when doing scp of large files (for example mksysb images) the scp connection stalls, then there's a solution to this problem: Add "-l 8192" to the scp command. The reason for scp to stall, is because scp greedily grabs as much bandwith of the network as possible when it transfers files, any delay caused by the network switch of the firewall can easily make the TCP connection stalled. Adding the option "-l 8192" limits the scp session bandwith up to 8192 Kbit/second, which seems to work safe and fast enough (up to 1 MB/second):
# scp -l 8192 SOURCE DESTINATION
TOPICS: AIX, HARDWARE, LOGICAL PARTITIONING
Release adapter after DLPAR

An adapter that has previously been added to a LPAR and now needs to be removed, usually doesn't want to be removed from the LPAR, because it is in use by the LPAR. Here's how you find and remove the involved devices on the LPAR:
First, run:
# lsslot -c pci
This will find the adapter involved. Then, find the parent device of a slot, by running:
# lsdev -Cl [adapter] -F parent
(Fill in the correct adapter, e.g. fcs0). Now, remove the parent device and all its children:
# rmdev -Rl [parentdevice] -d
For example:
# rmdev -Rl pci8 -d
Now you should be able to remove the adapter via the HMC from the LPAR. If you need to replace the adapter because it is broken and needs to be replaced, then you need to power down the PCI slot in which the adapter is placed: After issuing the "rmdev" command, run diag and go into "Task Selection", "Hot Plug Task", "PCI Hot Plug Manager", "Replace/Remove a PCI Hot Plug Adapter". Select the adapter and choose "remove". After the adapter has been replaced (usually by an IBM technician), run cfgmgr again to make the adapter known to the LPAR. TOPICS: AIX, SYSTEM ADMINISTRATION
Howto setup AIX 'boot debugger'

The AIX kernel has an "enter_dbg" variable in it that can be set at the beginning of the boot processing which will cause all boot process output to be sent to the system console. In some cases, this data can be useful in debugging boot issues. The procedure for setting the boot debugger is as follows: First: Preparing the system. Set up KDB to present an initial debugger screen
# bosboot -ad /dev/ipldevice -I
Reboot the server:

# shutdown -Fr
Setting up for Kernel boot trace: When the debugger screen appears, set enter_dbg to the value we want to use:
************* Welcome to KDB ************* Call gimmeabreak... Static breakpoint: .gimmeabreak+000000 .gimmeabreak+000004 <.kdb_init+0002C0> r3=0 KDB(0)> mw enter_dbg enter_dbg+000000: xmdbg+000000: KDB(0)> g 00000000 = . = 42 tweq blr r8,r8 r8=0000000A
00000000
Now, detailed boot output will be displayed on the console. If your system completes booting, you will want to turn enter_dbg off:
00000000
When finished using the boot debugger, disable it by running:

# bosdebug -o # bosboot -ad /dev/ipldevice
Alternate disk install

It is very easy to clone your rootvg to another disk, for example for testing purposes. For example: If you wish to install a piece of software, without modifying the current rootvg, you can clone a rootvg disk to a new disk; start your system from that disk and do the installation there. If it succeeds, you can keep using this new rootvg disk; If it doesn't, you can revert back to the old rootvg disk, like nothing ever happened. First, make sure every logical volume in the rootvg has a name that consists of 11 characters or less (if not, the alt_disk_copy command will fail). To create a copy on hdisk1, type:
alt_disk_copy -d hdisk1
If you now restart your system from hdisk1, you will notice, that the original rootvg has been renamed to old_rootvg. To delete this volume group (in case you're satisfied with the new rootvg), type:
# alt_rootvg_op -X old_rootvg
A very good article about alternate disk installs can be found on developerWorks. If you wish to copy a mirrored rootvg to two other disks, make sure to use quotes around the target disks, e.g. if you wish to create a copy on disks hdisk4 and hdisk5, run:
# alt_disk_copy -d "hdisk4 hdisk5"
Installation history
A very easy way to see what was installed recently on your system:
# lslpp -h
Permanently change hostname

Permanently change hostname for inet0 device in the ODM by choosing one of the following: Command line method:
# chdev -l inet0 -a hostname=[newhostname]
SMIT fastpath method:

# smitty hostname
Change the name of the node which changes the uname process by choosing one of the following: Command line method:
# uname -S [newhostname]
Or run the following script:

# /etc/rc.net
Change the hostname on the current running system:

# hostname [newhostname]
Change the /etc/hosts file to reflect the new hostname. Change DNS name server, if applicable. TOPICS: AIX, STORAGE, SYSTEM ADMINISTRATION
Mounting a Windows share on an AIX system

There is a way to mount a share from a windows system as an NFS filesystem in AIX: 1. 2. Install the CIFS software on the AIX server (this is part of AIX itself: bos.cifs_fs). Create a folder on the windows machine, e.g. D:\share.
3.
Create a local user, e.g. "share" (user IDs from Active Directory can not be used): Settings -> Control Panel -> User Accounts -> Advanced tab -> Advanced button -> Select Users -> Right click in right window and select "New User" -> Enter User-name, password twice, deselect "User must change password at next logon" and click on create and close and ok.
4.
Make sure the folder on the D: drive (in this case "share") is shared and give the share a name (we'll use "share" again as name in this example) and give "full control" permissions to "Everyone".
5.
Create a mountpoint on the AIX machine to mount the windows share on, e.g./mnt/share.
6.
Type on the AIX server as user root:

# mount -v cifs -n hostname/share/password -o uid=201,fmode=750 /share /mnt/share
7.
You're done!
Switching between 32bit and 64bit

Before switching to 64bit mode, make sure the hardware supports it. To verify what is running and what the hardware can support, run the following as root:
# echo "Hardware:\t`bootinfo -y` bits capable" Hardware: 64 bits capable # echo "Running:\t`bootinfo -K` bits mode" Running: 32 bits mode # ls -l /unix /usr/lib/boot/unix lrwxrwxrwx 1 root system 21 Aug 15 2006 /unix -> /usr/lib/boot/unix_mp lrwxrwxrwx 1 root system 21 Aug 15 2006 /usr/lib/boot/unix -> /usr/lib/boot/unix_mp
To switch from 32-bit mode to 64-bit mode run the following commands, in the given order:
# ln -sf /usr/lib/boot/unix_64 /unix # ln -sf /usr/lib/boot/unix_64 /usr/lib/boot/unix # bosboot -ad /dev/ipldevice # shutdown -Fr
# ln -sf /usr/lib/boot/unix_mp /unix # ln -sf /usr/lib/boot/unix_mp /usr/lib/boot/unix # bosboot -ad /dev/ipldevice # shutdown -Fr
Bootinfo vs Getconf
The command /usr/sbin/bootinfo has traditionally been used to find out information regarding system boot devices, kernel versions, and disk sizes. This command has been depricated in favor of the command /usr/bin/getconf. The bootinfo man page has been removed, and the command is only used in AIX by the booting and software installation utilities. It should not be used in customer-created shell scripts or run by hand. The getconf command will report much of the same information that bootinfo will: What was the device the system was last booted from?
# getconf BOOT_DEVICE hdisk0
What size is a particular disk in the system?

# getconf DISK_SIZE /dev/hdisk0 10240
What partition size is being used on a disk in the system?

# getconf DISK_PARTITION /dev/hdisk0 16
Is the machine capable of running a 64-bit kernel?

$ getconf HARDWARE_BITMODE 64
Is the system currently running a 64-bit or 32-bit kernel?

# getconf KERNEL_BITMODE 64
How much real memory does the system have?

# getconf REAL_MEMORY 524288
TOPICS: AIX, NETWORKING, POWERHA / HACMP
Using an alternative MAC address

HACMP is capable of using an alternative MAC address in combination with its service address. So, how do you set this MAC address without HACMP, just using the command line? (Could come in handy, in case you wish to configure the service address on a system, without having to start HACMP).
# ifconfig enX down # ifconfig enX detach # chdev -l entX -a use_alt_addr=yes # chdev -l entX -a alt_addr=0x00xxxxxxxxxx # ifconfig enX xxx.xxx.xxx.xxx # ifconfig enX up
And if you wish to remove it again:

# ifconfig enX down
# ifconfig enX detach # chdev -l entX -a use_alt_addr=no # chdev -l entX -a alt_addr=0x00000000000
Change the PuTTY title

When you have a lot of Putty screens, or if you frequently login to a host through a jump server, it's very easy to set the title of the Putty window, for exmaple to the hostname of the server you're currently logged into. This way, you'll easily recognize each telnet screen, and avoid entering -possibly destructive- commands in the wrong window:
echo "\033]0;`hostname`\007"
For example, you can add this line to /etc/profile, and have the hostname of the PuTTY title set automatically. TOPICS: AIX, SECURITY, SYSTEM ADMINISTRATION
Password-less SSH Login

On occasion I have the need to establish trust relationships between Unix boxes so that I can script file transfers. In short, here's how you leverage SSH to do that. Using the example of trying to connect from server "a" to get a file on "b" follow this example: Connect to "a". Type: ssh-keygen -t rsa The default directory for keyfiles will be ~/.ssh/ (if you do not want to be prompted, leave passphrase blank). Copy the contents of .ssh/id_rsa.pub (there should only be one line). Place this line on "b", in ~/.ssh/authorized_keys. That's it, you should now be able to ssh/sftp/scp from a to b without being prompted for a password! TOPICS: AIX, INSTALLATION, SYSTEM ADMINISTRATION
Creating a bootable AIX DVD

This is a short procedure how to creat an AIX DVD from a base set of 8 AIX 5.3 CD's: 1. Copy all CD's using "cp -hRp" to a directory, start with the last CD and work towards the first one. You can do this on either an AIX or a Linux system. 2. 3. Check that <directory>/installp/ppc contails all install images. If not already, remove <directory>/usr/sys/inst.images. This directory also might contain all installation images. 4. 5. Create a link <directory>/usr/sys/inst.images pointing to <directory>/installp/ppc. Find all .toc files in the directory structure and, if necessary, change all vol%# entries to vol%1 (There should be at least 2 .toc files that need these updates). You have to change vol%2 to vol%1, vol%3 to vol%1, etcetera, up till vol%8.
6.
7.
Create an ISO image with RockRidge extentions:

# mkisofs -R -o
Now you've created an ISO image that you can burn to a DVD. Some specific information on burning this ISO image on AIX to a DVD-RAM: Burning a DVD-RAM is somewhat more difficult than burning a CD. First, it depends if you have a slim-line DVD-RAM drive in a Power5 system or a regular DVD-RAM drive in Power4 systems (not slimline). Use DLPAR to move the required SCSI controller to a LPAR, in order to be able to use the DVD-RAM drive. After the DLPAR action of the required SCSI controller is complete, execute: cfgmgr. After the configuration manager has run, you will end up with either 1 or 2 DVD drives (depending on the actual drives in the hardware frame):
# lsdev -Cc cdrom cd0 Available 3F-09-00-0,0 SCSI DVD-RAM Drive cd1 Available 3F-09-00-5,0 16 Bit LVD SCSI DVD-ROM Drive
As you can see, the first is the DVD-RAM, the second is a DVD-ROM. Therefor, we will use the first one (in this sample). Place a DVD-RAM single sided 4.7 GB Type II disc (partnumber 19P0862) in the drive. DO NOT USE ANY OTHER TYPE OF DVD-RAM DISCS. OTHER TYPE OF DISCS ARE NOT SUPPORTED BY IBM. In case you have a POWER4 system: Be sure to use a use the case of the DVD-RAM in order to burn the DVD. DVD-RAM drives in Power4 systems will NOT burn if you removed the DVD-RAM from its case. Also, be sure to have the latest firmware level on the DVD-RAM drive (see websitehttp://www14.software.ibm.com/webapp/set2/firmware for the correct level of the firmware for your drive). Without this firmware level these DVD-RAM drives are unable to burn Type II DVD-RAM discs. Using lscfg -vl cd0 you can check the firmware level:
# lscfg -vl cd0 cd0 U1.9-P2-I1/Z2-A0 SCSI DVD-RAM Drive (4700 MB)
Manufacturer................IBM Machine Type and Model......DVRM00203 ROS Level and ID............A132
Device Specific.(Z0)........058002028F000010 Part Number.................04N5272 EC Level....................F74471 FRU Number..................04N5967
The firmware level of this DVD-RAM drive is "A132". This level is too low in order to be able to burn Type II discs. Check the website for the latest level. The description on this webpage on how to install the DVD-RAM firmware was found to be inaccurate. Install firmware as follows: Download the firmware file and place it in /tmp on the server. You will get a filename with a "rpm" extension. Run:
# rpm -ihv --ignoreos <filename>
Example:
# rpm -ihv --ignoreos /tmp/ibm-scsi-dvdram.dvrm00203-A151.rpm ibm-scsi-dvdram.dvrm00203 #############################
(Beware of the double dash before "ignoreos"!!). This command will place the microcode in /etc/microcode. Run:
# diag -d cd0 -c -T "download -s /etc/microcode -f"
This will install the firmware. Use the correct DVD-RAM drive (in this case cd0) to install the firmware!!
# diag -d cd0 -c -T "download -s /etc/microcode -f" Installation of the microcode has completed successfully. The current microcode for cd0 is IBM-DVRM00203.A151. Please run diagnostics on the device to ensure that it is functioning properly.
Use the following command to burn the DVD-RAM:

# /usr/sbin/burn_cd -d /dev/cd0 /install/aix53ml4dvd.iso
Burning a DVD-RAM can take a long time. Variable burn times from 1 to 7 hours were seen!!! A DVD-RAM made in a slim-line DVD drive on a Power5 system can be read in a regular DVD drive on a Power4 system, if the latest firmware is installed on the DVD drive. On a Linux system you can use a tool like K3B to write the ISO image to a regular DVD+R disc. TOPICS: AIX, SECURITY, SYSTEM ADMINISTRATION
Portmir
A very nice command to use when you either want to show someone remotely how to do something on AIX, or to allow a non-root user to have root access, is portmir.
First of all, you need 2 users logged into the system, you and someone else. Ask the other person to run the tty command in his/her telnet session and to tell you the result. For example:
user$ tty /dev/pts/1
Next, start the portmirror in your own telnet session:

root# portmir -t /dev/pts/1
(Of course, fill in the correct number of your system; it won't be /dev/pts/1 all the time everywhere!) Now every command on screen 1 is repeated on screen 2, and vice versa. You can both run commands on 1 screen. You can stop it by running:
# portmir -o
If you're the root user and the other person temporarily requires root access to do something (and you can't solve it by giving the other user sudo access, hint, hint!), then you can su - to root in the portmir session, allowing the other person to have root access, while you can see what he/she is doing. You may run into issues when you resize a screen, or if you use different types of terminals. Make sure you both have the same $TERM setting, i.e.: xterm. If you resize the screen, and the other doesn't, you may need to run the tset and/or the resizecommands. TOPICS: AIX, BACKUP & RESTORE, STORAGE, SYSTEM ADMINISTRATION
JFS2 snapshots
JFS2 filesystems allow you to create file system snapshots. Creating a snapshot is actually creating a new file system, with a copy of the metadata of the original file system (the snapped FS). The snapshot (like a photograph) remains unchanged, so it's possible to backup the snapshot, while the original data can be used (and changed!) by applications. When data on the original file system changes, while a snapshot exists, the original data is copied to the snapshot to keep the snapshot in a consistant state. For these changes, you'll need temporary space, thus you need to create a snapshot of a specific size to allow updates while the snapshot exists. Usually 10% is enough. Database file systems are usually not a very good subject for creating snapshots, because all database files change constantly when the database is active, causing a lot of copying of data from the original to the snapshot file system. In order to have a snapshot you have to: Create and mount a JFS2 file system (source FS). You can find it in SMIT as "enhanced" file system.
Create a snapshot of a size big enough to hold the changes of the source FS by issuing smitty crsnap. Once you have created this snapshot as a logical device or logical volume, there's a read-only copy of the data in source FS. You have to mount this device in order to work with this data.
Mount your snapshot device by issuing smitty mntsnap. You have to provide a directory name over which AIX will mount the snapshot. Once mounted, this device will be read-only. Creating a snapshot of a JFS2 file system:
# snapshot -o snapfrom=$FILESYSTEM -o size=${SNAPSIZE}M
Where $FILESYSTEM is the mount point of your file system and $SNAPSIZE is the amount of megabytes to reserve for the snapshot. Check if a file system holds a snapshot:
# snapshot -q $FILESYSTEM
When the snapshot runs full, it is automatically deleted. Therefore, create it large enough to hold all changed data of the source FS. Mounting the snapshot: Create a directory:
# mkdir -p /snapshot$FILESYSTEM
Find the logical device of the snapshot:

# SNAPDEVICE=`snapshot -q $FILESYSTEM | grep -v ^Snapshots | grep -v ^Current | awk '{print $2}'`
Mount the snapshot:

# mount -v jfs2 -o snapshot $SNAPDEVICE /snapshot$FILESYSTEM
Now you can backup your data from the mountpoint you've just mounted. When you're finished with the snapshot: Unmount the snapshot filesystem:
# unmount /snapshot$FILESYSTEM
Remove the snapshot:

# snapshot -d $SNAPDEVICE
Remove the mount point:

# rm -rf /snapshot$FILESYSTEM
When you restore data from a snapshot, be aware that the backup of the snapshot is actually a different file system in your backup system, so you have to specify a restore destination to restore the data to. TOPICS: AIX, LVM, SYSTEM ADMINISTRATION
How to mount/unmount an ISO CD-ROM image as a local file system

To mount: 1. 2. Build a logical volume (the size of an ISO image, better if a little bigger). Create an entry in /etc/filesystem using that logical volume (LV), but setting its Virtual File System (V'S) to be cdrfs. 3. 4. 5. Create the mount point for this LV/ISO. Copy the ISO image to the LV using dd. Mount and work on it like a mounted CD-ROM.
The entry in /etc/filesystem should look like:

/IsoCD:
dev = /dev/lv09 vfs = cdrfs mount = false options = ro account = false
To unmount: 1. 2. Unmount the file system. Destroy the logical volume.
TOPICS: AIX, SDD, STORAGE, STORAGE AREA NETWORK
PVID trouble
To add a PVID to a disk, enter:
# chdev -l vpathxx -a pv=yes
To clear all reservations from a previously used SAN disk:

# chpv -C vpathxx
TOPICS: AIX, MONITORING, SYSTEM ADMINISTRATION
Cec Monitor
To monitor all lpars within 1 frame, use:
# topas -C
Keeping a processor busy

There are times that you would like to create some "load" on the system. A very, very easy way of keeping a processor very busy is:
# yes > /dev/null
The yes command will continiously echo "yes" to /dev/null. This is a single-threaded process, so it will put load on a single processor. If you wish to put load on multiple processors, why not run yes a couple of times? TOPICS: AIX, BACKUP & RESTORE, SYSTEM ADMINISTRATION
Cloning a system using mksysb

If you wish to clone a system with a mksysb, then you can do so, but you do not want your cloned system to come up with the same TCP/IP information. Just issue rmtcpipbefore creating the mksysb, and you have a perfect image for cloning to another system. Be sure to issue this command at a terminal, as you will lose your network connection! TOPICS: AIX, SYSTEM ADMINISTRATION
Find: 0652-018 An expression term lacks a required parameter

If you get this error, you probably have one of the following things wrong: You've forgotten the slash and semicolon in the find command. Use findcommand like this:
# find /tmp -mtime +5 -type f -exec rm {} \;
If you get this error from crontab, then you should add an extra slash to the slash and semicolon. Use the find command like this:
0 2 * * * find /tmp -mtime +5 -type f -exec rm {} \\;
TOPICS: AIX, NETWORKING, ODM
Delete multiple default gateways

First, obtain how many gateways there are:
# odmget -q "attribute=route" CuAt
CuAt: name = "inet0" attribute = "route" value = "net,-hopcount,0,,0,192.168.0.2" type = "R" generic = "DU" rep = "s" nls_index = 0
CuAt: name = "inet0" attribute = "route" value = "net,-hopcount,0,,0,192.168.0.1" type = "R" generic = "DU"
rep = "s" nls_index = 0
If there are more than one, you need to remove the excess route:
# chdev -l inet0 -a delroute="net,-hopcount,0,,0,192.168.0.2" Method error (/usr/lib/methods/chginet): 0514-068 Cause not known. 0821-279 writing to routing socket: The process does not exist. route: not in table or multiple matches 0821-207 chginet: Cannot add route record to CuAt.
Then verify again:

Determining microcodes
A very usefull command to list microcodes is lsmcode:
# lsmcode -c
Calculating dates in Korn Shell

Let's say you wish to calculate with dates within a Korn Shell script, for example "current date minus 7 days". How do you do it? There's a tiny C program that can do this for you, called ctimed. You can download it here: ctimed.tar. Executable ctimed uses the UNIX Epoch time to calculate. UNIX counts the number of seconds passed since Januari 1, 1970, 00:00. So, how many seconds have passed since 1970?
# current=`./ctimed now`
This should give you a number well over 1 billion. How many seconds is 1 week? (7 days, 24 hours a day, 60 minutes an hour, 60 seconds an hour):
# let week=7*24*60*60
# let aweekago="$current-$week" Convert this into human readable format:
# ./ctimed $aweekago
You should get something like: Sat Sep 17 13:50:26 2005 TOPICS: AIX, SYSTEM ADMINISTRATION
Printing to a file
To create a printer queue that dumps it contents to /dev/null:
# /usr/lib/lpd/pio/etc/piomkpq -A 'file' -p 'generic' -d '/dev/null' -D asc -q 'qnull'
This command will create a queue named "qnull", which dumps its output to /dev/null. To print to a file, do exactly the same, except, change /dev/null to the/complete/path/to/your/filename you like to print to. Make sure the file you're printing to exists and has the proper access rights. Now you can print to this file queue:
# lpr -Pqfile /etc/motd
and the contents of your print will be written to a file. TOPICS: AIX, STORAGE, SYSTEM ADMINISTRATION
Burning AIX ISO files on CD

If you wish to put AIX files on a CD, you *COULD* use Windows. But, Windows files have certain restrictions on file length and permissions. Also, Windows can't handle files that begin with a dot, like ".toc", which is a very important file if you wish to burn installable filesets on a CD. How do you solve this problem? Put all files you wish to store on a CD in a separate directory, like: /tmp/cd Create an ISO file of this directory. You'll need mkisofs to accomplish this. This is part of the AIX Toolbox for Linux. You can find it in /opt/freeware/bin.
# mkisofs -o /path/to/file.iso -r /tmp/cd
This will create a file called file.iso. Make sure you have enough storage space. Transfer this file to a PC with a CD-writer in it. Burn this ISO file to CD using Easy CD Creator or Nero. The CD will be usable in any AIX CD-ROM drive. TOPICS: AIX
What is LUM and what is the ii4lmd daemon?

IBM License Use Management (LUM) is the IBM product for technical software license management deployed by most IBM use-based priced software products, such as the C Compiler. More info on:
Installing C for AIX License Use Management License Use Management - Library TOPICS: AIX, LINUX, SYSTEM ADMINISTRATION
Remote file system copy

How to copy a filesystem from one server to another: Make sure you can execute a remote shell on the target host (by adding an entry of the source host in the /.rhosts file). Login to the source system as root and enter:
# (cd LOCAL_MOUNTPOINT && tar cpvBf - . ) | rsh REMOTEHOST 'cd REMOTE_MOUNTPOINT && tar xpvBf -'
For ssh, use the following command:

# tar -cf - myfiles | ssh user@host "umask 000 ; cat | tar -xpf -"
You might also have run into the problem that, when FTP'ing CD software on a Windows PC to a remote AIX system, files with lowercase names suddenly change to uppercase file names. This is how to copy the complete contents of a CD on a Red Hat Linux system to a remote AIX system as a tar file: Login as root on the Linux system. Mount the CD-ROM:
# mount /mnt/cdrom # cd /mnt/cdrom
Tar the contents:

# tar -cvf - . | ssh userid@hostname "cd /path/to/where/you/want/it/on/the/target/system ; cat > filename.tar"
Unmount the CD-ROM:

# cd / # umount /mnt/cdrom
Important note: make sure you can write with your user-ID in the target folder on the target system. Otherwise your tar file might end up in the home directory of the user-ID used. TOPICS: AIX, SECURITY, SYSTEM ADMINISTRATION
HOWTO: set up ssh keys

First, install OpenSSH and OpenSSL on two UNIX servers, serverA and serverB. This works best using DSA keys and SSH2 by default as far as I can tell. All the other HOWTOs I've seen seem to deal with RSA keys and SSH1, and the instructions not surprisingly fail to work with SSH2. On each server type ssh someserver.example.com and make a connection with your regular password. This will create a .ssh dir in your home directory with the proper permissions. On your primary server where you want your secret keys to live (let's say serverA), type:
# ssh-keygen -t dsa
This will prompt you for a secret passphrase. If this is your primary identity key, use an empty passphrase (which is not secure, but the easiest to work with). If this works right, you will get two files called id_dsa and id_dsa.pub in your .ssh dir. Copy the id_dsa.pub file to the other host's .ssh dir with the name authorized_keys2:
# scp ~/.ssh/id_dsa.pub serverB:.ssh/authorized_keys2
Now serverB is ready to accept your ssh key. For a test, type:
# ssh serverB
This should let you in without typing a password or passphrase. Hooray! You can ssh and scp all you want and not have to type any password or passphrase. TOPICS: AIX, SYSTEM ADMINISTRATION
Fast IPL
Using FAST IPL only works on some RS6000 systems, like SP's or J/G/R30/40's. To configure FAST IPL:
# mpcfg -cf 11 1
Check current configuration:

# mpcfg -df
If you can only use a terminal to configure the Fast IPL: Put the key into service mode, press [ENTER] on the keyboard. Then type: sbb. Using the menu you can configure Fast IPL. Then reboot and switch the key back to Normal. TOPICS: AIX, SSA, STORAGE
SSA batteries
To find the status of the batteries of an SSA adapter, enter as root:
# ssa_fw_status -a ssaX
X is the number of your adapter, for example:

# ssa_fw_status -a ssa0 Fast write cache size: 32 Expected battery life: 22000 Powered on hours: 20706 Battery state: Active
After installing a new battery, enter the following command:

# ssa_format - l ssaX - b
This will reset the lifetime counter.
Log file rotation script

A little script to rotate a log while not upsetting the process which is logging to the file. This script will copy and compress the log file, and then zero the log file out. Then the script will search for older log files and remove them after +3 days since last modification.
DATE=`date +%d%h-%I%p` BASE_DIR='/var' if [ -f $BASE_DIR/logname.log ]; then cp $BASE_DIR/logname.log $BASE_DIR/logname.log.$DATE > $BASE_DIR/logname.log compress $BASE_DIR/logname.log.$DATE fi
find $BASE_DIR -name 'logname.log.*' -a -mtime +3 -exec rm {} \;
Korn Shell history

To retrieve a list of all recent commands:
# history -100
This shows you the last 100 entries. TOPICS: AIX, BACKUP & RESTORE, SYSTEM ADMINISTRATION
DVD-RAM Backup
You can use a DVD-RAM to create a system backup. To do so, enter:
# smitty mkdvd
This works in AIX 5.2 and above. TOPICS: AIX, STORAGE, SYSTEM ADMINISTRATION
Using a DVD-RAM as a regular file system

To use a DVD-RAM for writing and reading like a file system, use the following commands. This will work only at AIX 5.2 and above: Put a DVD-RAM into the drive
# udfcreate -d /dev/cd0
Mount the DVD-RAM:

# mount -V udfs /dev/cd0 /mnt
If you get an error, ensure /etc/vfs contains this line (and retry the mount command after validating):
udfs 34 /sbin/helpers/udfmnthelp
Then use this as a regular filesystem. TOPICS: AIX
Word wrapping files
If the lines in your text file are too long, you may want to word wrap them. In AIX this command is called fold:
# fold -sw 72 longfile > shortfile
This command will keep the longest line up to 72 characters and will not break a word in half. Without -w 72 lines will be wrapped to 80 characters. TOPICS: AIX, SYSTEM ADMINISTRATION
Processor speed and more system information

To quickly show you the processor speed, cpu type, amount of memory and other system information, type:
# lsconf
You can also use prtconf. TOPICS: AIX, BACKUP & RESTORE, SYSTEM ADMINISTRATION
/dev/ipldevice gone?
Sometimes, when you create an mksysb, you receive an error like this one:
/dev/ipldevice not found
Device /dev/ipldevice is a hard link to the disk your system booted from. Mksysb tries to determine the size of the boot logical volume with the bosboot -qad /dev/ipldevicecommand. Via lslv -m hd5 you can see from which disk was booted (or via bootlist -m normal -o). To resolve this problem: re-create the hard link yourself:
# ln /dev/bootdevice /dev/ipldevice
For example:
ln /dev/rhdisk0 /dev/ipldevice
Note: Use "rhdisk" and not "hdisk". Another way to solve this problem: reboot your system and the /dev/ipldevice will be created automatically for you (Your users may prefer the first solution...). TOPICS: AIX, SSA, STORAGE, SYSTEM ADMINISTRATION
Renaming pdisks
If, for some reason, the pdisk and hdisk numbering of SSA disks is not sequential anymore, then there's a way to bring order in to chaos. Usually, the pdisk and hdisk numbering order are screwed up when you replace multiple disks together. Especially on HACMP clusters, a correct numbering of pdisks and hdisks on all nodes of the cluster, comes in handy. Unmount all file systems on the specific disks, then varyoff the volume group:
# /usr/lib/methods/cfgssar -l ssar
If this doesn't help (it sometimes will), then renumber the disks manually:
Write down the pdisk names, hdisk names, location of the disks in the SSA drawer and the connection ID's of the disks. You can use lsdev -Cc pdisk to show you all the pdisks and the drawer and location codes. Use lsdev -Clpdiskx -Fconnwhere to show the connection ID of a pdisk. Then, figure out, how you want all disks numbered. Remove the pdisks and hdisks with the rmdev -dl command. Create the pdisks again:
# mkdev -p ssar -t scsd -c pdisk -s ssar -w [connection-ID] -l pdisk1
Create the hdisks again:

# mkdev -p ssar -t hdisk -c disk -s ssar -w [connection-ID] -l hdisk3
Test with:
# ssaxlate -l pdisk1
if it shows hdisk3 (Usually the hdisk number is 2 higher than the pdisk number if you use 2 SCSI disks in the rootvg). If you've done all disks this way, check with lsdev -Cc pdisk. If you're happy, then varyon the volume group again and mount all filesystems. TOPICS: AIX, ODM, SYSTEM ADMINISTRATION
Automatically e-mail error report entries

You can automatically forward all error report entries to your email. This next part describes how to do that. Create a file like this:
# cat /tmp/mailgeorge errnotify: en_name="mailgeorge" en_persistenceflg=1 en_method="errpt -a -l $1|mail -s \"errpt: $9\" george@email.com"
Add this to the ODM:

# odmadd /tmp/mailgeorge
Now log an entry in the error report:

# errlogger "My coffee is cold"
You will see in the error report:

# errpt -a ---------------------------------------------------LABEL: IDENTIFIER: OPMSG AA8AB241
Date/Time:
Tue Oct
6 15:57:58 CDT 2009
Sequence Number: 585
Machine Id: Node Id: Class: Type: Resource Name:
0004D6EC4C00 hostname O TEMP OPERATOR
Description OPERATOR NOTIFICATION
User Causes ERRLOGGER COMMAND
Recommended Actions REVIEW DETAILED DATA
Detail Data MESSAGE FROM ERRLOGGER COMMAND My coffee is cold
Clear the error log again (because we logged a fake test-entry in the error report):
# errclear 0
Watch your email. You should receive the same error report entry in your email. By the way, you can delete this from the ODM like this:
# odmdelete -q 'en_name=mailgeorge' -o errnotify
More info here: http://www.blacksheepnetworks.com/security/resources/aix-errornotification.html. TOPICS: AIX, NIM, SYSTEM ADMINISTRATION
Doing a mksysb restore through NIM

A "how-to" restore a mksysb through NIM: Create a mksysb resource in NIM: Logon to the NIM server as user root. Run smitty nim, Perform NIM Administration Tasks, Manage resources, Define a resource, select mksysb, type name mksysb_, enter "master" as Server of Resource, enter the full path to the mksysb file at Location of Resource: e.g. /backup/hostname.image. Add the mksysb resource to the defined machine in NIM, together with the original SPOT and LPP source of the host: Run: smitty nim, Perform NIM Administration Tasks, Manage Machines, Manage Network Install Resource Allocation, Allocate Network Install Resources, select the machine, select the mksysb resource defined in the previous step, along with the correct SPOT and LPP_SOURCE of the oslevel of the system.
Do a perform operation on the machine in NIM and set it to mksysb: Run smitty nim, Perform NIM Administration Tasks, Manage Machines, Perform Operations on Machines, select the machine, select bos_inst, set the Source for BOS Runtime Files to mksysb, set Remain NIM client after install to no, set Initiate Boot Operation on Client to no, set Accept new license agreements to yes.
Start up the system in SMS mode and boot from the NIM server, using a virtual terminal on the HMC. Select the disks to install to. Make sure that you set import user volume groups to "yes". Restore the system. By the way, another method to initiate a mksysb restore is by using:
# smitty nim_bosinst
TOPICS: AIX
AIX Introduction
AIX is short for Advanced Interactive eXecutive. AIX is the UNIX operating system from IBM for RS/6000, pSeries and the latest Power systems. Currently, it is called "System P". IBM is nowadays the largest UNIX hardware vendor worldwide. AIX and RS/6000 was released on the 14th of February, 1990 in London. Currently, the latest release of AIX is version 6.1. Also AIX 5.3 exists and is still supported by IBM. Older versions (e.g. 3.2.5, 4.3.3., 5.1 and 5.2) have reached end-of-program services and thus are no longer supported by IBM.
AIX supports Logical Partioning (short: LPAR). With LPAR you can create mulitple system environments on a single machine, thus sharing the processor and memory resources of a single machine by several operating system instances. From AIX 5.2 on, AIX supports DLPAR, Dynamic Logical Partitioning, which enables administrators to add, remove or move system resouces such as memory, adapters and CPU between partitions without the need to reboot each partion. From AIX 5.3, AIX supports micro-partitioning. With LPAR, a single CPU can only be used by a single Operating System instance. With micropartitioning, a CPU can be shared by up to 10 operating system instances. From AIX 5.3 also the sharing of disk and network resources by several operating system instances is
supported. Later versions of AIX and Power hardware now also include the ability to share I/O amongst several operating system images, through the use of a Virtual I/O server (VIO). IBM used to supply Maintenance Levels for AIX. Nowadays, they supply Technology Levels, one in February and one in July each year. TOPICS: AIX, PERFORMANCE, SYSTEM ADMINISTRATION
PerfPMR
When you suspect a performance problem, PerfPMR can be run. This is a tool generally used by IBM support personal to resolve performance related issues. The download site for this tool is: ftp://ftp.software.ibm.com/aix/tools/perftools/perfpmr TOPICS: AIX, NETWORKING, SYSTEM ADMINISTRATION
ICMP packets from an AIX system

Where did all these routes come from & why is my system sending ICMP packets every 10 minutes? This is caused by path MTU discovery. If your AIX system is sending ICMP packets, you can disable it: AIX has a feature called path MTU discovery which is based on ICMP packets in order to learn MTU sizes on the LAN. Path MTU discovery is a way of finding out the maximum packet size that can be sent along a network using ICMP packets and is enabled on AIX by default. This is done to avoid IP fragmentation on heterogenous networks (ie, an ATM network connected to an ethernet network) and is described in RFC 1191.
# no -a | grep discover
will show you whether tcp_pmtu_discover and udp_pmtu_discover are enabled (1) or disabled (0). Disable them with:
# no -p -o tcp_pmtu_discover=0 # no -p -o udp_pmtu_discover=0
If these are disabled, you shouldn't see any ICMP messages any more. When one system tries to optimize its transmissions by discovering the path MTU, a pmtu entry is created in a Path MTU (PMTU) table. You can display this table using thepmtu display command. To avoid the accumulation of pmtu entries, unused pmtu entries will expire and be deleted when the pmtu_expire time (no -o pmtu_expire) is exceeded; default after 10 minutes. TOPICS: AIX, SYSTEM ADMINISTRATION
Proper vmstat output

Often when you run vmstat, the output may be looking very disorganized, because of column values not being shown properly under each row. A very simple solution to this issue is to run the same command with the "-w" flag, which will provide you with a wide vmstat output:
# vmstat -w
TOPICS: AIX, NIM, SYSTEM ADMINISTRATION
Nim status of a client

To check on the status of a NIM client, for example, when doing a mksysb restore through NIM:
# lsnim -l hostname
Simple printer remediation

To resolve a stuck printer, do the following: Check the status of the queue:
# lpstat -W -p[queue-name]
Find the pio process in use for that queue:

# ps -ef | grep pio
Kill that pio process:

# kill [process-number]
Check the status of the queue again:

Enable the queue again:

# enable [queue-name]
System dump compression

Default compression of system dump isn't turned on, which may cause a lot of error report entries about your system dump devices being too small. Check if the compression is on/off:
# sysdumpdev -l
Without compression, sysdumpdev -e will estimate the system dump size. To turn compression on:
# sysdumpdev -C
This will reduce the required (estimated) dump size by 5-7. TOPICS: AIX, NETWORKING, SYSTEM ADMINISTRATION
Measuring network throughput

To measure network throughput independantly from disk I/O: Set up an FTP connection from machine A to machine B.
bin put "|dd if=/dev/zero bs=32k count=1024" /dev/null
This will transfer a file of 32K * 1024 = 32 MB. The transfer informaton will be shown by FTP. TOPICS: AIX, PERFORMANCE, STORAGE, SYSTEM ADMINISTRATION
Creating a large file

When you wish to create a very large file for test purposes, try this command:
# dd if=/dev/zero bs=1024 count=2097152 of=./test.large.file
This wil create a file consisting of 2097152 blocks of 1024 bytes, which is 2GB. You can change the count value to anything you like. Be aware of the fact, that if you wish to create files larger than 2GB, that your file system needs to be created as a "large file enabled file system", otherwise the upper file size limit is 2GB (under JFS; under JFS2 the upper limit is 64GB). Also check the ulimit values of the user-id you use to create the large file: set the file limit to -1, which is unlimited. Usually, the file limit is default set to 2097151 in /etc/security/limits, which stands for 2097151 blocks of 512 bytes = 1GB. Another way to create a large file is:
# /usr/sbin/lmktemp ./test.large.file 2147483648
This will create a file of 2147483648 bytes (which is 1024 * 2097152 = 2GB). You can use this large file for adapter throughput testing purposes: Write large sequential I/O test:
# cd /BIG # time /usr/sbin/lmktemp 2GBtestfile 2147483648
Divide 2048/#seconds for MB/sec write speed. Read large sequential I/O test:
# umount /BIG
(This will flush file from memory)

# mount /BIG # time cp 2GBtestfile /dev/null
Divide 2048/#seconds for MB/sec read speed. Tip: Run nmon (select a for adapter) in another window. You will see the throughput for each adapter. More information on JFS and JFS2 can be found here. TOPICS: AIX, NETWORKING, SYSTEM ADMINISTRATION
Interface down from boot

If you wish to disable a network interface permanently, avoiding NETWORK DOWN errors in the error report:
# chdev -l en0 -a state=down -P
This command will permanently bring down the en0 interface (permanently means after reboot).
Restoring a mksysb of a mirrored rootvg to a nonmirrored rootvg

If you've created a mksysb of a mirrored rootvg and you wish to restore it to a system with only 1 disk in the rootvg, you can go about it as follows: Create a new /image.data file, run:
# mkszfile -m
Change the image.data file:

# vi /image.data
In each lv_data stanza of this file, change the values of the COPIES= line by one-half (i.e. copies = 2, change to copies = 1). Also change the PPs to match the LPs as well. Create a new mksysb, utilizing the /image.data file:
# mksysb /dev/rmt0
(Do not use smit and do not run with the -i flag, both will generate a new image.data file). Use this mksysb to restore your system on another box without mirroring. TOPICS: AIX, SYSTEM ADMINISTRATION
Copying a logical volume

For copying a logical volume, the cplv command can be used. Using the -v flag, you can enter the name of the volume group you wish to copy to. Once the logical volume has been copied, the trick is to get the file system in it back online: Copy the stanza of the file system in the logical volume you copied to a new stanza in/etc/filesystems. Modify this new stanza; enter the correct jfs log, mount point and logical volume name. After this, do a fsck of your new mount point. Make sure your new mount point exists. After the fsck, you can mount the file system. TOPICS: AIX, HMC, SYSTEM ADMINISTRATION
HMC Access & Security Issues
As of Hardware Management Console (HMC) Release 3, Version 2.3, the rexeccommand is no longer available on the HMC. Use ssh command instead. From Version 2.5, users are required to enter a valid HMC user id/password when downloading the WebSM client from the HMC. The URL for the WebSM client is: http://[HMC fully qualified domain name]/remote_client.html. Standard users receive the restriced shell via a set -r in .mysshrc when logging in. Comment the set -r command in /opt/hsc/data/ssh/hmcsshrc to get rid of the restricted shell for your users (it gets copied to $HOME/.mysshrc). For more information on commands that can be used in restriced shell on the HMC, go to HMC Power 4 Hints & Tips. A special hscpe user ID can be created which has unrestricted shell access via thepesh command. Use lshmc -v to determine the serial number of the HMC (after *SE). Then call IBM support and request for the password of the hscpe user for the peshcommand. IBM is able to generate a password for the hscpe user for one day. TOPICS: AIX, SYSTEM ADMINISTRATION
Creating graphs from NMON

Here's how: (This has been tested this with Nmon version 9f, which you can downloadhere): Run nmon -f for a while. This will create nmon output files *.nmon in your current directory. Make sure you've downloaded rrdtool. Install rrdtool by unpacking it in /usr/local/bin. Make sure directory /usr/local/bin is in your $PATH:
# export PATH="$PATH:/usr/local/bin"
Create a directory for the output of the graphs:

# mkdir output
Run the nmon2rrdv9 tool (found in the nmon download):

# ./nmon2rrdv9 -f [nmon output file] -d ./output -x
In directory output an index.html and several gif files will be created. By accessing index.html in a web browser, you can view the graphs. For a sample, click here. Tip: use nweb as your web browser. TOPICS: AIX, SYSTEM ADMINISTRATION
Find large files

How do you find really large files in a file system:
# find . -size +1024 -xdev -exec ls -l {} \;
The -xdev flag is used to only search within the same file system, instead of traversing the full directory tree. The amount specified (1024) is in blocks of 512 bytes. Adjust this value for the size of files you're looking for. TOPICS: AIX, SYSTEM ADMINISTRATION
TCPdump/IPtrace existing files?

When you receive an error like:
Do not specify an existing file
when using iptrace or tcpdump, then this is probably caused by a kernel extension already loaded. To resolve this, run:
# iptrace -u
After this, the kernel externsion is removed and iptrace or tcpdump will work again. TOPICS: AIX, SYSTEM ADMINISTRATION
Logical versus Physical directory using PWD

The pwd command will show you default the logical directory (pwd -L = default), which means, if any symbolic links are included in the path, that is, what will be shown. To show you the actual physical directory, use the next undocumented feature:
# pwd -P
Searching large amounts of files

Searching with grep in directories with large amounts of files, can get the following error:
The parameter list is too long
The workaround is as follows:

# ls | xargs grep "[search string]"
E.g.
# ls | xargs grep -i "error"
Kill all processes of a specific users

To kill all processes of a specific user, enter:
# ps -u [user-id] -o pid | grep -v PID | xargs kill -9
Another way is to use who to check out your current users and their terminals. Kill all processes related to a specific terminal:
# fuser -k /dev/pts[#]
Yet another method: Su to the user-id you wish to kill all processes of and enter:
# su - [user-id] -c kill -9 -1
How much paging space is this process using?

To discover the amount of paging space a proces is using, type:
# svmon -wP [proces-id]
Svmon shows you the amount of memory in 4KB blocks.
Finding and removing core dumps

Use the following command to interactively find and remove core dumps on your system:
# find / -name core -exec file {} \; -exec rm -i {} \;
Automatic FTP
How to do an automatic FTP from within a script: -n prevents automatic login and -v puts it in verbose mode asciorbin_Type should be set to either ascii or binary grep for $PHRASE (particular return code) from $LOGFILE to determine success or failure
ftp -nv $REMOTE_MACHINE < $LOGFILE user $USER_NAME $USER_PASSWORD $asciorbin_Type cd $REMOTE_PATH put $LOCAL_FILE $REMOTE_FILE quit ! grep $PHRASE $LOGFILE
Changing maxuproc requires a reboot?

When you change MAXUPROC (Maximum number of processes allowed per user), the smitty help panel will tell you that changes to this operating system parameter will take effect after the next system reboot. This is wrong Help information. The change takes effect immediately, if MAXUPROC is increased. If it is decreased, then it will take effect after the next system reboot. This help panel text from smitty will be changed in AIX 5.3. APAR IY52397. TOPICS: AIX, SYSTEM ADMINISTRATION
Montoring a system without logging in

Let's say you have a helpdesk, where they must be able to run a script under user-id root to check or monitor a system: First, create a script, you wish your helpdesk to run. Modify your /etc/inetd.conf file and add:
check stream tcp wait root /usr/local/bin/script.sh
Where script.sh is the script you've written. Modify your /etc/services file and add:
check 4321/tcp
You may change the portnumber to anything you like, as long as it's not in use. Now, you may run:
# telnet [system] 4321
And your script will be magically run and it's output displayed on your screen. If the output of the script isn't displayed on your screen very long, just put a sleep command at the end of your script. TOPICS: AIX, SYSTEM ADMINISTRATION
Defunct processes
Defunct processes are commonly known as "zombies". You can't "kill" a zombie as it is already dead. Zombies are created when a process (typically a child process) terminates either abnormally or normally and it's spawning process (typically a parent process) does not "wait" for it (or has yet to "wait" for it) to return an exit status. It should be noted that zombies DO NOT consume any system resources (except a process slot in the process table). They are there to stay until the server is rebooted. Zombies commonly occur on programs that were (incompletely) ported from old BSD systems to modern SysV systems, because the semantics of signals and/or waiting is different between these two OS families. See: http://www.hyperdictionary.com/dictionary/zombie+process TOPICS: AIX, SYSTEM ADMINISTRATION
DLpar with DVD-ROM

Adding a DVD-ROM with DLpar is very easy. Removing however, can be somewhat more difficult, especially when you've run cfgmgr and devices have been configured. This is how to remove it. Remove all cdrom devices found with lsdev -Cc cdrom:
# rmdev -dl cd0 # rmdev -dl ide0
Then remove all devices found with:

# lsdev -C | grep pci
All PCI devices still in use, can't be removed. The one not in use, is the PCI device where the DVD-ROM drive on was configured. You have to remove it before you can do a DLPAR remove operation on it. Now do your DLPAR remove operation. TOPICS: AIX, LINUX
Line numbers in VI
To display line numbers in VI: Press ESC, then type
:set number
To remove the line numbers again, type

:set nonumber
Sending attachments from AIX

How do you send an attachment via e-mail from AIX to Windows? Uuencode is the answer:
# uuencode [source-file] [filename].b64 | mail -v -s "subject" [emailaddress]
For example:
# uuencode /etc/motd motd.b64 | mail -v -s "Message of the day" email@hostname.com
The .b64 extension gets recognized by Winzip. When you receive your email in Outlook, you will have an attachment, which can be opened by Winzip (or any other unzip tool). You can combine this into a one-liner:
# ( echo "This is the body";uuencode /etc/motd motd.b64 ) | mail -s "This is the subject" email@hostname.com
If you want to attach tar of gzip images to an e-mail, you can also simply use those extensions to send through email, as these extensions are also properly recognized by Winzip:
# uuencode file.tar file.tar | mailx -s "subject" email@hostname.com # uuencode file.tar.gz file.tar.gz | mailx -s "subject" email@hostname.com
FTP umask
A way to change the default 027 umask of ftp is to change the entry in /etc/inetd.conffor ftpd:
ftp stream tcp6 nowait root /usr/sbin/ftpd -l -u 117
This will create files with umask 117 (mode 660). Using the -l option will make sure the FTP sessions are logged to the syslogd. If you want to see these FTP messages in thesyslogd output, then you should add to /etc/syslog.conf:
daemon.info [filename]
Centralized shell history

It's a good idea to centralize the shell history files for ease in tracking the actions done by the users: Create a ${hist_dir}. Add the following lines to the /etc/profile file:
export HISTFILE=/${hist_dir}/${LOGNAME}_`date "+%Y%m%d_%H%M%S"` export HISTSIZE=2000
Control-M
When exchanging text files between Windows and AIX systems, you often run into ^M (CTRL-M) characters at the end of each line in a text file. To remove these ugly characters:
tr -d '^M' < [inputfile] > [outputfile]
To type the ^M character on the command line: press CTRL, then type v and the m. Another way: download this zip archive: controlm.zip (1KB). This zip archive includes 2 files: unix2dos and dos2unix, which you can run on AIX: To convert a Windows file to Unix file:
# dos2unix [filename]
Bootinfo
To find out if your machine has a 64 or 32 bit architecture:
# bootinfo -y
To find out which kernel the system is running:

# bootinfo -K
You can also check the link /unix:

# ls -ald /unix
unix_mp: 32 bits, unix_64: 64 bits To find out from which disk your system last booted:
# bootinfo -b
To find out the size of real memory:

# bootinfo -r
To display the hardware platform type:

# bootinfo -T
Resizing the jfs log

In general IBM recommends that JFS log devices be set to 2MB for every 1GB of data to be protected. The default jfslog for rootvg is /dev/hd8 and is by default 1 PP large. In some cases, file system activity is too heavy or too frequent for the log device. When this occurs, the system will log errors like JFS_LOG_WAIT or JFS_LOG_WRAP. First try to reduce the filesystem activity. If that's not possible, this is the way to extend the JFS log:
Determine which log device to increase. This can be determined by its Device Major/Minor Number in the error log:
# errpt -a
An example output follows:

Device Major/Minor Number 000A 0003
The preceding numbers are hexadecimal numbers and must be converted to decimal values. In this exmpale, hexadecial 000A 0003 equals to decimal numbers 10 and 3. Determine which device corresponds with these Device Major/Minor Numbers:
# ls -al /dev | grep "10, 3"
If the output from the preceding command reveals that the log device the needs to be enlarged is /dev/hd8 (the default JFS log device for rootvg), then special actions are needed. See further on. Increase the size of /dev/hd8:
extendlv hd8 1
If the jfslog device is /dev/hd8, then boot the machine into Service Mode and access the root volume group and start a shell. If the jfslog is a user created jfslog, then unmount all filesystems that use the jfslog in question (use mountto show the jfslog used for each filesystem).
Format the jfslog:

# logform [jsflog device]
For example:
logform /dev/hd8
If the jfslog device is /dev/hd8, then reboot the system:

# sync; sync; sync; reboot
If the jfslog is a user created jfslog, then mount all filesystems again after thelogform completed. TOPICS: AIX, SYSTEM ADMINISTRATION
Cleaning file systems

It sometimes occurs that a file system runs full, while a process is active, e.g. when a process logs its output to a file. If you delete the log file of a process when it is still active, the file will be gone, but the disk space will usually not be freed. This is because the process keeps the inode of the file open as long as the process is active and still writes to the inode. After deleting the file, it's not available as file anymore, so you can't view the log file of the process anymore. The disk space will ONLY be freed once the process actually ends. To overcome it, don't delete the log file, but copy /dev/null to it:
# cp /dev/null [logfile]
This will clear the file, free up the disk space and the process logging to the file will just continue logging as nothing ever happened. TOPICS: AIX, SYSTEM ADMINISTRATION
Finding files with no defined user or group

Use the following command to find any files that no longer have a valid user and/or group, which may happen when a group or user is deleted from a system:
# find / -fstype jfs $ -nouser -o -nogroup $ -ls
TOPICS: AIX, HMC, LOGICAL PARTITIONING
Lpar tips
The uname -Ls command will show you the partition number and the partition (lpar) name. When setting the resource allocation for a partition profile, set the minimum to the absolute bare minimum, and set the maximum as high as possible. For memory there are special considerations: If you set the maximum too low and you wish to exceed above the maximum amount of memory defined in the active profile, you can't simply adjust the profile and put extra memory in via DLPAR, because the LPAR has been initialized with a certain page table size, based on the maximum amount of memory setting. Therefore, a reboot will be required when you wish to use more memory than defined in the active profile. If you do try it however, you'll receive the following error:
HMCERRV3DLPAR018: There is no memory available for dynamic logical partioning on this partition.
If you set the maximum too high, the partition will be initialize with a large page table size, which uses too much memory for overhead, which you might never use. TOPICS: AIX
Web interface to upload diagnostic data to IBM

If you have a PMR open with IBM and you need to upload logs/snap/dump etcetera to IBM: You can use the below web interface to upload files. All you need is your PMR number, type (mostly AIX), and your email address. Once the files are uploaded, you get an email confirmation. You can still use the traditional ftp to testcase, should you prefer. http://www.ecurep.ibm.com/app/upload TOPICS: AIX
Getting yesterdays date

Getting yesterdays date on AIX can be a litte tricky. One way of doing this, is by usinggnudate:
# date --date=yesterday +%m%d%y
But gnu-date has been known to sometimes have issues when shifting to or from daylight savings time. Another good solution to getting yesterdays date is:
# perl -MPOSIX -le 'print strftime "%m%d%y",localtime(time-(60*60*24))'
Email messages from the cron daemon

Some user accounts, mostly service accounts, may create a lot of email messages, for example when a lot of commands are run by the cron daemon for a specific user. There are a couple of ways to deal with this: 1. Make sure no unnecesary emails are sent at all To avoid receiving messages from the cron daemon; one should always redirect the output of commands in crontabs to a file or to /dev/null. Also make sure to redirect STDERR as well:
0 * * * * /path/to/command > /path/to/logfile 2>&1 1 * * * * /path/to/command > /dev/null 2>&1
2. Make sure the commands in the crontab actually exist An entry in a crontab with a command that does not exits, will generate an email message from the cron daemon to the user, informing the user about this issue. This is something that may occur on HACMP clusters where crontab files are synchronized on all HACMP nodes. They need to be synchronize on all the nodes, just in case a resource group fails over to a standby node. However, the required file systems containing the commands may not be available on all the nodes at all time. To get around that, test if the command exists first:
0 * * * * [ -x /path/to/command ] && /path/to/command > /path/to/logfile 2>&1
3. Clean up the email messages regularly The last way of dealing with this, is to add another cron entry to a users crontab; that cleans out the mailbox every night, for example the next command that deletes all but the last 1000 messages from a users mailbox:
0 * * * * echo d1-$(let num="$(echo f|mail|tail -1|awk '{print $2}')1000";echo $num)|mail >/dev/null
4. Forward the email to the user Very effective: Create a .forward file in the users home directory, to forward all email messages to the user. If the user starts receiving many, many emails, he/she will surely do somehting about it, when it gets annoying. TOPICS: AIX, NETWORKING, POWERHA / HACMP
Specifying the default gateway on a specific interface

When you're using HACMP, you usually have multiple network adapters installed and thus multiple network interface to handle with. If AIX configured the default gateway on a wrong interface (like on your management interface instead of the boot interface), you might want to change this, so network traffic isn't sent over the management interface. Here's how you can do this:
First, stop HACMP or do a take-over of the resource groups to another node; this will avoid any problems with applications when you start fiddling with the network configuration. Then open up a virtual terminal window to the host on your HMC. Otherwise you would loose the connection, as soon as you drop the current default gateway. Now you need to determine where your current default gateway is configured. You can do this by typing:
# lsattr -El inet0 # netstat -nr
The lsattr command will show you the current default gateway route and the netstatcommand will show you the interface it is configured on. You can also check the ODM:
# odmget -q"attribute=route" CuAt
Now, delete the default gateway like this:

# lsattr -El inet0 | awk '$2 ~ /hopcount/ { print $2 }' | read GW # chdev -l inet0 -a delroute=${GW}
If you would now use the route command to specifiy the default gateway on a specific interface, like this:
# route add 0 [ip address of default gateway: xxx.xxx.xxx.254] -if enX
You will have a working entry for the default gateway. But... the route command does not change anything in the ODM. As soon as your system reboots; the default gateway is gone again. Not a good idea. A better solution is to use the chdev command:
# chdev -l inet0 -a addroute=net,-hopcount,0,,0,[ip address of default gateway]
This will set the default gateway to the first interface available. To specify the interface use:
# chdev -l inet0 -a addroute=net,-hopcount,0,if,enX,,0,[ip address of default gateway]
Substitute the correct interface for enX in the command above. If you previously used the route add command, and after that you use chdev to enter the default gateway, then this will fail. You have to delete it first by using route delete 0, and then give the chdev command. Afterwards, check fi the new default gateway is properly configured:
# lsattr -El inet0 # odmget -q"attribute=route" CuAt
And ofcourse, try to ping the IP address of the default gateway and some outside address. Now reboot your system and check if the default gateway remains configured on the correct interface. And startup HACMP again! TOPICS: AIX, MONITORING, SYSTEM ADMINISTRATION
"Bootpd: Received short packet" messages on console

If you're receiving messages like these on your console:
Mar 9 11:47:29 daemon:notice bootpd[192990]: received short packet Mar 9 11:47:31 daemon:notice bootpd[192990]: received short packet Mar 9 11:47:38 daemon:notice bootpd[192990]: hardware address not found: E41F132E3D6C
Then it means that you have the bootpd enabled on your server. There's nothing wrong with that. In fact, a NIM server for example requires you to have this enabled. However; these messages on the console can be annoying. There are systems on your network that are sending bootp requests (broadcast). Your system is listening to these requests and trying to answer. It is looking in the bootptab configuration (file /etc/bootptab) to see if their macaddresses are defined. When they aren't, you are getting these messages. To solve this, either disable the bootpd daemon, or change the syslog configuration. If you don't need the bootpd daemon, then edit the /etc/inetd.conf file and comment the entry for bootps. Then run:
# refresh -s inetd
If you do have a requirement for bootpd, then update the /etc/syslog.conf file and look for the entry that starts with daemon.notice:
#daemon.notice /dev/console daemon.notice /nsr/logs/messages
By commenting the daemon.notice entry to /dev/console, and instead adding an entry that logs to a file, you can avoid seeing these messages on the console. Now all you have to do is refresh the syslogd daemon:
# refresh -s syslogd
TOPICS: AIX, BACKUP & RESTORE, LINUX, MONITORING, TSM
Report the end result of a TSM backup

A very easy way of getting a report from a backup is by using the POSTSchedulecmd entry in the dsm.sys file. Add the following entry to your dsm.sys file (which is usually located in /usr/tivoli/tsm/client/ba/bin or /opt/tivoli/tsm/client/ba/bin):
POSTSchedulecmd "/usr/local/bin/RunTsmReport"
This entry tells the TSM client to run script /usr/local/bin/RunTSMReport, as soon as it has completed its scheduled command. Now all you need is a script that creates a report from the dsmsched.log file, the file that is written to by the TSM scheduler:
#!/bin/bash TSMLOG=/tmp/dsmsched.log WRKDIR=/tmp echo "TSM Report from `hostname`" >> ${WRKDIR}/tsmc tail -100 ${TSMLOG} > ${WRKDIR}/tsma grep -n "Elapsed processing time:" ${WRKDIR}/tsma > ${WRKDIR}/tsmb CT2=`cat ${WRKDIR}/tsmb | awk -F":" '{print $1}'` ((CT3 = $CT2 - 14)) ((CT5 = $CT2 + 1 )) CT4=1 while read Line1 ; do if [ ${CT3} -gt ${CT4} ] ; then ((CT4 = ${CT4} + 1 )) else echo "${Line1}" >> ${WRKDIR}/tsmc ((CT4 = ${CT4} + 1 )) if [ ${CT4} -gt ${CT5} ] ; then break fi fi done < ${WRKDIR}/tsma mail -s "`hostname` Backup" email@address.com < ${WRKDIR}/tsmc rm ${WRKDIR}/tsma ${WRKDIR}/tsmb ${WRKDIR}/tsmc
TOPICS: AIX, INSTALLATION, NIM
Creating an LPP source and SPOT in NIM

This is a quick and dirty method of setting up an LPP source and SPOT of AIX 5.3 TL10 SP2, without having to swap DVD's into the AIX host machine. What you basically need is the actual AIX 5.3 TL10 SP2 DVD's from IBM, a Windows host, and access to your NIM server. Create ISO images of the DVD's through Windows, e.g. by using MagicISO. SCP these ISO image files over to the AIX NIM server, e.g. by using WinSCP. We need a way to access the data in the ISO images on the NIM server, and to extract the filesets from it (see IBM Wiki). Create a logical volume that is big enough to hold the data of one DVD. Check with "lsvg rootvg" if you have enough space in rootvg and what the PP size is. In our example it is 64 MB. Thus, to hold an ISO image of roughly 4.7 GB, we would need roughly 80 LPs of 64 MB.
# /usr/sbin/mklv -y testiso -t jfs rootvg 80
Create filesystem on it:

# /usr/sbin/crfs -v jfs -d testiso -m /testiso -An -pro -tn -a frag=4096 -a nbpi=4096 -a ag=8
Create a location where to store all of the AIX filesets on the server:
# mkdir /sw_depot/5300-10-02-0943-full
Copy the ISO image to the logical volume:

# /usr/bin/dd if=/tmp/aix53-tl10-sp2-dvd1.iso of=/dev/rtestiso bs=1m # chfs -a vfs=cdrfs /testiso
Mount the testiso filesystem and copy the data:

# mount /testiso # bffcreate -d /testiso -t /sw_depot/5300-10-02-0943-full all # umount /testiso
Repeat the above 5 steps for both DVD's. You'll end up with a folder of at least 4 GB. Delete the iso logical volume:
# rmfs -r /testiso # rmlv testiso
Make also sure to delete any left-over ISO images:

# rm -rf /tmp/aix53-tl10-sp2-dvd*iso
Define the LPP source (From the NIM A to Z redbook):

# mkdir /export/lpp_source/LPPaix53tl10sp2 # nim -o define -t lpp_source -a server=master -a location=/export/lpp_source/LPPaix53tl10sp2 -a source=/sw_depot/5300-10-020943-full LPPaix53tl10sp2
Check with:
# lsnim -l LPPaix53tl10sp2
Rebuild the .toc:

# nim -Fo check LPPaix53tl10sp2
Define the SPOT from the LPP source:

# nim -o define -t spot -a server=master -a location=/export/spot/SPOTaix53tl10sp2 -a source=LPPaix53tl10sp2 -a installp_flags=-aQg SPOTaix53tl10sp2
Check the SPOT:

# nim -o check SPOTaix53tl10sp2 # nim -o lppchk -a show_progress=yes SPOTaix53tl10sp2
A small note when you're using AIX 7 / AIX 6.1: Significant changes have been made in AIX 7 and AIX 6.1 that add new support for NIM. In
particular there is now the capability to use the loopmount command to mount iso images into filesystems. As an example:
# loopmount -i aixv7-base.iso -m /aix -o "-V cdrfs -o ro"
The above mounts the AIX 7 base iso as a filesystem called /aix. You can now create an lpp_source or spot from the iso or you can simply read the files. TOPICS: AIX, MONITORING
Removing error report entries forever

There's a way to avoid certain entries appearing in the error report indefinitely. You can use this for example for tape cleaning messages: The following command shows you the entries that are written to the error log, but not reported on:
# errpt -t -F Report=0
Let's say you don't want any reports on errors with ID D1A1AE6F:
# errupdate [Enter] =D1A1AE6F: [Enter] Report=False [Enter] [Ctrl-D] [Ctrl-D]
With "Report=False", errors are still logged in your logfile (usually /var/adm/ras/errlog). If you don't want them to be logged to the error log, for example when you have an errnotify (which still starts an action, also for error ID's with "Report=False"), you can change "Report=False" to "Log=False". More info on this subject can be found here. TOPICS: AIX, LINUX
Converting a UNIX time stamp

UNIX records its timestamp in seconds since January 1, 1970. An easy way to convert this time stamp to some readable format: unixtimestamp.com. TOPICS: AIX, BACKUP & RESTORE, NIM
Nimesis
If you're trying to restoring an mksysb through NIM and constantly get the same error when trying to restore a mksysb on different systems:
0042-006 niminit (To-master) rcmd connection refused
This may be caused by the "nimesis" daemon not running on the NIM server. Make sure it's enabled in /etc/inittab on the NIM server:
# grep nim /etc/inittab nim:2:wait:/usr/bin/startsrc -g nim >/dev/console 2>&1
TOPICS: AIX, BACKUP & RESTORE, PERFORMANCE
Using a pipeline
The next part describes a problem where you would want to do a search on a file system to find all directories in it, and to start a backup session per directory found, but not more than 20 backup sessions at once. Usually you would use the "find" command to find those directories, with the "-exec" parameter to execute the backup command. But in this case, it would result in possibly more than 20 active backup sessions at once, which might overload the system. So, you can create a script that does a "find" and dumps the output to a file first, and then starts reading that file and initiating 20 backups in parallel. But then, the backup can't start, before the "find" command completes, which may take quite a long time, especially if run on a file system with a large number of files. So how do you do "find" commands and backups in parallel? Solve this problem with a pipeline. Create a pipeline:
# rm -f /tmp/pipe # mknod /tmp/pipe p
Issue the find command:

# find [/filesystem] -type d -exec echo {} \; > /tmp/pipe
So now you have a command which writes to the pipeline, but can't continue until some other process is reading from the pipeline. Create another script that reads from the pipe and issues the backup sessions:
cat /tmp/pipe | while read entry do # Wait until less than 20 backup sessions are active while [ $(jobs -p|wc -l|awk '{print $1}') -ge 20 ] do sleep 5 done
# start backup session in the background [backup-command] & echo Started backup of $entry at `date` done # wait for all backup sessions to end wait echo `date`: Backup complete
This way, while the "find" command is executing, already backup sessions are started, thus saving time to wait until the "find" command completes.
TOPICS: AIX
Proctree
AIX 5.1 and higher includes a set of System V commands. E.g. the Sun Solaris command "ptree" has been included in AIX as "proctree". Information about all System V commands in AIX can be found in the AIX 5.x differences guides. This command will show you the process tree of a specific user:
# proctree username
TOPICS: AIX, SECURITY
Restricting the number of login sessions of a user

If you wish to restrict the maximum number of login sessions for a specific user, you can do this by modifying the .profile of that user:
A=`w| grep $LOGNAME | wc -l` if [ $A -ge 3 ] ; then exit fi
This example restricts the number of logins to three. Make sure the user can't modify his/her own .profile by restricting access rights. TOPICS: AIX, LINUX
Query NTP servers

To test the connection of a server with its time servers (e.g. to check if no firewall is blocking NTP communication), run the following command:
# ntpq -c peers [ip-address-of-timeserver]
TOPICS: AIX, HARDWARE, MONITORING
Temperature monitoring
Older pSeries systems (Power4) are equipped with environmental sensors. You can read the sensor values using:
# /usr/lpp/diagnostics/bin/uesensor -l
You can use these sensors to monitor your systems and your computer rooms. It isn't very difficult to create a script to monitor these environmental sensors regularly and to display it on a webpage, updating it automatically. Newer systems (LPAR based) are not equipped with these environmental sensors. For PC systems several products exist, which attach to either a RJ45 or a parallel port and which can be used to monitor temperatures.
TOPICS: AIX, LVM, ODM
Removing ODM information of a logical volume

Sometimes situations occur where a logical volume is deleted, but the ODM is not up to date. E.g. when "lsvg -l" doesn't show you the logical volume, but the lslv command can still show information about the logical volume. Not good. To resolve this issue, first try:
# synclvodm -v [volume group name]
If that doesn't work, try this: (in the example below logical volume hd7 is used). Save the ODM information of the logical volume:
# odmget -q name=hd7 CuDv | tee -a /tmp/CuDv.hd7.out # odmget -q name=hd7 CuAt | tee -a /tmp/CuAt.hd7.out
If you mess things up, you can allways use the following command to restore the ODM information:
# odmadd /tmp/[filename]
Delete the ODM information of the logical volume:
# odmdelete -o CuDv -q name=hd7 # odmdelete -o CuAt -q name=hd7
Then, remove the device entry of the logical volume in the /dev directory (if present at all). TOPICS: AIX, SSA, STORAGE
SSA Fast Write

For high-disk performance systems, such as SSA, it is wise to enable the fast write on the disks. To check which disks are fast write enabled, type:
# smitty ssafastw
Fast write needs cache memory on the SSA adapter. Check your amount of cache memory on the SSA adapter:
# lscfg -vl ssax
Where 'x' is the number of your SSA adapter. 128MB of SDRAM will suffice. Having 128MB of SDRAM memory makes sure you can use the full 32MB of cache memory. To enable the fast write the disk must not be in use. So either the volume groups are varied offline, or the disk is taken out of the volume group. Use the following command to enable the fast write cache:
# smitty chgssardsk
TOPICS: AIX
Find filesets of a certain level

To find the known recommended technology levels:
# oslevel -rq
To find all filesets lower than a certain technology level:

# oslevel -rl 5300-07
To find all filesets higher than a certain technology level level:

# oslevel -rg 5200-05

The first command (viosbr) will create a backup of the configuration information to /home/padmin/cfgbackups. It will also schedule the command to run every day, and keep up to 10 files in /home/padmin/cfgbackups. The second command is the mksysb equivalent for a Virtual I/O Server: backupios. This command will create the mksysb image in the /mksysb folder, and exclude any ISO repositiory in rootvg, and anything else excluded in /etc/exclude.rootvg. TOPICS: AIX, BACKUP & RESTORE, STORAGE, SYSTEM ADMINISTRATION

Add these commands to your mksysb script, just before running the mksysb command. What this does is to run the mkvgdata command for each online volume group. This will generate output for a volume group in /tmp/vgdata. The resulting output is then tar'd and stored in the /sysadm folder or file system. This allows information regarding your volume groups, logical volumes, and file systems to be included in your mksysb image. To recreate the volume groups, logical volumes and file systems: Run:
Run:
Make sure to remove file systems with the rmfs command before running restvg, or it will not run correctly. Or, you can just run it once, run the exportvg command for the same volume
group, and run the restvg command again. There is also a "-s" flag for restvg that lets you shrink the file system to its minimum size needed, but depending on when the vgdata was created, you could run out of space, when restoring the contents of the file system. Just something to keep in mind. TOPICS: AIX, BACKUP & RESTORE, SYSTEM ADMINISTRATION

In this example, were using the mksysb image of a Virtual I/O server, created using iosbackup. This is basically the same as a mksysb image from a regular AIX system. The image file for this mksysb backup is called vio1.mksysb First, try to locate the file you're looking for; For example, if you're looking for file nimbck.ksh:

The savevg command can be used to backup user volume groups. All logical volume information is archived, as well as JFS and JFS2 mounted filesystems. However, this command cannot be used to backup raw logical volumes.
Save the contents of a raw logical volume onto a file using:

This will create a copy of logical volume "lvname" to a file "lvname.dd" in file system /file/system. Make sure that wherever you write your output file to (in the example above to /file/system) has enough disk space available to hold a full copy of the logical volume. If the logical volume is 100 GB, you'll need 100 GB of file system space for the copy. If you want to test how this works, you can create a logical volume with a file system on top of it, and create some files in that file system. Then unmount he filesystem, and use dd to copy the logical volume as described above. Then, throw away the file system using "rmfs -r", and after that has been completed, recreate the logical volume and the file system. If you now mount the file system, you will see, that it is empty. Unmount the file system, and use the following dd command to restore your backup copy:
Then, mount the file system again, and you will see that the contents of the file system (the files you've placed in it) are back. TOPICS: AIX, BACKUP & RESTORE, SYSTEM ADMINISTRATION
Lsmksysb
rootvg: LV NAME hd5 hd6 hd8 hd4 hd2 hd9var TYPE boot paging jfs2log jfs2 jfs2 jfs2 LPs 1 32 1 8 40 40 PPs 2 64 2 16 80 80 PVs 2 2 2 2 2 2 LV STATE closed/syncd open/syncd open/syncd open/syncd open/syncd open/syncd MOUNT POINT N/A N/A N/A / /usr /var
hd3 hd1 hd10opt dumplv1 dumplv2 hd11admin
jfs2 jfs2 jfs2 sysdump sysdump jfs2
40 8 8 16 16 1
80 16 16 16 16 2
2 2 2 1 1 2
open/syncd open/syncd open/syncd open/syncd open/syncd open/syncd
/tmp /home /opt N/A N/A /admin
Using lvmstat
iocnt 306653 34 453
Kb_read 47493022 0 234543
Kb_wrtn 383822 3340 234343
Kbps 103.2 2.8 89.3
What are you looking at here? iocnt: Reports back the number of read and write requests. Kb_read: Reports back the total data (kilobytes) from your measured interval that is read. Kb_wrtn: Reports back the amount of data (kilobytes) from your measured interval that is written. Kbps: Reports back the amount of data transferred in kilobytes per second. You can use the -d option for lvmstat to disable the collection of LVM statistics.

# lspv -l vpath23
# lslv -m prodlv
# lslv prodlv
# chlv -u 32 prodlv

vpath304 prodlv

# syncvg -l prodlv
Then check again:

# lslv -m prodlv
You can also use the "-e x" option with the mklv command to create a new logical volume from the start with the correct spreading over disks. TOPICS: AIX, BACKUP & RESTORE, SYSTEM ADMINISTRATION, VERITAS NETBACKUP

# mkdir bp bparchive bpbackup bpbkar bpcd bpdbsbora # mkdir bpfilter bphdb bpjava-msvc bpjava-usvc bpkeyutil # mkdir bplist bpmount bpnbat bporaexp bporaexp64 # mkdir bporaimp bporaimp64 bprestore db_log dbclient # mkdir symlogs tar user_ops # chmod 777 *
VERBOSE = 2
By default, VERBOSE is set to one, which means there isn't any logging at all, so that is not helpful. You can go up to "VERBOSE = 5", but that may create very large log files, and this may fill up the file system. In any case, check how much disk space is available in /usr before enabling the logging of the Veritas NetBackup client. Backups through Veritas NetBackup are initiated through inetd:
Now all you have to do is wait for the NetBackup server (the one listed in /usr/openv/netbackup/bp.conf) to start the backup on the AIX client. After the backup has run, you should at least find a log file in the bpcd and bpbkar folders in /usr/openv/netbackup.
TOPICS: BACKUP & RESTORE, EMC NETWORKER
Nsradmin
Here is how to retrieve client and group information from EMC Networker using nsradmin: First, start nsradmin as user root:
# /bin/nsradmin -s networkerserver
(Note: replace "networkerserver" for the actual host name of your EMC Networker Server). To select information of a specific client, for example "testserver", type:
nsradmin> print type: nsr client; name: testserver
You can furthur limit the attributes that you're seeing, by using the show sub-command. For example if you only wish to see the save set and the group, type:
nsradmin> show save set nsradmin> show group nsradmin> show name nsradmin> print type: nsr client; name: testserver name: testserver.domain.com; group: aixprod; save set: /, /usr, /var, /tmp, /home, /opt, /roothome;
If you wish to retrieve information regarding a group, type:

nsradmin> show Will show all attributes nsradmin> print type: nsr group; name: aixprod
If you like to get more information about the types you can print information of, type:
nsradmin> types
TOPICS: BACKUP & RESTORE, EMC, EMC NETWORKER
Restart Networker
This is how to stop EMC Networker:
# /bin/nsr_shutdown
And this is how you start it (taken from /etc/inittab):

# echo "sh /etc/rc.nsr" | at now
TOPICS: POWERHA / HACMP
PowerHA / HACMP support matrix

Support matrix / life cycle for IBM PowerHA (with a typical 3 year lifecycle): AIX AIX 5.1 5.2 HACMP 5.1 HACMP 5.2 HACMP 5.3 HACMP 5.4.0 HACMP 5.4.1 PowerHA 5.5 PowerHA 6.1 PowerHA 7.1 PowerHA 7.1.1 PowerHA 7.1.2 YES YES YES YES NO NO NO NO NO NO NO NO TL8+ TL8+ NO NO NO NO NO AIX 5.3 YES YES TL4+ TL4+ TL7+ TL9+ NO NO NO AIX 6.1 NO NO NO YES TL2 SP1+ TL2 SP1+ TL6+ TL7 SP2+ TL8 SP1+ AIX 7.1 NO NO NO NO YES YES YES YES TL1 SP2+ TL2 SP1+ Release Date July 11, 2003 July 16, 2004 Aug 12, 2005 July 28, 2006 Sep 11, 2007 Nov 14, 2008 Oct 20, 2009 Sep 10, 2010 Sep 10, 2010 Oct 3, 2012 End Of Support Sep 1, 2006 Sep 30, 2007 Sep 30, 2009 Sep 30, 2011 Sep 30, 2011 Apr 30, 2012 N/A N/A N/A N/A
ML4+ ML2+ YES
Note: None of these versions is supported for AIX 4.3.3. Source: HACMP Version Compatibility Matrix TOPICS: AIX, POWERHA / HACMP, SYSTEM ADMINISTRATION

There are a number of possible causes: clinfoES or snmpd subsystems are not active. snmp is unresponsive. snmp is not configured correctly. Cluster services are not active on any nodes.
Refer to the HACMP Administration Guide for more information. Additional information for verifying the SNMP configuration on AIX 6 can be found in /usr/es/sbin/cluster/README5.5.0.UPDATE
To:
# stopsrc -s hostmibd # stopsrc -s snmpmibd # stopsrc -s aixmibd # stopsrc -s snmpd # sleep 4 # chssys -s hostmibd -a "-c public" # chssys -s aixmibd # chssys -s snmpmibd # sleep 4 -a "-c public" -a "-c public"
# startsrc -s snmpd # startsrc -s aixmibd # startsrc -s snmpmibd # startsrc -s hostmibd # sleep 120 # stopsrc -s clinfoES # startsrc -s clinfoES # sleep 120
After that, clstat, cldump and snmpinfo should work. TOPICS: AIX, POWERHA / HACMP, SYSTEM ADMINISTRATION

# varyoffvg vg
And then retry the LVM command. If it continues to be a problem, then stop HACMP on both nodes, export the volume group and re-import the volume group on both nodes, and then restart the cluster. TOPICS: AIX, POWERHA / HACMP, SYSTEM ADMINISTRATION

In order to keep the system time synchronized with other nodes in an HACMP cluster or across the enterprise, Network Time Protocol (NTP) should be implemented. In its default configuration, NTP will periodically update the system time to match a reference clock by resetting the system time on the node. If the time on the reference clock is behind the time of the system clock, the system clock will be set backwards causing the same time period to be passed twice. This can cause internal timers in HACMP and Oracle databases to wait longer periods of time under some circumstances. When these circumstances arise, HACMP may stop the node or the Oracle instance may shut itself down. Oracle will log an ORA-29740 error when it shuts down the instance due to inconsistent timers. The hatsd daemon utilized by HACMP will log a TS_THREAD_STUCK_ER error in the system error log just before HACMP stops a node due to an expired timer. To avoid this issue, system managers should configure the NTP daemon to increment time on
the node slower until the system clock and the reference clock are in sync (this is called "slewing" the clock) instead of resetting the time in one large increment. The behavior is configured with the -x flag for the xntpd daemon. To check the current running configuration of xntpd for the -x flag:
# ps -aef | grep xntpd | grep -v grep root 409632 188534 0 11:46:45 0:00 /usr/sbin/xntpd

TOPICS: AIX, POWERHA / HACMP
AIX 5.3 end-of-service

The EOM date (end of marketing) has been announced for AIX 5.3: 04/11; meaning that AIX 5.3 will no longer be marketed by IBM from April 2011, and that it is now time for customers to start thinking about upgrading to AIX 6.1. The EOS (end of service) date for AIX 5.3 is 04/12, meaning AIX 5.3 will be serviced by IBM until April 2012. After that, IBM will only service AIX 5.3 for an additional fee. The EOL (end of life) date is 04/16, which is the end of life date at April 2016. The final technology level for AIX 5.3 is technology level 12. Some service packs for TL12 will be released though. IBM has also announced EOM and EOS dates for HACMP 5.4 and PowerHA 5.5, so if you're using any of these versions, you also need to upgrade to PowerHA 6.1: Sep 30, 2010: EOM HACMP 5.4, PowerHA 5.5 Sep 30, 2011: EOS HACMP 5.4 Sep 30, 2012: EOS HACMP 5.5 TOPICS: AIX, EMC, INSTALLATION, POWERHA / HACMP, STORAGE AREA NETWORK, SYSTEM ADMINISTRATION

Use this procedure to quickly configure an HACMP cluster, consisting of 2 nodes and disk heartbeating. Prerequisites:
Make sure you have the following in place: Have the IP addresses and host names of both nodes, and for a service IP label. Add these into the /etc/hosts files on both nodes of the new HACMP cluster. Make sure you have the HACMP software installed on both nodes. Just install all the filesets of the HACMP CD-ROM, and you should be good. Make sure you have this entry in /etc/inittab (as one of the last entries):

Enter the name of the resource group. It's a good habit to make sure that a resource group name ends with "rg", so you can recognize it as a resource group. Also, select the participating nodes. For the "Fallback Policy", it is a good idea to change it to "Never Fallback". This way, when the primary node in the cluster comes up, and the resource group is up-and-running on the secondary node, you won't see a failover occur from the secondary to the primary node. Note: The order of the nodes is determined by the order you select the nodes here. If
you put in "node01 node02" here, then "node01" is the primary node. If you want to have this any other way, now is a good time to correctly enter the order of node priority. Add the Servie IP/Label to the resource group:
Select the resource group you've created earlier, and add the Service IP/Label. Run a verification/synchronization:
# lspv | grep hdiskpower4
hdiskpower4
000a807f6b9cc8e5
None

# exportfs -va
0.2.55.d3.75.77 2587851 0 2587851 0 2587851 0 1912870 0 loopback 1912870 0 1912870 0
10.251.14 node01 10.251.20 serviceip
16896 link#1 16896 127 16896 ::1
As you can see, the Service IP label (in the example above called "serviceip") is defined on en1. In that case, for NFS to work, you also want to add the "serviceip" to the /etc/exports file on the NFS server and re-run "exportfs -va". And you should also make sure that hostname "serviceip" resolves to an IP address correctly (and of course the IP address resolves to the correct hostname) on both the NFS server and the client. TOPICS: MONITORING, POWERHA / HACMP
Cluster status webpage

How do you monitor multiple HACMP clusters? You're probably familiar with the clstat or the xclstat commands. These are nice, but not sufficient when you have more than 8 HACMP clusters to monitor, as it can't be configured to monitor more than 8 clusters. It's also difficult
to get an overview of ALL clusters in a SINGLE look with clstat. IBM included a clstat.cgi in HACMP 5 to show the cluster status on a webpage. This still doesn't provide an overview in a single look, as the clstat.cgi shows a long listing of all clusters, and it is just like clstat limited to monitoring just 8 clusters. HACMP cluster status can be retrieved via SNMP (this is actually what clstat does too). Using the IP addresses of a cluster and the snmpinfo command, you can remotely retrieve cluster status information, and use that information to build a webpage. By using colors for the status of the clusters and the nodes (green = ok, yellow = something is happening, red = error), you can get a quick overview of the status of all the HACMP clusters.
Per cluster you can see: the cluster name, the cluster ID, HACMP version and the status of the cluster and all its nodes. It will also show you where any resource groups are active. You can download the script here. Untar the file. There is a readme in the package, that will tell you how you can configure the script. This script has been tested with HACMP version 4 and 5, up to version 5.5.0.5. TOPICS: AIX, EMC, POWERHA / HACMP, STORAGE, STORAGE AREA NETWORK, SYSTEM ADMINISTRATION

To resolve this, you will have to make sure that the SCSI reset disk method is configured in HACMP. For example, when using EMC storage: Make sure emcpowerreset is present in /usr/lpp/EMC/Symmetrix/bin/emcpowerreset.
Then add new custom disk method: Enter into the SMIT fastpath for HACMP "smitty hacmp". Select Extended Configuration. Select Extended Resource Configuration. Select HACMP Extended Resources Configuration. Select Configure Custom Disk Methods. Select Add Custom Disk Methods.
TOPICS: POWERHA / HACMP, SYSTEM ADMINISTRATION
Synchronizing 2 HACMP nodes

In order to keep users and all their related settings and crontab files synchronized, here's a script that you can use to do this for you: sync.ksh

# ifconfig enX down # ifconfig enX detach # chdev -l entX -a use_alt_addr=yes # chdev -l entX -a alt_addr=0x00xxxxxxxxxx
# ifconfig enX xxx.xxx.xxx.xxx # ifconfig enX up

# ifconfig enX down # ifconfig enX detach # chdev -l entX -a use_alt_addr=no # chdev -l entX -a alt_addr=0x00000000000

2. Make sure the commands in the crontab actually exist An entry in a crontab with a command that does not exits, will generate an email message from the cron daemon to the user, informing the user about this issue. This is something that may occur on HACMP clusters where crontab files are synchronized on all HACMP nodes. They need to be synchronize on all the nodes, just in case a resource group fails over to a standby node. However, the required file systems containing the commands may not be available on all the nodes at all time. To get around that, test if the command exists first:
4. Forward the email to the user Very effective: Create a .forward file in the users home directory, to forward all email messages to the user. If the user starts receiving many, many emails, he/she will surely do somehting about it, when it gets annoying.
TOPICS: MONITORING, POWERHA / HACMP
HACMP auto-verification
HACMP automatically runs a verification every night, usually around mid-night. With a very simple command you can check the status of this verification run:
# tail -10 /var/hacmp/log/clutils.log 2>/dev/null|grep detected|tail -1
If this shows a returncode of 0, the cluster verification ran without any errors. Anything else, you'll have to investigate. You can use this command on all your HACMP clusters, allowing you to verify your HACMP cluster status every day. With the following smitty menu you can change the time when the auto-verification runs and if it should produce debug output or not:
# smitty clautover.dialog
You can check with:

# odmget HACMPcluster # odmget HACMPtimersvc
Be aware that if you change the runtime of the auto-verification that you have to synchronize the cluster afterwards to update the other nodes in the cluster. TOPICS: LVM, POWERHA / HACMP, SYSTEM ADMINISTRATION
VGDA out of sync

With HACMP, you can run into the following error during a verification/synchronization: WARNING: The LVM time stamp for shared volume group: testvg is inconsistent with the time stamp in the VGDA for the following nodes: host01 To correct the above condition, run verification & synchronization with "Automatically correct errors found during verification?" set to either 'Yes' or 'Interactive'. The cluster must be down for the corrective action to run. This can happen when you've added additional space to a logical volume/file system from the command line instead of using the smitty hacmp menu. But you certainly don't want to take down the entire HACMP cluster to solve this message. First of all, you don't. The cluster will fail-over nicely anyway, without these VGDA's being in sync. But, still, it is an annoying warning, that you would like to get rid off. Have a look at your shared logical volumes. By using the lsattr command, you can see if they are actually in sync or not:
host01 # lsattr -Z: -l testlv -a label -a copies -a size -a type -a strictness -Fvalue /test:1:809:jfs2:y:
Well, there you have it. One host reports testlv having a size of 806 LPs, the other says it's 809. Not good. You will run into this when you've used the extendlv and chfs commands to increase the size of a shared file system. You should have used the smitty menu. The good thing is, HACMP will sync the VGDA's if you do some kind of logical volume operation through the smitty hacmp menu. So, either increase the size of a shared logical volume through the smitty menu with just one LP (and of course, also increase the size of the corresponding file system); Or, you can create an additional shared logical volume through smitty of just one LP, and then remove it again afterwards. When you've done that, simply re-run the verification/synchronization, and you'll notice that the warning message is gone. Make sure you run the lsattr command again on your shared logical volumes on all the nodes in your cluster to confirm. TOPICS: AIX, NETWORKING, POWERHA / HACMP

When you're using HACMP, you usually have multiple network adapters installed and thus multiple network interface to handle with. If AIX configured the default gateway on a wrong interface (like on your management interface instead of the boot interface), you might want to change this, so network traffic isn't sent over the management interface. Here's how you can do this: First, stop HACMP or do a take-over of the resource groups to another node; this will avoid any problems with applications when you start fiddling with the network configuration. Then open up a virtual terminal window to the host on your HMC. Otherwise you would loose the connection, as soon as you drop the current default gateway. Now you need to determine where your current default gateway is configured. You can do this by typing:
And ofcourse, try to ping the IP address of the default gateway and some outside address. Now reboot your system and check if the default gateway remains configured on the correct interface. And startup HACMP again! TOPICS: POWERHA / HACMP
Useful HACMP commands

clstat - show cluster state and substate; needs clinfo. cldump - SNMP-based tool to show cluster state. cldisp - similar to cldump, perl script to show cluster state. cltopinfo - list the local view of the cluster topology. clshowsrv -a - list the local view of the cluster subsystems. clfindres (-s) - locate the resource groups and display status. clRGinfo -v - locate the resource groups and display status. clcycle - rotate some of the log files. cl_ping - a cluster ping program with more arguments. clrsh - cluster rsh program that take cluster node names as argument.
clgetactivenodes - which nodes are active? get_local_nodename - what is the name of the local node? clconfig - check the HACMP ODM. clRGmove - online/offline or move resource groups. cldare - sync/fix the cluster. cllsgrp - list the resource groups. clsnapshotinfo - create a large snapshot of the HACMP configuration. cllscf - list the network configuration of an HACMP cluster. clshowres - show the resource group configuration. cllsif - show network interface information. cllsres - show short resource group information. lssrc -ls clstrmgrES - list the cluster manager state. lssrc -ls topsvcs - show heartbeat information. cllsnode - list a node centric overview of the hacmp configuration. TOPICS: POWERHA / HACMP
HACMP MAC Address take-over

If you wish to enable MAC Address take-over on an HACMP cluster, you need a virtual MAC address. You can do a couple of things to make sure you have a unique MAC Address on your network: Use the MAC address of an old system, that you know has been destroyed. Buy a new network card, use the MAC address, then destroy this card. Use a DEADBEEF address: (0xdeadbeef1234). This is a non-existent hardware vendor. You might run into problems with someone else making up a deadbeef address, so use this option with caution. Anyway, register the MAC address you're using for HACMP clusters. TOPICS: MONITORING, POWERHA / HACMP
HACMP Event generation

HACMP provides events, which can be used to most accurately monitor the cluster status, for example via the Tivoli Enterprise Console. Each change in the cluster status is the result of an HACMP event. Each HACMP event has an accompanying notify method that can be used to handle the kind of notification we want. Interesting Cluster Events to monitor are: node_up node_down network_up network_down join_standby
fail_standby swap_adapter config_too_long event_error You can set the notify method via:
# smitty hacmp Cluster Configuration Cluster Resources Cluster Events Change/Show Cluster Events
You can also query the ODM:

# odmget HACMPevent
A few HACMP rules

With HACMP clusters documentation is probably the most important issue. You cannot properly manage an HACMP cluster if you do not document it. Document the precise configuration of the complete cluster and document any changes you've carried out. Also document all management procedures and stick to them! The cluster snapshot facility is an excellent way of documenting your cluster. Next step: get educated. You have to know exactly what you're doing on an HACMP cluster. If you have to manage a production cluster, getting a certification is a necessity. Don't ever let non-HACMP-educated UNIX administrators on your HACMP cluster nodes. They don't have a clue of what's going on and probably destroy your carefully layed-out configuration. Geographically separated nodes are important! Too many cluster nodes just sit on top of each other in the same rack. What if there's a fire? Or a power outage? Having an HACMP cluster won't help you if both nodes are on a single location, use the same power, or the same network switches. Put your HACMP logs in a sensible location. Don't put them in /tmp knowing that /tmpgets purged every night.... Test, test, test and test your cluster over again. Doing take-over tests every half year is best practice. Document your tests, and your test results. Don't assume that your cluster is high available after installing the cluster software. There are a lot of other things to consider in your infrastructure to avoid single points of failures, like: No two nodes sharing the same I/O drawer; Power redundancy; No two storage or network
adapters on the same SCSI backplane or bus; Redundancy in SAN HBA's; Application monitoring in place. TOPICS: POWERHA / HACMP
PowerHA / HACMP links

Official IBM sites: IBM PowerHA SystemMirror for AIX HACMP Documentation Library PowerHA Cluster Manager (HACMP) - Support for Power Systems High Availability Wiki Page Other HACMP related sites: Matilda Systems, Home of the Matilda Team (www.obtuse.com) IBM's Redbooks on HACMP eGroups hacmp mailing list The AIX page on this website lpar.co.uk (Alex Abderrazag)
QHA
The standard tool for cluster monitoring is clstat, which comes along with HACMP. Clstat is rather slow with its updates, and sometimes the required clinfo deamon needs restarting in order to get it operational, so this is, well, not good. There's a script which is a lot better. It is written by HACMP guru Alex Abderrazag. This script shows you the correct HACMP status, along with adapter and volume group information. It works fine on HACMP 5.2 until 6.1. You can download it here: qha. This is version 8.03 (latest update: 25/04/2007). For the latest version, check www.lpar.co.uk. This tiny but effective tool accepts two flags: -n (show network info) -v (show shared online vg) So, you can run # qha or # qha -v or # qha -n or # qha -nv. A description of the possible cluster states: ST_INIT: cluster configured and down ST_JOINING: node joining the cluster ST_VOTING: Inter-node decision state for an event ST_RP_RUNNING: cluster running recovery program ST_BARRIER: clstrmgr waiting at the barrier statement
ST_CBARRIER: clstrmgr is exiting recovery program ST_UNSTABLE: cluster unstable NOT_CONFIGURED: HA installed but not configured RP_FAILED: event script failed ST_STABLE: cluster services are running with managed resources (stable cluster) or cluster services have been "forced" down with resource groups potentially in the UNMANAGED state (HACMP 5.4 only) TOPICS: POWERHA / HACMP
PowerHA / HACMP Introduction

PowerHA is the new name for HACMP, which is short for High Availability Cluster MultiProcessing, a product of IBM. HACMP runs on AIX (and also on Linux) and its purpose is to provide high availability to systems, mainly for hardware failures. It can automatically detect system or network failures and can provide the capability to recover system hardware, applications, data and users while keeping recovery time to an absolute minimum. This is useful for systems that need to be online 24 hours a day, 365 days per year; for organizations that can't afford to have systems down for longer than 15 minutes. It's not completely faulttolerant, but it is high available. In comparance to other cluster software, HACMP is highly robust, allows for large distances between nodes of a single cluster and allows up to 32 nodes in a cluster. Previous version of HACMP have had a reputation of having a lot of "bugs". From version 5.4 onwards HACMP has seen a lot of improvements. IBM's HACMP exists for over 15 years. It's not actually an IBM product; IBM bought it from CLAM, which was later renamed to Availant and then renamed to LakeViewTechand nowadays is called Vision Solutions. Until August 2006, all development of HACMP was done by CLAM. Nowadays IBM does its own development of HACMP in Austin, Poughkeepsie and Bangalore. Competitors of HACMP are Veritas Cluster and Echo Cluster. The last one, Echo Cluster, is a product of Vision Solutions mentioned above and tends to be easier to set-up and meant for simpler clusters. Veritas is only used by customers that use it already on other operating systems, like Sun Solaris and Windows Server environments, and don't want to invest into yet another clustering technology. TOPICS: GPFS, ORACLE, POWERHA / HACMP
Oracle RAC introduction

The traditional method for making an Oracle database capable of 7*24 operation is by means of creating an HACMP cluster in an Active-Standby configuration. In case of a failure of the Active system, HACMP lets the standby system take over the resources, start Oracle and thus resumes operation. This takeover is done with a downtime period of aprox. 5 to 15
minutes, however the impact on the business applications is more severe. It can lead to interruptions up to one hour in duration. Another way to achieve high availability of databases, is to use a special version of the Oracle database software called Real Application Cluster, also called RAC. In a RAC cluster multiple systems (instances) are active (sharing the workload) and provide a near always-on database operation. The Oracle RAC software relies on IBM's HACMP software to achieve high availability for hardware and the operating system platform AIX. For storage it utilizes a concurrent filesystem called GPFS (General Parallel File System), a product of IBM. Oracle RAC 9 uses GPFS and HACMP. With RAC 10 you no longer need HACMP and GPFS. HACMP is used for network down notifications. Put all network adapters of 1 node on a single switch and put every node on a different switch. HACMP only manages the public and private network service adapters. There are no standby, boot or management adapters in a RAC HACMP cluster. It just uses a single hostname; Oracle RAC and GPFS do not support hostname take-over or IPAT (IP Address take-over). There are no disks, volume groups or resource groups defined in an HACMP RAC cluster. In fact, HACMP is only necessary for event handling for Oracle RAC. Name your HACMP RAC clusters in such away, that you can easily recognize the cluster as a RAC cluster, by using a naming convention that starts with RAC_. On every GPFS node of an Oracle RAC cluster a GPFS daemon (mmfs) is active. These daemons need to communicate with each other. This is done via the public network, not via the private network. Cache Fusion Via SQL*Net an Oracle block is read in memory. If a second node in an HACMP RAC cluster requests the same block, it will first check if it already has it stored locally in its own cache. If not, it will use a private dedicated network to ask if another node has the block in cache. If not, the block will be read from disk. This is called Cache Fusion or Oracle RAC interconnect. This is why on RAC HACMP clusters, each node uses an extra private network adapter to communicate with the other nodes, for Cache Fusion purposes only. All other communication, including the communication between the GPFS daemons on every node and the communication from Oracle clients, is done via the public network adapter. The throughput on the private network adapter can be twice as high as on the public network adapter. Oracle RAC will use its own private network for Cache Fusion. If this network is not available, or if one node is unable to access the private network, then the private network is no longer
used, but the public network will be used instead. If the private network returns to normal operation, then a fallback to the private network will occur. Oracle RAC uses cllsif of HACMP for this purpose. TOPICS: MONITORING, POWERHA / HACMP, SECURITY
HACMP 5.4: How to change SNMP community name from default "public" and keep clstat working
HACMP 5.4 supports changing the default community name from "public" to something else. SNMP is used for clstatES communications. Using the "public" SNMP community name, can be a security vulnerability. So changing it is advisable. First, find out what version of SNMP you are using:
# ls -l /usr/sbin/snmpd lrwxrwxrwx 1 root system 9 Sep 08 2008 /usr/sbin/snmpd -> snmpdv3ne
(In this case, it is using version 3). Make a copy of your configuration file. It is located on /etc.
/etc/snmpd.conf <- Version 1 /etc/snmpdv3.conf <- Version 3
Edit the file and replace wherever public is mentioned for your new community name. Make sure to use not more that 8 characters for the new community name. Change subsystems and restart them:
# chssys -s snmpmibd -a "-c new" # chssys -s hostmibd -a "-c new" # chssys -s aixmibd -a "-c new" # stopsrc -s snmpd # stopsrc -s aixmibd # stopsrc -s snmpmibd # stopsrc -s hostmibd # startsrc -s snmpd # startsrc -s hostmibd # startsrc -s snmpmibd # startsrc -s aixmibd
Test using your locahost:

# snmpinfo -m dump -v -h localhost -c new -o /usr/es/sbin/cluster/hacmp.defs nodeTable
If the command hangs, something is wrong. Check the changes you made. If everything works fine, perform the same change in the other node and test again. Now you can test from one server to the other using the snmpinfo command above.
If you need to backout, replace with the original configuration file and restart subsystems. Note in this case we use double-quotes. There is no space.
# chssys -s snmpmibd -a "" # chssys -s hostmibd -a "" # chssys -s aixmibd -a "" # stopsrc -s snmpd # stopsrc -s aixmibd # stopsrc -s snmpmibd # stopsrc -s hostmibd # startsrc -s snmpd # startsrc -s hostmibd # startsrc -s snmpmibd # startsrc -s aixmibd
Okay, now make the change to clinfoES and restart and both nodes:
# chssys -s clinfoES -a "-c new" # stopsrc -s clinfoES # startsrc -s clinfoES
Wait a few minutes and you should be able to use clstat again with the new community name. Disclaimer: If you have any other application other than clinfoES that uses snmpd with the default community name, you should make changes to it as well. Check with your application team or software vendor. TOPICS: POWERHA / HACMP
Tweaking the deadman switch

You can tweak the Dead Man Switch settings for HACMP. First have a look at the current setting by running:
# lssrc -ls topsvcs
A system usually has at least 2 heartbeats: 1 through the network: net_ether_01, with a sensitivity of 10 missed beats x 1 second interval x 2 = 20 seconds for it to fail. The other heartbeat is usually the disk heartbeat, diskhb_0, with a sensitivity of 4 missed beats x 2 second interval x 2 = 16 seconds. Basically, if the other node has failed, HACMP will know if all the heartbeating has failed, thus after 20 seconds. You can play around with the HACMP detection rates: Set it to normal:
# /usr/es/sbin/cluster/utilities/claddnim oether r2
(Ethernet heartbeating fails after 20 seconds). If you want to set it to slow: Use -r3 instead of -r2, and it fails after 48 seconds. Set it to fast: Use -r1, which will fail it after 10
seconds. To give you some more time, you can use a grace period:
# claddnim -oether -g 15
This will give you 15 seconds of grace time, which is the time within a network fallover must be taken care of. You will have to synchronize the cluster after making any changes using claddnim:
# /usr/es/sbin/cluster/utilities/cldare -rt -V 'normal'
TOPICS: POWERHA / HACMP, SDD, STORAGE, STORAGE AREA NETWORK
Reservation bit
If you wish to get rid of the SCSI disk reservation bit on SCSI, SSA and VPATH devices, there are two ways of achieving this: Firstly, HACMP comes along with some binaries that do this job:
# /usr/es/sbin/cluster/utilities/cl_SCSIdiskreset /dev/vpathx
Secondly, there is a little (not official) IBM binary tool called "lquerypr". This command is part of the SDD driver fileset. It can also release the persistant reservation bit and clear all reservations: First check if you have any reservations on the vpath:
# lquerypr -vh /dev/vpathx
Clear it as follows:
# lquerypr -ch /dev/vpathx
In case this doesn't work, try the following sequence of commands:

# lquerypr -ch /dev/vpathx # lquerypr -rh /dev/vpathx # lquerypr -ph /dev/vpathx
If you'd like to see more information about lquerypr, simply run lquerypr without any options, and it will display extensive usage information. For SDD, you should be able to use the following command to clear the persistant reservation:
# lquerypr -V -v -c /dev/vpathXX
For SDDPCM, use:

# pcmquerypr -V -v -c /dev/hdiskXX
TOPICS: AIX, HARDWARE, STORAGE, SYSTEM ADMINISTRATION
Creating a dummy disk device

At some times it may be necessary to create a dummy disk device, for example when you need a disk to be discovered while running cfgmgr with a certain name on multiple hosts. For example, if you need the disk to be called hdisk2, and only hdisk0 exists on the system, then running cfgmgr will discover the disk as hdisk1, not as hdisk2. In order to make sure cfgmgr indeed discovers the new disk as hdisk2, you can fool the system by temporarily creating a dummy disk device. Here are the steps involved: First: remove the newly discovered disk (in the example below known as hdisk1 - we will configure this disk as hdisk2):
# rmdev -dl hdisk1
Next, we create a dummy disk device with the name hdisk1:

# mkdev -l hdisk1 -p dummy -c disk -t hdisk -w 0000
Note that running the command above may result in an error. However, if you run the following command afterwards, you will notice that the dummy disk device indeed has been created:
# lsdev -Cc disk | grep hdisk1 hdisk1 Defined SSA Logical Disk Drive
Also note that the dummy disk device will not show up if you run the lspv command. That is no concern. Now run the cfgmgr command to discover the new disk. You'll notice that the new disk will now be discovered as hdisk2, because hdisk0 and hdisk1 are already in use.
# cfgmgr # lsdev -Cc disk | grep hdisk2
Finally, remove the dummy disk device:

# rmdev -dl hdisk1
TOPICS: AIX, HARDWARE, SYSTEM ADMINISTRATION

# /usr/lpp/diagnostics/bin/usysident ? usage: usysident [-s {normal | identify}] [-l location code | -d device name]
usysident [-t]




Keep in mind that activating the LED of a particular device does not activate the LED of the system panel. You can achieve that if you omit the device parameter. TOPICS: HARDWARE, INSTALLATION, SYSTEM ADMINISTRATION
Automating microcode discovery

You can run invscout to do a microcode discovery on your system, that will generate a hostname.mup file. Then you go upload this hostname.mup file at this page on the IBM website and you get a nice overview of the status of all firmware on your system. So far, so good. What if you have plenty of systems and you want to automate this? Here's a script to do this. This script first does a webget to collect the latestcatalog.mic file from the IBM website. Then it distributes this catalog file to all the hosts you want to check. Then, it runs invscout on all these hosts, and collects the hostname.mup files. It will concatenate all these files into 1 large file and do an HTTP POST through curl to upload the file to the IBM website and have a report generated from it. So, what do you need? You should have an AIX jump server that allows you to access the other hosts as user root through SSH. So you should have setup your SSH keys for user root. This jump server must have access to the Internet. You need to have wget and curl installed. Get it from the Linux Toolbox. Your servers should be AIX 5 or higher. It doesn't really work with AIX 4. Optional: a web server, like Apache 2, would be nice, so you can drop the resulting HTML file on your website every day. An entry in the root crontab to run this script every day.
A list of servers you want to check. Here's the script:

#!/bin/ksh
# script:
generate_survey.ksh
# purpose: To generate a microcode survey html file
# where is my list of servers located? SERVERS=/usr/local/etc/servers
# what temporary folder will I use? TEMP=/tmp/mup
# what is the invscout folder INV=/var/adm/invscout
# what is the catalog.mic file location for invscout? MIC=${INV}/microcode/catalog.mic
# if you have a webserver, # where shall I put a copy of survey.html? APA=/usr/local/apache2/htdocs
# who's the sender of the email? FROM=microcode_survey@ibm.com
# who's the receiver of the email? TO="your.email@address.com"
# what's the title of the email? SUBJ="Microcode Survey"
# user check USER=`whoami` if [ "$USER" != "root" ]; then echo "Only root can run this script." exit 1; fi
# create a temporary directory
rm -rf $TEMP 2>/dev/null mkdir $TEMP 2>/dev/null cd $TEMP
# get the latest catalog.mic file from IBM # you need to have wget installed # and accessible in $PATH # you can download this on: # www-03.ibm.com # /systems/power/software/aix/linux/toolbox/download.html wget techsupport.services.ibm.com/server/mdownload/catalog.mic # You could also use curl here, e.g.: #curl techsupport.services.ibm.com/server/mdownload/catalog.mic -LO
# move the catalog.mic file to this servers invscout directory mv $TEMP/catalog.mic $MIC
# remove any old mup files echo Remove any old mup files from hosts. for server in `cat $SERVERS` ; do echo "${server}" ssh $server "rm -f $INV/*.mup" done
# distribute this file to all other hosts for server in `cat $SERVERS` ; do echo "${server}" scp -p $MIC $server:$MIC done
# run invscout on all these hosts # this will create a hostname.mup file for server in `cat $SERVERS` ; do echo "${server}" ssh $server invscout done
# collect the hostname.mup files for server in `cat $SERVERS` ; do echo "${server}" scp -p $server:$INV/*.mup $TEMP
done
# concatenate all hostname.mup files to one file cat ${TEMP}/*mup > ${TEMP}/muppet.$$
# delete all the hostname.mup files rm $TEMP/*mup
# upload the remaining file to IBM. # you need to have curl installed for this # you can download this on: # www-03.ibm.com # /systems/power/software/aix/linux/toolbox/download.html # you can install it like this: # rpm -ihv # curl-7.9.3-2.aix4.3.ppc.rpm curl-devel-7.9.3-2.aix4.3.ppc.rpm
# more info on using curl can be found on: # http://curl.haxx.se/docs/httpscripting.html # more info on uploading survey files can be found on: # www14.software.ibm.com/webapp/set2/mds/fetch?pop=progUpload.html
# Sometimes, the IBM website will respond with an # "Expectation Failed" error message. Loop the curl command until # we get valid output.
stop="false"
while [ $stop = "false" ] ; do
curl -H Expect: -F mdsData=@${TEMP}/muppet.$$ -F sendfile="Upload file" \ http://www14.software.ibm.com/webapp/set2/mds/mds \ > ${TEMP}/survey.html
# # Test if we see Expectation Failed in the output #
unset mytest mytest=`grep "Expectation Failed" ${TEMP}/survey.html`
if [ -z "${mytest}" ] ; then
stop="true" fi
sleep 10
done
# now it is very useful to have an apache2 webserver running # so you can access the survey file mv $TEMP/survey.html $APA
# tip: put in the crontab daily like this: # 45 9 * * * /usr/local/sbin/generate_survey.ksh 1>/dev/null 2>&1
# mail the output # need to make sure this is sent in html format cat - ${APA}/survey.html <<HERE | sendmail -oi -t From: ${FROM} To: ${TO} Subject: ${SUBJ} Mime-Version: 1.0 Content-type: text/html Content-transfer-encoding: 8bit
HERE
# clean up the mess cd /tmp rm -rf $TEMP
TOPICS: HARDWARE, NETWORKING
Integrated Virtual Ethernet adapter

The "Integrated Virtual Ethernet" or IVE adapter is an adapter directly on the GX+ bus, and thus up to 3 times faster dan a regular PCI card. You can order Power6 frames with different kinds of IVE adapters, up to 10GB ports. The IVE adapter acts as a layer-2 switch. You can create port groups. In each port group up to 16 logical ports can be defined. Every port group requires at least 1 physical port (but 2 is also possible). Each logical port can have a MAC address assigned. These MAC addresses are located in the VPD chip of the IVE. When you replace an IVE adapters, LPARS will get new new MAC addresses.
Each LPAR can only use 1 logical port per physical port. Different LPARs that use logical ports from the same port group can communicate without any external hardware needed, and thus communicate very fast. The IVE is not hot-swappable. It can and may only be replaced by certified IBM service personnel. First you need to configure an HAE adapter; not in promiscues mode, because that is meant to be used if you wish to assign a physical port dedicated to an LPAR. After that, you need to assign a LHAE (logical host ethernet adapter) to an LPAR. The HAE needs to be configured, and the frame needs to be restarted, in order to function correctly (because of the setting of multi-core scaling on the HAE itself). So, to conclude: You can assign physical ports of the IVE adapter to separate LPARS (promiscues mode). If you have an IVE with two ports, up to two LPARS can use these ports. But you can also configure it as an HAE and have up to 16 LPARS per physical port in a port group using the same interface (10Gb ports are recommended). There are different kinds of IVE adapters; some allow to create more port groups and thus more network connectivity. The IVE is a method of virtualizing ethernet without the need for VIOS. TOPICS: AIX, HARDWARE, LOGICAL PARTITIONING

An adapter that has previously been added to a LPAR and now needs to be removed, usually doesn't want to be removed from the LPAR, because it is in use by the LPAR. Here's how you find and remove the involved devices on the LPAR: First, run:
# lsslot -c pci
For example:
# rmdev -Rl pci8 -d
Now you should be able to remove the adapter via the HMC from the LPAR. If you need to replace the adapter because it is broken and needs to be replaced, then you
need to power down the PCI slot in which the adapter is placed: After issuing the "rmdev" command, run diag and go into "Task Selection", "Hot Plug Task", "PCI Hot Plug Manager", "Replace/Remove a PCI Hot Plug Adapter". Select the adapter and choose "remove". After the adapter has been replaced (usually by an IBM technician), run cfgmgr again to make the adapter known to the LPAR. TOPICS: HARDWARE, INSTALLATION
Power 5 color codes

Ever noticed the different colors on parts of Power5 systems? Some parts are orange, others are blue. Orange means: you can touch it, open it, remove it, even if the system is running. Blue means: don't touch it if the system is active. TOPICS: HARDWARE, SDD, STORAGE, STORAGE AREA NETWORK
How-to replace a failing HBA using SDD storage

This is a procedure how to replace a failing HBA or fibre channel adapter, when used in combination with SDD storage: Determine which adapter is failing (0, 1, 2, etcetera):
# datapath query adapter
Check if there are dead paths for any vpaths:

# datapath query device
Try to set a "degraded" adapter back to online using:

# datapath set adapter 1 offline # datapath set adapter 1 online
(that is, if adapter "1" is failing, replace it with the correct adapter number). If the adapter is still in a "degraded" status, open a call with IBM. They most likely require you to take a snap from the system, and send the snap file to IBM for them to analyze and they will conclude if the adapter needs to be replaced or not. Involve the SAN storage team if the adapter needs to be replaced. They will have to update the WWN of the failing adapter when the adapter is replaced for a new one with a new WWN. If the adapter needs to be replaced, wait for the IBM CE to be onsite with the new HBA adapter. Note the new WWN and supply that to the SAN storage team. Remove the adapter:
# datapath remove adapter 1
(replace the "1" with the correct adapter that is failing). Check if the vpaths now all have one less path:
# datapath query device | more
De-configure the adapter (this will also de-configure all the child devices, so you won't have to do this manually), by running: diag, choose Task Selection, Hot Plug Task, PCI Hot Plug manager, Unconfigure a Device. Select the correct adapter, e.g. fcs1, set "Unconfigure any Child Devices" to "yes", and "KEEP definition in database" to "no". Hit ENTER.
Replace the adapter: Run diag and choose Task Selection, Hot Plug Task, PCI Hot Plug manager, Replace/Remove a PCI Hot Plug Adapter. Choose the correct device (be careful, you won't see the adapter name here, but only "Unknown", because the device was unconfigured).
LED.
Have the IBM CE replace the adapter. Close any events on the failing adapter on the HMC. Validate that the notification LED is now off on the system, if not, go back into diag, choose Task Selection, Hot Plug Task, PCI Hot Plug Manager, and Disable the attention
Check the adapter firmware level using:

# lscfg -vl fcs1
(replace this with the actual adapter name). And if required, update the adapter firmware microcode. Validate if the adapter is still functioning correctly by running:
# errpt # lsdev -Cc adapter
Have the SAN admin update the WWN. Run:

# cfgmgr -S
Check the adapter and the child devices:

# lsdev -Cc adapter # lsdev -p fcs1 # lsdev -p fscsi1
(replace this with the correct adapter name). Add the paths to the device:
# addpaths
Check if the vpaths have all paths again:

TOPICS: HARDWARE, INSTALLATION, NETWORKING
IP address service processors
With POWER6, the default addresses of the service processors have changed. This only applies to environments where the managed system was powered on before the HMC was configured to act as an DHCP server. The Service Processors may get their IP-Addresses by three different mechanisms: 1. 2. 3. Addresses received from a DHCP Server. Fixed addresses given to the interfaces using the ASMI. Default addresses if neither of the possibilities above is used.
The default addresses are different betweeen POWER5 and POWER6 servers. With POWER5 we have the following addresses:
Port HMC1: 192.168.2.147/24 Port HMC2: 192.168.3.147/24
The POWER6 systems use the following addresses:

First Service Processor: Port HMC1: 169.254.2.147/24 Port HMC2: 169.254.3.147/24
Second Service Processor: Port HMC1: 169.254.2.146/24 Port HMC2: 169.254.3.146/24
Link: System p Operations Guide for ASMI and for Nonpartitioned Systems. TOPICS: AIX, HARDWARE, MONITORING
TOPICS: HMC, LOGICAL PARTITIONING
Useful HMC commands

Here are some very useful commands for the Hardware Management Console (HMC): Show vital product data, such as the serial number:
# lshmc -v
Show the release of the HMC:

# lshmc -V
Show network information of the HMC:

# lshmc -n
Reboot the HMC:

# hmcshutdown -r -t now
Show the connected managed systems:

# lssysconn -r all
Change the password of user hscpe:

# chhmcusr -u hscpe -t passwd -v abc1234
List the users of the HMC:

# lshmcusr
These are intersting log files of the HMC:

# ls -al /var/hsc/log/hmclogger.log # ls -al /var/hsc/log/cimserver.log
Monitor the disk space:

# monhmc -r disk
This can be used to view the file systems of the HMC. Try using "proc", "mem" and "swap as well. By default this command will loop forever and update the screen every 4 seconds. You can run it only once, with the following command:
# monhmc -r disk -n 0
Zero out log files:

# chhmcfs -o f -d 0
This will delete any temporary files. Extremely useful if the HMC calls home to IBM about high usage of one of its file systems. Open a virtual console from the HMC:
# vtmenu
Exit by typing "~." (tilde dot) or "~~." (tilde tilde dot). Force the closure of a virtual terminal session:
# rmvterm -m SYSTEM-9117-570-SN10XXXXX -p name
Change the state of a partition:

# chsysstate -m SYSTEM-9131-52A-SN10XXXXX -r lpar -o on -n name -f default_profile # chsysstate -m SYSTEM-9131-52A-SN10XXXXX -r lpar -o shutdown -n name --immed
To start all partitions of one managed server:
# chsysstate -m Prd2-Server-8233-E8B-SN0XXXXXX -r lpar -o on --all
List partition profiles for a managed system:

# lssyscfg -r prof -m SYSTEM-9117-570-SN10XXXXX
List partition information:

# lspartition
TOPICS: AIX, HMC, SYSTEM ADMINISTRATION

TOPICS: HMC, LINUX
Installing Linux websm client from HMC version 3.3.6

How do you install the Linux Web Based System Manager (websm) client from an HMC version 3.3.6, if your only access to the system is through ssh? The following procedure can be used: First, get the Linux websm software of the HMC to the Linux system:
# ssh -l hscroot hmc ls -als /usr/websm/pc_client/* # cd /tmp # scp hscroot@labhmc1:/usr/websm/pc_client/*linux* .
Install the java2 runtime environment:

# rpm -ihv *rpm
Install websm:
# ./wsmlinuxclient.exe -silent
Install some additional software required:

# yum install libXp compat-libstdc*
Run websm:
# /opt/websm/bin/wsm
TOPICS: HMC, INSTALLATION
Upgrading an HMC remotely

If you have issues getting to the computer room easily, and you have to update an HMC on the raised floor, then you can also do that upgrade remotely. IBM describes two methods on their website: by using the update media and using the recoverable media. Using the update media method, you may end up with a corrupted HMC. The only way to solve this, us accessing the HMC in the computer room (*sigh*). Therefore, use the recoverable media option. That one works better. A link to the documentation and software can be found here.
TOPICS: HMC, LOGICAL PARTITIONING, VIRTUALIZATION
Stopping and starting a LPAR from the HMC prompt

It is possible to stop and start an LPAR from the HMC prompt:
# lssycfg -r lpar
This command will list all partitions known to this HMC.

# chsysstate -o osshutdown -r lpar -n [partition name]
This command will send a shutdown OS command to the lpar.

# chsysstate -o on -r lpar -n [partition name]
This command will activate the partition.

# lsrefcode -r lpar -F lpar_name,refcode
This command will show the LED code. TOPICS: HMC, SYSTEM ADMINISTRATION
Inaccessible vterm on HMC?

It may happen that a virtual terminal (vterm) from an HMC GUI only showes a black screen, even though the Lpar is running perfectly. Here's a solution to this problem: Login to the HMC using ssh as hscroot. Run lssscfg -R sys to determine the machine name of your lpar on the HMC. Run mkvterm -m [machine-name] -p [partition-name]. You can end this session by typing "~." or "~~." (don't overlook the "dot" here!). Now go back to your HMC gui via WebBased System Manager and start-up a new vterm. It works again! TOPICS: HMC, SYSTEM ADMINISTRATION
Opening a virtual terminal window on HMC version 3.3.6

You may run into an issue with opening a virtual terminal window on an OLD HMC version (version 3.3.6). You can access the HMC through ssh, but opening a terminal window doesn't work. This ocurs when the HMC is in use for a full system partition on a frame: At the fist attempt to login through ssh to the HMC and running vtmenu:
# ssh -l hscroot hmc hscroot@hmc's password: hscroot@lawhmc2:~> vtmenu
Retrieving name of managed system(s) . . .
---------------------------------------------------------Managed Systems: ---------------------------------------------------------1) 10XXXXX-XXXX 2) 10YYYYY-YYYY
3) 10ZZZZZ-ZZZZ
Enter Number of Managed System.
(q to quit): 3
---------------------------------------------------------Partitions On Managed System: 10ZZZZZ-ZZZZ
----------------------------------------------------------
Enter Number of Running Partition (q to quit):
Here's where you may get stuck. Vtmenu allows you to select a frame, but won't show show any partition to start a virtual terminal window on. Seems obvious, because there aren't any partitions available (fullSystemPartition only). The solution is to run: mkvterm -m 10ZZZZZ-ZZZZ. This opens the virtual terminal window all right. When you're done, you can log out by using "~.". And if someone else is using the virtual terminal window, and you wish to close that virtual terminal window, run rmvterm -m 10ZZZZZ-ZZZZ. In case you're wondering, how to figure out the managed machine name to use with the mkvterm and rmvterm commands, simply run vtmenu first. It shows you a list of managed machines controlled by this HMC. TOPICS: HMC, SYSTEM ADMINISTRATION
Useful HMC key combintaions

CTRL-ALT-F1: Switch to Linux command line; no login possible. If you then click on CTRLALT-DEL the system will reboot. CTRL-ALT-F2: Takes you back to the Xserver window. CTRL-ALT-BACKSPACE: Kills of the Xserver and will start a new -fresh- one, so you can login again. TOPICS: HMC, SECURITY
Secure shell access to HMC

If you wish to be able to access an HMC from the command line, without the need of logging in, you can use ssh (secure shell). Set-up a secure shell connection to your HMC:
# ssh userid@hostname
You will have to enter a password to get into your HMC. To allow your root user direct access to the HMC without the need of logging in, you'll have to update the authorized_keys2 file in the .ssh subdirectory of the home directory of your HMC user. There's a problem: a regular user only gets a restricted shell on an HMC and therefore
is unable to edit the authorized_keys2 file in subdirectory .ssh. In an HMC version 3 it is possible to disable the restricted shell for users by editing file/opt/hsc/data/ssh/hmcsshrc. In an HMC version 4 and up you no longer get root access (except, you may get it, by contacting IBM), so you can no longer edit this file. But there's another way to accomplish it. Let's say your hmc user ID is hmcuser and you were able to logon to the HMC calledhmcsystem using this ID and a password (like described above). First, get a valid authorized_keys2 file, that allows root at your current host access to the HMC. Place this file in /tmp. Then, use scp to copy the authorized_keys2 file to the HMC:
# scp /tmp/authorized_keys2 hmcuser@hmcsystem:~hmcuser/.ssh/authorized_keys2
[Enter your hmcuser's password, when required] Now, just test if it works:
# ssh hmcuser@hmcsystem date
You should now be able to access the system without entering a password. TOPICS: AIX, HMC, SYSTEM ADMINISTRATION

As of Hardware Management Console (HMC) Release 3, Version 2.3, the rexeccommand is no longer available on the HMC. Use ssh command instead. From Version 2.5, users are required to enter a valid HMC user id/password when downloading the WebSM client from the HMC. The URL for the WebSM client is: http://[HMC fully qualified domain name]/remote_client.html. Standard users receive the restriced shell via a set -r in .mysshrc when logging in. Comment the set -r command in /opt/hsc/data/ssh/hmcsshrc to get rid of the restricted shell for your users (it gets copied to $HOME/.mysshrc). For more information on commands that can be used in restriced shell on the HMC, go to HMC Power 4 Hints & Tips. A special hscpe user ID can be created which has unrestricted shell access via thepesh command. Use lshmc -v to determine the serial number of the HMC (after *SE). Then call IBM support and request for the password of the hscpe user for the peshcommand. IBM is able to generate a password for the hscpe user for one day. TOPICS: AIX, HMC, LOGICAL PARTITIONING
Lpar tips
The uname -Ls command will show you the partition number and the partition (lpar) name. When setting the resource allocation for a partition profile, set the minimum to the absolute bare minimum, and set the maximum as high as possible. For memory there are special considerations: If you set the maximum too low and you wish to exceed above the maximum amount of memory defined in the active profile, you can't simply adjust the profile and put extra memory in via DLPAR, because the LPAR has been initialized with a certain page table size, based on the maximum amount of memory setting. Therefore, a reboot will be required when you wish to use more memory than defined in the active profile. If you do try it however, you'll receive the following error:
If you set the maximum too high, the partition will be initialize with a large page table size, which uses too much memory for overhead, which you might never use. TOPICS: HMC, INSTALLATION
HMC network update to v7.3.5

If you decide to update to HMC release 7.3.5, Fix Central only supplies you the ISO images. This procedure describes how you can update your HMC using the network without having to sit physically in front of the console. First, check if this new HMC level is supported by the firmware levels of your supported systems using this link. If you're certain you can upgrade to V7.3.5, then make sure to download all the required mandatory fixes from IBM Fix Central. Don't download the actual base level of HMV v7.3.5 of 3 Gigabytes. We'll download that directly to the HMC later on. Then, perform the "Save upgrade data" task using either the Web interface or the command line. Then get the required files from the IBM server using ftp protocol using the following command:
# getupgfiles -h ftp.software.ibm.com -u anonymous --passwd ftp -d /software/server/hmc/network/v7350
Hint: Once this procedure gets interrupted for any reason, you need to reboot your HMC before restarting it. Otherwise, some files will remain in the local download directory which will lead to incorrect checksums. You can check the progress of the procedure using the command ls -l /hmcdump in a different terminal. Once it has finished, you will see a prompt without any additional messages and the directory will be empty (the files will be copied to a different location). Then tell the HMC to boot from an alternate media by issuing the following command:
# chhmc -c altdiskboot -s enable --mode upgrade
Finally reboot your HMC with the following command from the console:
The installation should start automatically with the reboot. Once it has finished you should be able to login from remote again. The whole procedure takes up to one hour. Once you have finished you should add in any case the mandatory efixes for HMC 7.3.5 as ISO images. You can update the HMC with these fixes through the HMC. For more information, please visit this page. TOPICS: HMC, SECURITY
HMC access through SSH tunnel

If your HMC is located behind a firewall and your only access is through SSH, then you have to use SSH tunneling to get browser-based access to your HMC. The ports you need to use for setting up the SSH tunnel are: 22, 23, 8443, 9960, 9735, 657, 443, 2300, 2301 and 2302. This applies to version 7 of the HMC. For example, if you're using a jump server to get access to the HMC, you need to run:
# ssh -l user -g -L 8443:10.48.32.99:8443 -L 9960:10.48.32.99:9960 -L 9735:10.48.32.99:9735 -L 2300:10.48.32.99:2300 -L 2301:10.48.32.99:2301 -L 443:10.48.32.99:443 -L 2302:10.48.32.99:2302 -L 657:10.48.32.99:657 -L 22:10.48.32.99:22 -L 23:10.48.32.99:23 jumpserver.domain.com -N
When you've run the command above (and have logged in to your jumpserver), then point the browser to https://jumpserver.domain.com.
TOPICS: INSTALLATION, SYSTEM ADMINISTRATION
Can't open virtual terminal

If you get the following message when you open a vterm:
The session is reserved for physical serial port communication.
Then this may be caused by the fact that your system is still in MDC, or manufactoring default configuration mode. It can easily be resolved: Power down your frame. Power it back up to standby status. Then, when activating the default LPAR, choose "exit the MDC". TOPICS: AIX, INSTALLATION, SYSTEM ADMINISTRATION
Compare_report
# compare_report -b /tmp/node1 -o /tmp/node2 -h #(basehigher.rpt) #Base System Installed Software that is at a higher level #Fileset_Name:Base_Level:Other_Level
idsldap.clt64bit62.rte:6.2.0.5:6.2.0.4 idsldap.clt_max_crypto64bit62.rte:6.2.0.5:6.2.0.4 idsldap.cltbase62.adt:6.2.0.5:6.2.0.4 idsldap.cltbase62.rte:6.2.0.5:6.2.0.4 idsldap.cltjava62.rte:6.2.0.5:6.2.0.4 idsldap.msg62.en_US:6.2.0.5:6.2.0.4 idsldap.srv64bit62.rte:6.2.0.5:6.2.0.4 idsldap.srv_max_cryptobase64bit62.rte:6.2.0.5:6.2.0.4 idsldap.srvbase64bit62.rte:6.2.0.5:6.2.0.4 idsldap.srvproxy64bit62.rte:6.2.0.5:6.2.0.4 idsldap.webadmin62.rte:6.2.0.5:6.2.0.4 idsldap.webadmin_max_crypto62.rte:6.2.0.5:6.2.0.4 AIX-rpm:6.1.3.0-6:6.1.3.0-4

AIX-rpm is a "virtual" package which reflects what has been installed on the system by installp. It is created by the /usr/sbin/updtvpkg script when the rpm.rte is installed, and can be run anytime the administrator chooses (usually after installing something with installp that is required to satisfy some dependency by an RPM package). Since AIX-rpm has to have some sort of version number, it simply reflects the level of bos.rte on the system where /usr/sbin/updtvpkg is being run. It's just informational - nothing should be checking the level of AIX-rpm.
AIX doesn't just automatically run /usr/sbin/updtvpkg every time that something gets installed or deinstalled because on some slower systems with lots of software installed, /usr/sbin/updtvpkg can take a LONG time. If you want to run the command manually:
# rpm --rebuilddb
Once you run updtvpkg, you can run a rpm -qa to see your new AIX-rpm package. TOPICS: AIX, INSTALLATION, NIM, SYSTEM ADMINISTRATION
Nimadm


N usr,root

PLATFORM SPECIFIC
PLATFORM SPECIFIC
This is a device on a PCI I/O card. For a physical address like U788C.001.AAC1535-P1-C13-T1: U788C.001.AAC1535 - This part identifies the 'system unit/drawer'. If your system is made up of several drawers, then look on the front and match the ID to this section of the address. Now go round the back of the server. P1 - This is the PCI bus number. You may only have one. C13 - Card Slot C13. They are numbered on the back of the server. T1 - This is port 1 of 2 that are on the card. Your internal ports won't have the Card Slot numbers, just the T number, representing the port. This should be marked on the back of your server. E.g.: U788C.001.AAC1535-P1-T2 means unit U788C.001.AAC1535, PCI bus P1, port T2 and you should be able to see T2 printed on the back of the server. TOPICS: INSTALLATION, NIM
How to set up a NIM master

In this section, we will configure the NIM master and create some basic installation resources: Ensure that Volume 1 of the AIX DVD is in the drive. Install the NIM master fileset:
# installp -agXd /dev/cd0 bos.sysmgt.nim
Configure NIM master:

# smitty nim_config_env
Set fields as follows: "Primary Network Interface for the NIM Master": selected interface
"Input device for installation images": "cd0" "Remove all newly added NIM definitions...": "yes" Press Enter. Exit when complete. Initialize each NIM client:
# smitty nim_mkmac
Enter the host name of the appropriate LPAR. Set fields as follows: "Kernel to use for Network Boot": "mp" "Cable Type": "tp" Press Enter. Exit when complete.
A more extensive document about setting up NIM can be found here: http://www01.ibm.com/support/docview.wss?context=SWG10q1=setup+guide&uid=isg3T1010383 Another interesting document which covers NIM is this: NIM Nutshell TOPICS: AIX, INSTALLATION, SYSTEM ADMINISTRATION
install_all_updates
This installs all the software updates from the current directory. Of course, you will have to make sure the current directory contains any software. Don't worry about generating a Table Of Contents (.toc) file in this directory, because install_all_updates generates one for you. By default, install_all_updates will apply the filesets. Use -c to commit any software. Also, by default, it will expand any file systems; use -x to prevent this behavior). It will install any requisites by default (use -n to prevent). You can use -p to run a preview, and you can use -s to skip the recommended maintenance or technology level verification at the end of the install_all_updates output. You may have to use the -Y option to agree to all licence agreements. To install all available updates from the cdrom, and agree to all license agreements, and skip the recommended maintenance or technology level verification, run:
TOPICS: AIX, EMC, INSTALLATION, POWERHA / HACMP, STORAGE AREA NETWORK, SYSTEM ADMINISTRATION


Enter the name of the resource group. It's a good habit to make sure that a resource group name ends with "rg", so you can recognize it as a resource group. Also, select the participating nodes. For the "Fallback Policy", it is a good idea to change it to "Never
Fallback". This way, when the primary node in the cluster comes up, and the resource group is up-and-running on the secondary node, you won't see a failover occur from the secondary to the primary node. Note: The order of the nodes is determined by the order you select the nodes here. If you put in "node01 node02" here, then "node01" is the primary node. If you want to have this any other way, now is a good time to correctly enter the order of node priority. Add the Servie IP/Label to the resource group:
Wait until the cluster is stable and both nodes are up. Basically, the cluster is now up-and-running. However, during the Verification & Synchronization step, it will complain about not having a non-IP network. The next part is for setting up a disk heartbeat network, that will allow the nodes of the HACMP cluster to exchange disk heartbeat packets over a SAN disk. We're assuming here, you're using EMC storage. The process on other types of SAN storage is more or less similar, except for some differences, e.g. SAN disks on EMC storage are called "hdiskpower" devices, and they're called "vpath" devices on IBM SAN storage. First, look at the available SAN disk devices on your nodes, and select a small disk, that won't
be used to store any data on, but only for the purpose of doing the disk heartbeat. It is a good habit, to request your SAN storage admin to zone a small LUN as a disk heartbeating device to both nodes of the HACMP cluster. Make a note of the PVID of this disk device, for example, if you choose to use device hdiskpower4:

Select the disk device on both nodes by selecting the same disk on each node by pressing F7.
Run a Verification & Synchronization again, as described earlier above. Then check with clstat and/or cldump again, to check if the disk heartbeat network comes online. TOPICS: AIX, INSTALLATION, SYSTEM ADMINISTRATION

No more ordering CDROMs or DVDs and waiting days. Download the .iso image over the web and install from there. Use the virtual DVD drive on your VIOS 2.1 and install directly into the LPAR or read the contents into your NIM Server. Mount the .ISO image: On AIX 6.1 or AIX 7, use the loopmount command:http://publib.boulder.ibm.com/infocenter/aix/v6r1/index.jsp?topic=/com.ibm.aix .cmds/doc/aixcmds3/loopmount.htm On AIX 5.3, use the mklv-dd-mount trick:https://www.ibm.com/developerworks/wikis/display/WikiPtype/AIXV53MntISO Details on the new service: You have to prove you are entitled via: Customer number, Machine serial numbers or SWMA. The Entitled Software Support Download User Guide can be downloaded here:ftp://public.dhe.ibm.com/systems/support/planning/essreg/I1128814.pdf. Then you can download the AIX media, Expansion Packs, Linux Toolbox and more. Start at: www.ibm.com/eserver/ess.
TOPICS: INSTALLATION, LINUX
Linux Kickstart using a configuration file on a USB thumbdrive

In case you're wondering "How do I load a kickstart configuration file on my USB thumbdrive, while installing Linux?", we can tell you, it is really simple. You only have to know the syntax to do this. First, make sure that both the Linux DVD and the USB thumdrive are connected to the system, either directly, or through virtual media. Then, to install linux, type:
# linux ks=hd:sdc:/ks.cfg
(Replace "ks.cfg" with the actual Kickstart configuration file name) More information can be found here: https://www.redhat.com/docs/manuals/enterprise/RHEL-4-Manual/sysadminguide/s1-kickstart2-startinginstall.html
http://www.redhat.com/docs/manuals/enterprise/RHEL-5manual/Installation_Guide-en-US/s1-kickstart2-startinginstall.html http://www.redhat.com/docs/manuals/enterprise/RHEL-5manual/Installation_Guide-en-US/ch-kickstart2.html This is similar to a bos_inst.data, and lppsource definition in one file. Every Linux installation will generate one of these based on the selections made during the install. It usually can be found in /home/root/ks.cfg or sometimes /root/ks.cfg. The installer (anaconda) will tell you which of the two locations to look. This file can then be used to do various degrees of hands on/of installations. Also, a vendor may supply a ks.cfg file of their own for use to use. TOPICS: AIX, INSTALLATION, LOGICAL PARTITIONING, VIRTUALIZATION

The most popular innovation of IBM AIX Version 6.1 is clearly workload partitioning (WPARs). Once you get past the marketing hype, you'll need to determine the value that WPARs can provide in your environment. What can WPARs do that Logical Partitions (LPARs) could not? How and when should you use WPARs? Equally as important, when should you not use Workload Partitioning. Finally, how do you create, configure, and administer workload partitions? For a very good introduction to WPARs, please refer to the following article:https://www.ibm.com/developerworks/aix/library/au-wpar61aix/ or download the PDF version here. This article describes the differences between system and application WPARs, the various commands available, such as mkwpar, lswpar, startwpar and clogin. It also describes how to create and manage file systems and users, and it discusses the WPAR manager. It ends with an excellent list of references for further reading. TOPICS: HARDWARE, INSTALLATION, SYSTEM ADMINISTRATION

You can run invscout to do a microcode discovery on your system, that will generate a hostname.mup file. Then you go upload this hostname.mup file at this page on the IBM website and you get a nice overview of the status of all firmware on your system. So far, so good. What if you have plenty of systems and you want to automate this? Here's a script to do this. This script first does a webget to collect the latestcatalog.mic file from the IBM website. Then it distributes this catalog file to all the hosts you want to check. Then, it runs invscout on all these hosts, and collects the hostname.mup files. It will concatenate all these files into 1 large file and do an HTTP POST through curl to upload the file to the IBM website and have a report generated from it. So, what do you need?
You should have an AIX jump server that allows you to access the other hosts as user root through SSH. So you should have setup your SSH keys for user root. This jump server must have access to the Internet. You need to have wget and curl installed. Get it from the Linux Toolbox. Your servers should be AIX 5 or higher. It doesn't really work with AIX 4. Optional: a web server, like Apache 2, would be nice, so you can drop the resulting HTML file on your website every day. An entry in the root crontab to run this script every day. A list of servers you want to check. Here's the script:
#!/bin/ksh
# script:
generate_survey.ksh
# create a temporary directory rm -rf $TEMP 2>/dev/null mkdir $TEMP 2>/dev/null cd $TEMP
# run invscout on all these hosts # this will create a hostname.mup file
for server in `cat $SERVERS` ; do echo "${server}" ssh $server invscout done
# collect the hostname.mup files for server in `cat $SERVERS` ; do echo "${server}" scp -p $server:$INV/*.mup $TEMP done
# Sometimes, the IBM website will respond with an # "Expectation Failed" error message. Loop the curl command until # we get valid output.
stop="false"
if [ -z "${mytest}" ] ; then stop="true" fi
sleep 10
done
HERE
TOPICS: HMC, INSTALLATION
Upgrading an HMC remotely
If you have issues getting to the computer room easily, and you have to update an HMC on the raised floor, then you can also do that upgrade remotely. IBM describes two methods on their website: by using the update media and using the recoverable media. Using the update media method, you may end up with a corrupted HMC. The only way to solve this, us accessing the HMC in the computer room (*sigh*). Therefore, use the recoverable media option. That one works better. A link to the documentation and software can be found here. TOPICS: AIX, INSTALLATION, NIM, SYSTEM ADMINISTRATION

If your AIX server level is below 5.3 TL06, the easiest way is just to upgrade your current OS to TL 06 at minimum (take note it will depend of configurations for Power6 processors) then clone your server and install it on the new p6. But if you want to avoid an outage on your account, you can do the next using a NIM server (this is not official procedure for IBM, so they do not support this): Create your mksysb resource and do not create a spot from mksysb. Create an lppsource and spot with minimum TL required (I used TL08). Once you do nim_bosinst, choose the mksysb, and the created spot. It will send a warning message about spot is not at same level as mksysb, just ignore it. Do all necessary to boot from NIM. Once restoring the mksysb, there's some point where it is not able to create the bootlist because it detects the OS level is not supported on p6. So It will ask to continue and fix it later via SMS or fix it right now. Choose to fix it right now (it will open a shell). You will notice oslevel is as the same as mksysb. Create a NFS from NIM server or another server where you have the necessary TL and mount it on the p6. Proceed to do an upgrade, change the bootlist, exit the shell. Server will boot with new TL over the p6. TOPICS: AIX, INSTALLATION, SYSTEM ADMINISTRATION

It is very easy to clone your rootvg to another disk, for example for testing purposes. For example: If you wish to install a piece of software, without modifying the current rootvg, you can clone a rootvg disk to a new disk; start your system from that disk and do the installation there. If it succeeds, you can keep using this new rootvg disk; If it doesn't, you can revert back to the old rootvg disk, like nothing ever happened. First, make sure every logical volume in the rootvg has a name that consists of 11 characters
or less (if not, the alt_disk_copy command will fail). To create a copy on hdisk1, type:
# lslpp -h

This is a short procedure how to creat an AIX DVD from a base set of 8 AIX 5.3 CD's: 1. Copy all CD's using "cp -hRp" to a directory, start with the last CD and work towards the first one. You can do this on either an AIX or a Linux system. 2. 3. Check that <directory>/installp/ppc contails all install images. If not already, remove <directory>/usr/sys/inst.images. This directory also might contain all installation images. 4. 5. Create a link <directory>/usr/sys/inst.images pointing to <directory>/installp/ppc. Find all .toc files in the directory structure and, if necessary, change all vol%# entries to vol%1 (There should be at least 2 .toc files that need these updates). You have to change vol%2 to vol%1, vol%3 to vol%1, etcetera, up till vol%8. 6.
7.

# mkisofs -R -o
Now you've created an ISO image that you can burn to a DVD. Some specific information on burning this ISO image on AIX to a DVD-RAM: Burning a DVD-RAM is somewhat more difficult than burning a CD. First, it depends if you have a slim-line DVD-RAM drive in a Power5 system or a regular DVD-RAM drive in Power4 systems (not slimline).
Use DLPAR to move the required SCSI controller to a LPAR, in order to be able to use the DVD-RAM drive. After the DLPAR action of the required SCSI controller is complete, execute: cfgmgr. After the configuration manager has run, you will end up with either 1 or 2 DVD drives (depending on the actual drives in the hardware frame):
As you can see, the first is the DVD-RAM, the second is a DVD-ROM. Therefor, we will use the first one (in this sample). Place a DVD-RAM single sided 4.7 GB Type II disc (partnumber 19P0862) in the drive. DO NOT USE ANY OTHER TYPE OF DVD-RAM DISCS. OTHER TYPE OF DISCS ARE NOT SUPPORTED BY IBM. In case you have a POWER4 system: Be sure to use a use the case of the DVD-RAM in order to burn the DVD. DVD-RAM drives in Power4 systems will NOT burn if you removed the DVD-RAM from its case. Also, be sure to have the latest firmware level on the DVD-RAM drive (see websitehttp://www14.software.ibm.com/webapp/set2/firmware for the correct level of the firmware for your drive). Without this firmware level these DVD-RAM drives are unable to burn Type II DVD-RAM discs. Using lscfg -vl cd0 you can check the firmware level:
Manufacturer................IBM Machine Type and Model......DVRM00203 ROS Level and ID............A132 Device Specific.(Z0)........058002028F000010 Part Number.................04N5272 EC Level....................F74471 FRU Number..................04N5967
The firmware level of this DVD-RAM drive is "A132". This level is too low in order to be able to burn Type II discs. Check the website for the latest level. The description on this webpage on how to install the DVD-RAM firmware was found to be inaccurate. Install firmware as follows:
Download the firmware file and place it in /tmp on the server. You will get a filename with a "rpm" extension. Run:
Example:

Burning a DVD-RAM can take a long time. Variable burn times from 1 to 7 hours were seen!!! A DVD-RAM made in a slim-line DVD drive on a Power5 system can be read in a regular DVD drive on a Power4 system, if the latest firmware is installed on the DVD drive. On a Linux system you can use a tool like K3B to write the ISO image to a regular DVD+R disc. TOPICS: HARDWARE, INSTALLATION
Power 5 color codes

Ever noticed the different colors on parts of Power5 systems? Some parts are orange, others are blue. Orange means: you can touch it, open it, remove it, even if the system is running. Blue means: don't touch it if the system is active. TOPICS: INSTALLATION, SDD, STORAGE AREA NETWORK
SDD upgrade from 1.6.X to 1.7.X

Whenever you need to perform an upgrade of SDD (and it is wise to keep it up-to-date), make sure you check the SDD documentation before doing this. Here's the quick steps to perform to do the updates. Check for any entries in the errorlog that could interfere with the upgrades:
# errpt -a | more
Check if previously installed packages are OK:

# lppchk -v
Commit any previously installed packages:

# installp -c all
Make sure to have a recent mksysb image of the server and before starting the updates to the rootvg, do an incremental TSM backup. Also a good idea is to prepare the alt_disk_install on the second boot disk.
For HACMP nodes: check the cluster status and log files to make sure the cluster is stable and ready for the upgrades. Update fileset devices.fcp.disk.ibm to the latest level using smitty update_all. For ESS environments: Update host attachment script ibm2105 andibmpfe.essutil to the latest available levels using smitty update_all.
Enter the lspv command to find out all the SDD volume groups. Enter the lsvgfs command for each SDD volume group to find out which file systems are mounted, e.g.:
# lsvgfs vg_name
Enter the umount command to unmount all file systems belonging to the SDD volume groups. Enter the varyoffvg command to vary off the volume groups. If you are upgrading to an SDD version earlier than 1.6.0.0; or if you are upgrading to SDD 1.6.0.0 or later and your host is in a HACMP environment with nonconcurrent volume groups that are varied-on on other host, that is, reserved by other host, run the vp2hd volume_group_name script to convert the volume group from the SDD vpath devices to supported storage hdisk devices. Otherwise, you skip this step.
Stop the SDD server:

# stopsrc -s sddsrv
Remove all the SDD vpath devices:

# rmdev -dl dpo -R
Use the smitty command to uninstall the SDD. Enter smitty deinstall and press Enter. The uninstallation process begins. Complete the uninstallation process. If you need to upgrade the AIX operating system, you could perform the upgrade now. If required, reboot the system after the operating system upgrade. Use the smitty command to install the newer version of the SDD. Note: it is also possible to do smitty update_all to simply update the SDD fileset, without first uninstalling it; but IBM recommends doing an uninstall first, then patch the OS, and then do an install of the SDD fileset.
Use the smitty device command to configure all the SDD vpath devices to theAvailable state. Enter the lsvpcfg command to verify the SDD configuration. If you are upgrading to an SDD version earlier than 1.6.0.0, run the hd2vp volume_group_name script for each SDD volume group to convert the physical volumes from supported storage hdisk devices back to the SDD vpath devices.
Enter the varyonvg command for each volume group that was previously varied offline. Enter the lspv command to verify that all physical volumes of the SDD volume groups are SDD vpath devices. Check for any errors:
# errpt | more # lppchk -v # errclear 0
Enter the mount command to mount all file systems that were unmounted. Attention: If the physical volumes on an SDD volume groups physical volumes are mixed with hdisk devices and SDD vpath devices, you must run the dpovgfix utility to fix this problem. Otherwise, SDD will not function properly:
# dpovgfix vg_name
TOPICS: EMC, INSTALLATION, STORAGE, STORAGE AREA NETWORK
EMC and MPIO

You can run into an issue with EMC storage on AIX systems using MPIO (No Powerpath) for your boot disks: After installing the ODM_DEFINITONS of EMC Symmetrix on your client system, the system won't boot any more and will hang with LED 554 (unable to find boot disk). The boot hang (LED 554) is not caused by the EMC ODM package itself, but by the boot process not detecting a path to the boot disk if the first MPIO path does not corresponding to the fscsiX driver instance where all hdisks are configured. Let me explain that more in detail: Let's say we have an AIX system with four HBAs configured in the following order:
# lscfg -v | grep fcs fcs2 (wwn 71ca) -> no devices configured behind this fscsi2 driver instance (path only configured in CuPath ODM table) fcs3 (wwn 71cb) -> no devices configured behind this fscsi3 driver instance (path only configured in CuPath ODM table) fcs0 (wwn 71e4) -> no devices configured behind this fscsi0 driver instance
(path only configured in CuPath ODM table) fcs1 (wwn 71e5) -> ALL devices configured behind this fscsi1 driver instance
Looking at the MPIO path configuration, here is what we have for the rootvg disk:
# lspath -l hdisk2 -H -F"name parent path_id connection status" name parent path_id connection status
hdisk2 fscsi0 0 hdisk2 fscsi1 1 hdisk2 fscsi2 2 hdisk2 fscsi3 3
5006048452a83987,33000000000000 Enabled 5006048c52a83998,33000000000000 Enabled 5006048452a83986,33000000000000 Enabled 5006048c52a83999,33000000000000 Enabled
The fscsi1 driver instance is the second path (pathid 1), then remove the 3 paths keeping only the path corresponding to fscsi1 :
# rmpath -l hdisk2 -p fscsi0 -d # rmpath -l hdisk2 -p fscsi2 -d # rmpath -l hdisk2 -p fscsi3 -d # lspath -l hdisk2 -H -F"name parent path_id connection status"
Afterwards, do a savebase to update the boot lv hd5. Set up the bootlist to hdisk2 and reboot the host. It will come up successfully, no more hang LED 554. When checking the status of the rootvg disk, a new hdisk10 has been configured with the correct ODM definitions as shown below:
# lspv hdisk10 0003027f7f7ca7e2 rootvg active # lsdev -Cc disk hdisk2 Defined 00-09-01 MPIO Other FC SCSI Disk Drive
hdisk10 Available 00-08-01 EMC Symmetrix FCP MPIO Raid6
To summarize, it is recommended to setup ONLY ONE path when installing an AIX to a SAN disk, then install the EMC ODM package then reboot the host and only after that is complete, add the other paths. Dy doing that we ensure that the fscsiX driver instance used for the boot process has the hdisk configured behind. TOPICS: EMC, INSTALLATION, ODM, STORAGE, STORAGE AREA NETWORK
How to cleanup AIX EMC ODM definitions

From powerlink.emc.com: 1. Before making any changes, collect host logs to document the current configuration. At a minimum, save the following: inq, lsdev -Cc disk, lsdev -Cc adapter, lspv, and lsvg 2. Shutdown the application(s), unmount the file system(s), and varyoff all volume groups except for rootvg. Do not export the volume groups.
# varyoffvg <vg_name>
Check with lsvg -o (confirm that only rootvg is varied on) If no PowerPath, skip all steps with power names. 3. For CLARiiON configuration, if Navisphere Agent is running, stop it:
# /etc/rc.agent stop
4.
Remove paths from Powerpath configuration:

# powermt remove hba=all
5.
Delete all hdiskpower devices:

# lsdev -Cc disk -Fname | grep power | xargs -n1 rmdev -dl
6.
Remove the PowerPath driver instance:

# rmdev -dl powerpath0
7.
Delete all hdisk devices: For Symmetrix devices, use this command:
# lsdev -CtSYMM* -Fname | xargs -n1 rmdev -dl
For CLARiiON devices, use this command:

# lsdev -CtCLAR* -Fname | xargs -n1 rmdev -dl
8. 9.
Confirm with lsdev -Cc disk that there are no EMC hdisks or hdiskpowers. Remove all Fiber driver instances:
# rmdev -Rdl fscsiX
(X being driver instance number, i.e. 0,1,2, etc.) 10. 11. Verify through lsdev -Cc driver that there are no more fiber driver instances (fscsi). Change the adapter instances in Defined state
# rmdev -l fcsX
(X being adapter instance number, i.e. 0,1,2, etc.) 12. Create the hdisk entries for all EMC devices:
# emc_cfgmgr
or
# cfgmgr -vl fcsx
(x being each adapter instance which was rebuilt). Skip this part if no PowerPath. 13. Configure all EMC devices into PowerPath:
# powermt config
14.
Check the system to see if it now displays correctly:

# powermt display # powermt display dev=all # lsdev -Cc disk # /etc/rc.agent start
TOPICS: HARDWARE, INSTALLATION, NETWORKING

Port HMC1: 192.168.2.147/24 Port HMC2: 192.168.3.147/24

Second Service Processor: Port HMC1: 169.254.2.146/24 Port HMC2: 169.254.3.146/24
Link: System p Operations Guide for ASMI and for Nonpartitioned Systems. TOPICS: HMC, INSTALLATION
HMC network update to v7.3.5

If you decide to update to HMC release 7.3.5, Fix Central only supplies you the ISO images. This procedure describes how you can update your HMC using the network without having to sit physically in front of the console. First, check if this new HMC level is supported by the firmware levels of your supported systems using this link. If you're certain you can upgrade to V7.3.5, then make sure to download all the required mandatory fixes from IBM Fix Central. Don't download the actual base level of HMV v7.3.5 of 3 Gigabytes. We'll download that directly to the HMC later on. Then, perform the "Save upgrade data" task using either the Web interface or the command line. Then get the required files from the IBM server using ftp protocol using the following command:
# getupgfiles -h ftp.software.ibm.com -u anonymous --passwd ftp -d /software/server/hmc/network/v7350
Hint: Once this procedure gets interrupted for any reason, you need to reboot your HMC before restarting it. Otherwise, some files will remain in the local download directory which will lead to incorrect checksums. You can check the progress of the procedure using the command ls -l /hmcdump in a different terminal. Once it has finished, you will see a prompt without any additional messages and the directory will be empty (the files will be copied to a different location). Then tell the HMC to boot from an alternate media by issuing the following command:
# chhmc -c altdiskboot -s enable --mode upgrade
Finally reboot your HMC with the following command from the console:
The installation should start automatically with the reboot. Once it has finished you should be able to login from remote again. The whole procedure takes up to one hour. Once you have finished you should add in any case the mandatory efixes for HMC 7.3.5 as ISO images. You can update the HMC with these fixes through the HMC. For more information, please visit this page. TOPICS: INSTALLATION, LINUX
Enable the FTP server

Run to install:
#yum install pure-ftpd
Have it start at system reboot:

# chkconfig pure-ftpd on
Start it:
# /etc/init.d/pure-ftpd start
Enable UnixAuthentication in /etc/pure-ftpd/pure-ftpd.conf. Restart the FTP server:

# /etc/init.d/pure-ftpd restart
Installing MySQL, PHP and Apache

Install the software:
# yum -y install mysql mysql-server php-mysql httpd php phpmyadmin
Create link from the DirectoryRoot of the webserver to PhpMyAdmin:

# cd /var/www/html # ln -s /usr/share/phpMyAdmin
Make sure the services are started at boot time:

# chkconfig httpd on # chkconfig --add mysqld # chkconfig mysqld on
# service httpd start # service mysqld start
Set the root password for mysql:

# mysqladmin -u root password root
Make additional security-related changes to mysql:

# mysql -u root -p mysql> DROP DATABASE test; [removes the test database] mysql> DELETE FROM mysql.user WHERE user = ''; [Removes anonymous access]
Following the above steps, the document root for Apache is /var/www/html/. Create a test PHP script (such as phpinfo.php) and place it in the document root. A useful test script sample:
<?php phpinfo(); ?>
Test with a brower: http://hostname/phpinfo.php Create a database:

mysql> create database testdb
Add the following to /etc/httpd/conf/httpd.conf:

<Directory "/usr/share/phpMyAdmin"> Order allow,deny Allow from all </Directory>
Set the ServerName entry to hostname:80 Add "index.php" to the DirectoryIndex entry, so the webserver also recognizes index.php as an index file. Restart the http server:
# service httpd restart
Test with a browser: http://hostname/phpMyAdmin/ TOPICS: INSTALLATION, LINUX
Yum
Use yum to update your system (for Fedora or Red Hat):
# yum update
will update all the packages on your system. If you need to configure a proxy server for yum to use, then just add the following line to /etc/yum.conf:
proxy=http://address-of-proxyserver.org:80
Of course set it to your correct proxy server and port number. TOPICS: INSTALLATION, LINUX
Extra's in Ubuntu
Even though Ubuntu is very complete initially, you might want extra's, like a media player. You can use a script to do this for you:
# wget http://download.ubuntuforums.org/ubuntusetup/ubuntusetup.sh # sudo sh ubuntusetup.sh
APT
APT is short for Advanced Packaging Tool and is found in Debian distributions of Linux, like Ubuntu. To be able to use APT, you first need to download a list of available software:
# sudo apt-get update
To download the webbrowser Firefox, type:

# sudo apt-get install mozilla-firefox
With the command

# sudo apt-cache search [search-term]
you can search for a specific program. To upgrade your complete system:
# sudo apt-get upgrade
Besides the command-line, you can also use the graphical user interface Synaptic. TOPICS: AIX, INSTALLATION, NIM






Check with:
Rebuild the .toc:


Check the SPOT:

A small note when you're using AIX 7 / AIX 6.1: Significant changes have been made in AIX 7 and AIX 6.1 that add new support for NIM. In particular there is now the capability to use the loopmount command to mount iso images into filesystems. As an example:
The above mounts the AIX 7 base iso as a filesystem called /aix. You can now create an lpp_source or spot from the iso or you can simply read the files.
TOPICS: HMC, LOGICAL PARTITIONING
Useful HMC commands

Here are some very useful commands for the Hardware Management Console (HMC): Show vital product data, such as the serial number:
# lshmc -v
Show the release of the HMC:

# lshmc -V
Show network information of the HMC:

# lshmc -n
Reboot the HMC:

Show the connected managed systems:

# lssysconn -r all
Change the password of user hscpe:

# chhmcusr -u hscpe -t passwd -v abc1234
List the users of the HMC:

# lshmcusr
These are intersting log files of the HMC:

# ls -al /var/hsc/log/hmclogger.log # ls -al /var/hsc/log/cimserver.log
Monitor the disk space:

# monhmc -r disk
This can be used to view the file systems of the HMC. Try using "proc", "mem" and "swap as well. By default this command will loop forever and update the screen every 4 seconds. You can run it only once, with the following command:
# monhmc -r disk -n 0
Zero out log files:

# chhmcfs -o f -d 0
This will delete any temporary files. Extremely useful if the HMC calls home to IBM about high usage of one of its file systems. Open a virtual console from the HMC:
# vtmenu
Exit by typing "~." (tilde dot) or "~~." (tilde tilde dot). Force the closure of a virtual terminal session:
# rmvterm -m SYSTEM-9117-570-SN10XXXXX -p name
Change the state of a partition:

# chsysstate -m SYSTEM-9131-52A-SN10XXXXX -r lpar -o on -n name -f default_profile # chsysstate -m SYSTEM-9131-52A-SN10XXXXX -r lpar -o shutdown -n name --immed
To start all partitions of one managed server:
# chsysstate -m Prd2-Server-8233-E8B-SN0XXXXXX -r lpar -o on --all
List partition profiles for a managed system:

# lssyscfg -r prof -m SYSTEM-9117-570-SN10XXXXX
List partition information:

# lspartition
TOPICS: AIX, INSTALLATION, LOGICAL PARTITIONING, VIRTUALIZATION

The most popular innovation of IBM AIX Version 6.1 is clearly workload partitioning (WPARs). Once you get past the marketing hype, you'll need to determine the value that WPARs can provide in your environment. What can WPARs do that Logical Partitions (LPARs) could not? How and when should you use WPARs? Equally as important, when should you not use Workload Partitioning. Finally, how do you create, configure, and administer workload partitions? For a very good introduction to WPARs, please refer to the following article:https://www.ibm.com/developerworks/aix/library/au-wpar61aix/ or download the PDF version here. This article describes the differences between system and application WPARs, the various commands available, such as mkwpar, lswpar, startwpar and clogin. It also describes how to create and manage file systems and users, and it discusses the WPAR manager. It ends with an excellent list of references for further reading. TOPICS: LOGICAL PARTITIONING, VIRTUAL I/O SERVER, VIRTUALIZATION
Introduction to VIO
Prior to the introduction of POWER5 systems, it was only possible to create as many separate logical partitions (LPARs) on an IBM system as there were physical processors. Given that the largest IBM eServer pSeries POWER4 server, the p690, had 32 processors, 32 partitions were the most anyone could create. A customer could order a system with enough physical disks and network adapter cards, so that each LPAR would have enough disks to contain operating systems and enough network cards to allow users to communicate with each partition. The Advanced POWER Virtualization feature of POWER5 platforms, makes it possible to allocate fractions of a physical CPU to a POWER5 LPAR. Using virtual CPU's and virtual I/O, a user can create many more LPARs on a p5 system than there are CPU's or I/O slots. The Advanced POWER Virtualization feature accounts for this by allowing users to create shared network adapters and virtual SCSI disks. Customers can use these virtual resources to provide disk space and network adapters for each LPAR they create on their POWER5 system.
There are three components of the Advanced POWER Virtualization feature: MicroPartitioning, shared Ethernet adapters, and virtual SCSI. In addition, AIX 5L Version 5.3 allows users to define virtual Ethernet adapters permitting inter-LPAR communication. Micro-Partitioning An element of the IBM POWER Virtualization feature called Micro-Partitioning can divide a single processor into many different processors. In POWER4 systems, each physical processor is dedicated to an LPAR. This concept of dedicated processors is still present in POWER5 systems, but so is the concept of shared processors. A POWER5 system administrator can use the Hardware Management Console (HMC) to place processors in a shared processor pool. Using the HMC, the administrator can assign fractions of a CPU to individual partitions. If one LPAR is defined to use processors in the shared processor pool, when those CPUs are idle, the POWER Hypervisor makes them available to other partitions. This ensures that these processing resources are not wasted. Also, the ability to assign fractions of a CPU to a partition means it is possible to partition POWER5 servers into many different partitions. Allocation of physical processor and memory resources on POWER5 systems is managed by a system firmware component called the POWER Hypervisor. Virtual Networking Virtual networking on POWER5 hardware consists of two main capabilities. One capability is provided by a software IEEE 802.1q (VLAN) switch that is implemented in the Hypervisor on POWER5 hardware. Users can use the HMC to add Virtual Ethernet adapters to their partition definitions. Once these are added and the partitions booted, the new adapters can be configured just like real physical adapters, and the partitions can communicate with each other without having to connect cables between the LPARs. Users can separate traffic from different VLANs by assigning different VLAN IDs to each virtual Ethernet adapter. Each AIX 5.3 partition can support up to 256 Virtual Ethernet adapters. In addition, a part of the Advanced POWER virtualization virtual networking feature allows users to share physical adapters between logical partitions. These shared adapters, called
Shared Ethernet Adapters (SEAs), are managed by a Virtual I/O Server partition which maps physical adapters under its control to virtual adapters. It is possible to map many physical Ethernet adapters to a single virtual Ethernet adapter, thereby eliminating a single physical adapter as a point of failure in the architecture. There are a few things users of virtual networking need to consider before implementing it. First, virtual networking ultimately uses more CPU cycles on the POWER5 machine than when physical adapters are assigned to a partition. Users should consider assigning a physical adapter directly to a partition when heavy network traffic is predicted over a certain adapter. Secondly, users may want to take advantage of larger MTU sizes that virtual Ethernet allows, if they know that their applications will benefit from the reduced fragmentation and better performance that larger MTU sizes offer. The MTU size limit for SEA is smaller than Virtual Ethernet adapters, so users will have to carefully choose an MTU size so that packets are sent to external networks with minimum fragmentation. Virtual SCSI The Advanced POWER Virtualization feature called virtual SCSI allows access to physical disk devices which are assigned to the Virtual I/O Server (VIOS). The system administrator uses VIOS logical volume manager commands to assign disks to volume groups. The administrator creates logical volumes in the Virtual I/O Server volume groups. Either these logical volumes or the physical disks themselves may ultimately appear as physical disks (hdisks) to the Virtual I/O Server's client partitions, once they are associated with virtual SCSI host adapters. While the Virtual I/O Server software is packaged as an additional software bundle that a user purchases separately from the AIX 5.3 distribution, the virtual I/O client software is a part of the AIX 5.3 base installation media, so an administrator does not need to install any additional filesets on a Virtual SCSI client partition. TOPICS: AIX, HARDWARE, LOGICAL PARTITIONING

An adapter that has previously been added to a LPAR and now needs to be removed, usually doesn't want to be removed from the LPAR, because it is in use by the LPAR. Here's how you find and remove the involved devices on the LPAR: First, run:
# lsslot -c pci
For example:
# rmdev -Rl pci8 -d
Now you should be able to remove the adapter via the HMC from the LPAR. If you need to replace the adapter because it is broken and needs to be replaced, then you need to power down the PCI slot in which the adapter is placed: After issuing the "rmdev" command, run diag and go into "Task Selection", "Hot Plug Task", "PCI Hot Plug Manager", "Replace/Remove a PCI Hot Plug Adapter". Select the adapter and choose "remove". After the adapter has been replaced (usually by an IBM technician), run cfgmgr again to make the adapter known to the LPAR. TOPICS: HMC, LOGICAL PARTITIONING, VIRTUALIZATION

# lssycfg -r lpar



This command will show the LED code. TOPICS: AIX, HMC, LOGICAL PARTITIONING
Lpar tips
The uname -Ls command will show you the partition number and the partition (lpar) name. When setting the resource allocation for a partition profile, set the minimum to the absolute bare minimum, and set the maximum as high as possible. For memory there are special considerations: If you set the maximum too low and you wish to exceed above the maximum amount of memory defined in the active profile, you can't simply adjust the profile and put extra memory in via DLPAR, because the LPAR has been initialized with a certain page table size, based on the maximum amount of memory setting. Therefore, a reboot
will be required when you wish to use more memory than defined in the active profile. If you do try it however, you'll receive the following error:
If you set the maximum too high, the partition will be initialize with a large page table size, which uses too much memory for overhead, which you might never use.
LVM command history

# alog -o -t lvmcfg


This will create a copy of logical volume "lvname" to a file "lvname.dd" in file system /file/system. Make sure that wherever you write your output file to (in the example above to /file/system) has enough disk space available to hold a full copy of the logical volume. If the logical volume is 100 GB, you'll need 100 GB of file system space for the copy. If you want to test how this works, you can create a logical volume with a file system on top of it, and create some files in that file system. Then unmount he filesystem, and use dd to copy the logical volume as described above. Then, throw away the file system using "rmfs -r", and after that has been completed, recreate the logical volume and the file system. If you now mount the file system, you will see, that it is empty. Unmount the file system, and use the following dd command to restore your backup copy:
Then, mount the file system again, and you will see that the contents of the file system (the files you've placed in it) are back. TOPICS: AIX, LVM, SYSTEM ADMINISTRATION

Getting disk devices named the same way on, for example, 2 nodes of a PowerHA cluster, can be really difficult. For us humans though, it's very useful to have the disks named the same way on all nodes, so we can recognize the disks a lot faster, and don't have to worry about picking the wrong disk. The way to get around this usually involved either creating dummy disk devices or running configuration manager on a specific adapter, like: cfgmgr -vl fcs0. This complicated procedure is not needed any more since AIX 7.1 and AIX 6.1 TL6, because a new command has been made available, called rendev, which is very easy to use for renaming devices:

The VG type, commonly known as standard or normal, allows a maximum of 32 physical volumes (PVs). A standard or normal VG is no more than 1016 physical partitions (PPs) per PV and has an upper limit of 256 logical volumes (LVs) per VG. Subsequently, a new VG type was introduced which was referred to as big VG. A big VG allows up to 128 PVs and a maximum of 512 LVs. AIX 5L Version 5.3 has introduced a new VG type called scalable volume group (scalable VG). A scalable VG allows a maximum of 1024 PVs and 4096 LVs. The maximum number of PPs applies to the entire VG and is no longer defined on a per disk basis. This opens up the prospect of configuring VGs with a relatively small number of disks and fine-grained storage allocation options through a large number of PPs, which are small in size. The scalable VG can hold up to 2,097,152 (2048 K) PPs. As with the older VG types, the size is specified in units of megabytes and the size variable must be equal to a power of 2. The range of PP sizes starts at 1 (1 MB) and goes up to 131,072 (128 GB). This is more than two orders of magnitude above the 1024 (1 GB), which is the maximum for both normal and big VG types in AIX 5L Version 5.2. The new maximum PP size provides an architectural support for 256 petabyte disks. The table below shows the variation of configuration limits with different VG types. Note that
the maximum number of user definable LVs is given by the maximum number of LVs per VG minus 1 because one LV is reserved for system use. Consequently, system administrators can configure 255 LVs in normal VGs, 511 in big VGs, and 4095 in scalable VGs. VG type Normal VG Big VG Scalable VG Max PVs 32 128 1024 Max LVs 256 512 4096 Max PPs per VG 32,512 (1016 * 32) 130,048 (1016 * 128) 2,097,152 Max PP size 1 GB 1 GB 128 GB
The scalable VG implementation in AIX 5L Version 5.3 provides configuration flexibility with respect to the number of PVs and LVs that can be accommodated by a given instance of the new VG type. The configuration options allow any scalable VG to contain 32, 64, 128, 256, 512, 768, or 1024 disks and 256, 512, 1024, 2048, or 4096 LVs. You do not need to configure the maximum values of 1024 PVs and 4096 LVs at the time of VG creation to account for potential future growth. You can always increase the initial settings at a later date as required. The System Management Interface Tool (SMIT) and the Web-based System Manager graphical user interface fully support the scalable VG. Existing SMIT panels, which are related to VG management tasks, have been changed and many new panels added to account for the scalable VG type. For example, you can use the new SMIT fast path _mksvg to directly access the Add a Scalable VG SMIT menu. The user commands mkvg, chvg, and lsvg have been enhanced in support of the scalable VG type. For more information: http://www.ibm.com/developerworks/aix/library/au-aix5l-lvm.html. TOPICS: AIX, LVM, SYSTEM ADMINISTRATION

# ipl_varyon -i

# bosboot -ad /dev/hdisk2
bosboot: Boot image is 38224 512 byte blocks. # bosboot -ad /dev/hdisk3 bosboot: Boot image is 38224 512 byte blocks.



When you run the mirrorvg command, you will (by default) lock the volume group it is run against. This way, you have no way of knowing what the status is of the sync process that occurs after mirrorvg has run the mklvcopy commands for all the logical volumes in the volume group. Especially with very large volume groups, this can be a problem. The solution however is easy: Make sure to run the mirrorvg command with the -s option, to prevent it to run the sync. Then, when mirrorvg has completed, run the syncvg yourself with the -P option. For example, if you wish to mirror the rootvg from hdisk0 to hdisk1:


# lsfs /opt
Name
Nodename Mount Pt VFS /opt jfs2
Size
Options Auto Accounting yes no
/dev/hd10opt --
4194304 --
# getlvcb -AT hd10opt AIX LVCB intrapolicy = c copies = 2 interpolicy = m lvid = 00f69a1100004c000000012f9dca819a.9 lvname = hd10opt label = /opt machine id = 69A114C00 number lps = 8 relocatable = y strict = y stripe width = 0 stripe size in exponent = 0 type = jfs2 upperbound = 32 fs = vfs=jfs2:log=/dev/hd8:vol=/opt:free=false:quota=no time created = Thu Apr 28 20:26:36 2011
You can clearly see the "time created" for this file system in the example above. TOPICS: AIX, LVM, SYSTEM ADMINISTRATION

# errpt # exportvg testvg # importvg -y testvg vpath51 testvg # ls -als /dev/*testlv 0 crw-rw---- 1 root system 57, 3 Mar 10 15:11 /dev/rtestlv 0 brw-rw---- 1 root system 57, 3 Mar 10 15:11 /dev/testlv
# chlv -U root -G system -P 660 testlv # ls -als /dev/*testlv 0 crw-rw---- 1 root system 57, 3 Mar 10 15:14 /dev/rtestlv
0 brw-rw---- 1 root system 57, 3 Mar 10 15:14 /dev/testlv # chown user.staff /dev/testlv /dev/rtestlv # chmod 777 /dev/testlv /dev/rtestlv # ls -als /dev/*testlv 0 crwxrwxrwx 1 user staff 57, 3 Mar 10 15:14 /dev/rtestlv 0 brwxrwxrwx 1 user staff 57, 3 Mar 10 15:14 /dev/testlv # readvgda vpath51 | egrep "lvname|dev_|Logical" lvname: dev_uid: dev_gid: dev_perm: testlv (i=2) 0 0 360
# varyoffvg testvg # exportvg testvg # importvg -Ry testvg vpath51 testvg # ls -als /dev/*testlv 0 crw-rw---- 1 root system 57, 3 Mar 10 15:23 /dev/rtestlv 0 brw-rw---- 1 root system 57, 3 Mar 10 15:23 /dev/testlv
So, when you have customized user/group/mode settings for logical volumes, and you need to export and import the volume group, always make sure to use the -R option when running importvg. Also, make sure never to use the chmod/chown/chgrp commands on logical volume block and character devices in /dev, but use the chlv command instead, to make sure the VGDA is updated accordingly. Note: A regular volume group does not store any customized owner/group/mode in the VGDA. It is only stored for Big or Scalable volume groups. In case you're using a regular volume group with customized owner/group/mode settings for logical volumes, you will have to use the chmod/chown/chgrp commands to update it, especially after exporting and reimporting the volume group. TOPICS: AIX, BACKUP & RESTORE, LVM, PERFORMANCE, STORAGE,SYSTEM ADMINISTRATION
Using lvmstat
One of the best tools to look at LVM usage is with lvmstat. It can report the bytes read and written to logical volumes. Using that information, you can determine which logical volumes are used the most.
Gathering LVM statistics is not enabled by default:

iocnt 306653 34 453
Kb_read 47493022 0 234543
Kb_wrtn 383822 3340 234343
Kbps 103.2 2.8 89.3
What are you looking at here? iocnt: Reports back the number of read and write requests. Kb_read: Reports back the total data (kilobytes) from your measured interval that is read. Kb_wrtn: Reports back the amount of data (kilobytes) from your measured interval that is written. Kbps: Reports back the amount of data transferred in kilobytes per second. You can use the -d option for lvmstat to disable the collection of LVM statistics. TOPICS: AIX, BACKUP & RESTORE, LVM, PERFORMANCE, STORAGE,SYSTEM ADMINISTRATION

# lspv -l vpath23
# lslv -m prodlv
# lslv prodlv
# chlv -u 32 prodlv

vpath304 prodlv

# syncvg -l prodlv

Then check again:

# lslv -m prodlv
You can also use the "-e x" option with the mklv command to create a new logical volume from the start with the correct spreading over disks. TOPICS: LINUX, LVM, STORAGE
Howto extend an ext3 filesystem in RHEL5

You can grow your ext3 filesystems while online: The functionality has been included in resize2fs so to resize a logical volume, start by extending the volume:
# lvextend -L +2G /dev/systemvg/homelv
And the resize the filesystem:

# resize2fs /dev/systemvg/homelv
By omitting the size argument, resize2fs defaults to using the available space in the partition/lv. TOPICS: AIX, LVM, SYSTEM ADMINISTRATION


/IsoCD:
TOPICS: LVM, POWERHA / HACMP, SYSTEM ADMINISTRATION
VGDA out of sync

With HACMP, you can run into the following error during a verification/synchronization: WARNING: The LVM time stamp for shared volume group: testvg is inconsistent with the time stamp in the VGDA for the following nodes: host01 To correct the above condition, run verification & synchronization with "Automatically correct errors found during verification?" set to either 'Yes' or 'Interactive'. The cluster must be down
for the corrective action to run. This can happen when you've added additional space to a logical volume/file system from the command line instead of using the smitty hacmp menu. But you certainly don't want to take down the entire HACMP cluster to solve this message. First of all, you don't. The cluster will fail-over nicely anyway, without these VGDA's being in sync. But, still, it is an annoying warning, that you would like to get rid off. Have a look at your shared logical volumes. By using the lsattr command, you can see if they are actually in sync or not:
Well, there you have it. One host reports testlv having a size of 806 LPs, the other says it's 809. Not good. You will run into this when you've used the extendlv and chfs commands to increase the size of a shared file system. You should have used the smitty menu. The good thing is, HACMP will sync the VGDA's if you do some kind of logical volume operation through the smitty hacmp menu. So, either increase the size of a shared logical volume through the smitty menu with just one LP (and of course, also increase the size of the corresponding file system); Or, you can create an additional shared logical volume through smitty of just one LP, and then remove it again afterwards. When you've done that, simply re-run the verification/synchronization, and you'll notice that the warning message is gone. Make sure you run the lsattr command again on your shared logical volumes on all the nodes in your cluster to confirm. TOPICS: AIX, LVM, ODM


Then, remove the device entry of the logical volume in the /dev directory (if present at all).
Sudosh
Sudosh is designed specifically to be used in conjunction with sudo or by itself as a login shell. Sudosh allows the execution of a root or user shell with logging. Every command the user types within the root shell is logged as well as the output. This is different from "sudo -s" or "sudo /bin/sh", because when you use one of these instead of sudosh to start a new shell, then this new shell does not log commands typed in the new shell to syslog; only the fact that a new shell started is logged. If this newly started shell supports commandline history, then you can still find the commands called in the shell in a file such as .sh_history, but if you use a shell such as csh that does not support command-line logging you are out of luck. Sudosh fills this gap. No matter what shell you use, all of the command lines are logged to syslog (including vi keystrokes). In fact, sudosh uses the script command to log all key strokes and output. Setting up sudosh is fairly easy. For a Linux system, first download the RPM of sudosh, for example from rpm.pbone.net. Then install it on your Linux server:
Now, as a sys admin, you can view the log files created in /var/log/sudosh, but it is much cooler to use the sudosh-replay command to replay (like a VCR) the actual session, as run by the user with the sudosh access.
First, run sudosh-replay without any paramaters, to get a list of sessions that took place using sudosh:
09/16/2010 6s
The first paramtere is the session-ID, the second parameter is the multiplier. Use a higher value for multiplier to speed up the replay, while "1" is the actual speed. And the third parameter is the max-wait. Where there might have been wait times in the actual session, this parameter restricts to wait for a maximum max-wait seconds, in the example above, 5 seconds. For AIX, you can find the necessary RPM here. It is slightly different, because it installs in /opt/freeware/bin, and also the sudosh.conf is located in this directory. Both Linux and AIX require of course sudo to be installed, before you can install and use sudosh. TOPICS: MONITORING, POWERHA / HACMP
Cluster status webpage

How do you monitor multiple HACMP clusters? You're probably familiar with the clstat or the xclstat commands. These are nice, but not sufficient when you have more than 8 HACMP clusters to monitor, as it can't be configured to monitor more than 8 clusters. It's also difficult to get an overview of ALL clusters in a SINGLE look with clstat. IBM included a clstat.cgi in HACMP 5 to show the cluster status on a webpage. This still doesn't provide an overview in a single look, as the clstat.cgi shows a long listing of all clusters, and it is just like clstat limited to monitoring just 8 clusters. HACMP cluster status can be retrieved via SNMP (this is actually what clstat does too). Using the IP addresses of a cluster and the snmpinfo command, you can remotely retrieve cluster status information, and use that information to build a webpage. By using colors for the status of the clusters and the nodes (green = ok, yellow = something is happening, red = error), you can get a quick overview of the status of all the HACMP clusters.
Per cluster you can see: the cluster name, the cluster ID, HACMP version and the status of the cluster and all its nodes. It will also show you where any resource groups are active. You can download the script here. Untar the file. There is a readme in the package, that will tell you how you can configure the script. This script has been tested with HACMP version 4 and 5, up to version 5.5.0.5. TOPICS: AIX, MONITORING, SYSTEM ADMINISTRATION
Cec Monitor
# topas -C
TOPICS: MONITORING, POWERHA / HACMP
HACMP auto-verification
HACMP automatically runs a verification every night, usually around mid-night. With a very simple command you can check the status of this verification run:
# tail -10 /var/hacmp/log/clutils.log 2>/dev/null|grep detected|tail -1
If this shows a returncode of 0, the cluster verification ran without any errors. Anything else, you'll have to investigate. You can use this command on all your HACMP clusters, allowing you to verify your HACMP cluster status every day. With the following smitty menu you can change the time when the auto-verification runs and if it should produce debug output or not:
# smitty clautover.dialog
You can check with:

# odmget HACMPcluster # odmget HACMPtimersvc
Be aware that if you change the runtime of the auto-verification that you have to synchronize the cluster afterwards to update the other nodes in the cluster. TOPICS: MONITORING, POWERHA / HACMP
HACMP Event generation
HACMP provides events, which can be used to most accurately monitor the cluster status, for example via the Tivoli Enterprise Console. Each change in the cluster status is the result of an HACMP event. Each HACMP event has an accompanying notify method that can be used to handle the kind of notification we want. Interesting Cluster Events to monitor are: node_up node_down network_up network_down join_standby fail_standby swap_adapter config_too_long event_error You can set the notify method via:
# smitty hacmp Cluster Configuration Cluster Resources Cluster Events Change/Show Cluster Events
You can also query the ODM:

# odmget HACMPevent

Then it means that you have the bootpd enabled on your server. There's nothing wrong with that. In fact, a NIM server for example requires you to have this enabled. However; these messages on the console can be annoying. There are systems on your network that are sending bootp requests (broadcast). Your system is listening to these requests and trying to answer. It is looking in the bootptab configuration (file /etc/bootptab) to see if their macaddresses are defined. When they aren't, you are getting these messages. To solve this, either disable the bootpd daemon, or change the syslog configuration. If you
don't need the bootpd daemon, then edit the /etc/inetd.conf file and comment the entry for bootps. Then run:
# refresh -s inetd
TOPICS: AIX, BACKUP & RESTORE, LINUX, MONITORING, TSM
Report the end result of a TSM backup

A very easy way of getting a report from a backup is by using the POSTSchedulecmd entry in the dsm.sys file. Add the following entry to your dsm.sys file (which is usually located in /usr/tivoli/tsm/client/ba/bin or /opt/tivoli/tsm/client/ba/bin):
POSTSchedulecmd "/usr/local/bin/RunTsmReport"
This entry tells the TSM client to run script /usr/local/bin/RunTSMReport, as soon as it has completed its scheduled command. Now all you need is a script that creates a report from the dsmsched.log file, the file that is written to by the TSM scheduler:
#!/bin/bash TSMLOG=/tmp/dsmsched.log WRKDIR=/tmp echo "TSM Report from `hostname`" >> ${WRKDIR}/tsmc tail -100 ${TSMLOG} > ${WRKDIR}/tsma grep -n "Elapsed processing time:" ${WRKDIR}/tsma > ${WRKDIR}/tsmb CT2=`cat ${WRKDIR}/tsmb | awk -F":" '{print $1}'` ((CT3 = $CT2 - 14)) ((CT5 = $CT2 + 1 )) CT4=1 while read Line1 ; do if [ ${CT3} -gt ${CT4} ] ; then ((CT4 = ${CT4} + 1 )) else echo "${Line1}" >> ${WRKDIR}/tsmc ((CT4 = ${CT4} + 1 )) if [ ${CT4} -gt ${CT5} ] ; then break fi fi
done < ${WRKDIR}/tsma mail -s "`hostname` Backup" email@address.com < ${WRKDIR}/tsmc rm ${WRKDIR}/tsma ${WRKDIR}/tsmb ${WRKDIR}/tsmc
TOPICS: MONITORING, POWERHA / HACMP, SECURITY

If the command hangs, something is wrong. Check the changes you made. If everything works fine, perform the same change in the other node and test again. Now you can test from one server to the other using the snmpinfo command above.
If you need to backout, replace with the original configuration file and restart subsystems. Note in this case we use double-quotes. There is no space.
Wait a few minutes and you should be able to use clstat again with the new community name. Disclaimer: If you have any other application other than clinfoES that uses snmpd with the default community name, you should make changes to it as well. Check with your application team or software vendor. TOPICS: AIX, MONITORING
Removing error report entries forever

There's a way to avoid certain entries appearing in the error report indefinitely. You can use this for example for tape cleaning messages: The following command shows you the entries that are written to the error log, but not reported on:
# errpt -t -F Report=0
Let's say you don't want any reports on errors with ID D1A1AE6F:
# errupdate [Enter] =D1A1AE6F: [Enter] Report=False [Enter] [Ctrl-D] [Ctrl-D]
With "Report=False", errors are still logged in your logfile (usually /var/adm/ras/errlog). If you don't want them to be logged to the error log, for example when you have an errnotify (which
still starts an action, also for error ID's with "Report=False"), you can change "Report=False" to "Log=False". More info on this subject can be found here. TOPICS: AIX, HARDWARE, MONITORING
TOPICS: WEBSPHERE
WebSphere MQ links
A number of external links, related to WebSphere MQ: Official IBM sites: WebSphere MQ Family (formerly - MQSeries family) WebSphere Trial downloads Related links: MQ Software - Q Pasa! monitoring software MQ Solutions UK TOPICS: WEBSPHERE
WebSphere MQ introduction
WebSphere MQ, previously known as MQ Series, is a tool to transfer message and data from one system to another, a means of program-to-program communication. Basically, one program puts a message in a queue and the other program reads from the queue. This can be synchronous or asynchronous. It's time independant, communicating applications do not have to be active at the same time. MQSeries runs on a variety of platforms. The MQSeries products enable programs to communicate with each other across a network of unlike components, such as processors, subsystems, operating systems and communication protocols. MQSeries programs use a consistent application program interface (API) across all platforms.
The figure shows the main parts of an MQSeries application at run time. Programs use MQSeries API calls, that is the Message Queue Interface (MQI), to communicate with a queue manager (MQM), the run-time program of MQSeries. For the queue manager to do its work, it refers to objects, such as queues and channels. The queue manager itself is an object as well. What is Messaging and Queuing? Message queuing is a method of program-to-program communication. Programs within an application communicate by writing and retrieving application-specific data (messages)
to/from queues, without having a private, dedicated, logical connection to link them. Messaging means that programs communicate with each other by sending data in messages and not by calling each other directly. Queuing means that programs communicate through queues. Programs communicating through queues need not be executed concurrently.
This figure shows how two programs, A and B, communicate with each other. We see two queues; one is the "output" queue for A and at the same time the "input" queue for B, while the second queue is used for replies flowing from B to A. The squares between the queues and the programs represent the Message Queuing Interface (API) the program uses to communicate with MQSeries' run-time program, the queue manager. As said before, the API is a simple multi platform API consisting of 13 calls. About the queue manager The heart of MQSeries is the message queue manager (MQM), MQSeries' run-time program. Its job is to manage queues and messages for applications. It provides the Message Queuing Interface (MQI) for communication with applications. Application programs invoke functions of the queue manager by issuing API calls. For example, the MQPUT API call puts a message on a MQSeriesqueue to be read by another program using the MQGET API call. This scenario is shown in the next figure.
A program may send messages to another program that runs in the same machine as the
queue manager (shown above), or to a program that runs in a remote system, such as a server or a host. The remote system has its own queue manager with its own queues. This scenario is shown in the next figure.
The queue manager transfers messages to other queue managers via channels using existing network facilities, such as TCP/IP, SNA or SPX. Multiple queue managers can reside in the same machine. They also need channels to communicate. Application programmers do not need to know where the program to which they are sending messages runs. They put their messages on a queue and let the queue manager worry about the destination machine and how to get the messages there. MQSeries knows what to do when the remote system is not available or the target program is not running or busy. For the queue manager to do its work, it refers to objects that are defined by an administrator, usually when the queue manager is created or when a new application is added. MQSeries for Windows provides graphical user interfaces; other platforms use the command line interface or panels. TOPICS: DB2, IBM CONTENT MANAGER, WEBSPHERE
ICM process check

DB2 database check:
# ps -ef | grep db2loggw
This command shows you the (log writer processes of the) DB2 databases that are active. To show the DB2 instance list:
# su - db2inst1 -c db2 list active databases # su - db2inst1 -c db2ilist
Check if the db2agents and db2sysc are active via ps -ef. If one of these are not functioning, the database will not be active. Check connectivity to DB2 database:
# su - db2inst1 # db2 connect to [database-instance] user [user-name] using [password] db2 list tables db2 list applications db2 connect reset
IBM HTTP Server check: Check the HTTP process using:

# ps -ef | grep httpserver
Startup a web browser to http://[server]/ and to https://[server]/. Check the eClient:

# ps -ef | grep eClient_Server
WebSphere check: Start a browser to http://[server]:9090/admin and login with ID wasadmin or icmadmin. Start a command window, go to the WebSphere directory (WebSphere/AppServer/bin) and run:
# serverstatus -all
Individual WebSphere applications can be stopped with stopserver icmrm (if you would like to stop the Resource Manager) and startserver icmrm (server1 is the default WebSphere server). Check the Resource Manager Start a browser to http://[server]/icmrm/ICMResourceManager. If you get output about a NULL request, it's fine. Other checks: http://[server]/icmrm/snoop should output information of the Snoop Servlet. https://[server]/icmrm/ICMRMAdminServlet should show RM diagnostic data (You'll probably have to logon with the rmadmin userid). TOPICS: WEBSPHERE
WebSphere links
A number of external links, related to WebSphere Application Server: Official IBM sites: WebSphere IBM page WebSphere Application Server WAS - Support page WAS 5.1 - Info Center WAS 6.0 - Info Center WAS - Prerequisites Search for WebSphere on www.redbooks.ibm.com WebSphere Redbooks Domain WebSphere Developer Domain
Websphere Advisor Magazine Related links: Eclipse.org JBoss WebSphere.org Java2 Platform Enterprise Edition standard Java Runtime Environment IBM HTTP Server - support TOPICS: WEBSPHERE
Testing your WebSphere installation

First of all, start your HTTP Server and WAS administration server "server1". In a browser, run the snoop servlet:
http://localhost:9080/snoop
Using port 9080 bypasses the web Server plug-in and uses the imbedded HTTP server of WebSphere Application Server. In a browser, run the snoop servlet:
http://localhost/snoop
This uses port 80, which will test the Web Server plug-in of the HTTP Server, and the communication from the Web Server to WebSphere Application Server. TOPICS: WEBSPHERE
WebSphere introduction
WebSphere Application Server (WAS) is a server for deploying and managing applications on the web. It is a deployment environment for Java based applications (it is basically an environment for running Java code). For example, the eClient of IBM Content Manager uses Java Server Pages (JSPs). JSPs contain HTML and embedded Java code that is compiled and run by WAS, similar to PHP. WebSphere Application Server is actually part (or the foundation) of a huge range of products, called the WebSphere family of products. WAS is built on the services of a web server to provide additional services to support business applications and transactions on the web. A common example of this is persistence support for user sessions that cannot be provided by only using an HTTP server. In general, WAS is able to facilitate a multi-tiered, web enabled environment that provides security, reliability, availability, scalability, flexibility and performance. WAS can, of course, serve static HTML and dynamic content. Releases available: WAS Express (Single Server): Designed to support only a single hardware server. For small companies or individuals.
WAS Base: intended for large production environments. Its purpose: A standalone, single machine, which is not scalable (1 process on 1 machine). WAS Network Deployment (ND): offers specific high end functionality. Actually the same as the base version, but this version is scalable: it can be spread over several systems to offer load-balancing capabilities.
WAS Enterprise: the same as the ND version, but with added features. Nowadays it is also called the Business Integration Foundation version. The admin console of WAS is also a WebSphere application, usually available through:
http://[server]:9090/admin
WAS 5 has a connection pooling feature, which can result in a significant observable reduction in response time, especially for database connections. It reduces the overhead of creating a new connection for each user and disconnting it afterwards, by using existing connections from a connection pool. WAS communicates with databases via JDBC, which is actually the driver for a database.
Using iptrace
IP alias

TOPICS: LINUX, NETWORKING
Howto Red Hat Enterprise Linux 5 configuring the network

Red hat Linux provides following tools to make changes to Network configuration such as add new card, assign IP address, change DNS server, etcetera: GUI tool (X windows required) - system-config-network Command line text based GUI tool (No X windows required) - system-confignetwork-tui Edit configuration files directly, stored in /etc/sysconfig/network-scripts directory The following instructions are compatible with CentOS, Fedora Core and Red Hat Enterprise Linux 3, 4 and 5. Editing the configuration files stored in /etc/sysconfig/network-scripts: First change directory to /etc/sysconfig/network-scripts/:
# cd /etc/sysconfig/network-scripts/
You need to edit / create files as follows: /etc/sysconfig/network-scripts/ifcfg-eth0 : First Ethernet card configuration file /etc/sysconfig/network-scripts/ifcfg-eth1 : Second Ethernet card configuration file To edit/create the first NIC file, type the following command:
# vi ifcfg-eth0
Append/modify as follows:
# Intel Corporation 82573E Gigabit Ethernet Controller (Copper) DEVICE=eth0 BOOTPROTO=static DHCPCLASS= HWADDR=00:30:48:56:A6:2E IPADDR=10.251.17.204 NETMASK=255.255.255.0 ONBOOT=yes
Save and close the file. Define the default gateway (router IP) and hostname in /etc/sysconfig/network file:
# vi /etc/sysconfig/network NETWORKING=yes HOSTNAME=host.domain.com GATEWAY=10.251.17.1
Save and close the file. Restart networking:

# /etc/init.d/network restart
Make sure you have correct DNS server defined in /etc/resolv.conf file. Try to ping the gateway, and other hosts on your network. Also check if you can resolv host names:
# nslookup host.domain.com
And verify if the NTP servers are correct in /etc/ntp.conf, and if you can connect to the time server, by running the ntpdate command against one of the NTP servers:
# ntpdate 10.20.30.40
This should synchronize system time with time server 10.20.30.40. TOPICS: AIX, NETWORKING, SYSTEM ADMINISTRATION

stream
f1df487f8

# lsof -i :[PORT]
Example:
2638066 oracle
IPv4 0xf1b3f398 0t1716253
TOPICS: HARDWARE, NETWORKING
Integrated Virtual Ethernet adapter

The "Integrated Virtual Ethernet" or IVE adapter is an adapter directly on the GX+ bus, and thus up to 3 times faster dan a regular PCI card. You can order Power6 frames with different kinds of IVE adapters, up to 10GB ports. The IVE adapter acts as a layer-2 switch. You can create port groups. In each port group up to 16 logical ports can be defined. Every port group requires at least 1 physical port (but 2 is also possible). Each logical port can have a MAC address assigned. These MAC addresses are located in the VPD chip of the IVE. When you replace an IVE adapters, LPARS will get new new MAC addresses. Each LPAR can only use 1 logical port per physical port. Different LPARs that use logical ports from the same port group can communicate without any external hardware needed, and thus communicate very fast. The IVE is not hot-swappable. It can and may only be replaced by certified IBM service personnel.
First you need to configure an HAE adapter; not in promiscues mode, because that is meant to be used if you wish to assign a physical port dedicated to an LPAR. After that, you need to assign a LHAE (logical host ethernet adapter) to an LPAR. The HAE needs to be configured, and the frame needs to be restarted, in order to function correctly (because of the setting of multi-core scaling on the HAE itself). So, to conclude: You can assign physical ports of the IVE adapter to separate LPARS (promiscues mode). If you have an IVE with two ports, up to two LPARS can use these ports. But you can also configure it as an HAE and have up to 16 LPARS per physical port in a port group using the same interface (10Gb ports are recommended). There are different kinds of IVE adapters; some allow to create more port groups and thus more network connectivity. The IVE is a method of virtualizing ethernet without the need for VIOS. TOPICS: AIX, NETWORKING, SYSTEM ADMINISTRATION
SCP Stalls

# ifconfig enX down # ifconfig enX detach # chdev -l entX -a use_alt_addr=yes # chdev -l entX -a alt_addr=0x00xxxxxxxxxx # ifconfig enX xxx.xxx.xxx.xxx # ifconfig enX up
# ifconfig enX down # ifconfig enX detach # chdev -l entX -a use_alt_addr=no # chdev -l entX -a alt_addr=0x00000000000

Then verify again:

CuAt: name = "inet0" attribute = "route" value = "net,-hopcount,0,,0,192.168.0.1"
type = "R" generic = "DU" rep = "s" nls_index = 0

If these are disabled, you shouldn't see any ICMP messages any more. When one system tries to optimize its transmissions by discovering the path MTU, a pmtu entry is created in a Path MTU (PMTU) table. You can display this table using thepmtu display command. To avoid the accumulation of pmtu entries, unused pmtu entries will expire and be deleted when the pmtu_expire time (no -o pmtu_expire) is exceeded; default after 10 minutes. TOPICS: AIX, NETWORKING, SYSTEM ADMINISTRATION

This will transfer a file of 32K * 1024 = 32 MB. The transfer informaton will be shown by FTP. TOPICS: AIX, NETWORKING, SYSTEM ADMINISTRATION

This command will permanently bring down the en0 interface (permanently means after reboot). TOPICS: AIX, NETWORKING, POWERHA / HACMP

When you're using HACMP, you usually have multiple network adapters installed and thus multiple network interface to handle with. If AIX configured the default gateway on a wrong interface (like on your management interface instead of the boot interface), you might want to change this, so network traffic isn't sent over the management interface. Here's how you can do this: First, stop HACMP or do a take-over of the resource groups to another node; this will avoid any problems with applications when you start fiddling with the network configuration. Then open up a virtual terminal window to the host on your HMC. Otherwise you would loose the connection, as soon as you drop the current default gateway. Now you need to determine where your current default gateway is configured. You can do this by typing:

And ofcourse, try to ping the IP address of the default gateway and some outside address. Now reboot your system and check if the default gateway remains configured on the correct interface. And startup HACMP again! TOPICS: LINUX, NETWORKING
Enabling bonding in Linux

To enable "etherchannel" or "bonding" in Linux nomenclature: Add these two lines to /etc/modprobe.conf:
alias bond0 bonding options bond0 miimon=100 mode=1 primary=eth0
Entry "mode=1" simply means active/standby. Entry "miimon" is the number in milliseconds to wait before determining a link dead (Change eth0 to match your primary device, if it is different. Blades sometimes have eth4 as the primary device). In /etc/sysconfig/network-scripts create ifcfg-bond0 with the following (of course, change the network info to match your own):
DEVICE=bond0 BROADCAST=10.250.19.255 IPADDR=10.250.19.194 NETMASK=255.255.255.0 GATEWAY=10.250.19.1 ONBOOT=yes BOOTPROTO=none
Change ifcfg-eth0 and ifcfg-eth1 (or whatever they are) to resemble this:
DEVICE=eth0 HWADDR=00:22:64:9B:54:9C USERCTL=no
ONBOOT=yes MASTER=bond0 SLAVE=yes BOOTPPROTO=none
Leave the value of HWADDR to whatever it is in your file. This is important. It is this devices MAC Address. Run /etc/init.d/network restart. You will want to do at least this part from the console, in case something goes wrong. Once you get your "OK" and the prompt comes back, do an ifconfig -a. You should see bond0. Make sure you can ping your default gateway. After that, all should be good. Note: When making back up copies of the ifcfg-* files, you must either move the backup files out of this directory or change your backup copy strategy for these files. The primary network script that reads these files, basically runs: ls ifcg-*. It then creates an interface based on the part after the dash ("-"). So if you run, for example:
# cp ifcfg-eth0 ifcfg-eth0.bak
You will end up with an alias device of eth0 called eth0.bak. Instead do this:
# cp ifcfg-eth0 bak.$(date +%Y%m%d).ifcfg-eth0
That foils the configuration script and allows to keep backup/backout copies in the same directory with the working copies. TOPICS: HARDWARE, INSTALLATION, NETWORKING

Port HMC1: 192.168.2.147/24 Port HMC2: 192.168.3.147/24

Second Service Processor:
Port HMC1: 169.254.2.146/24 Port HMC2: 169.254.3.146/24
Link: System p Operations Guide for ASMI and for Nonpartitioned Systems. TOPICS: LINUX, NETWORKING
Linux bond interfaces

Linux allows binding multiple network interfaces into a single channel/NIC using special kernel module called bonding. According to official bonding documentation, The Linux bonding driver provides a method for aggregating multiple network interfaces into a single logical "bonded" interface. The behavior of the bonded interfaces depends upon the mode; generally speaking, modes provide either hot standby or load balancing services. Additionally, link integrity monitoring may be performed. Setting up bounding is easy with RHEL v4.0. Red Hat Linux stores network configuration in /etc/sysconfig/network-scripts/ directory. First, you need to create bond0 config file:
# vi /etc/sysconfig/network-scripts/ifcfg-bond0
Append following lines to it:

DEVICE=bond0 IPADDR=192.168.1.20 NETWORK=192.168.1.0 NETMASK=255.255.255.0 USERCTL=no BOOTPROTO=none ONBOOT=yes
Replace above IP address with your actual IP address. Save file and exit to shell prompt. Now open the configuration files for eth0 and eth1 in the same directory using the vi text editor and make sure file read as follows for eth0 interface:
# cat /etc/sysconfig/network-scripts/ifcfg-eth0 DEVICE=eth0 USERCTL=no ONBOOT=yes MASTER=bond0 SLAVE=yes BOOTPROTO=none
Repeat the same for the ifcfg-eth1 file, of course, set the DEVICE to eth1. Then, make sure that the following two lines are added to either /etc/modprobe.conf or /etc/modules.conf (see this page or also this page for more information):
alias bond0 bonding options bond0 mode=1 miimon=100
Then load the bonding module:

# modprobe bonding
Restart networking service in order to bring up bond0 interface:
# service network restart
Verify everything is working:

# less /proc/net/bonding/bond0 Bonding Mode: load balancing (round-robin) MII Status: up MII Polling Interval (ms): 0 Up Delay (ms): 0 Down Delay (ms): 0
Slave Interface: eth0 MII Status: up Link Failure Count: 0 Permanent HW addr: 00:0c:29:c6:be:59
Slave Interface: eth1 MII Status: up Link Failure Count: 0 Permanent HW addr: 00:0c:29:c6:be:63
TOPICS: AIX, INSTALLATION, NIM, SYSTEM ADMINISTRATION
Nimadm



N usr,root
TOPICS: INSTALLATION, NIM
How to set up a NIM master

In this section, we will configure the NIM master and create some basic installation resources: Ensure that Volume 1 of the AIX DVD is in the drive. Install the NIM master fileset:
# installp -agXd /dev/cd0 bos.sysmgt.nim
Configure NIM master:

# smitty nim_config_env
Set fields as follows: "Primary Network Interface for the NIM Master": selected interface "Input device for installation images": "cd0" "Remove all newly added NIM definitions...": "yes" Press Enter. Exit when complete. Initialize each NIM client:
# smitty nim_mkmac
Enter the host name of the appropriate LPAR. Set fields as follows: "Kernel to use for Network Boot": "mp" "Cable Type": "tp" Press Enter. Exit when complete.
A more extensive document about setting up NIM can be found here: http://www01.ibm.com/support/docview.wss?context=SWG10q1=setup+guide&uid=isg3T1010383 Another interesting document which covers NIM is this: NIM Nutshell TOPICS: AIX, BACKUP & RESTORE, NIM, SYSTEM ADMINISTRATION

# Remove unwanted entries from the inittab rmitab hacmp 2>/dev/null rmitab tsmsched 2>/dev/null rmitab tsm 2>/dev/null rmitab clinit 2>/dev/null rmitab pst_clinit 2>/dev/null rmitab qdaemon 2>/dev/null rmitab sddsrv 2>/dev/null rmitab nimclient 2>/dev/null rmitab nimsh 2>/dev/null rmitab naviagent 2>/dev/null
# Disable start scripts chmod 000 /etc/rc.d/rc2.d/S01app
# copy inetd.conf
cp /etc/inetd.conf /etc/inetd.conf.org # take out unwanted items cat /etc/inetd.conf.org | grep -v bgssd > /etc/inetd.conf
# smitty nim_mkres

# smit nim_bosinst
BOSINST_DATA to use during installation IMAGE_DATA to use during installation
[] [server1_image_date]
Transfer the /image.data file to the NIM master and store it in the location you desire. It is a good idea to place the file, or any NIM resource for that matter, in a descriptive manor, for example: /export/nim/image_data. This will ensure you can easily identify your "image_data" NIM resource file locations, should you have the need for multiple "image_data" resources. Make sure your image.data filenames are descriptive also. A common way to name the file would be in relation to your clientname, for example: server1_image_data.
Run the nim command, or use smitty and the fast path 'nim_mkres' to define the file that you have edited using the steps above: From command line on the NIM master:
# smit nim_mkres
Define a Resource
[]


# cd / # mkszfile
# vi /image.data
lv_data: VOLUME_GROUP= rootvg LV_SOURCE_DISK_LIST= hdisk0 hdisk1 LV_IDENTIFIER= 00cead4a00004c0000000117b1e92c90.2 LOGICAL_VOLUME= hd6 VG_STAT= active/complete TYPE= paging MAX_LPS= 512 COPIES= 2 LPs= 124 STALE_PPs= 0 INTER_POLICY= minimum INTRA_POLICY= middle MOUNT_POINT= MIRROR_WRITE_CONSISTENCY= off LV_SEPARATE_PV= yes PERMISSION= read/write LV_STATE= opened/syncd WRITE_VERIFY= off PP_SIZE= 128 SCHED_POLICY= parallel PP= 248 BB_POLICY= non-relocatable RELOCATABLE= yes UPPER_BOUND= 32 LABEL= MAPFILE= /tmp/vgdata/rootvg/hd6.map LV_MIN_LPS= 124
STRIPE_WIDTH= STRIPE_SIZE= SERIALIZE_IO= no FS_TAG= DEV_SUBTYP=
lv_data: VOLUME_GROUP= rootvg LV_SOURCE_DISK_LIST= hdisk0 LV_IDENTIFIER= 00cead4a00004c0000000117b1e92c90.2 LOGICAL_VOLUME= hd6 VG_STAT= active/complete TYPE= paging MAX_LPS= 512 COPIES= 1 LPs= 124 STALE_PPs= 0 INTER_POLICY= minimum INTRA_POLICY= middle MOUNT_POINT= MIRROR_WRITE_CONSISTENCY= off LV_SEPARATE_PV= yes PERMISSION= read/write LV_STATE= opened/syncd WRITE_VERIFY= off PP_SIZE= 128 SCHED_POLICY= parallel PP= 124 BB_POLICY= non-relocatable RELOCATABLE= yes UPPER_BOUND= 32 LABEL= MAPFILE= /tmp/vgdata/rootvg/hd6.map LV_MIN_LPS= 124 STRIPE_WIDTH= STRIPE_SIZE=
SERIALIZE_IO= no FS_TAG= DEV_SUBTYP=
cat /image.data | while read LINE ; do if [ "${LINE}" = "COPIES= 2" ] ; then COPIESFLAG=1 echo "COPIES= 1" else if [ ${COPIESFLAG} -eq 1 ] ; then PP=ècho ${LINE} | awk '{print $1}'` if [ "${PP}" = "PP=" ] ; then PPNUM=ècho ${LINE} | awk '{print $2}'` ((PPNUMNEW=$PPNUM/2)) echo "PP= ${PPNUMNEW}" COPIESFLAG=0 else echo "${LINE}" fi else echo "${LINE}"
fi fi done > /image.data.1disk

If your AIX server level is below 5.3 TL06, the easiest way is just to upgrade your current OS to TL 06 at minimum (take note it will depend of configurations for Power6 processors) then clone your server and install it on the new p6. But if you want to avoid an outage on your account, you can do the next using a NIM server (this is not official procedure for IBM, so they do not support this): Create your mksysb resource and do not create a spot from mksysb. Create an lppsource and spot with minimum TL required (I used TL08). Once you do nim_bosinst, choose the mksysb, and the created spot. It will send a warning message about spot is not at same level as mksysb, just ignore it. Do all necessary to boot from NIM. Once restoring the mksysb, there's some point where it is not able to create the bootlist because it detects the OS level is not supported on p6. So It will ask to continue and fix it later via SMS or fix it right now. Choose to fix it right now (it will open a shell). You will notice oslevel is as the same as mksysb. Create a NFS from NIM server or another server where you have the necessary TL and mount it on the p6. Proceed to do an upgrade, change the bootlist, exit the shell. Server will boot with new TL over the p6. TOPICS: AIX, NIM, SYSTEM ADMINISTRATION

A "how-to" restore a mksysb through NIM: Create a mksysb resource in NIM: Logon to the NIM server as user root. Run smitty nim, Perform NIM Administration Tasks, Manage resources, Define a resource, select mksysb, type name mksysb_, enter "master" as Server of Resource, enter the full path to the mksysb file at Location of Resource: e.g. /backup/hostname.image. Add the mksysb resource to the defined machine in NIM, together with the original SPOT and LPP source of the host: Run: smitty nim, Perform NIM Administration Tasks, Manage Machines, Manage Network Install Resource Allocation, Allocate Network Install Resources, select the machine, select the mksysb resource defined in the previous step, along with the correct SPOT and LPP_SOURCE of the oslevel of the system.
Do a perform operation on the machine in NIM and set it to mksysb: Run smitty nim, Perform NIM Administration Tasks, Manage Machines, Perform Operations on Machines, select the machine, select bos_inst, set the Source for BOS Runtime Files to mksysb, set Remain NIM client after install to no, set Initiate Boot Operation on Client to no, set Accept new license agreements to yes.
Start up the system in SMS mode and boot from the NIM server, using a virtual terminal on the HMC. Select the disks to install to. Make sure that you set import user volume groups to "yes". Restore the system. By the way, another method to initiate a mksysb restore is by using:

# lsnim -l hostname
TOPICS: AIX, INSTALLATION, NIM





Check with:
Rebuild the .toc:


Check the SPOT:

A small note when you're using AIX 7 / AIX 6.1: Significant changes have been made in AIX 7 and AIX 6.1 that add new support for NIM. In particular there is now the capability to use the loopmount command to mount iso images into filesystems. As an example:
The above mounts the AIX 7 base iso as a filesystem called /aix. You can now create an lpp_source or spot from the iso or you can simply read the files. TOPICS: AIX, BACKUP & RESTORE, NIM
Nimesis
If you're trying to restoring an mksysb through NIM and constantly get the same error when trying to restore a mksysb on different systems:
0042-006 niminit (To-master) rcmd connection refused
This may be caused by the "nimesis" daemon not running on the NIM server. Make sure it's enabled in /etc/inittab on the NIM server:
# grep nim /etc/inittab nim:2:wait:/usr/bin/startsrc -g nim >/dev/console 2>&1

Then verify again:

TOPICS: AIX, ODM, SYSTEM ADMINISTRATION




Date/Time:
Tue Oct
6 15:57:58 CDT 2009
Sequence Number: 585 Machine Id: Node Id: Class: Type: Resource Name: 0004D6EC4C00 hostname O TEMP OPERATOR
Detail Data MESSAGE FROM ERRLOGGER COMMAND
My coffee is cold
# errclear 0
More info here: http://www.blacksheepnetworks.com/security/resources/aix-errornotification.html. TOPICS: EMC, INSTALLATION, ODM, STORAGE, STORAGE AREA NETWORK

4.

5.

6.

7.

8. 9.
# rmdev -Rdl fscsiX
(X being driver instance number, i.e. 0,1,2, etc.)
10. 11.
Verify through lsdev -Cc driver that there are no more fiber driver instances (fscsi). Change the adapter instances in Defined state
# rmdev -l fcsX
# emc_cfgmgr
or
# cfgmgr -vl fcsx
# powermt config
14.

TOPICS: AIX, LVM, ODM


Then, remove the device entry of the logical volume in the /dev directory (if present at all).
TOPICS: AIX, ORACLE, SDD, STORAGE, SYSTEM ADMINISTRATION

root#node2 # cd /dev root@node2 # lspv|grep vpath|grep None|awk '{print $1}'|xargs ls -als 0 brw------0 brw------0 brw------1 root 1 root 1 root system system system 47, 47, 47, 4 Apr 29 13:33 vpath4 5 Apr 29 13:33 vpath5 6 Apr 29 13:33 vpath6
0 brw------0 brw-------
1 root 1 root
system system
47,
9 Apr 29 13:33 vpath9
47, 10 Apr 29 13:33 vpath10
On server node2:
TOPICS: BACKUP & RESTORE, ORACLE, TSM
Test Oracle TDP

How do you test if Oracle TDP (RMAN) is working properly?
# tdpoconf showenv
TOPICS: GPFS, ORACLE, POWERHA / HACMP
Oracle RAC introduction

The traditional method for making an Oracle database capable of 7*24 operation is by means of creating an HACMP cluster in an Active-Standby configuration. In case of a failure of the
Active system, HACMP lets the standby system take over the resources, start Oracle and thus resumes operation. This takeover is done with a downtime period of aprox. 5 to 15 minutes, however the impact on the business applications is more severe. It can lead to interruptions up to one hour in duration. Another way to achieve high availability of databases, is to use a special version of the Oracle database software called Real Application Cluster, also called RAC. In a RAC cluster multiple systems (instances) are active (sharing the workload) and provide a near always-on database operation. The Oracle RAC software relies on IBM's HACMP software to achieve high availability for hardware and the operating system platform AIX. For storage it utilizes a concurrent filesystem called GPFS (General Parallel File System), a product of IBM. Oracle RAC 9 uses GPFS and HACMP. With RAC 10 you no longer need HACMP and GPFS. HACMP is used for network down notifications. Put all network adapters of 1 node on a single switch and put every node on a different switch. HACMP only manages the public and private network service adapters. There are no standby, boot or management adapters in a RAC HACMP cluster. It just uses a single hostname; Oracle RAC and GPFS do not support hostname take-over or IPAT (IP Address take-over). There are no disks, volume groups or resource groups defined in an HACMP RAC cluster. In fact, HACMP is only necessary for event handling for Oracle RAC. Name your HACMP RAC clusters in such away, that you can easily recognize the cluster as a RAC cluster, by using a naming convention that starts with RAC_. On every GPFS node of an Oracle RAC cluster a GPFS daemon (mmfs) is active. These daemons need to communicate with each other. This is done via the public network, not via the private network. Cache Fusion Via SQL*Net an Oracle block is read in memory. If a second node in an HACMP RAC cluster requests the same block, it will first check if it already has it stored locally in its own cache. If not, it will use a private dedicated network to ask if another node has the block in cache. If not, the block will be read from disk. This is called Cache Fusion or Oracle RAC interconnect. This is why on RAC HACMP clusters, each node uses an extra private network adapter to communicate with the other nodes, for Cache Fusion purposes only. All other communication, including the communication between the GPFS daemons on every node and the communication from Oracle clients, is done via the public network adapter. The throughput on the private network adapter can be twice as high as on the public network adapter.
Oracle RAC will use its own private network for Cache Fusion. If this network is not available, or if one node is unable to access the private network, then the private network is no longer used, but the public network will be used instead. If the private network returns to normal operation, then a fallback to the private network will occur. Oracle RAC uses cllsif of HACMP for this purpose.
Using lvmstat
iocnt 306653 34 453
Kb_read 47493022 0 234543
Kb_wrtn 383822 3340 234343
Kbps 103.2 2.8 89.3

A common issue on AIX servers is, that logical volumes are configured on only one single disk, sometimes causing high disk utilization on a small number of disks in the system, and impacting the performance of the application running on the server.
If you suspect that this might be the case, first try to determine which disks are saturated on the server. Any disk that is in use more than 60% all the time, should be considered. You can use commands such as iostat, sar -d, nmon and topas to determine which disks show high utilization. If the do, check which logical volumes are defined on that disk, for example on an IBM SAN disk:
# lspv -l vpath23
# lslv -m prodlv
# lslv prodlv
# chlv -u 32 prodlv
# lsvg -p prodvg | sort -nk4 | grep -v vpath408 | tail -8 vpath188 vpath163 active active 959 959 40 42 00..00..00..00..40 00..00..00..00..42
vpath208 vpath205 vpath194 vpath24 vpath304 vpath161
active active active active active active
959 959 959 959 959 959
96 192 240 243 340 413
00..00..96..00..00 102..00..00..90..00 00..00..00..48..192 00..00..00..51..192 00..89..152..99..00 14..00..82..125..192

vpath304 prodlv

# syncvg -l prodlv

Then check again:

# lslv -m prodlv

#!/bin/ksh
Note: the script assumes that you've stored the NMON output files in /var/msgs/nmon. Update the script to the folder you're using to store NMON files. TOPICS: LINUX, PERFORMANCE, STORAGE, SYSTEM ADMINISTRATION
Creating a RAM disk on Linux

On Linux, you can use the tmpfs to create a RAM disk:
# mkdir -p /mnt/tmp # mount -t tmpfs -o size=20m tmpfs /mnt/tmp
This will create a 20 Megabyte sized RAM file system, mounted on /mtn/tmp. If you leave out the "-o size" option, by default half of the memory will be allocated. However, the memory will not be used, as long as no data is written to the RAM file system. TOPICS: AIX, PERFORMANCE, STORAGE, SYSTEM ADMINISTRATION

# mkramdisk 4G



Searching for an easy way to create high-quality graphs that you can print, publish to the Web, or cut and paste into performance reports? Look no further. The nmon_analyser tool takes files produced by the NMON performance tool, turns them into Microsoft Excel spreadsheets, and automatically produces these graphs. You can download the tool here: http://www.ibm.com/developerworks/aix/library/au-nmon_analyser/ TOPICS: AIX, PERFORMANCE, SYSTEM ADMINISTRATION

Here are a couple of rules that your paging spaces should adhere to, for best performance: The size of paging space should match the size of the memory. Use more than one paging space, on different disks to each other. All paging spaces should have the same size. All paging spaces should be mirrored. Paging spaces should not be put on "hot" disks. TOPICS: AIX, PERFORMANCE, SYSTEM ADMINISTRATION

pers 0 0
clnt 2758 2093160
other 163295
PoolSize -
inuse 4918761 9853
pgsp 12885 0
pin 621096 5360
virtual 2825601 9853
# bc scale=2 2982321/5079040 .58
Thus, the actual memory consumption is 58% of the memory (5079040 blocks of 4 KB = 19840 MB). The free memory is thus: (100% - 58%) * 19840 MB = 8332 MB. Try to keep the value of memory consumption less than 90%. Above that, you will generally start seeing paging activity using the vmstat command. By that time, it is a good idea to lower the load on the system or to get more memory in your system. TOPICS: AIX, PERFORMANCE, SYSTEM ADMINISTRATION

# yes > /dev/null
The yes command will continiously echo "yes" to /dev/null. This is a single-threaded process, so it will put load on a single processor. If you wish to put load on multiple processors, why not run yes a couple of times? TOPICS: AIX, PERFORMANCE, SYSTEM ADMINISTRATION
PerfPMR
When you suspect a performance problem, PerfPMR can be run. This is a tool generally used by IBM support personal to resolve performance related issues. The download site for this tool is: ftp://ftp.software.ibm.com/aix/tools/perftools/perfpmr TOPICS: AIX, PERFORMANCE, STORAGE, SYSTEM ADMINISTRATION

This wil create a file consisting of 2097152 blocks of 1024 bytes, which is 2GB. You can change the count value to anything you like. Be aware of the fact, that if you wish to create files larger than 2GB, that your file system
needs to be created as a "large file enabled file system", otherwise the upper file size limit is 2GB (under JFS; under JFS2 the upper limit is 64GB). Also check the ulimit values of the user-id you use to create the large file: set the file limit to -1, which is unlimited. Usually, the file limit is default set to 2097151 in /etc/security/limits, which stands for 2097151 blocks of 512 bytes = 1GB. Another way to create a large file is:
# umount /BIG

Divide 2048/#seconds for MB/sec read speed. Tip: Run nmon (select a for adapter) in another window. You will see the throughput for each adapter. More information on JFS and JFS2 can be found here. TOPICS: AIX, BACKUP & RESTORE, PERFORMANCE
Using a pipeline
The next part describes a problem where you would want to do a search on a file system to find all directories in it, and to start a backup session per directory found, but not more than 20 backup sessions at once. Usually you would use the "find" command to find those directories, with the "-exec" parameter to execute the backup command. But in this case, it would result in possibly more than 20 active backup sessions at once, which might overload the system. So, you can create a script that does a "find" and dumps the output to a file first, and then starts reading that file and initiating 20 backups in parallel. But then, the backup can't start, before the "find" command completes, which may take quite a long time, especially if run on a file system with a large number of files. So how do you do "find" commands and backups in
parallel? Solve this problem with a pipeline. Create a pipeline:

# rm -f /tmp/pipe # mknod /tmp/pipe p
Issue the find command:

# find [/filesystem] -type d -exec echo {} \; > /tmp/pipe
So now you have a command which writes to the pipeline, but can't continue until some other process is reading from the pipeline. Create another script that reads from the pipe and issues the backup sessions:
cat /tmp/pipe | while read entry do # Wait until less than 20 backup sessions are active while [ $(jobs -p|wc -l|awk '{print $1}') -ge 20 ] do sleep 5 done
# start backup session in the background [backup-command] & echo Started backup of $entry at `date` done # wait for all backup sessions to end wait echo `date`: Backup complete
This way, while the "find" command is executing, already backup sessions are started, thus saving time to wait until the "find" command completes.
TOPICS: AIX, STORAGE AREA NETWORK, SYSTEM ADMINISTRATION

This error can occur if the fibre channel adapter is extremely busy. The AIX FC adapter driver is trying to map an I/O buffer for DMA access, so the FC adapter can read or write into the buffer. The DMA mapping is done by making a request to the PCI bus device driver. The PCI bus device driver is saying that it can't satisfy the request right now. There was simply too much IO at that moment, and the adapter couldn't handle them all. When the FC adapter is configured, we tell the PCI bus driver how much resource to set aside for us, and it may have gone over the limit. It is therefore recommended to increase the max_xfer_size on the fibre channel devices. It depends on the type of fibre channel adapter, but usually the possible sizes are: 0x100000, 0x200000, 0x400000, 0x800000, 0x1000000 To view the current setting type the following command:


Enter the name of the resource group. It's a good habit to make sure that a resource group name ends with "rg", so you can recognize it as a resource group. Also, select the participating nodes. For the "Fallback Policy", it is a good idea to change it to "Never Fallback". This way, when the primary node in the cluster comes up, and the resource
group is up-and-running on the secondary node, you won't see a failover occur from the secondary to the primary node. Note: The order of the nodes is determined by the order you select the nodes here. If you put in "node01 node02" here, then "node01" is the primary node. If you want to have this any other way, now is a good time to correctly enter the order of node priority. Add the Servie IP/Label to the resource group:
Wait until the cluster is stable and both nodes are up. Basically, the cluster is now up-and-running. However, during the Verification & Synchronization step, it will complain about not having a non-IP network. The next part is for setting up a disk heartbeat network, that will allow the nodes of the HACMP cluster to exchange disk heartbeat packets over a SAN disk. We're assuming here, you're using EMC storage. The process on other types of SAN storage is more or less similar, except for some differences, e.g. SAN disks on EMC storage are called "hdiskpower" devices, and they're called "vpath" devices on IBM SAN storage. First, look at the available SAN disk devices on your nodes, and select a small disk, that won't be used to store any data on, but only for the purpose of doing the disk heartbeat. It is a good
habit, to request your SAN storage admin to zone a small LUN as a disk heartbeating device to both nodes of the HACMP cluster. Make a note of the PVID of this disk device, for example, if you choose to use device hdiskpower4:

Select the disk device on both nodes by selecting the same disk on each node by pressing F7.
Run a Verification & Synchronization again, as described earlier above. Then check with clstat and/or cldump again, to check if the disk heartbeat network comes online. TOPICS: AIX, EMC, STORAGE, STORAGE AREA NETWORK, SYSTEM ADMINISTRATION



6.
7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18.
# rm ./etc/PowerPathExtensions # rm ./etc/emcp_registration # rm ./usr/lib/boot/protoext/disk.proto.ext.scsi.pseudo.power # rm ./usr/lib/drivers/pnext # rm ./usr/lib/drivers/powerdd # rm ./usr/lib/drivers/powerdiskdd # rm ./usr/lib/libpn.a # rm ./usr/lib/methods/cfgpower # rm ./usr/lib/methods/cfgpowerdisk # rm ./usr/lib/methods/chgpowerdisk # rm ./usr/lib/methods/power.cat # rm ./usr/lib/methods/ucfgpower
19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35.
# rm ./usr/lib/methods/ucfgpowerdisk # rm ./usr/lib/nls/msg/en_US/power.cat # rm ./usr/sbin/powercf # rm ./usr/sbin/powerprotect # rm ./usr/sbin/pprootdev # rm ./usr/lib/drivers/cgext # rm ./usr/lib/drivers/mpcext # rm ./usr/lib/libcg.so # rm ./usr/lib/libcong.so # rm ./usr/lib/libemcp_mp_rtl.so # rm ./usr/lib/drivers/mpext # rm ./usr/lib/libmp.a # rm ./usr/sbin/emcpreg # rm ./usr/sbin/powermt # rm ./usr/share/man/man1/emcpreg.1 # rm ./usr/share/man/man1/powermt.1 # rm ./usr/share/man/man1/powerprotect.1
36.

To resolve this, you will have to make sure that the SCSI reset disk method is configured in HACMP. For example, when using EMC storage: Make sure emcpowerreset is present in /usr/lpp/EMC/Symmetrix/bin/emcpowerreset. Then add new custom disk method: Enter into the SMIT fastpath for HACMP "smitty hacmp". Select Extended Configuration. Select Extended Resource Configuration. Select HACMP Extended Resources Configuration. Select Configure Custom Disk Methods. Select Add Custom Disk Methods.
TOPICS: AIX, SDD, STORAGE, STORAGE AREA NETWORK
PVID trouble

# chpv -C vpathxx
TOPICS: EMC, STORAGE, STORAGE AREA NETWORK
EMC PowerPath key installation

This describes how to configure the EMC PowerPath registration keys. First, check the current configuration of PowerPath:
# powermt config Warning: all licenses for storage systems support are missing or expired.
The install the keys:

# emcpreg -install
=========== EMC PowerPath Registration =========== Do you have a new registration key or keys to enter?[n] y Enter the registration keys(s) for your product(s), one per line, pressing Enter after each key. After typing all keys, press Enter again.
Key (Enter if done): P6BV-4KDB-QET6-RF9A-QV9D-MN3V 1 key(s) successfully added. Key successfully installed.
Key (Enter if done): 1 key(s) successfully registered.
(Note: the license key used in this example is not valid). TOPICS: INSTALLATION, SDD, STORAGE AREA NETWORK
SDD upgrade from 1.6.X to 1.7.X

Whenever you need to perform an upgrade of SDD (and it is wise to keep it up-to-date), make sure you check the SDD documentation before doing this. Here's the quick steps to perform to do the updates. Check for any entries in the errorlog that could interfere with the upgrades:
# errpt -a | more
Check if previously installed packages are OK:

# lppchk -v
Commit any previously installed packages:

# installp -c all
Make sure to have a recent mksysb image of the server and before starting the updates to the rootvg, do an incremental TSM backup. Also a good idea is to prepare the alt_disk_install on the second boot disk.
For HACMP nodes: check the cluster status and log files to make sure the cluster is stable and ready for the upgrades. Update fileset devices.fcp.disk.ibm to the latest level using smitty update_all. For ESS environments: Update host attachment script ibm2105 andibmpfe.essutil to the latest available levels using smitty update_all.
Enter the lspv command to find out all the SDD volume groups. Enter the lsvgfs command for each SDD volume group to find out which file systems are mounted, e.g.:
# lsvgfs vg_name
Enter the umount command to unmount all file systems belonging to the SDD volume groups. Enter the varyoffvg command to vary off the volume groups. If you are upgrading to an SDD version earlier than 1.6.0.0; or if you are upgrading to SDD 1.6.0.0 or later and your host is in a HACMP environment with nonconcurrent volume groups that are varied-on on other host, that is, reserved by other host, run the vp2hd volume_group_name script to convert the volume group from the SDD vpath devices to supported storage hdisk devices. Otherwise, you skip this step.
Stop the SDD server:

# stopsrc -s sddsrv
Remove all the SDD vpath devices:

# rmdev -dl dpo -R
Use the smitty command to uninstall the SDD. Enter smitty deinstall and press Enter. The uninstallation process begins. Complete the uninstallation process. If you need to upgrade the AIX operating system, you could perform the upgrade now. If required, reboot the system after the operating system upgrade. Use the smitty command to install the newer version of the SDD. Note: it is also possible to do smitty update_all to simply update the SDD fileset, without first uninstalling it; but IBM recommends doing an uninstall first, then patch the OS, and then do an install of the SDD fileset.
Use the smitty device command to configure all the SDD vpath devices to theAvailable state. Enter the lsvpcfg command to verify the SDD configuration. If you are upgrading to an SDD version earlier than 1.6.0.0, run the hd2vp volume_group_name script for each SDD volume group to convert the physical volumes from supported storage hdisk devices back to the SDD vpath devices.
Enter the varyonvg command for each volume group that was previously varied offline. Enter the lspv command to verify that all physical volumes of the SDD volume groups are SDD vpath devices. Check for any errors:
# errpt | more # lppchk -v # errclear 0
Enter the mount command to mount all file systems that were unmounted. Attention: If the physical volumes on an SDD volume groups physical volumes are mixed with hdisk devices and SDD vpath devices, you must run the dpovgfix utility to fix this problem. Otherwise, SDD will not function properly:
# dpovgfix vg_name
EMC Grab
EMC Grab is a utility that is run locally on each host and gathers storage-specific information (driver version, storage-technical details, etc). The EMC Grab report creates a zip file. This zip file can be used by EMC support. You can download the "Grab Utility" from the following locations: ftp://ftp.emc.com/pub/emcgrab/Unix/ When you've downloaded EMCgrab, and stored in a temporary location on the server like /tmp/emc, untar it using:
tar -xvf *tar
Then run:
/tmp/emc/emcgrab/emcgrab.sh
The script is interactive and finishes after a couple of minutes. TOPICS: EMC, STORAGE, STORAGE AREA NETWORK
Reset reservation bit

If you run into not being able to access an hdiskpowerX disk, you may need to reset the reservation bit on it:
# /usr/lpp/EMC/Symmetrix/bin/emcpowerreset fscsiX hdiskpowerX
BCV issue with Solution Enabler

There is a known bug on AIX with Solution Enabler, the software responsible for BCV backups. Hdiskpower devices dissapear and you need to run the following command to make them come back. This will happen when a server is rebooted. BCV devices are only visible on the target servers.
# /usr/lpp/EMC/Symmetrix/bin/mkbcv -a ALL hdisk2 Available hdisk3 Available hdisk4 Available hdisk5 Available hdisk6 Available hdisk7 Available hdisk8 Available hdiskpower1 Available hdiskpower2 Available hdiskpower3 Available hdiskpower4 Available
EMC and MPIO

You can run into an issue with EMC storage on AIX systems using MPIO (No Powerpath) for your boot disks: After installing the ODM_DEFINITONS of EMC Symmetrix on your client system, the system won't boot any more and will hang with LED 554 (unable to find boot disk). The boot hang (LED 554) is not caused by the EMC ODM package itself, but by the boot process not detecting a path to the boot disk if the first MPIO path does not corresponding to the fscsiX driver instance where all hdisks are configured. Let me explain that more in detail:
Let's say we have an AIX system with four HBAs configured in the following order:
# lscfg -v | grep fcs fcs2 (wwn 71ca) -> no devices configured behind this fscsi2 driver instance (path only configured in CuPath ODM table) fcs3 (wwn 71cb) -> no devices configured behind this fscsi3 driver instance (path only configured in CuPath ODM table) fcs0 (wwn 71e4) -> no devices configured behind this fscsi0 driver instance (path only configured in CuPath ODM table) fcs1 (wwn 71e5) -> ALL devices configured behind this fscsi1 driver instance
To summarize, it is recommended to setup ONLY ONE path when installing an AIX to a SAN disk, then install the EMC ODM package then reboot the host and only after that is complete, add the other paths. Dy doing that we ensure that the fscsiX driver instance used for the boot process has the hdisk configured behind. TOPICS: HARDWARE, SDD, STORAGE, STORAGE AREA NETWORK



Have the IBM CE replace the adapter. Close any events on the failing adapter on the HMC.
LED.
Validate that the notification LED is now off on the system, if not, go back into diag, choose Task Selection, Hot Plug Task, PCI Hot Plug Manager, and Disable the attention

# lscfg -vl fcs1

# cfgmgr -S

# addpaths

TOPICS: EMC, STORAGE, STORAGE AREA NETWORK, SYSTEM ADMINISTRATION
Recovering from dead EMC paths

If you run:
# powermt display dev=all
And you notice that there are "dead" paths, then these are the commands to run in order to set these paths back to "alive" again, of course, AFTER ensuring that any SAN related issues are resolved. To have PowerPath scan all devices and mark any dead devices as alive, if it finds that a device is in fact capable of doing I/O commands, run:
# powermt restore
To delete any dead paths, and to reconfigure them again:
# powermt reset # powermt config
Or you could run:

# powermt check
TOPICS: EMC, INSTALLATION, ODM, STORAGE, STORAGE AREA NETWORK

4.

5.

6.

7.

8. 9.
# rmdev -Rdl fscsiX
# rmdev -l fcsX
(X being adapter instance number, i.e. 0,1,2, etc.)
12.
Create the hdisk entries for all EMC devices:

# emc_cfgmgr
or
# cfgmgr -vl fcsx
# powermt config
14.

Display the status of EMC SAN devices

An easy way to see the status of your SAN devices is by using the following command:
# powermt display Symmetrix logical device count=6 CLARiiON logical device count=0 Hitachi logical device count=0 Invista logical device count=0 HP xp logical device count=0 Ess logical device count=0 HP HSx logical device count=0 ============================================================== - Host Bus Adapters ### HW Path --- I/O Paths ---Summary Total Dead ------ Stats -----IO/Sec Q-IOs Errors
============================================================== 0 fscsi0 1 fscsi1 optimal optimal 6 6 0 0 0 0 0 0
To get more information on the disks, use:

TOPICS: SDD, STORAGE, STORAGE AREA NETWORK
Vpath commands
Check the relation between vpaths and hdisks:
# lsvpcfg
Check the status of the adapters according to SDD:

Check on stale partitions:
# lsvg -o | lsvg -i | grep -i stale
Reservation bit

For SDDPCM, use:

TOPICS: LINUX, STORAGE, STORAGE AREA NETWORK
Emulex hbanyware
If you have Emulex HBA''s and the hbanyware software installed, for example on Linux, then you can use the following commands to retrieve information about the HBA''s: To run a GUI version:
# /usr/sbin/hbanyware/hbanyware
To run the command-line verion:

# /usr/sbin/hbanyware/hbacmd listhbas
To get for attributes about a specific HBA:

# /usr/sbin/hbanyware/hbacmd listhbas 10:00:00:00:c9:6c:9f:d0
TOPICS: STORAGE, STORAGE AREA NETWORK
SAN introduction
SAN storage places the physical disk outside a computer system. It is now connected to a Storage Area Network (SAN). In a Storage Area Network, storage is offered to many systems, including AIX systems. This is done via logical blocks of disk space (LUNs). In the case of an AIX system, every SAN disk is seen as a seperate hdisk, with the advantage of easily expanding the AIX system with new SAN disks, avoiding buying and installing new physical hard disks.
Other advantages of SAN: Disk storage is no longer limited to the space in the computer system itself or the amount of available disk slots.
After the initial investment in the SAN network and storage, the costs of storage per gigabyte are less than disk space within the computer systems. Using two different SAN networks (fabrics), you can avoid having disruptions in your storage, the same as mirroring your data on separate disks. The two SAN fabrics should not be connected to each other.
Using two seperate, geographically dispersed storage systems (e.g. ESS), a disruption in a computer center will not cause your computer systems to go down. When you place to SAN network adapters (called Host Bay adapters on Fibre Channel or HBA) in every computer system, you can connect your AIX system to two different fabrics, thus increasing the availability of the storage. Also, you'll be able to load balance the disk storage over these two host bay adapters. You'll need Multipath I/O software (e.g. SDD or PowerPath) for this to work.
By using 2 HBAs, a defect in a single HBA will not cause downtime. AIX systems are able to boot from SAN disks.

# ksh "fc -t -10"
This will list the last 10 commands for that history file. TOPICS: AIX, SECURITY, SYSTEM ADMINISTRATION
Fix user accounts

Security guidelines nowadays can be annoying. Within many companies people have to comply with strict security in regards to password expiration settings, password complexity and system security settings. All these settings and regulations more than often result in people getting locked out from their accounts on AIX systems, and also getting frustrated. To help your users, you can't go change default security settings on the AIX systems. Your auditor will make sure you won't do that. But instead, there are some "tricks" you can do, to ensure that a user account is and stays available to your end user. We've put all those tricks together in one simple script, that can fix a user account, and we called it fixuser.ksh. You can run this script as often as you like and for any user that you like. It will help you to ensure that a user account is not locked, that AIX won't bug the user to change their password, that the user doesn't have a failed login count (from typing too many passwords), and a bunch of other stuff that usually will keep your users from logging in and getting pesky "Access Denied" messages. The script will not change any default security settings, and it can easily be adjusted to run for several user accounts, or can be run from a crontab so user accounts stay enabled for your users. The script is a win-win situation for everyone: Your auditor is happy, because security settings are strict on your system; Your users are happy for being able to just login without
any hassle; And the sys admin will be happy for not having to resolve login issues manually anymore. The script:
#!/usr/bin/ksh
unset user
if [ ! -z "${1}" ] ; then user=${1} fi
unset myid myid=ìd ${user} 2>/dev/null` if [ ! -z "${myid}" ] ; then echo "Fixing account ${user}..." fixit ${user}
echo "Done." else echo "User ${user} does not exist." fi
mkpasswd
# visudo
# su - pete $ sudo -l User pete may run the following commands on this host: (ALL) NOPASSWD: /usr/local/bin/mkpasswd, !/usr/local/bin/mkpasswd root


When you copy the /etc/passwd and /etc/group files, make sure they contain at least a minimum set of essential user and group definitions. Listed specifically as users are the following: root, daemon, bin, sys, adm, uucp, guest, nobody, lpd
Listed specifically as groups are the following: system, staff, bin, sys, adm, uucp, mail, security, cron, printq, audit, ecs, nobody, usr If the bos.compat.links fileset is installed, you can copy the /etc/security/mkuser.defaults file over. If it is not installed, the file is located as mkuser.default in the /usr/lib/security directory. If you copy over mkuser.defaults, changes must be made to the stanzas. Replace group with pgrp, and program with shell. A proper stanza should look like the following:
This will clear up any discrepancies (such as uucp not having an entry in /etc/security/passwd). Ideally this should be run on the source system before copying over the files as well as after porting these files to the new system. NOTE: It is possible to find user ID conflicts when migrating users from older versions of AIX to newer versions. AIX has added new user IDs in different release cycles. These are reserved IDs and should not be deleted. If your old user IDs conflict with the newer AIX system user IDs, it is advised that you assign new user IDs to these older IDs. From: http://www-01.ibm.com/support/docview.wss?uid=isg3T1000231 TOPICS: AIX, SECURITY, SYSTEM ADMINISTRATION

# dsh -n server date
server.domain.com: Host key verification failed. dsh: 2617-009 server.domain.com remote shell had exit code 255

# ssh server

# man ftpaccess.ctl
To further restrict the FTP account to a server, especially for accounts that are only used for FTP purposes, make sure to disable login and remote login for the account via smitty user. TOPICS: AIX, SECURITY, SYSTEM ADMINISTRATION


Sudosh
Sudosh is designed specifically to be used in conjunction with sudo or by itself as a login shell. Sudosh allows the execution of a root or user shell with logging. Every command the user types within the root shell is logged as well as the output. This is different from "sudo -s" or "sudo /bin/sh", because when you use one of these instead of sudosh to start a new shell, then this new shell does not log commands typed in the new shell to syslog; only the fact that a new shell started is logged. If this newly started shell supports commandline history, then you can still find the commands called in the shell in a file such as .sh_history, but if you use a shell such as csh that does not support command-line logging you are out of luck. Sudosh fills this gap. No matter what shell you use, all of the command lines are logged to syslog (including vi keystrokes). In fact, sudosh uses the script command to log all key strokes and output.
Setting up sudosh is fairly easy. For a Linux system, first download the RPM of sudosh, for example from rpm.pbone.net. Then install it on your Linux server:
09/16/2010 6s
The first paramtere is the session-ID, the second parameter is the multiplier. Use a higher value for multiplier to speed up the replay, while "1" is the actual speed. And the third parameter is the max-wait. Where there might have been wait times in the actual session, this parameter restricts to wait for a maximum max-wait seconds, in the example above, 5 seconds.
For AIX, you can find the necessary RPM here. It is slightly different, because it installs in /opt/freeware/bin, and also the sudosh.conf is located in this directory. Both Linux and AIX require of course sudo to be installed, before you can install and use sudosh. TOPICS: AIX, SECURITY, SYSTEM ADMINISTRATION
SUID
char *getlogin();
*p='\0'; for (i=1; i<argc; ++i) { strcpy(p, argv[i]); p += strlen(argv[i]); if (i < argc-1) { *p = ' '; ++p; *p = '\0'; } }
setuid(0);
################################################ # Make rules #
################################################
all:
sushi
clean:
rm -f *.o sushi
install: cp -p sushi /bin chown root chmod a+rx chmod u+s /bin/sushi /bin/sushi /bin/sushi
################################################
This is something that you want to avoid. Even vendors are known to build backdoors like these into their software. The find command shown at the beginning of this article will help you discover commands as these. Note that the good thing of the sushi program shown above is, that it will write an entry into log file /tmp/sushilog each time someone uses the command. To avoid users being able to run commands with the SUID set, you may want to add the "nosuid" option in /etc/filesystems for each file system:
Especially for (permanently) NFS mounted file systems, it is a VERY good idea to have this nosuid option set, avoiding someone to create a sushi-like program on a NFS server, and
being able to run the program as a regular user on the NFS client system, to gain root access on the NFS client; or if you want to mount a NFS share on a client temporarily, enable the nosuid by running:



# history 888 ? :: cd aix_auth/ 889 ? :: vi server 890 ? :: ldapsearch 891 ? :: fc -lt 892 ? :: fc -l
# fc -t
TOPICS: SECURITY, SYSTEM ADMINISTRATION
Listing sudo access

Sudo is an excellent way to provide root access to commands to other non-root users, without giving them too much access to the system. A very simple command to show you what a specific user is allowed to do:
# su - [username] -c sudo -l User [username] may run the following commands on this host: (root) NOPASSWD: /usr/local/sbin/reset.ksh (root) NOPASSWD: /usr/local/bin/mkpasswd (root) NOPASSWD: !/usr/local/bin/mkpasswd root

On occasion I have the need to establish trust relationships between Unix boxes so that I can script file transfers. In short, here's how you leverage SSH to do that. Using the example of trying to connect from server "a" to get a file on "b" follow this example: Connect to "a". Type: ssh-keygen -t rsa The default directory for keyfiles will be ~/.ssh/ (if you do not want to be prompted, leave passphrase blank). Copy the contents of .ssh/id_rsa.pub (there should only be one line). Place this line on "b", in ~/.ssh/authorized_keys. That's it, you should now be able to ssh/sftp/scp from a to b without being prompted for a password! TOPICS: AIX, SECURITY, SYSTEM ADMINISTRATION
Portmir
A very nice command to use when you either want to show someone remotely how to do something on AIX, or to allow a non-root user to have root access, is portmir. First of all, you need 2 users logged into the system, you and someone else. Ask the other person to run the tty command in his/her telnet session and to tell you the result. For example:

# portmir -o
If you're the root user and the other person temporarily requires root access to do something (and you can't solve it by giving the other user sudo access, hint, hint!), then you can su - to root in the portmir session, allowing the other person to have root access, while you can see what he/she is doing. You may run into issues when you resize a screen, or if you use different types of terminals. Make sure you both have the same $TERM setting, i.e.: xterm. If you resize the screen, and the other doesn't, you may need to run the tset and/or the resizecommands.
TOPICS: HMC, SECURITY
Secure shell access to HMC

If you wish to be able to access an HMC from the command line, without the need of logging in, you can use ssh (secure shell). Set-up a secure shell connection to your HMC:
# ssh userid@hostname
You will have to enter a password to get into your HMC. To allow your root user direct access to the HMC without the need of logging in, you'll have to update the authorized_keys2 file in the .ssh subdirectory of the home directory of your HMC user. There's a problem: a regular user only gets a restricted shell on an HMC and therefore is unable to edit the authorized_keys2 file in subdirectory .ssh. In an HMC version 3 it is possible to disable the restricted shell for users by editing file/opt/hsc/data/ssh/hmcsshrc. In an HMC version 4 and up you no longer get root access (except, you may get it, by contacting IBM), so you can no longer edit this file. But there's another way to accomplish it. Let's say your hmc user ID is hmcuser and you were able to logon to the HMC calledhmcsystem using this ID and a password (like described above). First, get a valid authorized_keys2 file, that allows root at your current host access to the HMC. Place this file in /tmp. Then, use scp to copy the authorized_keys2 file to the HMC:
# scp /tmp/authorized_keys2 hmcuser@hmcsystem:~hmcuser/.ssh/authorized_keys2
[Enter your hmcuser's password, when required] Now, just test if it works:
# ssh hmcuser@hmcsystem date
You should now be able to access the system without entering a password. TOPICS: AIX, SECURITY, SYSTEM ADMINISTRATION

First, install OpenSSH and OpenSSL on two UNIX servers, serverA and serverB. This works best using DSA keys and SSH2 by default as far as I can tell. All the other HOWTOs I've seen seem to deal with RSA keys and SSH1, and the instructions not surprisingly fail to work with SSH2.
On each server type ssh someserver.example.com and make a connection with your regular password. This will create a .ssh dir in your home directory with the proper permissions. On your primary server where you want your secret keys to live (let's say serverA), type:
# ssh-keygen -t dsa
# ssh serverB
This should let you in without typing a password or passphrase. Hooray! You can ssh and scp all you want and not have to type any password or passphrase. TOPICS: LINUX, SECURITY
ILO access through SSH

This describes how to get SSH access to a Linux system on a HP blade system, which requires you to work through the ILO: First of all, you need to know the ILO IP address. Simply open up an SSH session to this IP address:
# ssh -l ilo-admin 10.250.21.37 ilo-admin@10.250.21.37's password: User:ilo-admin logged-in to 10.250.21.37 iLO 2 Advanced 1.60 at 16:05:58 Jul 11 2008 Server Name: Server Power: On
</>hpiLO->
The next thing you need to do is type "VSP", hit ENTER and login to the server:
hpiLO-> VSP Starting virtual serial port Press 'ESC (' to return to the CLI Session </>hpiLO-> Virtual Serial Port active: IO=0x02F8 INT=3 [ENTER] </>hpiLO-> Virtual Serial Port active: IO=0x02F8 INT=3 Red Hat Enterprise Linux ES release 4 (Nahant Update 8)
Kernel 2.6.9-89.ELsmp on an i686 hostname login:
To make this magic happen, we need to spawn a getty on /dev/ttyS1. You might see somthing like this in /etc/inittab:
mo1::off:/sbin/mgetty -x 0 -D -s38400 -a /dev/ttyS1
The mgetty will not work. That expects a modem. Comment it out (it is off anyways). Add this line:
ilo:2345:respawn:/sbin/agetty ttyS1 115200 vt100
Then allows root to login on that tty:

# echo "ttyS1" >> /etc/securetty
Then reread the /etc/inittab and spawn any missing processes, like the new getty:
# kill -HUP 1
Now you should be able to ssh to the servers ILO IP address, login as ilo-admin, run VSP and get a login prompt. TOPICS: LINUX, SECURITY
Accessing ILO through SSH tunnelling

To manage Linux server, running on HP blades, you need to be able to access the ILO. Without it, you won't be able to remotely access the console or to reboot the server. If your ILO is on a separate network and your only access is through SSH, it can be difficult to access the ILO, since the web-based GUI is only supported in Microsoft Internet Explorer. By means of SSH tunneling however, it is possible to access the ILO. First, you need to have a UNIX box which is capable of accessing the ILO directly through SSH (in our example called "jumpserver"). Also, you'll need to have the ILO's IP address. On your Linux client in a system console (let's assume, this client is called "desktop"), create the SSH tunnel and forward the correct ports needed for ILO access, by running as user root:
# ssh -g -L 23:10.250.21.38:23 -L 443:10.250.21.38:443 -L 17988:10.250.21.38:17988 -L 3389:10.250.21.38:3389 jumpserver -N
This will create the SSH tunnel, open up ports 23, 443, 17988 and 3389 through host "jumpserver" to IP address 10.250.21.38. Of course, adjust the name of the jumpserver and the IP address of the ILO to your configuration. Now open up an Internet Explorer on a Windows PC (which is capable of accessing the Linux client "desktop"), and point your browser to https://desktop. You should see the login screen of the ILO. When you're done working on the ILO, simply type CTRL-C in the system console on "desktop".
Likewise, you can do something similar to this, if you don't have a Linux client (like "desktop" in the example above) to work with, by using PuTTY on a Windows PC. In this case, configure PuTTY to set up a SSH tunnel to the "jumpserver" and forward the same ports. Then open up Internet Explorer, and point your browser tohttps://localhost, which should then open up the ILO login screen. TOPICS: MONITORING, POWERHA / HACMP, SECURITY

If the command hangs, something is wrong. Check the changes you made.
If everything works fine, perform the same change in the other node and test again. Now you can test from one server to the other using the snmpinfo command above. If you need to backout, replace with the original configuration file and restart subsystems. Note in this case we use double-quotes. There is no space.
Wait a few minutes and you should be able to use clstat again with the new community name. Disclaimer: If you have any other application other than clinfoES that uses snmpd with the default community name, you should make changes to it as well. Check with your application team or software vendor. TOPICS: SECURITY, X11
Double X11 forwarding

Everybody is usually quite familiar with how to open an X11 windows GUI on a Windows PC. It involves running an X-server on the PC, for example Xming. Install this with all default settings. Make sure you have PuTTY installed on your PC before installing Xming. Then on your PC run Xlaunch, and make sure to set your DISPLAY to a higher value, for example "10" and to check "No Access Control". Log in to the UNIX host through PuTTY, and before starting the session to your UNIX host, go to "Connection" -> "SSH" -> "X11" in PuTTY and select "Enable X11 forwarding", and then click "Open". Once logged in, set the DISPLAY variable to the IP address of your PC and set the correct display, for example:
# export DISPLAY="154.18.20.31:10"
And then, to test, run xclock or xeyes:

# xeyes
The program xeyes should open on your window. Now, how do you get around opening an X window if you have to go through a jumpserver first to get to the correct UNIX server, where you would like to start an X-based program? That's not too difficult also. After logging in on the UNIX jumpserver, following the procedure described above, issue the following command:
# ssh -X -Y -C otherunixhost
Of course, replace "otherunixhost" with the hostname of the UNIX server you'd like to connect to through your jump server. Then, again, to test, run "xeyes" or "xclock" to test. It should open on your PC. Now you have X11 forwarding from a UNIX server, to a jumpserver, and back to your PC, in fact double X11 forwarding. TOPICS: HMC, SECURITY
HMC access through SSH tunnel

If your HMC is located behind a firewall and your only access is through SSH, then you have to use SSH tunneling to get browser-based access to your HMC. The ports you need to use for setting up the SSH tunnel are: 22, 23, 8443, 9960, 9735, 657, 443, 2300, 2301 and 2302. This applies to version 7 of the HMC. For example, if you're using a jump server to get access to the HMC, you need to run:
# ssh -l user -g -L 8443:10.48.32.99:8443 -L 9960:10.48.32.99:9960 -L 9735:10.48.32.99:9735 -L 2300:10.48.32.99:2300 -L 2301:10.48.32.99:2301 -L 443:10.48.32.99:443 -L 2302:10.48.32.99:2302 -L 657:10.48.32.99:657 -L 22:10.48.32.99:22 -L 23:10.48.32.99:23 jumpserver.domain.com -N
When you've run the command above (and have logged in to your jumpserver), then point the browser to https://jumpserver.domain.com. TOPICS: AIX, SECURITY
Restricting the number of login sessions of a user

If you wish to restrict the maximum number of login sessions for a specific user, you can do this by modifying the .profile of that user:
A=`w| grep $LOGNAME | wc -l` if [ $A -ge 3 ] ; then exit fi
This example restricts the number of logins to three. Make sure the user can't modify his/her own .profile by restricting access rights.
TOPICS: AIX, BACKUP & RESTORE, STORAGE, SYSTEM ADMINISTRATION

Run:
Make sure to remove file systems with the rmfs command before running restvg, or it will not run correctly. Or, you can just run it once, run the exportvg command for the same volume group, and run the restvg command again. There is also a "-s" flag for restvg that lets you shrink the file system to its minimum size needed, but depending on when the vgdata was created, you could run out of space, when restoring the contents of the file system. Just something to keep in mind. TOPICS: AIX, STORAGE, SYSTEM ADMINISTRATION, VIRTUALIZATION
Change default value of hcheck_interval

The default value of hcheck_interval for VSCSI hdisks is set to 0, meaning that health checking is disabled. The hcheck_interval attribute of an hdisk can only be changed online if the volume group to which the hdisk belongs, is not active. If the volume group is active, the ODM value of the hcheck_interval can be altered in the CuAt class, as shown in the following example for hdisk0:
# chdev -l hdisk0 -a hcheck_interval=60 -P
The change will then be applied once the system is rebooted. However, it is possible to change the default value of the hcheck_interval attribute in the PdAt ODM class. As a result, you won't have to worry about its value anymore and newly discovered hdisks will automatically get the new default value, as illustrated in the example below:
# odmget -q 'attribute = hcheck_interval AND uniquetype = \ PCM/friend/vscsi' PdAt | sed 's/deflt = \"0\"/deflt = \"60\"/' \ | odmchange -o PdAt -q 'attribute = hcheck_interval AND \ uniquetype = PCM/friend/vscsi'
Mounting USB drive on AIX

To familiarize yourself with using USB drives on AIX, take a look at the following article at IBM developerWorks: http://www.ibm.com/developerworks/aix/library/au-flashdrive/ Before you start using it, make sure you DLPAR the USB controller to your LPAR, if not done so already. You should see the USB devices on your system:
# lsconf | grep usb + usbhc0 U78C0.001.DBJX589-P2 + usbhc1 U78C0.001.DBJX589-P2 + usbhc2 U78C0.001.DBJX589-P2 USB Host Controller USB Host Controller USB Enhanced Host Controller
+ usbms0 U78C0.001.DBJX589-P2-C8-T5-L1 USB Mass Storage
After you plug in the USB drive, run cfgmgr to discover the drive, or if you don't want the run the whole cfgmgr, run:
# /etc/methods/cfgusb -l usb0
Some devices may not be recognized by AIX, and may require you to run the lquerypv command:
# lquerypv -h /dev/usbms0
To create a 2 TB file system on the drive, run:

# mkfs -olog=INLINE,ea=v2 -s2000G -Vjfs2 /dev/usbms0
To mount the file system, run:

# mount -o log=INLINE /dev/usbms0 /usbmnt
Then enjoy using a 2 TB file system:

# df -g /usbmnt Filesystem /dev/usbms0 GB blocks 2000.00 Free %Used 1986.27 1% Iused %Iused Mounted on 3182 1% /usbmnt
TOPICS: AIX, HARDWARE, STORAGE, SYSTEM ADMINISTRATION

At some times it may be necessary to create a dummy disk device, for example when you need a disk to be discovered while running cfgmgr with a certain name on multiple hosts. For example, if you need the disk to be called hdisk2, and only hdisk0 exists on the system, then running cfgmgr will discover the disk as hdisk1, not as hdisk2. In order to make sure cfgmgr indeed discovers the new disk as hdisk2, you can fool the system by temporarily creating a dummy disk device.
Here are the steps involved: First: remove the newly discovered disk (in the example below known as hdisk1 - we will configure this disk as hdisk2):
# rmdev -dl hdisk1


# rmdev -dl hdisk1
Erasing disks
# diag -T format
This will start the Format media service aid in a menu driven interface. If prompted, choose your terminal. You will then be presented with a resource selection list. Choose the hdisk devices you want to erase from this list and commit your changes according to the instructions on the screen. Once you have committed your selection, choose Erase Disk from the menu. You will then be asked to confirm your selection. Choose Yes. You will be asked if you want to Read data from drive or Write patterns to drive. Choose Write patterns to drive. You will then have the
opportunity to modify the disk erasure options. After you specify the options you prefer, choose Commit Your Changes. The disk is now erased. Please note, that it can take a long time for this process to complete. If you want to do it quick-and-dirty: For each disk, use the dd command to overwrite the data on the disk. For example:
This does the trick, as it reads zeroes from /dev/zero and outputs 10 times 1024 zeroes to each disk. That overwrites anything on the start of the disk, rendering the disk useless. TOPICS: AIX, LVM, STORAGE, SYSTEM ADMINISTRATION

The scalable VG implementation in AIX 5L Version 5.3 provides configuration flexibility with respect to the number of PVs and LVs that can be accommodated by a given instance of the new VG type. The configuration options allow any scalable VG to contain 32, 64, 128, 256, 512, 768, or 1024 disks and 256, 512, 1024, 2048, or 4096 LVs. You do not need to configure the maximum values of 1024 PVs and 4096 LVs at the time of VG creation to account for potential future growth. You can always increase the initial settings at a later date as required. The System Management Interface Tool (SMIT) and the Web-based System Manager graphical user interface fully support the scalable VG. Existing SMIT panels, which are related to VG management tasks, have been changed and many new panels added to account for the scalable VG type. For example, you can use the new SMIT fast path _mksvg to directly access the Add a Scalable VG SMIT menu. The user commands mkvg, chvg, and lsvg have been enhanced in support of the scalable VG type. For more information: http://www.ibm.com/developerworks/aix/library/au-aix5l-lvm.html. TOPICS: AIX, ORACLE, SDD, STORAGE, SYSTEM ADMINISTRATION

root@node2 /root # lspv | grep vpath | grep -i none vpath4 vpath5 00f69a11a2f620c5 00f69a11a2f622c8 None None
vpath6 vpath9 vpath10
00f69a11a2f624a7 00f69a11a2f62f1f 00f69a11a2f63212
None None None
root#node2 # cd /dev root@node2 # lspv|grep vpath|grep None|awk '{print $1}'|xargs ls -als 0 brw------0 brw------0 brw------0 brw------0 brw------1 root 1 root 1 root 1 root 1 root system system system system system 47, 47, 47, 47, 4 Apr 29 13:33 vpath4 5 Apr 29 13:33 vpath5 6 Apr 29 13:33 vpath6 9 Apr 29 13:33 vpath9
47, 10 Apr 29 13:33 vpath10
On server node2:
root@node1 # ls -als /dev/*_disk* 0 crw-r--r-- 1 root system 47, 6 May 13 07:18 /dev/ocr_disk01
0 crw-r--r-- 1 root system 0 crw-r--r-- 1 root system 0 crw-r--r-- 1 root system 0 crw-r--r-- 1 root system
47, 47,
7 May 13 07:19 /dev/ocr_disk02 8 May 13 07:19 /dev/voting_disk01
Using lvmstat
iocnt 306653 34 453
Kb_read 47493022 0 234543
Kb_wrtn 383822 3340 234343
Kbps 103.2 2.8 89.3
What are you looking at here? iocnt: Reports back the number of read and write requests. Kb_read: Reports back the total data (kilobytes) from your measured interval that is read.
Kb_wrtn: Reports back the amount of data (kilobytes) from your measured interval that is written. Kbps: Reports back the amount of data transferred in kilobytes per second. You can use the -d option for lvmstat to disable the collection of LVM statistics. TOPICS: AIX, BACKUP & RESTORE, LVM, PERFORMANCE, STORAGE,SYSTEM ADMINISTRATION

# lspv -l vpath23
# lslv -m prodlv
# lslv prodlv
# chlv -u 32 prodlv

vpath161 prodlv vpath163 prodlv vpath188 prodlv vpath194 prodlv vpath205 prodlv vpath208 prodlv 16 16 16 16 16 16 16 16 16 16 16 16 00..00..16..00..00 00..00..00..00..16 00..00..00..00..16 00..00..00..16..00 16..00..00..00..00 00..00..16..00..00 N/A N/A N/A N/A N/A N/A
vpath24
prodlv
16 16
16 16
00..00..00..16..00 00..16..00..00..00
N/A N/A
vpath304 prodlv

# syncvg -l prodlv

Then check again:

# lslv -m prodlv
You can also use the "-e x" option with the mklv command to create a new logical volume from the start with the correct spreading over disks. TOPICS: LINUX, PERFORMANCE, STORAGE, SYSTEM ADMINISTRATION


# mkramdisk 4G
# ls -l /dev/ram*
brw-------
1 root system
46, 0 Sep 22 08:01 /dev/ramdisk0





3. 4. # odmdelete -q name=powerpath0 -o CuDv # odmdelete -q name=powerpath0 -o CuAt
5.
# rm /dev/powerpath0
6.
7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35.
# rm ./etc/PowerPathExtensions # rm ./etc/emcp_registration # rm ./usr/lib/boot/protoext/disk.proto.ext.scsi.pseudo.power # rm ./usr/lib/drivers/pnext # rm ./usr/lib/drivers/powerdd # rm ./usr/lib/drivers/powerdiskdd # rm ./usr/lib/libpn.a # rm ./usr/lib/methods/cfgpower # rm ./usr/lib/methods/cfgpowerdisk # rm ./usr/lib/methods/chgpowerdisk # rm ./usr/lib/methods/power.cat # rm ./usr/lib/methods/ucfgpower # rm ./usr/lib/methods/ucfgpowerdisk # rm ./usr/lib/nls/msg/en_US/power.cat # rm ./usr/sbin/powercf # rm ./usr/sbin/powerprotect # rm ./usr/sbin/pprootdev # rm ./usr/lib/drivers/cgext # rm ./usr/lib/drivers/mpcext # rm ./usr/lib/libcg.so # rm ./usr/lib/libcong.so # rm ./usr/lib/libemcp_mp_rtl.so # rm ./usr/lib/drivers/mpext # rm ./usr/lib/libmp.a # rm ./usr/sbin/emcpreg # rm ./usr/sbin/powermt # rm ./usr/share/man/man1/emcpreg.1 # rm ./usr/share/man/man1/powermt.1 # rm ./usr/share/man/man1/powerprotect.1
36.
TOPICS: LINUX, LVM, STORAGE
Howto extend an ext3 filesystem in RHEL5
You can grow your ext3 filesystems while online: The functionality has been included in resize2fs so to resize a logical volume, start by extending the volume:
# lvextend -L +2G /dev/systemvg/homelv
And the resize the filesystem:

# resize2fs /dev/systemvg/homelv
By omitting the size argument, resize2fs defaults to using the available space in the partition/lv. TOPICS: AIX, STORAGE, SYSTEM ADMINISTRATION
Using NFS
The Networked File System (NFS) is one of a category of filesystems known as distributed filesystems. It allows users to access files resident on remote systems without even knowing that a network is involved and thus allows filesystems to be shared among computers. These remote systems could be located in the same room or could be miles away. In order to access such files, two things must happen. First, the remote system must make the files available to other systems on the network. Second, these files must be mounted on the local system to be able to access them. The mounting process makes the remote files appear as if they are resident on the local system. The system that makes its files available to others on the network is called a server, and the system that uses a remote file is called a client. NFS Server NFS consists of a number of components including a mounting protocol, a file locking protocol, an export file and daemons (mountd, nfsd, biod, rpc.lockd, rpc.stad) that coordinate basic file services. Systems using NFS make the files available to other systems on the network by "exporting" their directories to the network. An NFS server exports its directories by putting the names of these directories in the /etc/exports file and executing the exportfs command. In its simplest form, /etc/exports consists of lines of the form:
/cyclop/users /usr/share/man /usr/mail -access=homer:bart, root=homer -access=marge:maggie:lisa
# exportfs -va
This allows moe to mount /cyclop/users for reading and writing, and maps anonymous users (users from other hosts that do not exist on the local system and the root user from any remote system) to the UID -1. This corresponds to the nobody account, and it tells NFS not to allow such users access to anything. NFS Clients After the files, directories and/or filesystems have been exported, an NFS client must explicitly mount them before it can use them. It is handled by the mountd daemon (sometimes called rpc.mountd). The server examines the mount request to be sure the client has proper authorization.
The following syntax is used for the mount command. Note that the name of the server is followed by a colon and the directory to be mounted:
lsnfsmnt : Displays the characteristics of NFS mountable file systems. showmount -e : List exported filesystems.

# showmount -a
Start/Stop/Status NFS daemons In the following discussion, reference to daemon implies any one of the SRC-controlled
daemons (such as nfsd or biod). The NFS daemons can be automatically started at system (re)start by including the /etc/rc.nfs script in the /etc/inittab file. They can also be started manually by executing the following command:
If the /etc/exports file does not exist, the nfsd and the rpc.mountd daemons will not start. You can get around this by creating an empty /etc/exports file. This will allow the nfsd and the rpc.mountd daemons to start, although no filesystems will be exported. TOPICS: AIX, STORAGE, SYSTEM ADMINISTRATION
Working with disks

Each row describes one disk. The first column shows its name followed by the PVID and the volume group it belongs to. "None" in the last column indicates that the disk does not belong to any volume group. "Active" in the last column indicates, that the volume group is varied on. Existence of a PVID indicates possibility of presence of data on the disk. It is possible that
such disk belongs to a volume group which is varied off. Executing lspv with a disk name generates information only about this device:
# lspv hdisk4 PHYSICAL VOLUME: PV IDENTIFIER: PV STATE: STALE PARTITIONS: PP SZE: TOTAL PPs: FREE PPs: USED PPs: hdisk4 00c03c8a14fa936b active 0 16 megabyte(s) 639 (10224 megabytes) 599 (9584 megabytes) 40 (640 megabytes) ALLOCATABLE: yes VOLUME GROUP: VG IDENTIFIER: abc_vg 00c03b1a000
In the case of hdisks, we are able to determine its size, the number of logical volumes (two), the number of physical partitions in need of synchronization (Stale Partitions) and the number of VGDA's. Executing lspv against a disk without a volume group membership does nothing useful:
# lsattr -El hdisk0 -a size_in_mb
size_in_mb 73400 Size in Megabytes False
# lqueryvg -Atp hdisk2 0516-320 lqueryvg: hdisk2 is not assigned to a volume group. Max LVs: PP Size: Free PPs: LV count: PV count: Total VGDAs: Conc Allowed: MAX PPs per PV MAX PVs: Quorum (disk): Quorum (dd): Auto Varyon ?: Conc Autovaryo Varied on Conc Physical: 256 26 1117 0 3 3 0 1016 32 1 1 1 0 0 00c03b1a32e50767 00c03b1a32ee4222 00c03b1a9db2f183 Total PPs: LTG size: HOT SPARE: AUTO SYNC: VG PERMISSION: SNAPSHOT VG: IS_PRIMARY VG: PSNFSTPP: VARYON MODE: VG Type: Max PPs: 1117 128 0 0 0 0 0 4352 ??????? 0 32512 1 1 1 0 0 0

PV count: 3
Their PVIDs are:

LV count: 0
It is easy to notice that a disk belongs to a volume group. Logical volume names are the best proof of this. To display data stored on a disk you can use the command lquerypv. A PVID can be assigned to or removed from a disk if it does not belong to a volume group, by executing the command chdev.

# chpv -v r hdisk2
To allow I/O:
# chpv -v a hdisk2

# chpv -a n hdisk2

# chpv -a y hdisk2
The row labeled FREE DISTRIBUTION shows the number of free PPs in each band. The row labeled USED DISTRIBUTION shows the number of used PPs in each band. As you can see, some bands of this disk have no data. Presently, this policy lost its meaning as even the slowest disks are much faster then their predecesors. In the case of RAID or SAN disks, this
policy has no meaning at all. For those who still use individual SCSI or SSA disks, it is good to remember that the data closer to the outer edge is read/written the slowest. To learn what logical volumes are located on a given disk, you can execute command lspv -l hdisk#. The reversed relation is established executing lslv -M lv_name. It is always a good idea to know what adapter and what bus any disk is attached to. Otherwise, if one of the disks breaks, how will you know which disk needs to be removed and replaced? AIX has many commands that can help you. It is customary to start from the adapter, to identify all adapters known to the kernel:
You can get more details by executing command: lsattr -El hdisk0. This article has been based on an article published on wmduszyk.com. TOPICS: AIX, EMC, POWERHA / HACMP, STORAGE, STORAGE AREA NETWORK, SYSTEM ADMINISTRATION

There is a way to mount a share from a windows system as an NFS filesystem in AIX: 1. 2. 3. Install the CIFS software on the AIX server (this is part of AIX itself: bos.cifs_fs). Create a folder on the windows machine, e.g. D:\share. Create a local user, e.g. "share" (user IDs from Active Directory can not be used): Settings -> Control Panel -> User Accounts -> Advanced tab -> Advanced button -> Select Users -> Right click in right window and select "New User" -> Enter User-name, password twice, deselect "User must change password at next logon" and click on create and close and ok. 4. Make sure the folder on the D: drive (in this case "share") is shared and give the share a name (we'll use "share" again as name in this example) and give "full control" permissions to "Everyone". 5. Create a mountpoint on the AIX machine to mount the windows share on, e.g./mnt/share. 6. Type on the AIX server as user root:
7.
You're done!
JFS2 snapshots
JFS2 filesystems allow you to create file system snapshots. Creating a snapshot is actually creating a new file system, with a copy of the metadata of the original file system (the snapped FS). The snapshot (like a photograph) remains unchanged, so it's possible to backup the snapshot, while the original data can be used (and changed!) by applications. When data on the original file system changes, while a snapshot exists, the original data is copied to the snapshot to keep the snapshot in a consistant state. For these changes, you'll need temporary space, thus you need to create a snapshot of a specific size to allow updates while the snapshot exists. Usually 10% is enough. Database file systems are usually not a very good subject for creating snapshots, because all database files change constantly when the database is active, causing a lot of copying of data from the original to the snapshot file system. In order to have a snapshot you have to: Create and mount a JFS2 file system (source FS). You can find it in SMIT as "enhanced" file system. Create a snapshot of a size big enough to hold the changes of the source FS by issuing smitty crsnap. Once you have created this snapshot as a logical device or logical volume, there's a read-only copy of the data in source FS. You have to mount this device in order to work with this data.
Mount your snapshot device by issuing smitty mntsnap. You have to provide a directory name over which AIX will mount the snapshot. Once mounted, this device will be read-only. Creating a snapshot of a JFS2 file system:

Mount the snapshot:



When you restore data from a snapshot, be aware that the backup of the snapshot is actually a different file system in your backup system, so you have to specify a restore destination to restore the data to. TOPICS: AIX, SDD, STORAGE, STORAGE AREA NETWORK
PVID trouble
# chpv -C vpathxx

This will create a file called file.iso. Make sure you have enough storage space. Transfer this file to a PC with a CD-writer in it. Burn this ISO file to CD using Easy CD Creator or Nero. The CD will be usable in any AIX CD-ROM drive. TOPICS: AIX, SSA, STORAGE
SSA batteries
To find the status of the batteries of an SSA adapter, enter as root:
# ssa_fw_status -a ssaX
X is the number of your adapter, for example:

# ssa_fw_status -a ssa0 Fast write cache size: 32 Expected battery life: 22000 Powered on hours: 20706 Battery state: Active
After installing a new battery, enter the following command:

# ssa_format - l ssaX - b
This will reset the lifetime counter. TOPICS: AIX, STORAGE, SYSTEM ADMINISTRATION

Mount the DVD-RAM:

Then use this as a regular filesystem. TOPICS: AIX, SSA, STORAGE, SYSTEM ADMINISTRATION
Renaming pdisks
If this doesn't help (it sometimes will), then renumber the disks manually: Write down the pdisk names, hdisk names, location of the disks in the SSA drawer and the connection ID's of the disks. You can use lsdev -Cc pdisk to show you all the pdisks and the drawer and location codes. Use lsdev -Clpdiskx -Fconnwhere to show the connection ID of a pdisk. Then, figure out, how you want all disks numbered. Remove the pdisks and hdisks with the rmdev -dl command. Create the pdisks again:

Test with:
if it shows hdisk3 (Usually the hdisk number is 2 higher than the pdisk number if you use 2 SCSI disks in the rootvg). If you've done all disks this way, check with lsdev -Cc pdisk. If you're happy, then varyon the volume group again and mount all filesystems. TOPICS: AIX, PERFORMANCE, STORAGE, SYSTEM ADMINISTRATION

This wil create a file consisting of 2097152 blocks of 1024 bytes, which is 2GB. You can change the count value to anything you like. Be aware of the fact, that if you wish to create files larger than 2GB, that your file system needs to be created as a "large file enabled file system", otherwise the upper file size limit is 2GB (under JFS; under JFS2 the upper limit is 64GB). Also check the ulimit values of the user-id you use to create the large file: set the file limit to -1, which is unlimited. Usually, the file limit is default set to 2097151 in /etc/security/limits, which stands for 2097151 blocks of 512 bytes = 1GB. Another way to create a large file is:
# umount /BIG

Divide 2048/#seconds for MB/sec read speed. Tip: Run nmon (select a for adapter) in another window. You will see the throughput for each adapter. More information on JFS and JFS2 can be found here. TOPICS: AIX, STORAGE, SYSTEM ADMINISTRATION

EMC PowerPath key installation
This describes how to configure the EMC PowerPath registration keys. First, check the current configuration of PowerPath:
# powermt config Warning: all licenses for storage systems support are missing or expired.
The install the keys:

# emcpreg -install
=========== EMC PowerPath Registration =========== Do you have a new registration key or keys to enter?[n] y Enter the registration keys(s) for your product(s), one per line, pressing Enter after each key. After typing all keys, press Enter again.
Key (Enter if done): P6BV-4KDB-QET6-RF9A-QV9D-MN3V 1 key(s) successfully added. Key successfully installed.
Key (Enter if done): 1 key(s) successfully registered.
(Note: the license key used in this example is not valid). TOPICS: EMC, STORAGE, STORAGE AREA NETWORK
EMC Grab
EMC Grab is a utility that is run locally on each host and gathers storage-specific information (driver version, storage-technical details, etc). The EMC Grab report creates a zip file. This zip file can be used by EMC support. You can download the "Grab Utility" from the following locations: ftp://ftp.emc.com/pub/emcgrab/Unix/ When you've downloaded EMCgrab, and stored in a temporary location on the server like /tmp/emc, untar it using:
tar -xvf *tar
Then run:
/tmp/emc/emcgrab/emcgrab.sh
The script is interactive and finishes after a couple of minutes. TOPICS: EMC, STORAGE, STORAGE AREA NETWORK
Reset reservation bit

If you run into not being able to access an hdiskpowerX disk, you may need to reset the reservation bit on it:
# /usr/lpp/EMC/Symmetrix/bin/emcpowerreset fscsiX hdiskpowerX
BCV issue with Solution Enabler

There is a known bug on AIX with Solution Enabler, the software responsible for BCV backups. Hdiskpower devices dissapear and you need to run the following command to make them come back. This will happen when a server is rebooted. BCV devices are only visible on the target servers.
# /usr/lpp/EMC/Symmetrix/bin/mkbcv -a ALL hdisk2 Available hdisk3 Available hdisk4 Available hdisk5 Available hdisk6 Available hdisk7 Available hdisk8 Available hdiskpower1 Available hdiskpower2 Available hdiskpower3 Available hdiskpower4 Available
EMC and MPIO

You can run into an issue with EMC storage on AIX systems using MPIO (No Powerpath) for your boot disks: After installing the ODM_DEFINITONS of EMC Symmetrix on your client system, the system won't boot any more and will hang with LED 554 (unable to find boot disk). The boot hang (LED 554) is not caused by the EMC ODM package itself, but by the boot process not detecting a path to the boot disk if the first MPIO path does not corresponding to the fscsiX driver instance where all hdisks are configured. Let me explain that more in detail: Let's say we have an AIX system with four HBAs configured in the following order:
# lscfg -v | grep fcs fcs2 (wwn 71ca) -> no devices configured behind this fscsi2 driver instance (path only configured in CuPath ODM table) fcs3 (wwn 71cb) -> no devices configured behind this fscsi3 driver instance (path only configured in CuPath ODM table) fcs0 (wwn 71e4) -> no devices configured behind this fscsi0 driver instance (path only configured in CuPath ODM table) fcs1 (wwn 71e5) -> ALL devices configured behind this fscsi1 driver instance
To summarize, it is recommended to setup ONLY ONE path when installing an AIX to a SAN disk, then install the EMC ODM package then reboot the host and only after that is complete, add the other paths. Dy doing that we ensure that the fscsiX driver instance used for the boot process has the hdisk configured behind. TOPICS: HARDWARE, SDD, STORAGE, STORAGE AREA NETWORK


LED.
Have the IBM CE replace the adapter. Close any events on the failing adapter on the HMC. Validate that the notification LED is now off on the system, if not, go back into diag, choose Task Selection, Hot Plug Task, PCI Hot Plug Manager, and Disable the attention

# lscfg -vl fcs1

# cfgmgr -S

# addpaths


If you run:
# powermt restore

Or you could run:

# powermt check
TOPICS: EMC, INSTALLATION, ODM, STORAGE, STORAGE AREA NETWORK

From powerlink.emc.com: 1. Before making any changes, collect host logs to document the current configuration. At a minimum, save the following: inq, lsdev -Cc disk, lsdev -Cc adapter, lspv, and lsvg
2.
Shutdown the application(s), unmount the file system(s), and varyoff all volume groups except for rootvg. Do not export the volume groups.
4.

5.

6.

7.

8. 9.
# rmdev -Rdl fscsiX
# rmdev -l fcsX
# emc_cfgmgr
or
# cfgmgr -vl fcsx
# powermt config
14.
Display the status of EMC SAN devices

An easy way to see the status of your SAN devices is by using the following command:
# powermt display Symmetrix logical device count=6 CLARiiON logical device count=0 Hitachi logical device count=0 Invista logical device count=0 HP xp logical device count=0 Ess logical device count=0 HP HSx logical device count=0 ============================================================== - Host Bus Adapters ### HW Path --- I/O Paths ---Summary Total Dead ------ Stats -----IO/Sec Q-IOs Errors
============================================================== 0 fscsi0 1 fscsi1 optimal optimal 6 6 0 0 0 0 0 0
To get more information on the disks, use:

TOPICS: STORAGE, SYSTEM ADMINISTRATION
Inodes without filenames

It will sometimes occur that a file system reports storage to be in use, while you're unable to find which file exactly is using that storage. This may occur when a process has used disk storage, and is still holding on to it, without the file actually being there anymore for whatever reason. A good way to resolve such an issue, is to reboot the server. This way, you'll be sure the process is killed, and the disk storage space is released. However, if you don't want to use such drastic measures, here's a little script that may help you trying to find the process that may be responsible for an inode without a filename. Make sure you have lsof installed on your server.
#!/usr/bin/ksh
# Make sure to enter a file system to scan # as the first attribute to this script.
FILESYSTEM=$1 LSOF=/usr/sbin/lsof
# A for loop to get a list of all open inodes # in the filesystem using lsof. for i in `$LSOF -Fi $FILESYSTEM | grep î | sed s/i//g` ; do # Use find to list associated inode filenames. if [ `find $FILESYSTEM -inum $i` ] ; then echo > /dev/null else # If filename cannot be found, # then it is a suspect and check lsof output for this inode. echo Inode $i does not have an associated filename: $LSOF $FILESYSTEM | grep -e $i -e COMMAND fi done
TOPICS: SDD, STORAGE, STORAGE AREA NETWORK
Vpath commands
Check the relation between vpaths and hdisks:
# lsvpcfg
Check the status of the adapters according to SDD:

Check on stale partitions:

# lsvg -o | lsvg -i | grep -i stale
Reservation bit
For SDDPCM, use:

TOPICS: LINUX, STORAGE, STORAGE AREA NETWORK
Emulex hbanyware
If you have Emulex HBA''s and the hbanyware software installed, for example on Linux, then you can use the following commands to retrieve information about the HBA''s: To run a GUI version:
# /usr/sbin/hbanyware/hbanyware
To run the command-line verion:

# /usr/sbin/hbanyware/hbacmd listhbas
To get for attributes about a specific HBA:

# /usr/sbin/hbanyware/hbacmd listhbas 10:00:00:00:c9:6c:9f:d0
TOPICS: STORAGE, STORAGE AREA NETWORK
SAN introduction
SAN storage places the physical disk outside a computer system. It is now connected to a Storage Area Network (SAN). In a Storage Area Network, storage is offered to many systems, including AIX systems. This is done via logical blocks of disk space (LUNs). In the case of an AIX system, every SAN disk is seen as a seperate hdisk, with the advantage of easily expanding the AIX system with new SAN disks, avoiding buying and installing new physical hard disks.
Other advantages of SAN: Disk storage is no longer limited to the space in the computer system itself or the amount of available disk slots.
After the initial investment in the SAN network and storage, the costs of storage per gigabyte are less than disk space within the computer systems. Using two different SAN networks (fabrics), you can avoid having disruptions in your storage, the same as mirroring your data on separate disks. The two SAN fabrics should not be connected to each other.
Using two seperate, geographically dispersed storage systems (e.g. ESS), a disruption in a computer center will not cause your computer systems to go down. When you place to SAN network adapters (called Host Bay adapters on Fibre Channel or HBA) in every computer system, you can connect your AIX system to two different fabrics, thus increasing the availability of the storage. Also, you'll be able to load balance the disk storage over these two host bay adapters. You'll need Multipath I/O software (e.g. SDD or PowerPath) for this to work.
By using 2 HBAs, a defect in a single HBA will not cause downtime. AIX systems are able to boot from SAN disks. TOPICS: LINUX, STORAGE, VMWARE
Increasing the VMWare disk drive

If you have any VMWare images, where you made the disk size a little too small, then fortunately in VMWare Workstation you can change the size of a disk with a simple command line program. Sadly the command only makes your drive bigger not the actual partition. And especially Windows won't allow you to resize the partition where the Windows binaries are installed. So how can you get around that? First, create a copy of your vmdk file to somewhere else, should the next action fail for some reason. Then resize the disk to the required size:
# vmware-vdiskmanager -x 8GB myDisk.vmdk
You need to have plenty of disk space free to do this operation, as your vmdk file will be copied by vmware-vdiskmanager. BTW, this command may take a while, depending on the size of your vmdk file. Now get the ISO image of System Rescue CD-ROM and set the VMWare session to boot of the ISO image. Then, run QTParted. You can do this by starting this CD-ROM with a framebuffer (press F2 at start) and then run run_qtparted as soon as Linux has started. Select the windows drive partition with the right mouse button and choose resize. Set the new size and commit the change. Then exit from QTParted and from Linux (init 0). Remove the ISO image from the VMWare session and restart VMWare to normally start Windows. Windows will detect the disk change and force a chk_disk to run. Once Windows has started, the new disk size is present.
TOPICS: AIX, SSA, STORAGE
SSA Fast Write

For high-disk performance systems, such as SSA, it is wise to enable the fast write on the disks. To check which disks are fast write enabled, type:
# smitty ssafastw
Fast write needs cache memory on the SSA adapter. Check your amount of cache memory on the SSA adapter:
# lscfg -vl ssax
Where 'x' is the number of your SSA adapter. 128MB of SDRAM will suffice. Having 128MB of SDRAM memory makes sure you can use the full 32MB of cache memory. To enable the fast write the disk must not be in use. So either the volume groups are varied offline, or the disk is taken out of the volume group. Use the following command to enable the fast write cache:
# smitty chgssardsk

# ksh "fc -t -10"
This will list the last 10 commands for that history file. TOPICS: SYSTEM ADMINISTRATION, VIRTUAL I/O SERVER, VIRTUALIZATION
Accessing the virtual terminal on IVM managed hosts

Virtual clients running on a IVM (Integrated Virtualization Manager) do not have a direct atached serial console nor a virtual window which can be opened via an HMC. So how do you access the console? You can log on as the padmin user on the VIOS which is serving the client you want to logon to its console. Just log on to the VIOS, switch to user padmin:
# su - padmin
Then run the lssyscfg command to list the available LPARs and their IDs on this VIOS:
# lssyscfg -r lpar -F name,lpar_id
Alternatively you can log on to the IVM using a web browser and click on "View/Modify Partitions" which will also show LPAR names and their IDs. Use the ID of the LPAR you wish to access:
# mkvt -id [lparid]
This should open a console to the LPAR. If you receive a message "Virtual terminal is already connected", then the session is already in use. If you are sure no one else is using it, you can use the rmvt command to force the session to close.
# rmvt -id [lparid]
After that you can try the mkvt command again. When finished log off and type "~." (tilde dot) to end the session. Sometimes this will also close the session to the VIOS itself and you may need to logon to the VIOS again. TOPICS: AIX, SYSTEM ADMINISTRATION

# machstat -f 0 0 0
If it returns all zeroes, everything is fine. Anything else is not good. The first digit (the socalled EPOW Event) indicates the type of problem: EPOW Event 0 1 2 3 4 5 7 Description normal operation non-critical cooling problem non-critical power problem severe power problem - halt system severe problems - halt immediately unhandled issue unhandled issue
LVM command history

# alog -o -t lvmcfg

# alog -o -t lvmcfg | grep -v -E "workdir|exited|tellclvmd" [S 06/11/13-16:52:02:236 lvmstat.c 468] lvmstat -v testvg
[S 06/11/13-16:52:02:637 lvmstat.c 468] lvmstat -v rootvg [S 07/20/13-15:02:15:076 extendlv.sh 789] extendlv testlv 400 [S 07/20/13-15:02:33:199 chlv.sh 527] chlv -x 4096 testlv [S 08/22/13-12:29:16:807 chlv.sh 527] chlv -e x testlv [S 08/22/13-12:29:26:150 chlv.sh 527] chlv -e x fslv00 [S 08/22/13-12:29:46:009 chlv.sh 527] chlv -e x loglv00 [S 08/22/13-12:30:55:843 reorgvg.sh 590] reorgvg

# sleep 400


# fg sleep 400
But what if you wish to suspend a process that is not attached to a terminal, and is running in the background? This is where the kill command is useful. Using signal 17, you can suspend a process, and using signal 19 you can resume a process. This is how it works: First look up the process ID you wish to suspend:
# sleep 400 & [1] 8913102
root 14680240 10092788


# kill -19 8913102

# echo $RANDOM 19962 # echo $RANDOM
19360
The $RANDOM Korn shell built-in can also be used to generate numbers within a certain range, for example, if you want to run the sleep command for a random number of seconds. To sleep between 1 and 600 seconds (up to 10 minutes):

# echo vpm | kdb
For example:
0 1 2 3 4 5 6 7
0000000000000000 0000000000000000 0000000000000000 0000000000000000 00000000503536C7 0000000051609EAF 0000000051609E64 0000000051609E73
00000000 00000000 00000000 00000000 261137E1 036D61DC 036D6299 036D6224
00 00 00 00 00 02 02 02
Fix user accounts

Security guidelines nowadays can be annoying. Within many companies people have to comply with strict security in regards to password expiration settings, password complexity and system security settings. All these settings and regulations more than often result in people getting locked out from their accounts on AIX systems, and also getting frustrated. To help your users, you can't go change default security settings on the AIX systems. Your auditor will make sure you won't do that. But instead, there are some "tricks" you can do, to ensure that a user account is and stays available to your end user. We've put all those tricks together in one simple script, that can fix a user account, and we called it fixuser.ksh. You can run this script as often as you like and for any user that you like. It will help you to ensure that a user account is not locked, that AIX won't bug the user to change their password, that the user doesn't have a failed login count (from typing too many passwords), and a bunch of other stuff that usually will keep your users from logging in and getting pesky
"Access Denied" messages. The script will not change any default security settings, and it can easily be adjusted to run for several user accounts, or can be run from a crontab so user accounts stay enabled for your users. The script is a win-win situation for everyone: Your auditor is happy, because security settings are strict on your system; Your users are happy for being able to just login without any hassle; And the sys admin will be happy for not having to resolve login issues manually anymore. The script:
#!/usr/bin/ksh
unset user
if [ ! -z "${1}" ] ; then user=${1} fi
unset myid myid=ìd ${user} 2>/dev/null` if [ ! -z "${myid}" ] ; then echo "Fixing account ${user}..." fixit ${user} echo "Done." else echo "User ${user} does not exist." fi

There are 2 ways for reading the Diagnostics log file, located in /var/adm/ras/diag: The first option uses the diag tool. Run:
# diag
Then hit ENTER and select "Task Selection", followed by "Display Previous Diagnostic Results" and "Display Previous Results". The second option is to use diagrpt. Run:


The first command (viosbr) will create a backup of the configuration information to /home/padmin/cfgbackups. It will also schedule the command to run every day, and keep up to 10 files in /home/padmin/cfgbackups. The second command is the mksysb equivalent for a Virtual I/O Server: backupios. This command will create the mksysb image in the /mksysb folder, and exclude any ISO repositiory in rootvg, and anything else excluded in /etc/exclude.rootvg.

Run:
Make sure to remove file systems with the rmfs command before running restvg, or it will not run correctly. Or, you can just run it once, run the exportvg command for the same volume group, and run the restvg command again. There is also a "-s" flag for restvg that lets you shrink the file system to its minimum size needed, but depending on when the vgdata was created, you could run out of space, when restoring the contents of the file system. Just something to keep in mind. TOPICS: AIX, SYSTEM ADMINISTRATION
A quick way to remove all printer queues

Here's a quick way to remove all the printer queues from an AIX system:
/usr/lib/lpd/pio/etc/piolsvp -p | grep -v PRINTER | \ while read queue device rest ; do echo $queue $device rmquedev -q$queue -d$device rmque -q$queue done
Select the n'th line of a file
What if you want to get the 7th line of a text file. For example, you could get the 7th line of the /etc/hosts file, by using the head and tail commands, like this:
# head -7 /etc/hosts | tail -1 # Licensed Materials - Property of IBM
An even easier way to do it, is:

# sed -n 7p /etc/hosts # Licensed Materials - Property of IBM
Using the Command-Line Interface for LPM

Once you've successfully set up live partition mobility on a couple of servers, you may want to script the live partition mobility migrations, and at that time, you'll need the commands to perform this task on the HMC. In the example below, we're assuming you have multiple managed systems, managed through one HMC. Without, it would be difficult to move an LPAR from one managed system to another. First of all, to see the actual state of the LPAR that is to be migrated, you may want to start the nworms program, which is a small program that displays wriggling worms along with the serial number on your display. This allows you to see the serial number of the managed system that the LPAR is running on. Also, the worms will change color, as soon as the LPM migration has been completed. For example, to start nworms with 5 worms and an acceptable speed on a Power7 system, run:
# ./nworms 5 50000
Next, log on through ssh to your HMC, and see what managed systems are out there:
> lssyscfg -r sys -F name Server1-8233-E8B-SN066001R Server2-8233-E8B-SN066002R Server3-8233-E8B-SN066003R
It seems there are 3 managed systems in the example above. Now list the status of the LPARs on the source system, assuming you want to migrate from Server1-8233-E8B-SN066001R, moving an LPAR to Server2-8233-E8B-SN066002R:
> lslparmigr -r lpar -m Server1-8233-E8B-SN066001R name=vios1,lpar_id=3,migration_state=Not Migrating name=vios2,lpar_id=2,migration_state=Not Migrating name=lpar1,lpar_id=1,migration_state=Not Migrating
The example above shows there are 2 VIO servers and 1 LPAR on server Server1-8233E8B-SN066001R. Validate if it is possible to move lpar1 to Server2-82330E8B-SN066002R:
> migrlpar -o v -t Server2-8233-E8B-SN066002R -m Server1-8233-E8B-SN066001R --id 1 > echo $? 0
The example above shows a validation (-o v) to the target server (-t) from the source server (m) for the LPAR with ID 1, which we know from the lslparmigr command is our LPAR lpar1. If the command returns a zero, the validation has completed successfully. Now perform the actual migration:
> migrlpar -o m -t Server2-8233-E8B-SN066002R -m Server1-8233-E8B-SN066001R -p lpar1 &
This will take a couple a minutes, and the migration is likely to take longer, depending on the size of memory of the LPAR. To check the state:
> lssyscfg -r lpar -m Server1-8233-E8B-SN066001R -F name,state
Or to see the number of bytes transmitted and remaining to be transmitted, run:

> lslparmigr -r lpar -m Server1-8233-E8B-SN066001R -F name,migration_state,bytes_transmitted,bytes_remaining
Or to see the reference codes (which you can also see on the HMC gui):
> lsrefcode -r lpar -m Server2-8233-E8B-SN066002R lpar_name=lpar1,lpar_id=1,time_stamp=06/26/2012 15:21:24, refcode=C20025FF,word2=00000000 lpar_name=vios1,lpar_id=2,time_stamp=06/26/2012 15:21:47, refcode=,word2=03400000,fru_call_out_loc_codes= lpar_name=vios2,lpar_id=3,time_stamp=06/26/2012 15:21:33, refcode=,word2=03D00000,fru_call_out_loc_codes=
After a few minutes the lslparmigr command will indicate that the migration has been completed. And now that you know the commands, it's fairly easy to script the migration of multiple LPARs. TOPICS: AIX, STORAGE, SYSTEM ADMINISTRATION, VIRTUALIZATION

# odmget -q 'attribute = hcheck_interval AND uniquetype = \ PCM/friend/vscsi' PdAt | sed 's/deflt = \"0\"/deflt = \"60\"/' \ | odmchange -o PdAt -q 'attribute = hcheck_interval AND \ uniquetype = PCM/friend/vscsi'
Resolving LED code 555

If your system hangs with LED code 555, it will most likely mean that one of your rootvg file systems is corrupt. The following link will provide information on how to resolve it: http://www-304.ibm.com/support/docview.wss?uid=isg3T1000217 After completing the procedure, the system may still hang with LED code 555. If that happens, boot the system from media and enter service mode again, and access the volume group. Then check what the boot disk is according to:
# lslv -m hd5
Then also check your bootlist:

# bootlist -m normal -o
If these 2 don't match, set the boot list to the correct disk, as indicated by the lslv command above. For example, to set it to hdisk1, run:
# bootlist -m normal hdisk1
And then, make sure you can run the bosboot commands:
# bosboot -ad /dev/hdisk1 # bosboot -ad /dev/ipldevice
Note: exchange hdisk1 in the example above with the disk that was indicated by the lslv command. If the bosboot on the ipldevice fails, you have 2 options: Recover the system from a mksysb image, or recreate hd5. First, create a copy of your ODM:
# mount /dev/hd4 /mnt # mount /dev/hd2 /mnt/usr # mkdir /mnt/etc/objrepos/bak # cp /mnt/etc/objrepos/Cu* /mnt/etc/objrepos/bak # cp /etc/objrepos/Cu* /mnt/etc/objrepos # umount /dev/hd2 # umount /dev/hd4 # exit
Then, recreate hd5, for example, for hdisk1:

# rmlv hd5 # cd /dev # rm ipldevice # rm ipl_blv # mklv -y hd5 -t boot -ae rootvg 1 hdisk1 # ln /dev/rhd5 /dev/ipl_blv # ln /dev/rhdisk1 /dev/ipldevice # bosboot -ad /dev/hdisk1
If things still won't boot at this time, the only option you have left is to recover the system from a mksysb image. TOPICS: AIX, STORAGE, SYSTEM ADMINISTRATION
Mounting USB drive on AIX

To familiarize yourself with using USB drives on AIX, take a look at the following article at IBM developerWorks: http://www.ibm.com/developerworks/aix/library/au-flashdrive/ Before you start using it, make sure you DLPAR the USB controller to your LPAR, if not done so already. You should see the USB devices on your system:
# lsconf | grep usb + usbhc0 U78C0.001.DBJX589-P2 + usbhc1 U78C0.001.DBJX589-P2 + usbhc2 U78C0.001.DBJX589-P2 USB Host Controller USB Host Controller USB Enhanced Host Controller
+ usbms0 U78C0.001.DBJX589-P2-C8-T5-L1 USB Mass Storage
After you plug in the USB drive, run cfgmgr to discover the drive, or if you don't want the run the whole cfgmgr, run:
# /etc/methods/cfgusb -l usb0
Some devices may not be recognized by AIX, and may require you to run the lquerypv command:
# lquerypv -h /dev/usbms0
To create a 2 TB file system on the drive, run:

# mkfs -olog=INLINE,ea=v2 -s2000G -Vjfs2 /dev/usbms0
To mount the file system, run:

# mount -o log=INLINE /dev/usbms0 /usbmnt
Then enjoy using a 2 TB file system:

# df -g /usbmnt Filesystem /dev/usbms0 GB blocks 2000.00 Free %Used 1986.27 1% Iused %Iused Mounted on 3182 1% /usbmnt
Parent process ID
It's very easy to determine the parent process ID, without looking it up in the process list. For example for the current korn shell process, you can determine the parent process of the korn shell process, by looking at the process list:
# ps -ef | grep ksh | grep -v grep root 8061040 17891578 0 22:28:32 pts/0 0:00 -ksh
In the example above you can see that the parent process of the korn shell process with PID 8061040 is 17891578. The same answer can be retrieved by simply looking at the PPID variable:
# echo $PPID 17891578
Generating a PDF file from a text file on AIX

It's quite easy to generate a PDF file from a text file on AIX. What you need to do that, are both Enscript and Ghostscript installed. Both of them are RPMs that can be found in the AIX Toolbox for Linux Applications. Download the enscript and ghostscript RPMs, along with the RPMs ghostscript-fonts, libpng and urw-fonts, which are required by Ghostscript to run. Put the 5 RPMs in a single folder on your AIX system and run the following command to install all five:
# rpm -ihv *rpm
Now that it is installed, you can easily generate a PDF file. For example, if you wish to generate a PDF file from /etc/motd, run the following command:
# /opt/freeware/bin/enscript -B -p - /etc/motd \ | /usr/bin/gs -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite \ -sOutputFile=/tmp/motd.pdf -f -
The PDF file is written to /tmp/motd.pdf, and can be viewed with any PDF viewer. TOPICS: AIX, SYSTEM ADMINISTRATION
How to create a null printer

To create a printer queue that dumps its contents to /dev/null, simply run the following command:
# /usr/lib/lpd/pio/etc/piomkpq -A file -p generic -d null -D asc -q lpnull
This will create a printer queue called lpnull and it will write any print jobs to /dev/null. TOPICS: AIX, HARDWARE, STORAGE, SYSTEM ADMINISTRATION

At some times it may be necessary to create a dummy disk device, for example when you need a disk to be discovered while running cfgmgr with a certain name on multiple hosts. For example, if you need the disk to be called hdisk2, and only hdisk0 exists on the system, then running cfgmgr will discover the disk as hdisk1, not as hdisk2. In order to make sure cfgmgr indeed discovers the new disk as hdisk2, you can fool the system by temporarily
creating a dummy disk device. Here are the steps involved: First: remove the newly discovered disk (in the example below known as hdisk1 - we will configure this disk as hdisk2):
# rmdev -dl hdisk1


# rmdev -dl hdisk1
Erasing disks
# diag -T format
This will start the Format media service aid in a menu driven interface. If prompted, choose your terminal. You will then be presented with a resource selection list. Choose the hdisk devices you want to erase from this list and commit your changes according to the instructions on the screen. Once you have committed your selection, choose Erase Disk from the menu. You will then be asked to confirm your selection. Choose Yes. You will be asked if you want to Read data from
drive or Write patterns to drive. Choose Write patterns to drive. You will then have the opportunity to modify the disk erasure options. After you specify the options you prefer, choose Commit Your Changes. The disk is now erased. Please note, that it can take a long time for this process to complete. If you want to do it quick-and-dirty: For each disk, use the dd command to overwrite the data on the disk. For example:
This does the trick, as it reads zeroes from /dev/zero and outputs 10 times 1024 zeroes to each disk. That overwrites anything on the start of the disk, rendering the disk useless. TOPICS: AIX, SYSTEM ADMINISTRATION
Unconfiguring child objects

When removing a device on AIX, you may run into a message saying that a child device is not in a correct state. For example:
# rmdev -dl fcs3 Method error (/usr/lib/methods/ucfgcommo): 0514-029 Cannot perform the requested function because a child device of the specified device is not in a correct state.
To determine what the child devices are, use the -p option of the lsdev command. From the man page of the lsdev command:
-p Parent Specifies the device logical name from the Customized Devices object class for the parent of devices to be displayed. The -p Parent flag can be used to show the child devices of the given Parent. The Parent argument to the -p flag may contain the same wildcard charcters that can be used with the odmget command. This flag cannot be used with the -P flag.
For example:
# lsdev -p fcs3 fcnet3 Defined 07-01-01 Fibre Channel Network Protocol Device
fscsi3 Available 07-01-02 FC SCSI I/O Controller Protocol Device
To remove the device, and all child devices, use the -R option. From the man page for the rmdev command:
-R Unconfigures the device and its children. When used with the -d or -S flags, the
children are undefined or stopped, respectively.
The command to remove adapter fcs3 and all child devices, will be:
# rmdev -Rdl fcs3
mkpasswd
# visudo
# su - pete $ sudo -l User pete may run the following commands on this host: (ALL) NOPASSWD: /usr/local/bin/mkpasswd, !/usr/local/bin/mkpasswd root


When you copy the /etc/passwd and /etc/group files, make sure they contain at least a minimum set of essential user and group definitions. Listed specifically as users are the following: root, daemon, bin, sys, adm, uucp, guest, nobody, lpd
Listed specifically as groups are the following: system, staff, bin, sys, adm, uucp, mail, security, cron, printq, audit, ecs, nobody, usr If the bos.compat.links fileset is installed, you can copy the /etc/security/mkuser.defaults file over. If it is not installed, the file is located as mkuser.default in the /usr/lib/security directory. If you copy over mkuser.defaults, changes must be made to the stanzas. Replace group with pgrp, and program with shell. A proper stanza should look like the following:
This will clear up any discrepancies (such as uucp not having an entry in /etc/security/passwd). Ideally this should be run on the source system before copying over the files as well as after porting these files to the new system. NOTE: It is possible to find user ID conflicts when migrating users from older versions of AIX to newer versions. AIX has added new user IDs in different release cycles. These are reserved IDs and should not be deleted. If your old user IDs conflict with the newer AIX system user IDs, it is advised that you assign new user IDs to these older IDs. From: http://www-01.ibm.com/support/docview.wss?uid=isg3T1000231 TOPICS: AIX, STORAGE AREA NETWORK, SYSTEM ADMINISTRATION

This error can occur if the fibre channel adapter is extremely busy. The AIX FC adapter driver is trying to map an I/O buffer for DMA access, so the FC adapter can read or write into the buffer. The DMA mapping is done by making a request to the PCI bus device driver.
The PCI bus device driver is saying that it can't satisfy the request right now. There was simply too much IO at that moment, and the adapter couldn't handle them all. When the FC adapter is configured, we tell the PCI bus driver how much resource to set aside for us, and it may have gone over the limit. It is therefore recommended to increase the max_xfer_size on the fibre channel devices. It depends on the type of fibre channel adapter, but usually the possible sizes are: 0x100000, 0x200000, 0x400000, 0x800000, 0x1000000 To view the current setting type the following command:

VLAN to be set up: PVID 4. This number is basically randomly chosen; it could have been 23 or 67 or whatever, as long as it is not yet in use. Proper documentation of your VIO setup and the defined networks, is therefore important. Steps to set this up: Log in to HMC GUI as hscroot. Change the default profile of server1, and add a new virtual Ethernet adapter. Set the port virtual Ethernet to 4 (PVID 4). Select "This adapter is required for virtual server
activation". Configuration -> Manage Profiles -> Select "Default" -> Actions -> Edit -> Select "Virtual Adapters" tab -> Actions -> Create Virtual Adapter -> Ethernet adapter -> Set "Port Virtual Ethernet" to 4 -> Select "This adapter is required for virtual server activation." -> Click Ok -> Click Ok -> Click Close. Do the same for server2. Now do the same for both VIO clients, but this time do "Dynamic Logical Partitioning". This way, we don't have to restart the nodes (as we previously have only updated the default profiles of both servers), and still get the virtual adapter. Run cfgmgr on both nodes, and see that you now have an extra Ethernet adapter, in my case ent1. Run "lscfg -vl ent1", and note the adapter ID (in my case C5) on both nodes. This should match the adapter IDs as seen on the HMC. Now configure the IP address on this interface on both nodes. Add the entries for server1priv and server2priv in /etc/hosts on both nodes. Run a ping: ping server2priv (from server1) and vice versa. Done! Steps to throw it away: On each node: deconfigure the en1 interface:

# lsdev -Cc adapter
Done! TOPICS: AIX, POWERHA / HACMP, SYSTEM ADMINISTRATION

There are a number of possible causes: clinfoES or snmpd subsystems are not active.
snmp is unresponsive. snmp is not configured correctly. Cluster services are not active on any nodes.
Refer to the HACMP Administration Guide for more information. Additional information for verifying the SNMP configuration on AIX 6 can be found in /usr/es/sbin/cluster/README5.5.0.UPDATE
To:
# stopsrc -s hostmibd # stopsrc -s snmpmibd # stopsrc -s aixmibd # stopsrc -s snmpd # sleep 4 # chssys -s hostmibd -a "-c public" # chssys -s aixmibd # chssys -s snmpmibd # sleep 4 # startsrc -s snmpd # startsrc -s aixmibd # startsrc -s snmpmibd # startsrc -s hostmibd # sleep 120 # stopsrc -s clinfoES # startsrc -s clinfoES # sleep 120 -a "-c public" -a "-c public"
After that, clstat, cldump and snmpinfo should work. TOPICS: AIX, SYSTEM ADMINISTRATION
Too many open files

To determine if the number of open files is growing over a period of time, issue lsof to report the open files against a PID on a periodic basis. For example:
# lsof -p (PID of process) -r (interval) > lsof.out
Note: The interval is in seconds, 1800 for 30 minutes. This output does not give the actual file names to which the handles are open. It provides only the name of the file system (directory) in which they are contained. The lsof command indicates if the open file is associated with an open socket or a file. When it references a file, it identifies the file system and the inode, not the file name.
Run the following command to determine the file name:

# df -kP filesystem_from_lsof | awk '{print $6}' | tail -1
Now note the filesystem name. And then run:

# find filesystem_name -inum inode_from_lsof -print
This will show the actual file name. To increase the number, change or add the nofiles=XXXXX parameter in the /etc/security/limits file, run:
# chuser nofiles=XXXXX user_id
You can also use svmon:

# svmon -P java_pid -m | grep pers
This lists opens files in the format: filesystem_device:inode. Use the same procedure as above for finding the actual file name. TOPICS: AIX, SECURITY, SYSTEM ADMINISTRATION

# dsh -n server date server.domain.com: Host key verification failed. dsh: 2617-009 server.domain.com remote shell had exit code 255

# ssh server
In this example, were using the mksysb image of a Virtual I/O server, created using iosbackup. This is basically the same as a mksysb image from a regular AIX system. The image file for this mksysb backup is called vio1.mksysb First, try to locate the file you're looking for; For example, if you're looking for file nimbck.ksh:

This will create a copy of logical volume "lvname" to a file "lvname.dd" in file system /file/system. Make sure that wherever you write your output file to (in the example above to /file/system) has enough disk space available to hold a full copy of the logical volume. If the
logical volume is 100 GB, you'll need 100 GB of file system space for the copy. If you want to test how this works, you can create a logical volume with a file system on top of it, and create some files in that file system. Then unmount he filesystem, and use dd to copy the logical volume as described above. Then, throw away the file system using "rmfs -r", and after that has been completed, recreate the logical volume and the file system. If you now mount the file system, you will see, that it is empty. Unmount the file system, and use the following dd command to restore your backup copy:
Then, mount the file system again, and you will see that the contents of the file system (the files you've placed in it) are back. TOPICS: AIX, HARDWARE, SYSTEM ADMINISTRATION

# /usr/lpp/diagnostics/bin/usysident ? usage: usysident [-s {normal | identify}] [-l location code | -d device name] usysident [-t]




Keep in mind that activating the LED of a particular device does not activate the LED of the system panel. You can achieve that if you omit the device parameter. TOPICS: AIX, LVM, SYSTEM ADMINISTRATION

Getting disk devices named the same way on, for example, 2 nodes of a PowerHA cluster, can be really difficult. For us humans though, it's very useful to have the disks named the same way on all nodes, so we can recognize the disks a lot faster, and don't have to worry about picking the wrong disk. The way to get around this usually involved either creating dummy disk devices or running configuration manager on a specific adapter, like: cfgmgr -vl fcs0. This complicated procedure is not needed any more since AIX 7.1 and AIX 6.1 TL6, because a new command has been made available, called rendev, which is very easy to use for renaming devices:
Lsmksysb
rootvg: LV NAME hd5 hd6 hd8 hd4 hd2 hd9var TYPE boot paging jfs2log jfs2 jfs2 jfs2 LPs 1 32 1 8 40 40 PPs 2 64 2 16 80 80 PVs 2 2 2 2 2 2 LV STATE closed/syncd open/syncd open/syncd open/syncd open/syncd open/syncd MOUNT POINT N/A N/A N/A / /usr /var
hd3 hd1 hd10opt dumplv1 dumplv2 hd11admin
jfs2 jfs2 jfs2 sysdump sysdump jfs2
40 8 8 16 16 1
80 16 16 16 16 2
2 2 2 1 1 2
open/syncd open/syncd open/syncd open/syncd open/syncd open/syncd
/tmp /home /opt N/A N/A /admin

The scalable VG implementation in AIX 5L Version 5.3 provides configuration flexibility with respect to the number of PVs and LVs that can be accommodated by a given instance of the new VG type. The configuration options allow any scalable VG to contain 32, 64, 128, 256, 512, 768, or 1024 disks and 256, 512, 1024, 2048, or 4096 LVs. You do not need to configure the maximum values of 1024 PVs and 4096 LVs at the time of VG creation to account for potential future growth. You can always increase the initial settings at a later date as
required. The System Management Interface Tool (SMIT) and the Web-based System Manager graphical user interface fully support the scalable VG. Existing SMIT panels, which are related to VG management tasks, have been changed and many new panels added to account for the scalable VG type. For example, you can use the new SMIT fast path _mksvg to directly access the Add a Scalable VG SMIT menu. The user commands mkvg, chvg, and lsvg have been enhanced in support of the scalable VG type. For more information: http://www.ibm.com/developerworks/aix/library/au-aix5l-lvm.html. TOPICS: AIX, LVM, SYSTEM ADMINISTRATION

# ipl_varyon -i

# bosboot -ad /dev/hdisk2 bosboot: Boot image is 38224 512 byte blocks. # bosboot -ad /dev/hdisk3 bosboot: Boot image is 38224 512 byte blocks.



When you run the mirrorvg command, you will (by default) lock the volume group it is run against. This way, you have no way of knowing what the status is of the sync process that occurs after mirrorvg has run the mklvcopy commands for all the logical volumes in the
volume group. Especially with very large volume groups, this can be a problem. The solution however is easy: Make sure to run the mirrorvg command with the -s option, to prevent it to run the sync. Then, when mirrorvg has completed, run the syncvg yourself with the -P option. For example, if you wish to mirror the rootvg from hdisk0 to hdisk1:


# lsfs /opt Name Nodename Mount Pt VFS /opt jfs2 Size Options Auto Accounting yes no
/dev/hd10opt --
4194304 --
# getlvcb -AT hd10opt AIX LVCB intrapolicy = c copies = 2 interpolicy = m lvid = 00f69a1100004c000000012f9dca819a.9 lvname = hd10opt label = /opt machine id = 69A114C00 number lps = 8 relocatable = y
strict = y stripe width = 0 stripe size in exponent = 0 type = jfs2 upperbound = 32 fs = vfs=jfs2:log=/dev/hd8:vol=/opt:free=false:quota=no time created = Thu Apr 28 20:26:36 2011
You can clearly see the "time created" for this file system in the example above. TOPICS: SYSTEM ADMINISTRATION, VIRTUAL I/O SERVER
Unable to open file: ioscli.log for append

After installing VIO version 2.2.0.11-FP-24 SP-02, with some commands the following error will show up:
padmin $ netstat -num -state Name en5 en5 lo0 lo0 lo0 Mtu 1500 1500 Network link#2 Address Ipkts Ierrs Opkts Oerrs 0 0 0 0 0 1855 1855 458 458 458 0 0 0 0 0 Coll 0 0 0 0 0
5c.f3.fc.6.61.c1 10495 10495 458 127.0.0.1 458 458
10.188.107 10.188.107.55
16896 link#1 16896 127 16896 ::1%1
Unable to open file: ioscli.log for append
The solution is simple: correct the permissions on the /home/padmin folder:

padmin $ oem_setup_env # chmod 755 /home/padmin # chown padmin:staff /home/padmin # exit
TOPICS: AIX, ORACLE, SDD, STORAGE, SYSTEM ADMINISTRATION

root@node1 # lspv | grep vpath | grep -i none vpath6 vpath7 00f69a11a2f620c5 00f69a11a2f622c8 None None
vpath8 vpath13 vpath14
00f69a11a2f624a7 00f69a11a2f62f1f 00f69a11a2f63212
None None None
root#node2 # cd /dev root@node2 # lspv|grep vpath|grep None|awk '{print $1}'|xargs ls -als 0 brw------0 brw------0 brw------0 brw------0 brw------1 root 1 root 1 root 1 root 1 root system system system system system 47, 47, 47, 47, 4 Apr 29 13:33 vpath4 5 Apr 29 13:33 vpath5 6 Apr 29 13:33 vpath6 9 Apr 29 13:33 vpath9
47, 10 Apr 29 13:33 vpath10
On server node2:
mknod /dev/ocr_disk01 c 47 4 mknod /dev/ocr_disk02 c 47 5
mknod /dev/voting_disk01 c 47 6 mknod /dev/voting_disk02 c 47 9 mknod /dev/voting_disk03 c 47 10

# varyoffvg vg
And then retry the LVM command. If it continues to be a problem, then stop HACMP on both nodes, export the volume group and re-import the volume group on both nodes, and then restart the cluster. TOPICS: INSTALLATION, SYSTEM ADMINISTRATION
Can't open virtual terminal

If you get the following message when you open a vterm:
The session is reserved for physical serial port communication.
Then this may be caused by the fact that your system is still in MDC, or manufactoring default configuration mode. It can easily be resolved: Power down your frame. Power it back up to standby status. Then, when activating the default LPAR, choose "exit the MDC". TOPICS: AIX, LVM, SYSTEM ADMINISTRATION

# errpt # exportvg testvg # importvg -y testvg vpath51 testvg # ls -als /dev/*testlv 0 crw-rw---- 1 root system 57, 3 Mar 10 15:11 /dev/rtestlv 0 brw-rw---- 1 root system 57, 3 Mar 10 15:11 /dev/testlv
# chlv -U user -G staff -P 777 testlv # ls -als /dev/*testlv 0 crwxrwxrwx 1 user staff 57, 3 Mar 10 15:11 /dev/rtestlv 0 brwxrwxrwx 1 user staff 57, 3 Mar 10 15:11 /dev/testlv # readvgda vpath51 | egrep "lvname|dev_|Logical"
lvname: dev_uid: dev_gid: dev_perm:
testlv (i=2) 3878 1 511
# chlv -U root -G system -P 660 testlv # ls -als /dev/*testlv 0 crw-rw---- 1 root system 57, 3 Mar 10 15:14 /dev/rtestlv 0 brw-rw---- 1 root system 57, 3 Mar 10 15:14 /dev/testlv # chown user.staff /dev/testlv /dev/rtestlv # chmod 777 /dev/testlv /dev/rtestlv # ls -als /dev/*testlv 0 crwxrwxrwx 1 user staff 57, 3 Mar 10 15:14 /dev/rtestlv 0 brwxrwxrwx 1 user staff 57, 3 Mar 10 15:14 /dev/testlv # readvgda vpath51 | egrep "lvname|dev_|Logical" lvname: dev_uid: dev_gid: dev_perm: testlv (i=2) 0 0 360
# varyoffvg testvg # exportvg testvg # importvg -Ry testvg vpath51 testvg # ls -als /dev/*testlv 0 crw-rw---- 1 root system 57, 3 Mar 10 15:23 /dev/rtestlv 0 brw-rw---- 1 root system 57, 3 Mar 10 15:23 /dev/testlv
So, when you have customized user/group/mode settings for logical volumes, and you need to export and import the volume group, always make sure to use the -R option when running
importvg. Also, make sure never to use the chmod/chown/chgrp commands on logical volume block and character devices in /dev, but use the chlv command instead, to make sure the VGDA is updated accordingly. Note: A regular volume group does not store any customized owner/group/mode in the VGDA. It is only stored for Big or Scalable volume groups. In case you're using a regular volume group with customized owner/group/mode settings for logical volumes, you will have to use the chmod/chown/chgrp commands to update it, especially after exporting and reimporting the volume group. TOPICS: AIX, SYSTEM ADMINISTRATION
Using colors in Korn Shell

Here are some color codes you can use in the Korn Shell:
## Reset to normal: \033[0m NORM="\033[0m"
## Colors: BLACK="\033[0;30m" GRAY="\033[1;30m" RED="\033[0;31m" LRED="\033[1;31m" GREEN="\033[0;32m" LGREEN="\033[1;32m" YELLOW="\033[0;33m" LYELLOW="\033[1;33m" BLUE="\033[0;34m" LBLUE="\033[1;34m" PURPLE="\033[0;35m" PINK="\033[1;35m" CYAN="\033[0;36m" LCYAN="\033[1;36m" LGRAY="\033[0;37m" WHITE="\033[1;37m"
## Backgrounds BLACKB="\033[0;40m" REDB="\033[0;41m" GREENB="\033[0;42m" YELLOWB="\033[0;43m"
BLUEB="\033[0;44m" PURPLEB="\033[0;45m" CYANB="\033[0;46m" GREYB="\033[0;47m"
## Attributes: UNDERLINE="\033[4m" BOLD="\033[1m" INVERT="\033[7m"
## Cursor movements CUR_UP="\033[1A" CUR_DN="\033[1B" CUR_LEFT="\033[1D" CUR_RIGHT="\033[1C"
## Start of display (top left) SOD="\033[1;1f"
Just copy everyting above and paste it into your shell or in a script. Then, you can use the defined variables:
## Example - Red underlined echo "${RED}${UNDERLINE}This is a test!${NORM}"
## Example - different colors echo "${RED}This ${YELLOW}is ${LBLUE}a ${INVERT}test!${NORM}"
## Example - cursor movement # echo " ${CUR_LEFT}Test"
## Create a rotating thingy while true ; do printf "${CUR_LEFT}/" perl -e "use Time::HiRes qw(usleep); usleep(100000)" printf "${CUR_LEFT}-" perl -e "use Time::HiRes qw(usleep); usleep(100000)" printf "${CUR_LEFT}\\" perl -e "use Time::HiRes qw(usleep); usleep(100000)" printf "${CUR_LEFT}|" perl -e "use Time::HiRes qw(usleep); usleep(100000)" done
Note that the perl command used above will cause a sleep of 0.1 seconds. Perl is used here, because the sleep command can't be used to sleep less than 1 second. TOPICS: AIX, INSTALLATION, SYSTEM ADMINISTRATION
Compare_report
# compare_report -b /tmp/node1 -o /tmp/node2 -h #(basehigher.rpt) #Base System Installed Software that is at a higher level #Fileset_Name:Base_Level:Other_Level idsldap.clt64bit62.rte:6.2.0.5:6.2.0.4 idsldap.clt_max_crypto64bit62.rte:6.2.0.5:6.2.0.4 idsldap.cltbase62.adt:6.2.0.5:6.2.0.4 idsldap.cltbase62.rte:6.2.0.5:6.2.0.4 idsldap.cltjava62.rte:6.2.0.5:6.2.0.4 idsldap.msg62.en_US:6.2.0.5:6.2.0.4 idsldap.srv64bit62.rte:6.2.0.5:6.2.0.4 idsldap.srv_max_cryptobase64bit62.rte:6.2.0.5:6.2.0.4
idsldap.srvbase64bit62.rte:6.2.0.5:6.2.0.4 idsldap.srvproxy64bit62.rte:6.2.0.5:6.2.0.4 idsldap.webadmin62.rte:6.2.0.5:6.2.0.4 idsldap.webadmin_max_crypto62.rte:6.2.0.5:6.2.0.4 AIX-rpm:6.1.3.0-6:6.1.3.0-4
FIRMWARE_EVENT
If FIRMWARE_EVENT entries appear in the AIX error log without FRU or location code callout, these events are likely attributed to an AIX memory page deconfiguration event, which is the result of a single memory cell being marked as unusable by the system firmware. The actual error is and will continue to be handled by ECC; however, notification of the unusable bit is also passed up to AIX. AIX in turn migrates the data and deallocates the memory page associated with this event from its memory map. This process is an AIX RAS feature which became available in AIX 5.3 and provides extra memory resilience and is no cause for alarm. Since the failure represents a single bit, a hardware action is NOT warranted. To suppress logging, the following command will have to be entered and the partition will have to be rebooted to make the change effective:
# chdev -l sys0 -a log_pg_dealloc=false
Check the current status:

# lsattr -El sys0 -a log_pg_dealloc
More information about this function can be found in the "Highly Available POWER Servers for Business-Critical Applications" document which is available at the following link: ftp://ftp.software.ibm.com/common/ssi/rep_wh/n/POW03003USEN/POW03003USEN.PDF(se e pages 17-22 specifically). TOPICS: AIX, NETWORKING, SYSTEM ADMINISTRATION
Using iptrace

AIX-rpm is a "virtual" package which reflects what has been installed on the system by installp. It is created by the /usr/sbin/updtvpkg script when the rpm.rte is installed, and can be run anytime the administrator chooses (usually after installing something with installp that is required to satisfy some dependency by an RPM package). Since AIX-rpm has to have some sort of version number, it simply reflects the level of bos.rte
on the system where /usr/sbin/updtvpkg is being run. It's just informational - nothing should be checking the level of AIX-rpm. AIX doesn't just automatically run /usr/sbin/updtvpkg every time that something gets installed or deinstalled because on some slower systems with lots of software installed, /usr/sbin/updtvpkg can take a LONG time. If you want to run the command manually:
# rpm --rebuilddb
Once you run updtvpkg, you can run a rpm -qa to see your new AIX-rpm package. TOPICS: AIX, SYSTEM ADMINISTRATION
PRNG is not SEEDED

If you get a message "PRNG is not SEEDED" when trying to run ssh, you probably have an issue with the /dev/random and/or /dev/urandom devices on your system. These devices are created during system installation, but may sometimes be missing after an AIX upgrade. Check permissions on random numbers generators, the "others" must have "read" access to these devices:
# ls -l /dev/random /dev/urandom crw-r--r-- 1 root system 39, 0 Jan 22 10:48 /dev/random crw-r--r-- 1 root system 39, 1 Jan 22 10:48 /dev/urandom
If the permissions are not set correctly, change them as follows:

# chmod o+r /dev/random /dev/urandom
Now stop and start the SSH daemon again, and retry if ssh works.
# stopsrc -s sshd # startsrc -s sshd
If this still doesn't allow users to use ssh and the same message is produced, or if devices /dev/random and/or /dev/urandom are missing:
# stopsrc -s sshd # rm -rf /dev/random # rm -rf /dev/urandom # mknod /dev/random c 39 0 # mknod /dev/urandom c 39 1 # randomctl -l # ls -ald /dev/random /dev/urandom # startsrc -s sshd
Using lvmstat
iocnt 306653 34 453
Kb_read 47493022 0 234543
Kb_wrtn 383822 3340 234343
Kbps 103.2 2.8 89.3

A common issue on AIX servers is, that logical volumes are configured on only one single disk, sometimes causing high disk utilization on a small number of disks in the system, and
impacting the performance of the application running on the server. If you suspect that this might be the case, first try to determine which disks are saturated on the server. Any disk that is in use more than 60% all the time, should be considered. You can use commands such as iostat, sar -d, nmon and topas to determine which disks show high utilization. If the do, check which logical volumes are defined on that disk, for example on an IBM SAN disk:
# lspv -l vpath23
# lslv -m prodlv
# lslv prodlv
# chlv -u 32 prodlv
# lsvg -p prodvg | sort -nk4 | grep -v vpath408 | tail -8 vpath188 active 959 40 00..00..00..00..40
vpath163 vpath208 vpath205 vpath194 vpath24 vpath304 vpath161
active active active active active active active
959 959 959 959 959 959 959
42 96 192 240 243 340 413
00..00..00..00..42 00..00..96..00..00 102..00..00..90..00 00..00..00..48..192 00..00..00..51..192 00..89..152..99..00 14..00..82..125..192

vpath304 prodlv

# syncvg -l prodlv

Then check again:

# lslv -m prodlv

#!/bin/ksh
Note: the script assumes that you've stored the NMON output files in /var/msgs/nmon. Update the script to the folder you're using to store NMON files. TOPICS: AIX, SYSTEM ADMINISTRATION
Difference between major and minor numbers

A major number refers to a type of device, and a minor number specifies a particular device of that type or sometimes the operation mode of that device type.
Example:
# lsdev -Cc tape rmt0 Available 3F-08-02 IBM 3580 Ultrium Tape Drive (FCP) rmt1 Available 3F-08-02 IBM 3592 Tape Drive (FCP) smc0 Available 3F-08-02 IBM 3576 Library Medium Changer (FCP)
In the list above: rmt1 is a standalone IBM 3592 tape drive; rmt0 is an LTO4 drive of a library; smc0 is the medium changer (or robotic part) of above tape library. Now look at their major and minor numbers:
# ls -l /dev/rmt* /dev/smc* crw-rw-rwT 1 root system 38, 0 Nov 13 17:40 /dev/rmt0 crw-rw-rwT 1 root system 38,128 Nov 13 17:40 /dev/rmt1 crw-rw-rwT 1 root system 38, 1 Nov 13 17:40 /dev/rmt0.1 crw-rw-rwT 1 root system 38, 66 Nov 13 17:40 /dev/smc0
All use IBM tape device driver (and so have the same major number of 38), but actually they are different entities (with minor number of 0, 128 and 66 respectively). Also, compare rmt0 and rmt0.1. It's the same device, but with different mode of operation. TOPICS: AIX, SYSTEM ADMINISTRATION
Longer login names

User names can only be eight characters or fewer in AIX version 5.2 and earlier. Starting with AIX version 5.3, IBM increased the maximum number of characters to 255. To verify the setting in AIX 5.3 and later, you can extract the value from getconf:
Or use lsattr:
# lsattr -El sys0 -a max_logname max_logname 9 Maximum login name length at boot time True
To change the value, simply adjust the v_max_logname parameter (shown as max_logname in lsattr) using chdev to the maximum number of characters desired plus one to accommodate the terminating character. For example, if you want to have user names that are 128 characters long, you would adjust the v_max_logname parameter to 129:
# chdev -l sys0 -a max_logname=129 sys0 changed
Please note that this change will not go into effect until you have rebooted the operating system. Once the server has been rebooted, you can verify that the change has taken effect:
# getconf LOGIN_NAME_MAX
128
Keep in mind, however, that if your environment includes IBM RS/6000 servers prior to AIX version 5.3 or operating systems that cannot handle user names longer than eight characters and you rely on NIS or other authentication measures, it would be wise to continue with the eight-character user names. TOPICS: AIX, INSTALLATION, NIM, SYSTEM ADMINISTRATION
Nimadm



N usr,root

# man ftpaccess.ctl
To further restrict the FTP account to a server, especially for accounts that are only used for FTP purposes, make sure to disable login and remote login for the account via smitty user. TOPICS: AIX, SYSTEM ADMINISTRATION
PS1
The following piece of code fits nicely in the /etc/profile file. It makes sure the PS1, the prompt is set in such a way, that you can see who is logged in at what system and what the current path is. At the same time it also sets the window title the same way.
H=ùname -n` if [ $(whoami) = "root" ] ; then
PS1='^[]2;${USER}@(${H}) ${PWD##/*/}^G^M${USER}@(${H}) ${PWD##/*/} # ' else PS1='^[]2;${USER}@(${H}) ${PWD##/*/}^G^M${USER}@(${H}) ${PWD##/*/} $ ' fi
Note: to type the special characters, such as ^], you have to type first CRTL-V, and then CTRL-]. Likewise for ^G: type it as CTRL-V and then CTRL-G. Second note: the escape characters only work properly when setting the window title using PuTTY. If you or any of your users use Reflection to access the servers, the escape codes don't work. In that case, shorten it to:
if [ $(whoami) = "root" ] ; then PS1='${USER}@(${H}) ${PWD##/*/} # ' else PS1='${USER}@(${H}) ${PWD##/*/} $ ' fi
IP alias

VIM Swap and backup files

VIM on many different types of installations will create both swap files and backup files. How to disable VIM swap and backup files: Go into your _vimrc file. Add these lines to the bottom:
set nobackup set nowritebackup set noswapfile

Olson time zone support

"The public-domain time zone database contains code and data that represent the history of local time for many representative locations around the globe. It is updated periodically to reflect changes made by political bodies to time zone boundaries, UTC offsets, and daylightsaving rules. This database (often called tz or zoneinfo) is used by several implementations. Each location in the database represents a national region where all clocks keeping local time have agreed since 1970. Locations are identified by continent or ocean and then by the name of the location, which is typically the largest city within the region. For example, America/New_York represents most of the US eastern time zone; America/Phoenix represents most of Arizona, which uses mountain time without daylight saving time (DST); America/Detroit represents most of Michigan, which uses eastern time but with different DST rules in 1975; and other entries represent smaller regions like Starke County, Indiana, which switched from central to eastern time in 1991 and switched back in 2006." (from http://www.twinsun.com/tz/tz-link.htm) The public-domain time zone database is also widely known as the Olson time zone database and is the architecture on which the International Components for Unicode (ICU) and the Common Locale Data Repository (CLDR) time zone support relies. In previous AIX releases, the method by which the operating system supports time zone conventions is based on the POSIX time zone specification. In addition to this industry standard approach, AIX V6.1 recognizes and processes the Olson time zone naming conventions to facilitate support for a comprehensive set of time zones. This enhancement leverages the uniform time zone naming convention of the Olson database to offer an intuitive set of time zone values that can be assigned to the TZ time zone environment variable. Note: Time zone definitions conforming to the POSIX specification are still supported and
recognized by AIX. AIX checks the TZ environment variable to determine if the environment variable follows the POSIX specification rules. If the TZ environment variable does not match the POSIX convention, AIX calls the ICU library to get the Olson time zone translation. The use of the Olson database for time zone support within AIX provides significant advantages over the traditional POSIX rules. One of the biggest advantages is that Olson database maintains a historical record of what the time zone rules were at given points in time, so that if the rules change in a particular location, dates and times can be interpreted correctly both in the present and past. A good example of this is the US state of Indiana, which just began using daylight saving time in the year 2006. Under the POSIX implementation, Indiana would have to set its time zone value to EST5EDT, which would format current dates correctly using daylight saving time, but would also format times from previous years as though they were on daylight saving time, which is incorrect. Use of the ICU API set for time zones also allows for localized display names for time zones. For example, Central Daylight Saving Time would have an abbreviation of CDT for all locales under a POSIX implementation, but under ICU/Olson, it displays properly as HAC (Heure Avance du Centre) in a French locale. As in previous AIX releases, system administrators can rely on the Systems Management Interface Tool (SMIT) to configure the time zone by using system defined values for the TZ environment variable. To accomplish this task, enter the main SMIT menu and select System Environments, Change / Show Date and Time to access the Change Time Zone Using System Defined Values menu. Alternatively, the SMIT fast path chtz_date will directly open the Change / Show Date and Time menu. Selecting the Change Time Zone Using System Defined Values option will prompt SMIT to open the Select COUNTRY or REGION menu.
SMIT uses the undocumented /usr/lib/nls/lstz -C command to produce the list of available countries and regions. Note that undocumented commands and features are not officially supported for customer use, are not covered by the AIX compatibility statement, and may be subject to change without notice. After you have chosen the country or region in the Select COUNTRY or REGION menu, a new selection menu will list all available time zones for the country or region in question.
The selected value of the first column will be passed by SMIT to the chtz command, which in turn will change the TZ variable value in the /etc/environment system level configuration file. As with previous AIX releases, time zone configuration changes always require a system reboot to become effective. SMIT uses the internal /usr/lib/nls/lstz -c command to produce the list of available time zones for a given country and region. The -c flag uses a country or region designation as the input parameter. The /usr/lib/nls/lstz -C command provides a list of available input parameters. The /usr/lib/nls/lstz command used without any flag provides a full list of all Olson time zones available on AIX. Note that undocumented commands and features are not officially supported for customer use, are not covered by the AIX compatibility statement, and may be subject to change without notice. TOPICS: AIX, SECURITY, SYSTEM ADMINISTRATION

Sendmail tips
To find out if sendmail is running:
# ps -ef | grep sendmail
To stop and restart sendmail:

# stopsrc -s sendmail # startsrc -s sendmail -a "-bd -q30m"
Or:
# refresh -s sendmail
Use the -v flag on the mail command for "verbose" output. This is especially useful if you can't deliver mail, but also don't get any errors. E.g.:
# cat /etc/motd |mailx -v -s"test" email@address.com
To get sendmail to work on a system without DNS, create and/or edit /etc/netsvc.conf. It should contain 1 line only:
hosts=local
If you see the following error in the error report when starting sendmail:
DETECTING MODULE 'srchevn.c'@line:'355' FAILING MODULE sendmail
Then verify that your /etc/mail/sendmail.cf file is correct, and/or try starting the sendmail daemon as follows (instead of just running "startsrc -s sendmail"):
# startsrc -s sendmail -a "-bd -q30m"
More tips can be found here: http://www.angelfire.com/il2/sgillen/sendmail.html TOPICS: AIX, BACKUP & RESTORE, SYSTEM ADMINISTRATION, VERITAS NETBACKUP

# mkdir bp bparchive bpbackup bpbkar bpcd bpdbsbora
# mkdir bpfilter bphdb bpjava-msvc bpjava-usvc bpkeyutil # mkdir bplist bpmount bpnbat bporaexp bporaexp64 # mkdir bporaimp bporaimp64 bprestore db_log dbclient # mkdir symlogs tar user_ops # chmod 777 *
VERBOSE = 2
By default, VERBOSE is set to one, which means there isn't any logging at all, so that is not helpful. You can go up to "VERBOSE = 5", but that may create very large log files, and this may fill up the file system. In any case, check how much disk space is available in /usr before enabling the logging of the Veritas NetBackup client. Backups through Veritas NetBackup are initiated through inetd:
Now all you have to do is wait for the NetBackup server (the one listed in /usr/openv/netbackup/bp.conf) to start the backup on the AIX client. After the backup has run, you should at least find a log file in the bpcd and bpbkar folders in /usr/openv/netbackup. TOPICS: AIX, HMC, SYSTEM ADMINISTRATION


In order to keep the system time synchronized with other nodes in an HACMP cluster or across the enterprise, Network Time Protocol (NTP) should be implemented. In its default configuration, NTP will periodically update the system time to match a reference clock by resetting the system time on the node. If the time on the reference clock is behind the time of the system clock, the system clock will be set backwards causing the same time period to be passed twice. This can cause internal timers in HACMP and Oracle databases to wait longer periods of time under some circumstances. When these circumstances arise, HACMP may stop the node or the Oracle instance may shut itself down. Oracle will log an ORA-29740 error when it shuts down the instance due to inconsistent timers. The hatsd daemon utilized by HACMP will log a TS_THREAD_STUCK_ER error in
the system error log just before HACMP stops a node due to an expired timer. To avoid this issue, system managers should configure the NTP daemon to increment time on the node slower until the system clock and the reference clock are in sync (this is called "slewing" the clock) instead of resetting the time in one large increment. The behavior is configured with the -x flag for the xntpd daemon. To check the current running configuration of xntpd for the -x flag:
# ps -aef | grep xntpd | grep -v grep root 409632 188534 0 11:46:45 0:00 /usr/sbin/xntpd

TOPICS: LINUX, PERFORMANCE, STORAGE, SYSTEM ADMINISTRATION


# mkramdisk 4G


Core file naming

Before AIX 5L Version 5.1, a core file was always stored in a file named core. If the same or another application generated another core file before you renamed the previous one, the original content was lost. Beginning with AIX 5L Version 5.1, you can enable a unique naming of core files, but be aware that the default behavior is to name the files core. You apply the new enhancement by setting the environment variable CORE_NAMING to a non-NULL value, for example:
CORE_NAMING=yes
After setting CORE_NAMING, you can disable this feature by setting the variable to the NULL value. For example, if you are using the Korn shell, do the following:
export CORE_NAMING=
After setting CORE_NAMING, all new core will be stored in files of the format core.pid.ddhhmmss, where: pid: Process ID dd: Day of the month hh: Hours mm: Minutes ss: Seconds In the following example, two core files are generated by a process identified by PID 30480 at different times:
# ls -l core* -rw-r--r--rw-r--r-1 user group 1 user group 8179 Jan 28 2010 8179 Jan 28 2010 core.30480.28232347 core.30482.28232349
The time stamp used is in GMT and your time zone will not be used. Also check out the lscore and the chcore commands, which can also be used to list and set core naming. These commands can also be set to define a core location, and to turn core compression on. TOPICS: AIX, SECURITY, SYSTEM ADMINISTRATION

Sudosh
Sudosh is designed specifically to be used in conjunction with sudo or by itself as a login shell. Sudosh allows the execution of a root or user shell with logging. Every command the user types within the root shell is logged as well as the output. This is different from "sudo -s" or "sudo /bin/sh", because when you use one of these instead
of sudosh to start a new shell, then this new shell does not log commands typed in the new shell to syslog; only the fact that a new shell started is logged. If this newly started shell supports commandline history, then you can still find the commands called in the shell in a file such as .sh_history, but if you use a shell such as csh that does not support command-line logging you are out of luck. Sudosh fills this gap. No matter what shell you use, all of the command lines are logged to syslog (including vi keystrokes). In fact, sudosh uses the script command to log all key strokes and output. Setting up sudosh is fairly easy. For a Linux system, first download the RPM of sudosh, for example from rpm.pbone.net. Then install it on your Linux server:
09/16/2010 6s
The first paramtere is the session-ID, the second parameter is the multiplier. Use a higher value for multiplier to speed up the replay, while "1" is the actual speed. And the third parameter is the max-wait. Where there might have been wait times in the actual session, this parameter restricts to wait for a maximum max-wait seconds, in the example above, 5 seconds. For AIX, you can find the necessary RPM here. It is slightly different, because it installs in /opt/freeware/bin, and also the sudosh.conf is located in this directory. Both Linux and AIX require of course sudo to be installed, before you can install and use sudosh. TOPICS: AIX, SECURITY, SYSTEM ADMINISTRATION
SUID
char *getlogin();
*p='\0'; for (i=1; i<argc; ++i) { strcpy(p, argv[i]); p += strlen(argv[i]); if (i < argc-1) { *p = ' '; ++p; *p = '\0'; } }
setuid(0);
################################################ # Make rules #
################################################
all:
sushi
clean: rm -f *.o sushi
install: cp -p sushi /bin chown root chmod a+rx chmod u+s /bin/sushi /bin/sushi /bin/sushi
################################################
This is something that you want to avoid. Even vendors are known to build backdoors like these into their software. The find command shown at the beginning of this article will help you discover commands as these. Note that the good thing of the sushi program shown above is, that it will write an entry into log file /tmp/sushilog each time someone uses the command.
To avoid users being able to run commands with the SUID set, you may want to add the "nosuid" option in /etc/filesystems for each file system:
Especially for (permanently) NFS mounted file systems, it is a VERY good idea to have this nosuid option set, avoiding someone to create a sushi-like program on a NFS server, and being able to run the program as a regular user on the NFS client system, to gain root access on the NFS client; or if you want to mount a NFS share on a client temporarily, enable the nosuid by running:

Truss
To get more information on what a specific process is doing, you can get the truss command. That may be very useful, for example when a process appears to be hanging. For example, if you want to know what the "recover" process is doing, first look up the PID of this process:
# ps -ef | grep -i recover | grep -v grep root 348468 373010 0 17:30:25 pts/1 0:00 recover -f -a /etc
Then, run the truss command using that PID:

cscnimmaster# truss -p 348468 kreadv(0, 0x00000000, 0, 0x00000000) (sleeping...)
This way, you can see the process is actually sleeping. TOPICS: AIX, INSTALLATION, SYSTEM ADMINISTRATION

PLATFORM SPECIFIC
PLATFORM SPECIFIC
This is a device on a PCI I/O card. For a physical address like U788C.001.AAC1535-P1-C13-T1: U788C.001.AAC1535 - This part identifies the 'system unit/drawer'. If your system is made up of several drawers, then look on the front and match the ID to this section of the address. Now go round the back of the server. P1 - This is the PCI bus number. You may only have one. C13 - Card Slot C13. They are numbered on the back of the server. T1 - This is port 1 of 2 that are on the card.
Your internal ports won't have the Card Slot numbers, just the T number, representing the port. This should be marked on the back of your server. E.g.: U788C.001.AAC1535-P1-T2 means unit U788C.001.AAC1535, PCI bus P1, port T2 and you should be able to see T2 printed on the back of the server. TOPICS: AIX, INSTALLATION, SYSTEM ADMINISTRATION
install_all_updates
This installs all the software updates from the current directory. Of course, you will have to make sure the current directory contains any software. Don't worry about generating a Table Of Contents (.toc) file in this directory, because install_all_updates generates one for you. By default, install_all_updates will apply the filesets. Use -c to commit any software. Also, by default, it will expand any file systems; use -x to prevent this behavior). It will install any requisites by default (use -n to prevent). You can use -p to run a preview, and you can use -s to skip the recommended maintenance or technology level verification at the end of the install_all_updates output. You may have to use the -Y option to agree to all licence agreements. To install all available updates from the cdrom, and agree to all license agreements, and skip the recommended maintenance or technology level verification, run:


Enter the name of the resource group. It's a good habit to make sure that a resource group name ends with "rg", so you can recognize it as a resource group. Also, select the participating nodes. For the "Fallback Policy", it is a good idea to change it to "Never Fallback". This way, when the primary node in the cluster comes up, and the resource group is up-and-running on the secondary node, you won't see a failover occur from the secondary to the primary node. Note: The order of the nodes is determined by the order you select the nodes here. If you put in "node01 node02" here, then "node01" is the primary node. If you want to have this any other way, now is a good time to correctly enter the order of node priority. Add the Servie IP/Label to the resource group:
Select the resource group you've created earlier, and add the Service IP/Label.
Run a verification/synchronization:


# exportfs -va
0.2.55.d3.75.77 2587851 0 2587851 0 2587851 0 1912870 0 loopback 1912870 0 1912870 0
10.251.14 node01 10.251.20 serviceip
16896 link#1 16896 127 16896 ::1
As you can see, the Service IP label (in the example above called "serviceip") is defined on en1. In that case, for NFS to work, you also want to add the "serviceip" to the /etc/exports file on the NFS server and re-run "exportfs -va". And you should also make sure that hostname "serviceip" resolves to an IP address correctly (and of course the IP address resolves to the correct hostname) on both the NFS server and the client. TOPICS: AIX, SYSTEM ADMINISTRATION
MD5 for AIX

If you need to run an MD5 check-sum on a file on AIX, you will notice that there's not md5 or md5sum command available on AIX. Instead, use the following command to do this:
# csum -h MD5 [filename]
Note: csum can't handle files larger than 2 GB. TOPICS: AIX, PERFORMANCE, SYSTEM ADMINISTRATION

Searching for an easy way to create high-quality graphs that you can print, publish to the Web, or cut and paste into performance reports? Look no further. The nmon_analyser tool takes files produced by the NMON performance tool, turns them into Microsoft Excel spreadsheets, and automatically produces these graphs.
You can download the tool here: http://www.ibm.com/developerworks/aix/library/au-nmon_analyser/ TOPICS: AIX, BACKUP & RESTORE, NIM, SYSTEM ADMINISTRATION

# Remove unwanted entries from the inittab rmitab hacmp 2>/dev/null rmitab tsmsched 2>/dev/null rmitab tsm 2>/dev/null rmitab clinit 2>/dev/null rmitab pst_clinit 2>/dev/null rmitab qdaemon 2>/dev/null rmitab sddsrv 2>/dev/null rmitab nimclient 2>/dev/null rmitab nimsh 2>/dev/null rmitab naviagent 2>/dev/null
# Disable start scripts
chmod 000 /etc/rc.d/rc2.d/S01app
# copy inetd.conf cp /etc/inetd.conf /etc/inetd.conf.org # take out unwanted items cat /etc/inetd.conf.org | grep -v bgssd > /etc/inetd.conf
# smitty nim_mkres

# smit nim_bosinst
BOSINST_DATA to use during installation IMAGE_DATA to use during installation
[] [server1_image_date]
Transfer the /image.data file to the NIM master and store it in the location you desire. It is a good idea to place the file, or any NIM resource for that matter, in a descriptive manor, for example: /export/nim/image_data. This will ensure you can easily identify your "image_data" NIM resource file locations, should you have the need for multiple "image_data" resources. Make sure your image.data filenames are descriptive also. A common way to name the file would be in relation to your clientname, for example: server1_image_data.
Run the nim command, or use smitty and the fast path 'nim_mkres' to define the file that you have edited using the steps above: From command line on the NIM master:
# smit nim_mkres
Define a Resource
[]


# cd / # mkszfile
# vi /image.data
lv_data: VOLUME_GROUP= rootvg LV_SOURCE_DISK_LIST= hdisk0 hdisk1 LV_IDENTIFIER= 00cead4a00004c0000000117b1e92c90.2 LOGICAL_VOLUME= hd6 VG_STAT= active/complete TYPE= paging MAX_LPS= 512 COPIES= 2 LPs= 124 STALE_PPs= 0 INTER_POLICY= minimum INTRA_POLICY= middle MOUNT_POINT= MIRROR_WRITE_CONSISTENCY= off LV_SEPARATE_PV= yes PERMISSION= read/write LV_STATE= opened/syncd WRITE_VERIFY= off PP_SIZE= 128 SCHED_POLICY= parallel PP= 248 BB_POLICY= non-relocatable RELOCATABLE= yes UPPER_BOUND= 32 LABEL= MAPFILE= /tmp/vgdata/rootvg/hd6.map LV_MIN_LPS= 124
STRIPE_WIDTH= STRIPE_SIZE= SERIALIZE_IO= no FS_TAG= DEV_SUBTYP=
lv_data: VOLUME_GROUP= rootvg LV_SOURCE_DISK_LIST= hdisk0 LV_IDENTIFIER= 00cead4a00004c0000000117b1e92c90.2 LOGICAL_VOLUME= hd6 VG_STAT= active/complete TYPE= paging MAX_LPS= 512 COPIES= 1 LPs= 124 STALE_PPs= 0 INTER_POLICY= minimum INTRA_POLICY= middle MOUNT_POINT= MIRROR_WRITE_CONSISTENCY= off LV_SEPARATE_PV= yes PERMISSION= read/write LV_STATE= opened/syncd WRITE_VERIFY= off PP_SIZE= 128 SCHED_POLICY= parallel PP= 124 BB_POLICY= non-relocatable RELOCATABLE= yes UPPER_BOUND= 32 LABEL= MAPFILE= /tmp/vgdata/rootvg/hd6.map LV_MIN_LPS= 124 STRIPE_WIDTH= STRIPE_SIZE=
SERIALIZE_IO= no FS_TAG= DEV_SUBTYP=
cat /image.data | while read LINE ; do if [ "${LINE}" = "COPIES= 2" ] ; then COPIESFLAG=1 echo "COPIES= 1" else if [ ${COPIESFLAG} -eq 1 ] ; then PP=ècho ${LINE} | awk '{print $1}'` if [ "${PP}" = "PP=" ] ; then PPNUM=ècho ${LINE} | awk '{print $2}'` ((PPNUMNEW=$PPNUM/2)) echo "PP= ${PPNUMNEW}" COPIESFLAG=0 else echo "${LINE}" fi else echo "${LINE}"
fi fi done > /image.data.1disk
How to restore an image.data file from an existing mksysb file

Change the /tmp directory (or a directory where you would like to store the /image.data file from the mksysb image) and restore the /image.data file from the mksysb:
# cd /tmp # restore -xqvf [/location/of/mksysb/file] ./image.data
If you want to list the files in a mksysb image first, you can run the following command:
# restore -Tqvf [/location/of/mksysb/file]
How to restore an image.data file from tape

Restoring from tape: First change the block size of the tape device to 512:
# chdev -l rmt0 -a block_size=512
Check to make sure the block size of the tape drive has been changed:
# tctl -f /dev/rmt0 status
You will receive output similar to this:

rmt0 Available 09-08-00-0,0 LVD SCSI 4mm Tape Drive attribute value description user_settable
block_size compress
512 yes
BLOCK size (0=variable length) Use data COMPRESSION DENSITY setting #1 DENSITY setting #2 Use EXTENDED file marks Use DEVICE BUFFERS during writes RETENSION on tape change or reset
True True True True True True True
density_set_1 71 density_set_2 38 extfm mode ret ret_error size_in_mb yes yes no no
RETURN error on tape change or reset True False
36000 Size in Megabytes
Change to the /tmp directory (or a directory where you would like to store the /image.data file from the mksysb image) and restore the /image.data file from the tape:
# cd /tmp # restore -s2 -xqvf /dev/rmt0.1 ./image.data



6.
7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23.
# rm ./etc/PowerPathExtensions # rm ./etc/emcp_registration # rm ./usr/lib/boot/protoext/disk.proto.ext.scsi.pseudo.power # rm ./usr/lib/drivers/pnext # rm ./usr/lib/drivers/powerdd # rm ./usr/lib/drivers/powerdiskdd # rm ./usr/lib/libpn.a # rm ./usr/lib/methods/cfgpower # rm ./usr/lib/methods/cfgpowerdisk # rm ./usr/lib/methods/chgpowerdisk # rm ./usr/lib/methods/power.cat # rm ./usr/lib/methods/ucfgpower # rm ./usr/lib/methods/ucfgpowerdisk # rm ./usr/lib/nls/msg/en_US/power.cat # rm ./usr/sbin/powercf # rm ./usr/sbin/powerprotect # rm ./usr/sbin/pprootdev
24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35.
# rm ./usr/lib/drivers/cgext # rm ./usr/lib/drivers/mpcext # rm ./usr/lib/libcg.so # rm ./usr/lib/libcong.so # rm ./usr/lib/libemcp_mp_rtl.so # rm ./usr/lib/drivers/mpext # rm ./usr/lib/libmp.a # rm ./usr/sbin/emcpreg # rm ./usr/sbin/powermt # rm ./usr/share/man/man1/emcpreg.1 # rm ./usr/share/man/man1/powermt.1 # rm ./usr/share/man/man1/powerprotect.1
36.

Here are a couple of rules that your paging spaces should adhere to, for best performance: The size of paging space should match the size of the memory. Use more than one paging space, on different disks to each other. All paging spaces should have the same size. All paging spaces should be mirrored. Paging spaces should not be put on "hot" disks. TOPICS: AIX, SYSTEM ADMINISTRATION
How best to configure the /etc/netsvc.conf file

How best to configure the /etc/netsvc.conf file, making it easier to troubleshoot when resolving DNS issues: This file should resolve locally and through DNS. The line would read:
hosts=local,bind
You then would need to make sure that all the local adapter IP addresses are entered in /etc/hosts. After that is complete, for every adapter on the system you would apply:
# host
This will ensure a host command generates the same ouput (the hostname) with and without /etc/netsvc.conf. That way, you'll know you can continue to do certain things while troubleshooting a DNS problem.

No more ordering CDROMs or DVDs and waiting days. Download the .iso image over the web and install from there. Use the virtual DVD drive on your VIOS 2.1 and install directly into
the LPAR or read the contents into your NIM Server. Mount the .ISO image: On AIX 6.1 or AIX 7, use the loopmount command:http://publib.boulder.ibm.com/infocenter/aix/v6r1/index.jsp?topic=/com.ibm.aix .cmds/doc/aixcmds3/loopmount.htm On AIX 5.3, use the mklv-dd-mount trick:https://www.ibm.com/developerworks/wikis/display/WikiPtype/AIXV53MntISO Details on the new service: You have to prove you are entitled via: Customer number, Machine serial numbers or SWMA. The Entitled Software Support Download User Guide can be downloaded here:ftp://public.dhe.ibm.com/systems/support/planning/essreg/I1128814.pdf. Then you can download the AIX media, Expansion Packs, Linux Toolbox and more. Start at: www.ibm.com/eserver/ess. TOPICS: AIX, PERFORMANCE, SYSTEM ADMINISTRATION

pers 0 0
clnt 2758 2093160
other 163295
PoolSize -
inuse 4918761 9853
pgsp 12885 0
pin 621096 5360
virtual 2825601 9853
# bc
scale=2 2982321/5079040 .58
Thus, the actual memory consumption is 58% of the memory (5079040 blocks of 4 KB = 19840 MB). The free memory is thus: (100% - 58%) * 19840 MB = 8332 MB. Try to keep the value of memory consumption less than 90%. Above that, you will generally start seeing paging activity using the vmstat command. By that time, it is a good idea to lower the load on the system or to get more memory in your system. TOPICS: AIX, STORAGE, SYSTEM ADMINISTRATION
Using NFS
The Networked File System (NFS) is one of a category of filesystems known as distributed filesystems. It allows users to access files resident on remote systems without even knowing that a network is involved and thus allows filesystems to be shared among computers. These remote systems could be located in the same room or could be miles away. In order to access such files, two things must happen. First, the remote system must make the files available to other systems on the network. Second, these files must be mounted on the local system to be able to access them. The mounting process makes the remote files appear as if they are resident on the local system. The system that makes its files available to others on the network is called a server, and the system that uses a remote file is called a client. NFS Server NFS consists of a number of components including a mounting protocol, a file locking protocol, an export file and daemons (mountd, nfsd, biod, rpc.lockd, rpc.stad) that coordinate basic file services. Systems using NFS make the files available to other systems on the network by "exporting" their directories to the network. An NFS server exports its directories by putting the names of these directories in the /etc/exports file and executing the exportfs command. In its simplest form, /etc/exports consists of lines of the form:
/cyclop/users /usr/share/man -access=homer:bart, root=homer -access=marge:maggie:lisa
/usr/mail
# exportfs -va
This allows moe to mount /cyclop/users for reading and writing, and maps anonymous users (users from other hosts that do not exist on the local system and the root user from any remote system) to the UID -1. This corresponds to the nobody account, and it tells NFS not to allow such users access to anything. NFS Clients After the files, directories and/or filesystems have been exported, an NFS client must explicitly mount them before it can use them. It is handled by the mountd daemon (sometimes called rpc.mountd). The server examines the mount request to be sure the client has proper authorization.
The following syntax is used for the mount command. Note that the name of the server is followed by a colon and the directory to be mounted:
lsnfsmnt : Displays the characteristics of NFS mountable file systems. showmount -e : List exported filesystems.

# showmount -a
Start/Stop/Status NFS daemons In the following discussion, reference to daemon implies any one of the SRC-controlled
daemons (such as nfsd or biod). The NFS daemons can be automatically started at system (re)start by including the /etc/rc.nfs script in the /etc/inittab file. They can also be started manually by executing the following command:
If the /etc/exports file does not exist, the nfsd and the rpc.mountd daemons will not start. You can get around this by creating an empty /etc/exports file. This will allow the nfsd and the rpc.mountd daemons to start, although no filesystems will be exported. TOPICS: AIX, SYSTEM ADMINISTRATION
Why AIX Memory Typically Runs Near 100% Utilization

Memory utilization on AIX systems typically runs around 100%. This is often a source of concern. However, high memory utilization in AIX does not imply the system is out of memory. By design, AIX leaves files it has accessed in memory. This significantly improves performance when AIX reaccesses these files because they can be reread directly from memory, not disk. When AIX needs memory, it discards files using a "least used" algorithm. This generates no I/O and has almost no performance impact under normal circumstances. Sustained paging activity is the best indication of low memory. Paging activity can be monitored using the "vmstat" command. If the "page-in" (PI) and "page-out" (PO) columns show non-zero values over "long" periods of time, then the system is short on memory. (All systems will show occasional paging, which is not a concern.) Memory requirements for applications can be empirically determined using the AIX "rmss"command. The "rmss" command is a test tool that dynamically reduces usable memory. The onset of paging indicates an application's minimum memory requirement. Finally, the "svmon" command can be used to list how much memory is used each process. The interpretation of the svmon output requires some expertise. See the AIX documentation for details. To test the performance gain of leaving a file in memory, a 40MB file was read twice. The first
read was from disk, the second was from memory. The first read took 10.0 seconds. The second read took 1.3 second: a 7.4x improvement. TOPICS: AIX, STORAGE, SYSTEM ADMINISTRATION
Working with disks

Each row describes one disk. The first column shows its name followed by the PVID and the volume group it belongs to. "None" in the last column indicates that the disk does not belong to any volume group. "Active" in the last column indicates, that the volume group is varied on. Existence of a PVID indicates possibility of presence of data on the disk. It is possible that such disk belongs to a volume group which is varied off. Executing lspv with a disk name generates information only about this device:
# lspv hdisk4 PHYSICAL VOLUME: PV IDENTIFIER: PV STATE: STALE PARTITIONS: PP SZE: TOTAL PPs: FREE PPs: USED PPs: hdisk4 00c03c8a14fa936b active 0 16 megabyte(s) 639 (10224 megabytes) 599 (9584 megabytes) 40 (640 megabytes) ALLOCATABLE: yes VOLUME GROUP: VG IDENTIFIER: abc_vg 00c03b1a000
In the case of hdisks, we are able to determine its size, the number of logical volumes (two), the number of physical partitions in need of synchronization (Stale Partitions) and the number
of VGDA's. Executing lspv against a disk without a volume group membership does nothing useful:
# lsattr -El hdisk0 -a size_in_mb size_in_mb 73400 Size in Megabytes False
# lqueryvg -Atp hdisk2 0516-320 lqueryvg: hdisk2 is not assigned to a volume group. Max LVs: PP Size: Free PPs: LV count: PV count: Total VGDAs: Conc Allowed: MAX PPs per PV MAX PVs: Quorum (disk): 256 26 1117 0 3 3 0 1016 32 1
Quorum (dd): Auto Varyon ?: Conc Autovaryo Varied on Conc Physical:
1 1 0 0 00c03b1a32e50767 00c03b1a32ee4222 00c03b1a9db2f183 1 1 1 0 0 0
Total PPs: LTG size: HOT SPARE: AUTO SYNC: VG PERMISSION: SNAPSHOT VG: IS_PRIMARY VG: PSNFSTPP: VARYON MODE: VG Type: Max PPs:
1117 128 0 0 0 0 0 4352 ??????? 0 32512

PV count: 3
Their PVIDs are:


LV count: 0
It is easy to notice that a disk belongs to a volume group. Logical volume names are the best proof of this. To display data stored on a disk you can use the command lquerypv. A PVID can be assigned to or removed from a disk if it does not belong to a volume group, by executing the command chdev.

# chpv -v r hdisk2
To allow I/O:
# chpv -v a hdisk2

# chpv -a n hdisk2

# chpv -a y hdisk2
The row labeled FREE DISTRIBUTION shows the number of free PPs in each band. The row labeled USED DISTRIBUTION shows the number of used PPs in each band. As you can see, some bands of this disk have no data. Presently, this policy lost its meaning as even the slowest disks are much faster then their predecesors. In the case of RAID or SAN disks, this policy has no meaning at all. For those who still use individual SCSI or SSA disks, it is good to remember that the data closer to the outer edge is read/written the slowest. To learn what logical volumes are located on a given disk, you can execute command lspv -l hdisk#. The reversed relation is established executing lslv -M lv_name. It is always a good idea to know what adapter and what bus any disk is attached to. Otherwise, if one of the disks breaks, how will you know which disk needs to be removed and replaced? AIX has many commands that can help you. It is customary to start from the adapter, to identify all adapters known to the kernel:
You can get more details by executing command: lsattr -El hdisk0. This article has been based on an article published on wmduszyk.com. TOPICS: AIX, SECURITY, SYSTEM ADMINISTRATION


# history
888 ? :: cd aix_auth/ 889 ? :: vi server 890 ? :: ldapsearch 891 ? :: fc -lt 892 ? :: fc -l
# fc -t

[Entry Fields] * Disk Type (PdDvLn field from CuDv) * New Disk Type disk/pseudo/power [disk/pseudo/power]
* Method to identify ghost disks * Method to determine if a reserve is held
[SCSI3] [SCSI_TUR]
How to run background jobs

There are a couple of options for running background jobs: Option one: Start the job as normal, then press CTRL-Z. It will say it is stopped, and then type "bg". It will continue in the background. Then type "fg", if you want it to run in the foreground again. You can repeat typing CTRL-Z, bg, fg as much as you like. The process will be killed once you log out. You can avoid this by running: nohup command. Option two: Use the at command: run the command as follows:
# echo "command" | at now
This will start it in the background and it will keep on running even if you log out. Option three: Run it with an ampersand: command & This will run it in the background. But the process will be killed if you log out. You can avoid the process being killed by running: nohup command &. Option four: Schedule it one time in the crontab. With all options, make sure you redirect any output and errors to a file, like:
# command > command.out 2>&1
The creation date of a UNIX file

UNIX doesn't store a file creation timestamp in the inode information. The timestamps recorded are the last access timestamp, the last modified timestamp and the last changed timestamp (which is the last change to the inode information). When a file is brand new, the last modified timestamp will be the creation timestamp of the file, but that piece of information
is lost as soon as the file is modified in any way. To get this information, use the istat command, for example for the /etc/rc.tcpip file:
# ls -li /etc/rc.tcpip 8247 -rwxrwxr-- 1 root system 6607 Jan 06 06:25 /etc/rc.tcpip
Now you know the inode number: 8247.

# istat /etc/rc.tcpip Inode 8247 on device 10/4 Protection: rwxrwxr-Owner: 0(root) Link count: 1 Group: 0(system) Length 6607 bytes File
Last updated: Last modified: Last accessed:
Wed Jan Wed Jan Tue May
6 06:25:49 PST 2010 6 06:25:49 PST 2010 4 14:00:37 PDT 2010
The same type of information can be found using the fsdb command. Start the fsdbcommand with the file system where the file is located; in the example below the root file system. Then type the number of the inode, followed by "i":
# fsdb / File System: File System Size: Disk Map Size: Inode Map Size: Fragment Size: Allocation Group Size: Inodes per Allocation Group: Total Inodes: Total Fragments: / 2097152 20 38 4096 2048 4096 524288 262144 (512 byte blocks) (4K blocks) (4K blocks) (bytes) (fragments)
8247i i#: szh: a0: 0x1203 a4: 0x00 8247 md: f---rwxrwxr-0 szl: 6607 ln: 1 uid: 0 gid: 0
(actual size: a2: 0x00 a6: 0x00
6607) a3: 0x00 a7: 0x00
a1: 0x1204 a5: 0x00
at: Tue May 04 14:00:37 2010 mt: Wed Jan 06 06:25:49 2010 ct: Wed Jan 06 06:25:49 2010
AIX Multiple page size support

To list the supported page sizes on a system:
# pagesize -a 4096 65536 16777216 17179869184 # pagesize -af 4K 64K 16M 16G
To learn more about the multiple page size support for AIX, please read the relatedwhitepaper here.
UNKNOWN_ user in /etc/security/failedlogin

An "unknown" entry appears when somebody tried to log on with a user id which is not known to the system. It would be possible to show the userid they attempted to use, but this is not done as a common mistake is to enter the password instead of the userid. If this was recorded it would be a security risk. TOPICS: AIX, SYSTEM ADMINISTRATION
Reset an unknown root password

Insert the product media for the same version and level as the current installation into the appropriate drive. Power on the server. Boot the server into SMS mode and Choose Start Maintenance Mode for System Recovery. Select Access a Root Volume Group. A message displays explaining that you will not be able to return to the Installation menus without rebooting if you change the root volume group at this point. Type the number of the appropriate volume group from the list. Select Access this Volume Group and start a shell. At the prompt, type the passwd command to reset the root password. To write everything from the buffer to the hard disk and reboot the system, type the following:
# sync;sync;sync;reboot
df -I
The attribute "-I" (capital "i") for the df command can help you to show the actual used space within file systems, instead of giving you percentages with the regular df command:
# df -g Filesystem /dev/hd4 /dev/hd2 /dev/hd9var /dev/hd3 /dev/hd1 /proc /dev/hd10opt GB blocks 1.00 4.00 1.00 1.00 1.00 0.50 Free %Used 0.76 1.20 0.74 0.54 0.97 0.31 25% 70% 27% 46% 4% 39% Iused %Iused Mounted on 5255 55403 5324 325 1334 4162 2% / 6% /usr 3% /var 1% /tmp 1% /home /proc
4% /opt
# df -gI Filesystem /dev/hd4 /dev/hd2 /dev/hd9var /dev/hd3 /dev/hd1 /proc /dev/hd10opt GB blocks 1.00 4.00 1.00 1.00 1.00 0.50 Used 0.24 2.80 0.26 0.46 0.03 0.19 Free %Used Mounted on 0.76 1.20 0.74 0.54 0.97 0.31 25% / 70% /usr 27% /var 46% /tmp 4% /home /proc
39% /opt
Calculating with UNIX timestamps

Starting with AIX 5.3, you can use the following command to get the number of seconds since the UNIX EPOCH (January 1st, 1970):
# date +"%s"
On older AIX versions, or other UNIX operating systems, you may want to use the following command to get the same answer:
# perl -MPOSIX -le 'print time'
Getting this UNIX timestamp can be very useful when doing calculations with time stamps. If you need to convert a UNIX timestamp back to something readable:
now=`perl -MPOSIX -le 'print time'` # 3 months ago = # 30 days * 3 months * 24 hours * 60 minutes * 60 seconds = # 7776000 seconds. let threemonthsago="${now}-7776000" perl -MPOSIX -le "print scalar(localtime($threemonthsago))"
Converting HEX to DEC
Here's a simple command to convert a hexadecimal number to decimal. For example if you wish to convert hexadecimal "FF" to decimal:
# echo "ibase=16; FF" | bc 255
Sdiff
A very usefull command to compary 2 files is sdiff. Let's say you want to compare the lslpp from 2 different hosts, then sdiff -s shows the differences between two files next to each other:
# sdiff -s /tmp/a /tmp/b > > > gskta.rte lum.base.cli lum.base.gui lum.msg.en_US.base.cli lum.msg.en_US.base.gui rsct.basic.sp 7.0.3.27 5.1.2.0 5.1.2.0 5.1.2.0 5.1.2.0 2.4.10.0 | | | | | < < rsct.compat.basic.sp 2.4.10.0 < < rsct.compat.clients.sp 2.4.10.0 < < rsct.opt.fence.blade rsct.opt.fence.hmc bos.clvm.enh lum.base.cli 2.4.10.0 2.4.10.0 5.3.8.3 5.1.2.0 < < | | bos.clvm.enh lum.base.cli 5.3.0.50 5.1.0.0 gskta.rte lum.base.cli lum.base.gui lum.msg.en_US.base.cli lum.msg.en_US.base.gui 7.0.3.17 5.1.0.0 5.1.0.0 5.1.0.0 5.1.0.0 bos.loc.com.utf bos.loc.utf.EN_US 5.3.9.0 5.3.0.0
The default log file has been changed

You may encounter the following entry now and then in your errpt:
IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION 573790AA 0528212209 I O RMCdaemon The default log file has been changed.
An example of such an entry is:

----------------------------------------------------------------LABEL: RMCD_INFO_2_ST IDENTIFIER: 573790AA
Date/Time: Sun May 17 22:11:46 PDT 2009 Sequence Number: 8539
Machine Id: 00GB214D4C00 Node Id: blahblah Class: O Type: INFO Resource Name: RMCdaemon
Description The default log file has been changed.
Probable Causes The current default log file has been renamed and a new log file created.
Failure Causes The current log file has become too large.
Recommended Actions No action is required.
Detail Data DETECTING MODULE RSCT,rmcd_err.c,1.17,512 ERROR ID 6e0tBL/GsC28/gQH/ne1K//................... REFERENCE CODE
File name /var/ct/IW/log/mc/default
This error report entry refers to a file that was created, called /var/ct/IW/log/mc/default. Actually, when the file reaches 256 Kb, a new one is created, and the old one is renamed to default.last. The following messages can be found in this file:
2610-217 Received 193 unrecognized messages in the last 10.183333 minutes. Service is rmc.
This message more or less means: "2610-217 Received count of unrecognized messages unrecognized messages in the last time minutes. Service is service_name. Explanation: The RMC daemon has received the specified number of unrecognized messages within the specified time interval. These messages were received on the UDP port, indicated by the
specified service name, used for communication among RMC daemons. The most likely cause of this error is that this port number is being used by another application. User Response: Validate that the port number configured for use by the Resource Monitoring and Control daemon is only being used by the RMC daemon." Check if something else is using the port of the RMC daemon:
# grep RMC /etc/services rmc rmc # lsof -i :657 COMMAND rmcd rmcd PID USER 1384574 root 1384574 root FD 3u 14u TYPE DEVICE SIZE/OFF NODE NAME 0t0 0t0 UDP *:rmc TCP *:rmc (LISTEN) 657/tcp 657/udp # RMC # RMC
IPv6 0xf35f20 IPv6 0xf2fd39
# netstat -Aan | grep 657 f1000600022fd398 tcp f10006000635f200 udp 0 0 0 0 *.657 *.657 *.* *.* LISTEN
The socket 0x22fd008 is being held by proccess 1384574 (rmcd).
No, it is actually the RMC daemon that is using this port, so this is fine. Start an IP trace to find out who's transmitting to this port:
# iptrace -a -d host1 -p 657 /tmp/trace.out # ps -ef | grep iptrace root 2040018 iptrace -a -d lawtest2 -p 657 /tmp/trace.out # kill 2040018 iptrace: unload success! # ipreport -n /tmp/trace.out > /tmp/trace.fmt
The IP trace reports only shows messages from RMC daemon of the HMC:
Packet Number 3 ====( 458 bytes received on interface en4 )==== 12:12:34.927422418 ETHERNET packet : [14:5e:81:60:9d -> 14:5e:db:29:9a] type 800 (IP) IP header breakdown: < SRC = < DST = 10.231.21.55 > 10.231.21.54 > (hmc) (host1)
ip_v=4, ip_hl=20, ip_tos=0, ip_len=444, ip_id=0, ip_off=0 DF ip_ttl=64, ip_sum=f8ce, ip_p = 17 (UDP) UDP header breakdown:
[ udp length = 424 | udp checksum = 6420 ] 00000000 0b005001 f0fff0ff e81fd7bf 01000100 |..P.............|
00000010 00000020 00000030 00000040 00000050 00000060 00000070 00000080 00000090 000000a0 000000b0 000000c0 000000d0 ******** 00000150 00000160 00000170 00000180 00000190
ec9f95eb 85807522 02010000 05001100 2f001543 a88ba597 4a03134a 50a00200 00000000 00000000 4ca00200 00000000 85000010 00000000 01000000 45a34f3f fe5dd3e7 3901eb8d 169826cb cc22d391 e6045340 e2d4b997 1efc9b78 f0bfce77 487cbbd9 21fda20c f5cf8920 53d2f55a 2de3eb9d 62ba1eef 10b80598 e90f1918 9cd9c654 8fb26c66 2ba6f7f0 7d885d34 aa8d9f39 d2cd7277 7a87b6aa 494bb728 53dea666 65d92428 e2ad90ed 73869b8d d1deb7b2 719c27c5 e643dfdf 50000000 00000000 00000000 00000000 00000000
|......u"........| |/..C....J..JP...| |........L.......| |............E.O?| |.]..9.....&.."..| |..S@.......x...w| |H|..!...... S..Z| |-...b...........| |...T..lf+...}.]4| |...9..rwz...IK.(| |S..fe.$(....s...| |....q.'..C..P...| |................|
02007108 00000000 4a03134a 40000000 9c4670e2 7ec24946 de09ff13 f31c3647 f2a41648 3ae78b97 cd4f0177 d4f83407 37c6cdb0 4f089868 24b217b1 d37e9544 371bd914 eb79725b ef68a79f d50b4dd5
|..q.....J..J@...| |.Fp.~.IF......6G| |...H:....O.w..4.| |7...O..h$....~.D| |7....yr[.h....M.|
To start iptrace on LPAR, do:

# startsrc -s iptrace -a "-b -p 657 /tmp/iptrace.bin"
To turn on PRM trace, on LPAR do:

Monitor /var/ct/3410054220/log/mc/default file on LPAR make sure you see NEW errors for 2610-217 log after starting trace, may need to wait for 10min (since every 10 minutes it logs one 2610-217 error entry). To monitor default file, do:
# tail -f /var/ct/3410054220/log/mc/default
To stop iptrace, on LPAR do:

# stopsrc -s iptrace
To stop PRM trace, on LPAR do:

To format the iptraces, do:

# ipreport -rns /tmp/ipt > /tmp/ipreport.out
Collect ctsnap data, on LPAR do:

# ctsnap -x runrpttr
When analyzing the data you may find several nodeid's in the packets. On HMC side, you can run: /usr/sbin/rsct/bin/rmcdomainstatus -s ctrmc to find out if 22758085eb959fec was managed by HMC. You will need to have root access on the HMC to run this command. And you can get a temporary password from IBM to run with the pesh command as the hscpe user to get this root access. This command will list the known
managed systems to the HMC and their nodeid's. Then, on the actual LPARs run /usr/sbin/rsct/bin/lsnodeid to determine the nodeid of that LPAR. If you find any discrepancies between the HMC listing of nodeid's and the nodeid's found on the LPAR's, then that is causing the errpt message to appear about the change of the log file. To solve this, you have to recreate the RMC deamon databases on both the HMC and on the LPARs that have this issue: On HMC side run:
# /usr/sbin/rsct/bin/rmcctrl -z # /usr/sbin/rsct/bin/rmcctrl -A # /usr/sbin/rsct/bin/rmcctrl -p
Then run /usr/sbin/rsct/install/bin/recfgct on the LPARs:

# /usr/sbin/rsct/install/bin/recfgct 0513-071 The ctcas Subsystem has been added. 0513-071 The ctrmc Subsystem has been added. 0513-059 The ctrmc Subsystem has been started. Subsystem PID is 194568. # /usr/sbin/rsct/bin/lsnodeid 6bcaadbe9dc8904f
Repeat this for every LPAR connected to the HMC. After that, you can run on the HMC again:
# /usr/sbin/rsct/bin/rmcdomainstatus -s ctrmc # /usr/sbin/rsct/bin/lsrsrc IBM.ManagedNode Hostname UniversalId
After that, all you have to do is check on the LPARs if any messages are logged in 10 minute intervals:
# ls -als /var/ct/IW/log/mc/default
Duplicate errpt entries

By default, AIX will avoid logging duplicate errpt entries. You can see the default settings using smitty errdemon, which will be set to checking duplicate entries within a 10000 milliseconds time interval (10 seconds). Also, the default duplicate error maximum is set to 1000, so after 1000 duplicates, an additional entry will be made, depending on which is reached earlier, the duplicate time interval of 10 seconds or the duplicate error maximum. More information can be found here. Specifically, have a look at the section for "Customizing Duplicate Error Handling" (the -m and -t options). TOPICS: HARDWARE, INSTALLATION, SYSTEM ADMINISTRATION

You can run invscout to do a microcode discovery on your system, that will generate a hostname.mup file. Then you go upload this hostname.mup file at this page on the IBM website and you get a nice overview of the status of all firmware on your system.
So far, so good. What if you have plenty of systems and you want to automate this? Here's a script to do this. This script first does a webget to collect the latestcatalog.mic file from the IBM website. Then it distributes this catalog file to all the hosts you want to check. Then, it runs invscout on all these hosts, and collects the hostname.mup files. It will concatenate all these files into 1 large file and do an HTTP POST through curl to upload the file to the IBM website and have a report generated from it. So, what do you need? You should have an AIX jump server that allows you to access the other hosts as user root through SSH. So you should have setup your SSH keys for user root. This jump server must have access to the Internet. You need to have wget and curl installed. Get it from the Linux Toolbox. Your servers should be AIX 5 or higher. It doesn't really work with AIX 4. Optional: a web server, like Apache 2, would be nice, so you can drop the resulting HTML file on your website every day. An entry in the root crontab to run this script every day. A list of servers you want to check. Here's the script:
#!/bin/ksh
# script:
generate_survey.ksh
# create a temporary directory rm -rf $TEMP 2>/dev/null mkdir $TEMP 2>/dev/null cd $TEMP
# run invscout on all these hosts # this will create a hostname.mup file for server in `cat $SERVERS` ; do echo "${server}" ssh $server invscout done
# collect the hostname.mup files for server in `cat $SERVERS` ; do echo "${server}" scp -p $server:$INV/*.mup $TEMP done
# Sometimes, the IBM website will respond with an # "Expectation Failed" error message. Loop the curl command until
# we get valid output.
stop="false"
if [ -z "${mytest}" ] ; then stop="true" fi
sleep 10
done
HERE
TOPICS: SECURITY, SYSTEM ADMINISTRATION
Listing sudo access

Sudo is an excellent way to provide root access to commands to other non-root users, without giving them too much access to the system. A very simple command to show you what a specific user is allowed to do:
# su - [username] -c sudo -l User [username] may run the following commands on this host: (root) NOPASSWD: /usr/local/sbin/reset.ksh (root) NOPASSWD: /usr/local/bin/mkpasswd (root) NOPASSWD: !/usr/local/bin/mkpasswd root

If your AIX server level is below 5.3 TL06, the easiest way is just to upgrade your current OS to TL 06 at minimum (take note it will depend of configurations for Power6 processors) then clone your server and install it on the new p6. But if you want to avoid an outage on your account, you can do the next using a NIM server (this is not official procedure for IBM, so they do not support this): Create your mksysb resource and do not create a spot from mksysb. Create an lppsource and spot with minimum TL required (I used TL08). Once you do nim_bosinst, choose the mksysb, and the created spot. It will send a warning message about spot is not at same level as mksysb, just ignore it. Do all necessary to boot from NIM. Once restoring the mksysb, there's some point where it is not able to create the bootlist because it detects the OS level is not supported on p6. So It will ask to continue and fix it later via SMS or fix it right now. Choose to fix it right now (it will open a shell). You will notice oslevel is as the same as mksysb. Create a NFS from NIM server or another server where you have the necessary TL and mount it on the p6.
Proceed to do an upgrade, change the bootlist, exit the shell. Server will boot with new TL over the p6. TOPICS: AIX, NETWORKING, SYSTEM ADMINISTRATION

stream
f1df487f8

# lsof -i :[PORT]
Example:
2638066 oracle
IPv4 0xf1b3f398 0t1716253
SCP Stalls
Howto setup AIX 'boot debugger'

The AIX kernel has an "enter_dbg" variable in it that can be set at the beginning of the boot processing which will cause all boot process output to be sent to the system console. In some cases, this data can be useful in debugging boot issues. The procedure for setting the boot debugger is as follows: First: Preparing the system.
Set up KDB to present an initial debugger screen

# bosboot -ad /dev/ipldevice -I
Reboot the server:

# shutdown -Fr
Setting up for Kernel boot trace: When the debugger screen appears, set enter_dbg to the value we want to use:
00000000
Now, detailed boot output will be displayed on the console. If your system completes booting, you will want to turn enter_dbg off:
00000000
When finished using the boot debugger, disable it by running:

# bosdebug -o # bosboot -ad /dev/ipldevice

It is very easy to clone your rootvg to another disk, for example for testing purposes. For example: If you wish to install a piece of software, without modifying the current rootvg, you can clone a rootvg disk to a new disk; start your system from that disk and do the installation there. If it succeeds, you can keep using this new rootvg disk; If it doesn't, you can revert
back to the old rootvg disk, like nothing ever happened. First, make sure every logical volume in the rootvg has a name that consists of 11 characters or less (if not, the alt_disk_copy command will fail). To create a copy on hdisk1, type:
# lslpp -h
TOPICS: HMC, SYSTEM ADMINISTRATION
Inaccessible vterm on HMC?

It may happen that a virtual terminal (vterm) from an HMC GUI only showes a black screen, even though the Lpar is running perfectly. Here's a solution to this problem: Login to the HMC using ssh as hscroot. Run lssscfg -R sys to determine the machine name of your lpar on the HMC. Run mkvterm -m [machine-name] -p [partition-name]. You can end this session by typing "~." or "~~." (don't overlook the "dot" here!). Now go back to your HMC gui via WebBased System Manager and start-up a new vterm. It works again! TOPICS: AIX, SYSTEM ADMINISTRATION
Permanently change hostname

Permanently change hostname for inet0 device in the ODM by choosing one of the following: Command line method:
# chdev -l inet0 -a hostname=[newhostname]
SMIT fastpath method:
# smitty hostname
Change the name of the node which changes the uname process by choosing one of the following: Command line method:
# uname -S [newhostname]
Or run the following script:

# /etc/rc.net
Change the hostname on the current running system:

# hostname [newhostname]
Change the /etc/hosts file to reflect the new hostname. Change DNS name server, if applicable. TOPICS: AIX, STORAGE, SYSTEM ADMINISTRATION

There is a way to mount a share from a windows system as an NFS filesystem in AIX: 1. 2. 3. Install the CIFS software on the AIX server (this is part of AIX itself: bos.cifs_fs). Create a folder on the windows machine, e.g. D:\share. Create a local user, e.g. "share" (user IDs from Active Directory can not be used): Settings -> Control Panel -> User Accounts -> Advanced tab -> Advanced button -> Select Users -> Right click in right window and select "New User" -> Enter User-name, password twice, deselect "User must change password at next logon" and click on create and close and ok. 4. Make sure the folder on the D: drive (in this case "share") is shared and give the share a name (we'll use "share" again as name in this example) and give "full control" permissions to "Everyone". 5. Create a mountpoint on the AIX machine to mount the windows share on, e.g./mnt/share. 6. Type on the AIX server as user root:
7.
You're done!
Switching between 32bit and 64bit

Before switching to 64bit mode, make sure the hardware supports it. To verify what is running and what the hardware can support, run the following as root:
# echo "Hardware:\t`bootinfo -y` bits capable" Hardware: 64 bits capable
# echo "Running:\t`bootinfo -K` bits mode" Running: 32 bits mode # ls -l /unix /usr/lib/boot/unix lrwxrwxrwx 1 root system 21 Aug 15 2006 /unix -> /usr/lib/boot/unix_mp lrwxrwxrwx 1 root system 21 Aug 15 2006 /usr/lib/boot/unix -> /usr/lib/boot/unix_mp
# ln -sf /usr/lib/boot/unix_64 /unix # ln -sf /usr/lib/boot/unix_64 /usr/lib/boot/unix # bosboot -ad /dev/ipldevice # shutdown -Fr
# ln -sf /usr/lib/boot/unix_mp /unix # ln -sf /usr/lib/boot/unix_mp /usr/lib/boot/unix # bosboot -ad /dev/ipldevice # shutdown -Fr
Bootinfo vs Getconf
The command /usr/sbin/bootinfo has traditionally been used to find out information regarding system boot devices, kernel versions, and disk sizes. This command has been depricated in favor of the command /usr/bin/getconf. The bootinfo man page has been removed, and the command is only used in AIX by the booting and software installation utilities. It should not be used in customer-created shell scripts or run by hand. The getconf command will report much of the same information that bootinfo will: What was the device the system was last booted from?
# getconf BOOT_DEVICE hdisk0
What size is a particular disk in the system?

# getconf DISK_SIZE /dev/hdisk0 10240
What partition size is being used on a disk in the system?

# getconf DISK_PARTITION /dev/hdisk0 16
Is the machine capable of running a 64-bit kernel?

$ getconf HARDWARE_BITMODE 64
Is the system currently running a 64-bit or 32-bit kernel?
# getconf KERNEL_BITMODE 64
How much real memory does the system have?

# getconf REAL_MEMORY 524288
TOPICS: POWERHA / HACMP, SYSTEM ADMINISTRATION
Synchronizing 2 HACMP nodes

In order to keep users and all their related settings and crontab files synchronized, here's a script that you can use to do this for you: sync.ksh TOPICS: AIX, LINUX, SYSTEM ADMINISTRATION
Change the PuTTY title

When you have a lot of Putty screens, or if you frequently login to a host through a jump server, it's very easy to set the title of the Putty window, for exmaple to the hostname of the server you're currently logged into. This way, you'll easily recognize each telnet screen, and avoid entering -possibly destructive- commands in the wrong window:
echo "\033]0;`hostname`\007"
For example, you can add this line to /etc/profile, and have the hostname of the PuTTY title set automatically. TOPICS: AIX, SECURITY, SYSTEM ADMINISTRATION

On occasion I have the need to establish trust relationships between Unix boxes so that I can script file transfers. In short, here's how you leverage SSH to do that. Using the example of trying to connect from server "a" to get a file on "b" follow this example: Connect to "a". Type: ssh-keygen -t rsa The default directory for keyfiles will be ~/.ssh/ (if you do not want to be prompted, leave passphrase blank). Copy the contents of .ssh/id_rsa.pub (there should only be one line). Place this line on "b", in ~/.ssh/authorized_keys. That's it, you should now be able to ssh/sftp/scp from a to b without being prompted for a password! TOPICS: AIX, INSTALLATION, SYSTEM ADMINISTRATION

This is a short procedure how to creat an AIX DVD from a base set of 8 AIX 5.3 CD's: 1. Copy all CD's using "cp -hRp" to a directory, start with the last CD and work towards the first one. You can do this on either an AIX or a Linux system.
2. 3.
Check that <directory>/installp/ppc contails all install images. If not already, remove <directory>/usr/sys/inst.images. This directory also might contain all installation images.
4. 5.
Create a link <directory>/usr/sys/inst.images pointing to <directory>/installp/ppc. Find all .toc files in the directory structure and, if necessary, change all vol%# entries to vol%1 (There should be at least 2 .toc files that need these updates). You have to change vol%2 to vol%1, vol%3 to vol%1, etcetera, up till vol%8.
6.
7.

# mkisofs -R -o
Now you've created an ISO image that you can burn to a DVD. Some specific information on burning this ISO image on AIX to a DVD-RAM: Burning a DVD-RAM is somewhat more difficult than burning a CD. First, it depends if you have a slim-line DVD-RAM drive in a Power5 system or a regular DVD-RAM drive in Power4 systems (not slimline). Use DLPAR to move the required SCSI controller to a LPAR, in order to be able to use the DVD-RAM drive. After the DLPAR action of the required SCSI controller is complete, execute: cfgmgr. After the configuration manager has run, you will end up with either 1 or 2 DVD drives (depending on the actual drives in the hardware frame):
As you can see, the first is the DVD-RAM, the second is a DVD-ROM. Therefor, we will use the first one (in this sample). Place a DVD-RAM single sided 4.7 GB Type II disc (partnumber 19P0862) in the drive. DO NOT USE ANY OTHER TYPE OF DVD-RAM DISCS. OTHER TYPE OF DISCS ARE NOT SUPPORTED BY IBM. In case you have a POWER4 system: Be sure to use a use the case of the DVD-RAM in order to burn the DVD. DVD-RAM drives in Power4 systems will NOT burn if you removed the DVD-RAM from its case. Also, be sure to have the latest firmware level on the DVD-RAM drive (see
websitehttp://www14.software.ibm.com/webapp/set2/firmware for the correct level of the firmware for your drive). Without this firmware level these DVD-RAM drives are unable to burn Type II DVD-RAM discs. Using lscfg -vl cd0 you can check the firmware level:
Manufacturer................IBM Machine Type and Model......DVRM00203 ROS Level and ID............A132 Device Specific.(Z0)........058002028F000010 Part Number.................04N5272 EC Level....................F74471 FRU Number..................04N5967
The firmware level of this DVD-RAM drive is "A132". This level is too low in order to be able to burn Type II discs. Check the website for the latest level. The description on this webpage on how to install the DVD-RAM firmware was found to be inaccurate. Install firmware as follows: Download the firmware file and place it in /tmp on the server. You will get a filename with a "rpm" extension. Run:
Example:

Burning a DVD-RAM can take a long time. Variable burn times from 1 to 7 hours were seen!!! A DVD-RAM made in a slim-line DVD drive on a Power5 system can be read in a regular DVD drive on a Power4 system, if the latest firmware is installed on the DVD drive.
On a Linux system you can use a tool like K3B to write the ISO image to a regular DVD+R disc. TOPICS: HMC, SYSTEM ADMINISTRATION
Opening a virtual terminal window on HMC version 3.3.6

You may run into an issue with opening a virtual terminal window on an OLD HMC version (version 3.3.6). You can access the HMC through ssh, but opening a terminal window doesn't work. This ocurs when the HMC is in use for a full system partition on a frame: At the fist attempt to login through ssh to the HMC and running vtmenu:
# ssh -l hscroot hmc hscroot@hmc's password: hscroot@lawhmc2:~> vtmenu
Retrieving name of managed system(s) . . .
---------------------------------------------------------Managed Systems: ---------------------------------------------------------1) 10XXXXX-XXXX 2) 10YYYYY-YYYY 3) 10ZZZZZ-ZZZZ
Enter Number of Managed System.
(q to quit): 3
---------------------------------------------------------Partitions On Managed System: 10ZZZZZ-ZZZZ
----------------------------------------------------------
Enter Number of Running Partition (q to quit):
Here's where you may get stuck. Vtmenu allows you to select a frame, but won't show show any partition to start a virtual terminal window on. Seems obvious, because there aren't any partitions available (fullSystemPartition only). The solution is to run: mkvterm -m 10ZZZZZ-ZZZZ. This opens the virtual terminal window all right. When you're done, you can log out by using "~.". And if someone else is using the virtual terminal window, and you wish to close that virtual terminal window, run rmvterm -m 10ZZZZZ-ZZZZ. In case you're wondering, how to figure out the managed machine name to use with the
mkvterm and rmvterm commands, simply run vtmenu first. It shows you a list of managed machines controlled by this HMC. TOPICS: AIX, SECURITY, SYSTEM ADMINISTRATION
Portmir
A very nice command to use when you either want to show someone remotely how to do something on AIX, or to allow a non-root user to have root access, is portmir. First of all, you need 2 users logged into the system, you and someone else. Ask the other person to run the tty command in his/her telnet session and to tell you the result. For example:

# portmir -o
If you're the root user and the other person temporarily requires root access to do something (and you can't solve it by giving the other user sudo access, hint, hint!), then you can su - to root in the portmir session, allowing the other person to have root access, while you can see what he/she is doing. You may run into issues when you resize a screen, or if you use different types of terminals. Make sure you both have the same $TERM setting, i.e.: xterm. If you resize the screen, and the other doesn't, you may need to run the tset and/or the resizecommands. TOPICS: AIX, BACKUP & RESTORE, STORAGE, SYSTEM ADMINISTRATION
JFS2 snapshots
JFS2 filesystems allow you to create file system snapshots. Creating a snapshot is actually creating a new file system, with a copy of the metadata of the original file system (the snapped FS). The snapshot (like a photograph) remains unchanged, so it's possible to backup the snapshot, while the original data can be used (and changed!) by applications. When data on the original file system changes, while a snapshot exists, the original data is copied to the snapshot to keep the snapshot in a consistant state. For these changes, you'll need temporary space, thus you need to create a snapshot of a specific size to allow updates while the snapshot exists. Usually 10% is enough. Database file systems are usually not a very good subject for creating snapshots, because all database files change constantly when
the database is active, causing a lot of copying of data from the original to the snapshot file system. In order to have a snapshot you have to: Create and mount a JFS2 file system (source FS). You can find it in SMIT as "enhanced" file system. Create a snapshot of a size big enough to hold the changes of the source FS by issuing smitty crsnap. Once you have created this snapshot as a logical device or logical volume, there's a read-only copy of the data in source FS. You have to mount this device in order to work with this data. Mount your snapshot device by issuing smitty mntsnap. You have to provide a directory name over which AIX will mount the snapshot. Once mounted, this device will be read-only. Creating a snapshot of a JFS2 file system:

Mount the snapshot:



When you restore data from a snapshot, be aware that the backup of the snapshot is actually a different file system in your backup system, so you have to specify a restore destination to restore the data to. TOPICS: AIX, LVM, SYSTEM ADMINISTRATION


/IsoCD:
Cec Monitor
# topas -C

# yes > /dev/null
The yes command will continiously echo "yes" to /dev/null. This is a single-threaded process, so it will put load on a single processor. If you wish to put load on multiple processors, why not run yes a couple of times? TOPICS: AIX, BACKUP & RESTORE, SYSTEM ADMINISTRATION
Cloning a system using mksysb

If you wish to clone a system with a mksysb, then you can do so, but you do not want your cloned system to come up with the same TCP/IP information. Just issue rmtcpipbefore creating the mksysb, and you have a perfect image for cloning to another system. Be sure to issue this command at a terminal, as you will lose your network connection! TOPICS: AIX, SYSTEM ADMINISTRATION
Find: 0652-018 An expression term lacks a required parameter

If you get this error, you probably have one of the following things wrong: You've forgotten the slash and semicolon in the find command. Use findcommand like this:
# find /tmp -mtime +5 -type f -exec rm {} \;
If you get this error from crontab, then you should add an extra slash to the slash and semicolon. Use the find command like this:
0 2 * * * find /tmp -mtime +5 -type f -exec rm {} \\;
TOPICS: HMC, SYSTEM ADMINISTRATION
Useful HMC key combintaions

CTRL-ALT-F1: Switch to Linux command line; no login possible. If you then click on CTRLALT-DEL the system will reboot. CTRL-ALT-F2: Takes you back to the Xserver window. CTRL-ALT-BACKSPACE: Kills of the Xserver and will start a new -fresh- one, so you can login again. TOPICS: AIX, SYSTEM ADMINISTRATION
Determining microcodes
A very usefull command to list microcodes is lsmcode:
# lsmcode -c
Calculating dates in Korn Shell

Let's say you wish to calculate with dates within a Korn Shell script, for example "current date minus 7 days". How do you do it? There's a tiny C program that can do this for you, called ctimed. You can download it here: ctimed.tar. Executable ctimed uses the UNIX Epoch time to calculate. UNIX counts the number of seconds passed since Januari 1, 1970, 00:00.
So, how many seconds have passed since 1970?

# current=`./ctimed now`
This should give you a number well over 1 billion. How many seconds is 1 week? (7 days, 24 hours a day, 60 minutes an hour, 60 seconds an hour):
# let week=7*24*60*60
# let aweekago="$current-$week" Convert this into human readable format:

# ./ctimed $aweekago
You should get something like: Sat Sep 17 13:50:26 2005 TOPICS: AIX, SYSTEM ADMINISTRATION
Printing to a file
To create a printer queue that dumps it contents to /dev/null:
# /usr/lib/lpd/pio/etc/piomkpq -A 'file' -p 'generic' -d '/dev/null' -D asc -q 'qnull'
This command will create a queue named "qnull", which dumps its output to /dev/null. To print to a file, do exactly the same, except, change /dev/null to the/complete/path/to/your/filename you like to print to. Make sure the file you're printing to exists and has the proper access rights. Now you can print to this file queue:
# lpr -Pqfile /etc/motd
and the contents of your print will be written to a file. TOPICS: AIX, STORAGE, SYSTEM ADMINISTRATION

This will create a file called file.iso. Make sure you have enough storage space. Transfer this file to a PC with a CD-writer in it. Burn this ISO file to CD using Easy CD Creator or Nero.
The CD will be usable in any AIX CD-ROM drive. TOPICS: AIX, LINUX, SYSTEM ADMINISTRATION
Remote file system copy

How to copy a filesystem from one server to another: Make sure you can execute a remote shell on the target host (by adding an entry of the source host in the /.rhosts file). Login to the source system as root and enter:
# (cd LOCAL_MOUNTPOINT && tar cpvBf - . ) | rsh REMOTEHOST 'cd REMOTE_MOUNTPOINT && tar xpvBf -'
For ssh, use the following command:

# tar -cf - myfiles | ssh user@host "umask 000 ; cat | tar -xpf -"
You might also have run into the problem that, when FTP'ing CD software on a Windows PC to a remote AIX system, files with lowercase names suddenly change to uppercase file names. This is how to copy the complete contents of a CD on a Red Hat Linux system to a remote AIX system as a tar file: Login as root on the Linux system. Mount the CD-ROM:
# mount /mnt/cdrom # cd /mnt/cdrom
Tar the contents:

# tar -cvf - . | ssh userid@hostname "cd /path/to/where/you/want/it/on/the/target/system ; cat > filename.tar"
Unmount the CD-ROM:

# cd / # umount /mnt/cdrom
Important note: make sure you can write with your user-ID in the target folder on the target system. Otherwise your tar file might end up in the home directory of the user-ID used. TOPICS: AIX, SECURITY, SYSTEM ADMINISTRATION

First, install OpenSSH and OpenSSL on two UNIX servers, serverA and serverB. This works best using DSA keys and SSH2 by default as far as I can tell. All the other HOWTOs I've seen seem to deal with RSA keys and SSH1, and the instructions not surprisingly fail to work with SSH2. On each server type ssh someserver.example.com and make a connection with your regular password. This will create a .ssh dir in your home directory with the proper permissions. On your primary server where you want your secret keys to live (let's say serverA), type:
# ssh-keygen -t dsa
# ssh serverB
This should let you in without typing a password or passphrase. Hooray! You can ssh and scp all you want and not have to type any password or passphrase.
Fast IPL
Using FAST IPL only works on some RS6000 systems, like SP's or J/G/R30/40's. To configure FAST IPL:
# mpcfg -cf 11 1
Check current configuration:

# mpcfg -df
If you can only use a terminal to configure the Fast IPL: Put the key into service mode, press [ENTER] on the keyboard. Then type: sbb. Using the menu you can configure Fast IPL. Then reboot and switch the key back to Normal. TOPICS: AIX, SYSTEM ADMINISTRATION
Log file rotation script

A little script to rotate a log while not upsetting the process which is logging to the file. This script will copy and compress the log file, and then zero the log file out. Then the script will search for older log files and remove them after +3 days since last modification.
DATE=`date +%d%h-%I%p` BASE_DIR='/var' if [ -f $BASE_DIR/logname.log ]; then cp $BASE_DIR/logname.log $BASE_DIR/logname.log.$DATE > $BASE_DIR/logname.log compress $BASE_DIR/logname.log.$DATE fi
find $BASE_DIR -name 'logname.log.*' -a -mtime +3 -exec rm {} \;
Korn Shell history

To retrieve a list of all recent commands:
# history -100
This shows you the last 100 entries. TOPICS: AIX, BACKUP & RESTORE, SYSTEM ADMINISTRATION
DVD-RAM Backup
You can use a DVD-RAM to create a system backup. To do so, enter:
# smitty mkdvd
This works in AIX 5.2 and above. TOPICS: AIX, STORAGE, SYSTEM ADMINISTRATION

Mount the DVD-RAM:

Then use this as a regular filesystem. TOPICS: AIX, SYSTEM ADMINISTRATION
Processor speed and more system information

To quickly show you the processor speed, cpu type, amount of memory and other system information, type:
# lsconf
You can also use prtconf. TOPICS: AIX, BACKUP & RESTORE, SYSTEM ADMINISTRATION
/dev/ipldevice gone?
Sometimes, when you create an mksysb, you receive an error like this one:
/dev/ipldevice not found
Device /dev/ipldevice is a hard link to the disk your system booted from. Mksysb tries to determine the size of the boot logical volume with the bosboot -qad /dev/ipldevicecommand. Via lslv -m hd5 you can see from which disk was booted (or via bootlist -m normal -o).
To resolve this problem: re-create the hard link yourself:

# ln /dev/bootdevice /dev/ipldevice
For example:
ln /dev/rhdisk0 /dev/ipldevice
Note: Use "rhdisk" and not "hdisk". Another way to solve this problem: reboot your system and the /dev/ipldevice will be created automatically for you (Your users may prefer the first solution...). TOPICS: AIX, SSA, STORAGE, SYSTEM ADMINISTRATION
Renaming pdisks
If this doesn't help (it sometimes will), then renumber the disks manually: Write down the pdisk names, hdisk names, location of the disks in the SSA drawer and the connection ID's of the disks. You can use lsdev -Cc pdisk to show you all the pdisks and the drawer and location codes. Use lsdev -Clpdiskx -Fconnwhere to show the connection ID of a pdisk. Then, figure out, how you want all disks numbered. Remove the pdisks and hdisks with the rmdev -dl command. Create the pdisks again:

Test with:
if it shows hdisk3 (Usually the hdisk number is 2 higher than the pdisk number if you use 2 SCSI disks in the rootvg). If you've done all disks this way, check with lsdev -Cc pdisk. If you're happy, then varyon the volume group again and mount all filesystems. TOPICS: AIX, ODM, SYSTEM ADMINISTRATION



Date/Time:
Tue Oct
6 15:57:58 CDT 2009
Sequence Number: 585 Machine Id: Node Id: Class: Type: Resource Name: 0004D6EC4C00 hostname O TEMP OPERATOR
Detail Data MESSAGE FROM ERRLOGGER COMMAND My coffee is cold
# errclear 0
More info here: http://www.blacksheepnetworks.com/security/resources/aix-errornotification.html. TOPICS: AIX, NIM, SYSTEM ADMINISTRATION

A "how-to" restore a mksysb through NIM: Create a mksysb resource in NIM: Logon to the NIM server as user root. Run smitty nim, Perform NIM Administration Tasks, Manage resources, Define a resource, select mksysb, type name mksysb_, enter "master" as Server of Resource, enter the full path to the mksysb file at Location of Resource: e.g. /backup/hostname.image. Add the mksysb resource to the defined machine in NIM, together with the original SPOT and LPP source of the host: Run: smitty nim, Perform NIM Administration Tasks, Manage Machines, Manage Network Install Resource Allocation, Allocate Network Install Resources, select the machine, select the mksysb resource defined in the previous step, along with the correct SPOT and LPP_SOURCE of the oslevel of the system. Do a perform operation on the machine in NIM and set it to mksysb: Run smitty nim, Perform NIM Administration Tasks, Manage Machines, Perform Operations on Machines, select the machine, select bos_inst, set the Source for BOS Runtime Files to mksysb, set Remain NIM client after install to no, set Initiate Boot Operation on Client to no, set Accept new license agreements to yes. Start up the system in SMS mode and boot from the NIM server, using a virtual terminal on the HMC. Select the disks to install to. Make sure that you set import user volume groups to "yes". Restore the system. By the way, another method to initiate a mksysb restore is by using:
PerfPMR
When you suspect a performance problem, PerfPMR can be run. This is a tool generally used by IBM support personal to resolve performance related issues. The download site for this tool is: ftp://ftp.software.ibm.com/aix/tools/perftools/perfpmr TOPICS: AIX, NETWORKING, SYSTEM ADMINISTRATION

If these are disabled, you shouldn't see any ICMP messages any more. When one system tries to optimize its transmissions by discovering the path MTU, a pmtu entry is created in a Path MTU (PMTU) table. You can display this table using thepmtu display command. To avoid the accumulation of pmtu entries, unused pmtu entries will expire and be deleted when the pmtu_expire time (no -o pmtu_expire) is exceeded; default after 10 minutes. TOPICS: AIX, SYSTEM ADMINISTRATION
Proper vmstat output

Often when you run vmstat, the output may be looking very disorganized, because of column values not being shown properly under each row. A very simple solution to this issue is to run the same command with the "-w" flag, which will provide you with a wide vmstat output:
# vmstat -w

# lsnim -l hostname
Simple printer remediation

To resolve a stuck printer, do the following: Check the status of the queue:
Find the pio process in use for that queue:

# ps -ef | grep pio
Kill that pio process:

# kill [process-number]
Check the status of the queue again:

Enable the queue again:

# enable [queue-name]
System dump compression

Default compression of system dump isn't turned on, which may cause a lot of error report entries about your system dump devices being too small. Check if the compression is on/off:
# sysdumpdev -l
Without compression, sysdumpdev -e will estimate the system dump size. To turn compression on:
# sysdumpdev -C
This will reduce the required (estimated) dump size by 5-7. TOPICS: AIX, NETWORKING, SYSTEM ADMINISTRATION

This will transfer a file of 32K * 1024 = 32 MB. The transfer informaton will be shown by FTP. TOPICS: AIX, PERFORMANCE, STORAGE, SYSTEM ADMINISTRATION

This wil create a file consisting of 2097152 blocks of 1024 bytes, which is 2GB. You can change the count value to anything you like. Be aware of the fact, that if you wish to create files larger than 2GB, that your file system needs to be created as a "large file enabled file system", otherwise the upper file size limit is 2GB (under JFS; under JFS2 the upper limit is 64GB). Also check the ulimit values of the user-id you use to create the large file: set the file limit to -1, which is unlimited. Usually, the
file limit is default set to 2097151 in /etc/security/limits, which stands for 2097151 blocks of 512 bytes = 1GB. Another way to create a large file is:
# umount /BIG

Divide 2048/#seconds for MB/sec read speed. Tip: Run nmon (select a for adapter) in another window. You will see the throughput for each adapter. More information on JFS and JFS2 can be found here. TOPICS: AIX, NETWORKING, SYSTEM ADMINISTRATION

This command will permanently bring down the en0 interface (permanently means after reboot). TOPICS: AIX, BACKUP & RESTORE, SYSTEM ADMINISTRATION
Restoring a mksysb of a mirrored rootvg to a nonmirrored rootvg

If you've created a mksysb of a mirrored rootvg and you wish to restore it to a system with only 1 disk in the rootvg, you can go about it as follows: Create a new /image.data file, run:
# mkszfile -m
Change the image.data file:

# vi /image.data
In each lv_data stanza of this file, change the values of the COPIES= line by one-half (i.e. copies = 2, change to copies = 1). Also change the PPs to match the LPs as well. Create a new mksysb, utilizing the /image.data file:
# mksysb /dev/rmt0
(Do not use smit and do not run with the -i flag, both will generate a new image.data file). Use this mksysb to restore your system on another box without mirroring.
Copying a logical volume

For copying a logical volume, the cplv command can be used. Using the -v flag, you can enter the name of the volume group you wish to copy to. Once the logical volume has been copied, the trick is to get the file system in it back online: Copy the stanza of the file system in the logical volume you copied to a new stanza in/etc/filesystems. Modify this new stanza; enter the correct jfs log, mount point and logical volume name. After this, do a fsck of your new mount point. Make sure your new mount point exists. After the fsck, you can mount the file system. TOPICS: AIX, HMC, SYSTEM ADMINISTRATION

As of Hardware Management Console (HMC) Release 3, Version 2.3, the rexeccommand is no longer available on the HMC. Use ssh command instead. From Version 2.5, users are required to enter a valid HMC user id/password when downloading the WebSM client from the HMC. The URL for the WebSM client is: http://[HMC fully qualified domain name]/remote_client.html. Standard users receive the restriced shell via a set -r in .mysshrc when logging in. Comment the set -r command in /opt/hsc/data/ssh/hmcsshrc to get rid of the restricted shell for your users (it gets copied to $HOME/.mysshrc). For more information on commands that can be used in restriced shell on the HMC, go to HMC Power 4 Hints & Tips. A special hscpe user ID can be created which has unrestricted shell access via thepesh command. Use lshmc -v to determine the serial number of the HMC (after *SE). Then call IBM support and request for the password of the hscpe user for the peshcommand. IBM is able to generate a password for the hscpe user for one day. TOPICS: AIX, SYSTEM ADMINISTRATION
Creating graphs from NMON

Here's how: (This has been tested this with Nmon version 9f, which you can downloadhere): Run nmon -f for a while. This will create nmon output files *.nmon in your current directory. Make sure you've downloaded rrdtool. Install rrdtool by unpacking it in /usr/local/bin. Make sure directory /usr/local/bin is in your $PATH:
# export PATH="$PATH:/usr/local/bin"
Create a directory for the output of the graphs:

# mkdir output
Run the nmon2rrdv9 tool (found in the nmon download):

# ./nmon2rrdv9 -f [nmon output file] -d ./output -x
In directory output an index.html and several gif files will be created. By accessing index.html in a web browser, you can view the graphs. For a sample, click here. Tip: use nweb as your web browser. TOPICS: AIX, SYSTEM ADMINISTRATION
Find large files

How do you find really large files in a file system:
# find . -size +1024 -xdev -exec ls -l {} \;
The -xdev flag is used to only search within the same file system, instead of traversing the full directory tree. The amount specified (1024) is in blocks of 512 bytes. Adjust this value for the size of files you're looking for. TOPICS: AIX, SYSTEM ADMINISTRATION
TCPdump/IPtrace existing files?

When you receive an error like:
Do not specify an existing file
when using iptrace or tcpdump, then this is probably caused by a kernel extension already loaded. To resolve this, run:
# iptrace -u
After this, the kernel externsion is removed and iptrace or tcpdump will work again. TOPICS: AIX, SYSTEM ADMINISTRATION
Logical versus Physical directory using PWD

The pwd command will show you default the logical directory (pwd -L = default), which means, if any symbolic links are included in the path, that is, what will be shown. To show you the actual physical directory, use the next undocumented feature:
# pwd -P
Searching large amounts of files

Searching with grep in directories with large amounts of files, can get the following error:
The parameter list is too long
The workaround is as follows:

# ls | xargs grep "[search string]"
E.g.
# ls | xargs grep -i "error"
Kill all processes of a specific users

To kill all processes of a specific user, enter:
# ps -u [user-id] -o pid | grep -v PID | xargs kill -9
Another way is to use who to check out your current users and their terminals. Kill all processes related to a specific terminal:
# fuser -k /dev/pts[#]
Yet another method: Su to the user-id you wish to kill all processes of and enter:
# su - [user-id] -c kill -9 -1
How much paging space is this process using?

To discover the amount of paging space a proces is using, type:
# svmon -wP [proces-id]
Svmon shows you the amount of memory in 4KB blocks. TOPICS: AIX, STORAGE, SYSTEM ADMINISTRATION

Automatic FTP
How to do an automatic FTP from within a script: -n prevents automatic login and -v puts it in verbose mode asciorbin_Type should be set to either ascii or binary grep for $PHRASE (particular return code) from $LOGFILE to determine success or failure
ftp -nv $REMOTE_MACHINE < $LOGFILE user $USER_NAME $USER_PASSWORD $asciorbin_Type cd $REMOTE_PATH
put $LOCAL_FILE $REMOTE_FILE quit ! grep $PHRASE $LOGFILE
Changing maxuproc requires a reboot?

When you change MAXUPROC (Maximum number of processes allowed per user), the smitty help panel will tell you that changes to this operating system parameter will take effect after the next system reboot. This is wrong Help information. The change takes effect immediately, if MAXUPROC is increased. If it is decreased, then it will take effect after the next system reboot. This help panel text from smitty will be changed in AIX 5.3. APAR IY52397. TOPICS: AIX, SYSTEM ADMINISTRATION
Montoring a system without logging in

Let's say you have a helpdesk, where they must be able to run a script under user-id root to check or monitor a system: First, create a script, you wish your helpdesk to run. Modify your /etc/inetd.conf file and add:
check stream tcp wait root /usr/local/bin/script.sh
Where script.sh is the script you've written. Modify your /etc/services file and add:
check 4321/tcp
You may change the portnumber to anything you like, as long as it's not in use. Now, you may run:
# telnet [system] 4321
And your script will be magically run and it's output displayed on your screen. If the output of the script isn't displayed on your screen very long, just put a sleep command at the end of your script. TOPICS: AIX, SYSTEM ADMINISTRATION
Defunct processes
Defunct processes are commonly known as "zombies". You can't "kill" a zombie as it is already dead. Zombies are created when a process (typically a child process) terminates either abnormally or normally and it's spawning process (typically a parent process) does not "wait" for it (or has yet to "wait" for it) to return an exit status.
It should be noted that zombies DO NOT consume any system resources (except a process slot in the process table). They are there to stay until the server is rebooted. Zombies commonly occur on programs that were (incompletely) ported from old BSD systems to modern SysV systems, because the semantics of signals and/or waiting is different between these two OS families. See: http://www.hyperdictionary.com/dictionary/zombie+process TOPICS: AIX, SYSTEM ADMINISTRATION
DLpar with DVD-ROM

Adding a DVD-ROM with DLpar is very easy. Removing however, can be somewhat more difficult, especially when you've run cfgmgr and devices have been configured. This is how to remove it. Remove all cdrom devices found with lsdev -Cc cdrom:
# rmdev -dl cd0 # rmdev -dl ide0
Then remove all devices found with:

# lsdev -C | grep pci
All PCI devices still in use, can't be removed. The one not in use, is the PCI device where the DVD-ROM drive on was configured. You have to remove it before you can do a DLPAR remove operation on it. Now do your DLPAR remove operation. TOPICS: AIX, SYSTEM ADMINISTRATION
Sending attachments from AIX

How do you send an attachment via e-mail from AIX to Windows? Uuencode is the answer:
# uuencode [source-file] [filename].b64 | mail -v -s "subject" [emailaddress]
For example:
# uuencode /etc/motd motd.b64 | mail -v -s "Message of the day" email@hostname.com
The .b64 extension gets recognized by Winzip. When you receive your email in Outlook, you will have an attachment, which can be opened by Winzip (or any other unzip tool). You can combine this into a one-liner:
# ( echo "This is the body";uuencode /etc/motd motd.b64 ) | mail -s "This is the subject" email@hostname.com
If you want to attach tar of gzip images to an e-mail, you can also simply use those extensions to send through email, as these extensions are also properly recognized by Winzip:
# uuencode file.tar file.tar | mailx -s "subject" email@hostname.com # uuencode file.tar.gz file.tar.gz | mailx -s "subject"
email@hostname.com
FTP umask
A way to change the default 027 umask of ftp is to change the entry in /etc/inetd.conffor ftpd:
ftp stream tcp6 nowait root /usr/sbin/ftpd -l -u 117
This will create files with umask 117 (mode 660). Using the -l option will make sure the FTP sessions are logged to the syslogd. If you want to see these FTP messages in thesyslogd output, then you should add to /etc/syslog.conf:
daemon.info [filename]
Centralized shell history

It's a good idea to centralize the shell history files for ease in tracking the actions done by the users: Create a ${hist_dir}. Add the following lines to the /etc/profile file:
export HISTFILE=/${hist_dir}/${LOGNAME}_`date "+%Y%m%d_%H%M%S"` export HISTSIZE=2000
Control-M
When exchanging text files between Windows and AIX systems, you often run into ^M (CTRL-M) characters at the end of each line in a text file. To remove these ugly characters:
tr -d '^M' < [inputfile] > [outputfile]
To type the ^M character on the command line: press CTRL, then type v and the m. Another way: download this zip archive: controlm.zip (1KB). This zip archive includes 2 files: unix2dos and dos2unix, which you can run on AIX: To convert a Windows file to Unix file:
# dos2unix [filename]
Bootinfo
To find out if your machine has a 64 or 32 bit architecture:
# bootinfo -y
To find out which kernel the system is running:

# bootinfo -K
You can also check the link /unix:

# ls -ald /unix
unix_mp: 32 bits, unix_64: 64 bits To find out from which disk your system last booted:
# bootinfo -b
To find out the size of real memory:

# bootinfo -r
To display the hardware platform type:

# bootinfo -T
Resizing the jfs log

In general IBM recommends that JFS log devices be set to 2MB for every 1GB of data to be protected. The default jfslog for rootvg is /dev/hd8 and is by default 1 PP large. In some cases, file system activity is too heavy or too frequent for the log device. When this occurs, the system will log errors like JFS_LOG_WAIT or JFS_LOG_WRAP. First try to reduce the filesystem activity. If that's not possible, this is the way to extend the JFS log: Determine which log device to increase. This can be determined by its Device Major/Minor Number in the error log:
# errpt -a
An example output follows:

Device Major/Minor Number 000A 0003
The preceding numbers are hexadecimal numbers and must be converted to decimal values. In this exmpale, hexadecial 000A 0003 equals to decimal numbers 10 and 3. Determine which device corresponds with these Device Major/Minor Numbers:
# ls -al /dev | grep "10, 3"
If the output from the preceding command reveals that the log device the needs to be enlarged is /dev/hd8 (the default JFS log device for rootvg), then special actions are needed. See further on. Increase the size of /dev/hd8:
extendlv hd8 1
If the jfslog device is /dev/hd8, then boot the machine into Service Mode and access the root volume group and start a shell. If the jfslog is a user created jfslog, then unmount all filesystems that use the jfslog in question (use mountto show the jfslog used for each filesystem).
Format the jfslog:
# logform [jsflog device]
For example:
logform /dev/hd8
If the jfslog device is /dev/hd8, then reboot the system:

# sync; sync; sync; reboot
If the jfslog is a user created jfslog, then mount all filesystems again after thelogform completed. TOPICS: AIX, SYSTEM ADMINISTRATION
Cleaning file systems

It sometimes occurs that a file system runs full, while a process is active, e.g. when a process logs its output to a file. If you delete the log file of a process when it is still active, the file will be gone, but the disk space will usually not be freed. This is because the process keeps the inode of the file open as long as the process is active and still writes to the inode. After deleting the file, it's not available as file anymore, so you can't view the log file of the process anymore. The disk space will ONLY be freed once the process actually ends. To overcome it, don't delete the log file, but copy /dev/null to it:
# cp /dev/null [logfile]
This will clear the file, free up the disk space and the process logging to the file will just continue logging as nothing ever happened. TOPICS: AIX, SYSTEM ADMINISTRATION
Finding files with no defined user or group

Use the following command to find any files that no longer have a valid user and/or group, which may happen when a group or user is deleted from a system:
# find / -fstype jfs $ -nouser -o -nogroup $ -ls

2. Make sure the commands in the crontab actually exist An entry in a crontab with a command that does not exits, will generate an email message
from the cron daemon to the user, informing the user about this issue. This is something that may occur on HACMP clusters where crontab files are synchronized on all HACMP nodes. They need to be synchronize on all the nodes, just in case a resource group fails over to a standby node. However, the required file systems containing the commands may not be available on all the nodes at all time. To get around that, test if the command exists first:
4. Forward the email to the user Very effective: Create a .forward file in the users home directory, to forward all email messages to the user. If the user starts receiving many, many emails, he/she will surely do somehting about it, when it gets annoying. TOPICS: LVM, POWERHA / HACMP, SYSTEM ADMINISTRATION
VGDA out of sync

With HACMP, you can run into the following error during a verification/synchronization: WARNING: The LVM time stamp for shared volume group: testvg is inconsistent with the time stamp in the VGDA for the following nodes: host01 To correct the above condition, run verification & synchronization with "Automatically correct errors found during verification?" set to either 'Yes' or 'Interactive'. The cluster must be down for the corrective action to run. This can happen when you've added additional space to a logical volume/file system from the command line instead of using the smitty hacmp menu. But you certainly don't want to take down the entire HACMP cluster to solve this message. First of all, you don't. The cluster will fail-over nicely anyway, without these VGDA's being in sync. But, still, it is an annoying warning, that you would like to get rid off. Have a look at your shared logical volumes. By using the lsattr command, you can see if they are actually in sync or not:
host01 # lsattr -Z: -l testlv -a label -a copies -a size -a type -a strictness -Fvalue
/test:1:809:jfs2:y:
Well, there you have it. One host reports testlv having a size of 806 LPs, the other says it's 809. Not good. You will run into this when you've used the extendlv and chfs commands to increase the size of a shared file system. You should have used the smitty menu. The good thing is, HACMP will sync the VGDA's if you do some kind of logical volume operation through the smitty hacmp menu. So, either increase the size of a shared logical volume through the smitty menu with just one LP (and of course, also increase the size of the corresponding file system); Or, you can create an additional shared logical volume through smitty of just one LP, and then remove it again afterwards. When you've done that, simply re-run the verification/synchronization, and you'll notice that the warning message is gone. Make sure you run the lsattr command again on your shared logical volumes on all the nodes in your cluster to confirm. TOPICS: LINUX, SYSTEM ADMINISTRATION
Enabling sendmail on Linux

Make sure the relay host, e.g. the Exchange server allows incoming email from your Linux server. Make sure no firewall is blocking SMTP traffic from the Linux host. You can usenmap for this purpose:
# nmap -sS smtp.server.com
(Replace "smtp.server.com" for the actual SMTP server hostname of your environment). Check it the DNS configuration is correct in /etc/resolv.conf and make sure you can resolve the hostname and its IP address reversely:
# nslookup hostname # nslookup ipaddress
(use the IP address returned by the first DNS lookup on the hostname to reversely lookup the hostname by the IP address). Make a copy of sendmail.mc and sendmail.cf in /etc/mail. Edit sendmail.mc (add in the name of your SMTP server):
define(`confTRUSTED_USER', `root')dnl define(`SMART_HOST', èsmtp:smtp.server.com')dnl MASQUERADE_AS(`hostname.com')dnl FEATURE(masquerade_envelope)dnl FEATURE(masquerade_entire_domain)dnl
Then run:
# make -C /etc/mail
Edit sendmail.cf by modifying the "C{E}" line in sendmail.cf. Take any user listed on that line including root off that line, so mail sent from root gets masqueraded as well. Towards the bottom of sendmail.cf file, there is a section for Ruleset 94. Make sure that after "R$+" there is ONE tab (no space, or multiple spaces/tabs):
SMasqEnv=94 R$+ $@ $>MasqHdr $1
Clean out /var/spool/clientmqueue and /var/spool/mqueue (there may be lots of OLD emails there, we may not want to send these anymore). Then restart sendmail:
# service sendmail restart
(or "service sendmail start" if it isn't running yet; check the status with: "service sendmail status"). Make sure that sendmail is started at system restart:
# chkconfig sendmail on # chkconfig --list sendmail
Open a "tail -f /var/log/maillog" so you can watch any syslog activity for mail (of course there should be a "mail.*" entry in /etc/syslog.conf directing output to /var/log/maillog for this to work).
Send a test email message:

# echo "test" | sendmail -v address@email.com
(and check that the email message is actually accepted for delivery in the verbose output). Wait for the mail to arrive in your mailbox. TOPICS: LINUX, SYSTEM ADMINISTRATION
How to enable ntpd on Linux

This is a procedure to enable time synchronization (ntpd) on Linux (in this example, replace the IP address of the time server with the IP address of your time server): Stop all applications on the server. Check if you can access the time servers, e.g.:
# ntpdate -q 10.250.9.11
Check if the current timezone setting is correct by simply running the date command. Set the time and date correct:
# ntpdate 10.250.9.11
Start the NTP server:

# service ntpd start
Check the status:

# service ntpd status
Check the time synchronization (it may take some time for the client to synchronize with its time server):
# ntpq -p
Check that ntpd is started at system restart:

# chkconfig ntpd on # chkconfig --list | grep ntpd
Check the process:

# ps -ef | grep ntpd
Reboot the server:

# reboot

Then it means that you have the bootpd enabled on your server. There's nothing wrong with that. In fact, a NIM server for example requires you to have this enabled. However; these messages on the console can be annoying. There are systems on your network that are sending bootp requests (broadcast). Your system is listening to these requests and trying to answer. It is looking in the bootptab configuration (file /etc/bootptab) to see if their macaddresses are defined. When they aren't, you are getting these messages. To solve this, either disable the bootpd daemon, or change the syslog configuration. If you don't need the bootpd daemon, then edit the /etc/inetd.conf file and comment the entry for bootps. Then run:
# refresh -s inetd

If you run:
# powermt restore

Or you could run:

# powermt check
TOPICS: STORAGE, SYSTEM ADMINISTRATION
Inodes without filenames

It will sometimes occur that a file system reports storage to be in use, while you're unable to find which file exactly is using that storage. This may occur when a process has used disk storage, and is still holding on to it, without the file actually being there anymore for whatever reason. A good way to resolve such an issue, is to reboot the server. This way, you'll be sure the process is killed, and the disk storage space is released. However, if you don't want to use such drastic measures, here's a little script that may help you trying to find the process that may be responsible for an inode without a filename. Make sure you have lsof installed on your server.
#!/usr/bin/ksh
# Make sure to enter a file system to scan # as the first attribute to this script. FILESYSTEM=$1 LSOF=/usr/sbin/lsof
# A for loop to get a list of all open inodes # in the filesystem using lsof. for i in `$LSOF -Fi $FILESYSTEM | grep î | sed s/i//g` ; do # Use find to list associated inode filenames. if [ `find $FILESYSTEM -inum $i` ] ; then echo > /dev/null else # If filename cannot be found, # then it is a suspect and check lsof output for this inode. echo Inode $i does not have an associated filename: $LSOF $FILESYSTEM | grep -e $i -e COMMAND fi done
TOPICS: SYSTEM ADMINISTRATION
System administration best practices

System Administrators can be the worst kind of users on your system The ideal computer system is a system that doesn't have any users on it, or isn't related to any user action. Why? Well, as long as users can't access the computer system or users can't create a load on a system, the system will run smoothly. But this isn't reality. Without users, there wouldn't be any computer systems. And without users, there wouldn't be system administrators. Although users can be idiots and amaze you of all the stupid things they do, it's the collegue system administrators you'll have to really watch out for, because they have root authority and can mess up this quite badly. The people that install a system, should be responsible for the system If you install a system, and know you'll have to manage it yourself once it's in production, you'll make sure the system in configured correctly. I've seen it many times: the people responsible for installing the system, aren't responsible for the maintenance of the system. This creates a "throw-it-over-the-fence" effect: people installing a system really don't have a clue what kind of administration nightmares they've created and all the problems administrators run into during the most horrible hours of the day (usually Murphy preferres Sunday night, or when you're about to go to sleep). Make absolutely sure that once a system is installed, the same people have to manage it, at least during the first two months of production. Poorly designed systems are difficult to administer Take your time designing a system and learn about a specific application before implementing it. Rapid designs under high time-pressure usually end up causing lots of problems during production (and they give your administrator(s) a long-lasting headache). Make sure the documentation of a system is made, before going into production. And, as an administrator, perform a health check for accepting to manage the newly installed system. Also, keep the previous best-practice tip in mind: if you haven't installed the system, you have no clue what you're getting into. And finally: during the implementation phase of a project, as a system
administrator, you should be involved in the project, to be able to help design, install, configure and document the system. Don't combine different information systems on one server Sharing one operating system image by different information systems usually leads to problems, as these different information systems conflict in tuning parameters, backup windows, downtime slots, user information, peak usage, etcetera. Systems that do different tasks should be separated from each other to avoid dependencies. A grouping of information systems can be made by use, for example put databases together on one server. Or you can group them by product, for example put a single product and its database on one box. But NEVER, EVER put software from different vendors on 1 system! Naming conventions should be easy When choosing names for your hosts, printers, users etcetera, keep a few simple rules in mind: Choose names that are easy to remember and are not too long (8 characters max). Choose names NOT related to any department or other part of your organisation. Departments keep changing over time; by not naming your systems after departments, it will save you lots of time. Choose names NOT related to any location. Locations also change frequently, when assets are moved around. Never EVER reuse a name. Choose new names for new assets or users. When migrating from one host to another, don't use the same hostname, choose a new one. Rather use DNS aliases when you wish to keep a hostname. Don't configure hostname aliases on a network interface to keep using the old hostname. And for users and groups, always use a new UID and GID. When a system has more than 1 network interface, choose hostnames related to each other: Service address: rembrandt; Boot-address: rembrandt_boot; Standbyaddress: rembrandt_standby; etcetera. Choose a standard naming convention; don't change your naming convention. Never use a hostname twice, even though they are in seperate networks. Never enough disk space Never give your users too much disk space at once. Users will always find a way to fill up all disk space. Give them small amounts at a time. Encourage your users to clean up their directories, before requesting new disk space. This will save you time, disk space, backup throughput and money. Temporary space (like /tmp) is TEMPORARY. Make sure your users know that. Clean out temporary space every night and be ruthless about it! Applications can NEVER use temporary space in /tmp; applications should use separate file systems for storing temporary files. Put your static application data in a file system, SEPERATED from the changing data of an application. Usually every application should use 2 file systems at least: 1 for the application binaries and 1 for the data (and also log files). This will make sure your application file system will never run full.
Versioning Keep the least amount of versions of applications or operating system levels on your systems. The lesser the number of versions, the less you have to manage and the easier it becomes. Try to standardize on a small amount of versions. Only use supported versions of applications and operating systems. Check regularly which versions are supported. Upgrade in time, but not too fast! Usually a N-1 best practice should be used (always stay one version behind the released levels from vendors). Don't try to use the newest versions, as these versions usually suffer from all kinds of defects, yet to be discovered. This applies to application software, OS levels, service packs, firmware levels, etcetera. Know what you're doing! If you don't know what your doing EXACTLY, just don't do it. Get educated. Take time away from your boss for training. Being away from your work will make sure, you won't be disturbed all the time during your studies. Switch your mobile off and tell everybody you're not available. Take at least 2 courses a year with at least 40 hours of study each. An employer not wishing to pay for studies is not a good employer. IT is a fast-changing arena, so you have to keep up. Take courses related to your work. Don't bother taking courses, where you learn someting, you'll never use. Before going on a course, read about the subject. Learning is a lot more easier if you know already something about it. Make sure you have all the prerequisites when taking a course. After a course, actually use your new-gained knowledge. Also, get certified. Good for your career, but also good for the understanding of a subject. Certification requires you to actually use a certain product for an extended period of time thoroughly and also requires you to read books or get training on the subject. Don't do good-luck-certifications. Do your learning. Doing a certification 3 times on a single day, just to get certified, won't give you the needed knowledge. At least 2 certifications a year! Get a test system and try out your knowledge. Don't use production systems for testing. Also, make sure the test system isn't on the same network as your production system. Write down what your doing. It is always easier to look something up again instead of guessing what you did. Create procedures and stick to procedures. Keep procedures short! Keep it Simple We have difficulty understanding systems as they become more complex. Complexity leads to more errors and greater maintenance effort. We want systems to be more understandable, more maintainable, more flexible, and less error-prone. System design is not a haphazard process. There are many factors to consider in any design effort. All design should be as simple as possible, but no simpler. This facilitates having a more easily understood, and easily maintained system. This is not to say that features, even internal features, should be discarded in the name of simplicity. Indeed, the more elegant designs are usually the more
simple ones. Simple also does not mean "quick and dirty." In fact, it often takes a lot of thought and work over multiple iterations to simplify. Result: Systems that are more maintainable, understandable, and less error-prone. Manage the information system as a whole Don't just administer the operating system and its supporting hardware. The OS is always used to provide some kind of basis for an information system. You need to know the complete picture of the information system itself; its parts, the interfaces, etcetera, to understand the role of the OS in it. System Administration is a lot easier, knowing what the information system is used for. Therefore, manage a system from the users point of view. How will it affect the users if you change anything on the OS or on the underlying hardware level? Backup, backup and backup! Before doing anything on your system, make ABSOLUTELY sure you have a full, working backup of your system! Check this over and over again. A system should be backed up once every day. Determine what you should do when a backup fails. Determine how you should restore your system. Document your backup and restore procedures. And ofcourse, test it at regular intervals, by restoring a backup on a separate system. Last but not least Did you know that companies are spending roughly 70 to 90 percent of their complete IT budgets on maintaining their systems? Knowing this, it's a huge responsibility to maintain the systems in the best possible manner!
TOPICS: SYSTEM ADMINISTRATION, VIRTUAL I/O SERVER, VIRTUALIZATION
Accessing the virtual terminal on IVM managed hosts

Virtual clients running on a IVM (Integrated Virtualization Manager) do not have a direct atached serial console nor a virtual window which can be opened via an HMC. So how do you access the console? You can log on as the padmin user on the VIOS which is serving the client you want to logon to its console. Just log on to the VIOS, switch to user padmin:
# su - padmin
Then run the lssyscfg command to list the available LPARs and their IDs on this VIOS:
# lssyscfg -r lpar -F name,lpar_id
Alternatively you can log on to the IVM using a web browser and click on "View/Modify Partitions" which will also show LPAR names and their IDs. Use the ID of the LPAR you wish to access:
# mkvt -id [lparid]
This should open a console to the LPAR. If you receive a message "Virtual terminal is already connected", then the session is already in use. If you are sure no one else is using it, you can use the rmvt command to force the session to close.
# rmvt -id [lparid]
After that you can try the mkvt command again. When finished log off and type "~." (tilde dot) to end the session. Sometimes this will also close the session to the VIOS itself and you may need to logon to the VIOS again. TOPICS: AIX, BACKUP & RESTORE, SYSTEM ADMINISTRATION, VIRTUAL I/O SERVER, VIRTUALIZATION

The first command (viosbr) will create a backup of the configuration information to /home/padmin/cfgbackups. It will also schedule the command to run every day, and keep up to 10 files in /home/padmin/cfgbackups. The second command is the mksysb equivalent for a Virtual I/O Server: backupios. This command will create the mksysb image in the /mksysb folder, and exclude any ISO repositiory in rootvg, and anything else excluded in /etc/exclude.rootvg.
Using the Command-Line Interface for LPM

Once you've successfully set up live partition mobility on a couple of servers, you may want to script the live partition mobility migrations, and at that time, you'll need the commands to perform this task on the HMC. In the example below, we're assuming you have multiple managed systems, managed through one HMC. Without, it would be difficult to move an LPAR from one managed system to another. First of all, to see the actual state of the LPAR that is to be migrated, you may want to start the nworms program, which is a small program that displays wriggling worms along with the serial number on your display. This allows you to see the serial number of the managed system that the LPAR is running on. Also, the worms will change color, as soon as the LPM migration has been completed. For example, to start nworms with 5 worms and an acceptable speed on a Power7 system, run:
# ./nworms 5 50000
Next, log on through ssh to your HMC, and see what managed systems are out there:
> lssyscfg -r sys -F name Server1-8233-E8B-SN066001R Server2-8233-E8B-SN066002R Server3-8233-E8B-SN066003R
It seems there are 3 managed systems in the example above. Now list the status of the LPARs on the source system, assuming you want to migrate from Server1-8233-E8B-SN066001R, moving an LPAR to Server2-8233-E8B-SN066002R:
> lslparmigr -r lpar -m Server1-8233-E8B-SN066001R name=vios1,lpar_id=3,migration_state=Not Migrating name=vios2,lpar_id=2,migration_state=Not Migrating name=lpar1,lpar_id=1,migration_state=Not Migrating
The example above shows there are 2 VIO servers and 1 LPAR on server Server1-8233E8B-SN066001R. Validate if it is possible to move lpar1 to Server2-82330E8B-SN066002R:
> migrlpar -o v -t Server2-8233-E8B-SN066002R -m Server1-8233-E8B-SN066001R --id 1 > echo $? 0
The example above shows a validation (-o v) to the target server (-t) from the source server (m) for the LPAR with ID 1, which we know from the lslparmigr command is our LPAR lpar1. If the command returns a zero, the validation has completed successfully. Now perform the actual migration:
> migrlpar -o m -t Server2-8233-E8B-SN066002R -m Server1-8233-E8B-SN066001R -p lpar1 &
This will take a couple a minutes, and the migration is likely to take longer, depending on the size of memory of the LPAR. To check the state:
> lssyscfg -r lpar -m Server1-8233-E8B-SN066001R -F name,state
Or to see the number of bytes transmitted and remaining to be transmitted, run:

> lslparmigr -r lpar -m Server1-8233-E8B-SN066001R -F name,migration_state,bytes_transmitted,bytes_remaining
Or to see the reference codes (which you can also see on the HMC gui):
> lsrefcode -r lpar -m Server2-8233-E8B-SN066002R lpar_name=lpar1,lpar_id=1,time_stamp=06/26/2012 15:21:24, refcode=C20025FF,word2=00000000 lpar_name=vios1,lpar_id=2,time_stamp=06/26/2012 15:21:47, refcode=,word2=03400000,fru_call_out_loc_codes= lpar_name=vios2,lpar_id=3,time_stamp=06/26/2012 15:21:33, refcode=,word2=03D00000,fru_call_out_loc_codes=
After a few minutes the lslparmigr command will indicate that the migration has been completed. And now that you know the commands, it's fairly easy to script the migration of multiple LPARs. TOPICS: AIX, STORAGE, SYSTEM ADMINISTRATION, VIRTUALIZATION

# odmget -q 'attribute = hcheck_interval AND uniquetype = \ PCM/friend/vscsi' PdAt | sed 's/deflt = \"0\"/deflt = \"60\"/' \
| odmchange -o PdAt -q 'attribute = hcheck_interval AND \ uniquetype = PCM/friend/vscsi'
TOPICS: VIRTUAL I/O SERVER, VIRTUALIZATION
Virtual I/O Server lifecycle dates

Product PowerVM VIOS Enterprise Edition PowerVM VIOS Express Edition PowerVM VIOS Standard Edition PowerVM VIOS Enterprise Edition PowerVM VIOS Express Edition PowerVM VIOS Standard Edition Virtual I/O Server Virtual I/O Server Virtual I/O Server Virtual I/O Server Virtual I/O Server Version 2.2.x 2.2.x 2.2.x 2.1.x 2.1.x 2.1.x 1.5.x 1.4.x 1.3.x 1.2.x 1.1.x End of Support not announced not announced not announced Sep 30, 2012 Sep 30, 2012 Oct 30, 2012 Sep 30, 2011 Sep 30, 2010 Sep 30, 2009 Sep 30, 2008 Sep 30, 2008
Source: http://www-01.ibm.com/software/support/aix/lifecycle/index.html TOPICS: AIX, SYSTEM ADMINISTRATION, VIRTUALIZATION

VLAN to be set up: PVID 4. This number is basically randomly chosen; it could have been 23 or 67 or whatever, as long as it is not yet in use. Proper documentation of your VIO setup and the defined networks, is therefore important. Steps to set this up: Log in to HMC GUI as hscroot. Change the default profile of server1, and add a new virtual Ethernet adapter. Set the port virtual Ethernet to 4 (PVID 4). Select "This adapter is required for virtual server activation". Configuration -> Manage Profiles -> Select "Default" -> Actions -> Edit -> Select "Virtual Adapters" tab -> Actions -> Create Virtual Adapter -> Ethernet adapter ->
Set "Port Virtual Ethernet" to 4 -> Select "This adapter is required for virtual server activation." -> Click Ok -> Click Ok -> Click Close. Do the same for server2. Now do the same for both VIO clients, but this time do "Dynamic Logical Partitioning". This way, we don't have to restart the nodes (as we previously have only updated the default profiles of both servers), and still get the virtual adapter. Run cfgmgr on both nodes, and see that you now have an extra Ethernet adapter, in my case ent1. Run "lscfg -vl ent1", and note the adapter ID (in my case C5) on both nodes. This should match the adapter IDs as seen on the HMC. Now configure the IP address on this interface on both nodes. Add the entries for server1priv and server2priv in /etc/hosts on both nodes. Run a ping: ping server2priv (from server1) and vice versa. Done! Steps to throw it away: On each node: deconfigure the en1 interface:

# lsdev -Cc adapter
Done! TOPICS: AIX, INSTALLATION, LOGICAL PARTITIONING, VIRTUALIZATION

The most popular innovation of IBM AIX Version 6.1 is clearly workload partitioning (WPARs). Once you get past the marketing hype, you'll need to determine the value that WPARs can provide in your environment. What can WPARs do that Logical Partitions (LPARs) could not? How and when should you use WPARs? Equally as important, when should you not use Workload Partitioning. Finally, how do you create, configure, and administer workload partitions? For a very good introduction to WPARs, please refer to the following article:https://www.ibm.com/developerworks/aix/library/au-wpar61aix/ or download the PDF
version here. This article describes the differences between system and application WPARs, the various commands available, such as mkwpar, lswpar, startwpar and clogin. It also describes how to create and manage file systems and users, and it discusses the WPAR manager. It ends with an excellent list of references for further reading. TOPICS: LOGICAL PARTITIONING, VIRTUAL I/O SERVER, VIRTUALIZATION
Introduction to VIO
Prior to the introduction of POWER5 systems, it was only possible to create as many separate logical partitions (LPARs) on an IBM system as there were physical processors. Given that the largest IBM eServer pSeries POWER4 server, the p690, had 32 processors, 32 partitions were the most anyone could create. A customer could order a system with enough physical disks and network adapter cards, so that each LPAR would have enough disks to contain operating systems and enough network cards to allow users to communicate with each partition. The Advanced POWER Virtualization feature of POWER5 platforms, makes it possible to allocate fractions of a physical CPU to a POWER5 LPAR. Using virtual CPU's and virtual I/O, a user can create many more LPARs on a p5 system than there are CPU's or I/O slots. The Advanced POWER Virtualization feature accounts for this by allowing users to create shared network adapters and virtual SCSI disks. Customers can use these virtual resources to provide disk space and network adapters for each LPAR they create on their POWER5 system.
There are three components of the Advanced POWER Virtualization feature: MicroPartitioning, shared Ethernet adapters, and virtual SCSI. In addition, AIX 5L Version 5.3 allows users to define virtual Ethernet adapters permitting inter-LPAR communication. Micro-Partitioning An element of the IBM POWER Virtualization feature called Micro-Partitioning can divide a single processor into many different processors. In POWER4 systems, each physical
processor is dedicated to an LPAR. This concept of dedicated processors is still present in POWER5 systems, but so is the concept of shared processors. A POWER5 system administrator can use the Hardware Management Console (HMC) to place processors in a shared processor pool. Using the HMC, the administrator can assign fractions of a CPU to individual partitions. If one LPAR is defined to use processors in the shared processor pool, when those CPUs are idle, the POWER Hypervisor makes them available to other partitions. This ensures that these processing resources are not wasted. Also, the ability to assign fractions of a CPU to a partition means it is possible to partition POWER5 servers into many different partitions. Allocation of physical processor and memory resources on POWER5 systems is managed by a system firmware component called the POWER Hypervisor. Virtual Networking Virtual networking on POWER5 hardware consists of two main capabilities. One capability is provided by a software IEEE 802.1q (VLAN) switch that is implemented in the Hypervisor on POWER5 hardware. Users can use the HMC to add Virtual Ethernet adapters to their partition definitions. Once these are added and the partitions booted, the new adapters can be configured just like real physical adapters, and the partitions can communicate with each other without having to connect cables between the LPARs. Users can separate traffic from different VLANs by assigning different VLAN IDs to each virtual Ethernet adapter. Each AIX 5.3 partition can support up to 256 Virtual Ethernet adapters. In addition, a part of the Advanced POWER virtualization virtual networking feature allows users to share physical adapters between logical partitions. These shared adapters, called Shared Ethernet Adapters (SEAs), are managed by a Virtual I/O Server partition which maps physical adapters under its control to virtual adapters. It is possible to map many physical Ethernet adapters to a single virtual Ethernet adapter, thereby eliminating a single physical adapter as a point of failure in the architecture. There are a few things users of virtual networking need to consider before implementing it. First, virtual networking ultimately uses more CPU cycles on the POWER5 machine than when physical adapters are assigned to a partition. Users should consider assigning a physical adapter directly to a partition when heavy network traffic is predicted over a certain adapter. Secondly, users may want to take advantage of larger MTU sizes that virtual Ethernet allows, if they know that their applications will benefit from the reduced fragmentation and better performance that larger MTU sizes offer. The MTU size limit for SEA is smaller than Virtual Ethernet adapters, so users will have to carefully choose an MTU size so that packets are sent to external networks with minimum fragmentation. Virtual SCSI
The Advanced POWER Virtualization feature called virtual SCSI allows access to physical disk devices which are assigned to the Virtual I/O Server (VIOS). The system administrator uses VIOS logical volume manager commands to assign disks to volume groups. The administrator creates logical volumes in the Virtual I/O Server volume groups. Either these logical volumes or the physical disks themselves may ultimately appear as physical disks (hdisks) to the Virtual I/O Server's client partitions, once they are associated with virtual SCSI host adapters. While the Virtual I/O Server software is packaged as an additional software bundle that a user purchases separately from the AIX 5.3 distribution, the virtual I/O client software is a part of the AIX 5.3 base installation media, so an administrator does not need to install any additional filesets on a Virtual SCSI client partition. TOPICS: HMC, LOGICAL PARTITIONING, VIRTUALIZATION

# lssycfg -r lpar



This command will show the LED code.

AIX Health Check

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

AIX Health Check

Încărcat de

Drepturi de autor:

Formate disponibile

TOPICS: AIX, SECURITY, SYSTEM ADMINISTRATION

Put this in /etc/profile on all servers:

Put this in /etc/environment, to turn on time stamped history files:

Use machstat to identify power or cooling issues

LVM command history

To filter out only the actual commands:

TOPICS: AIX, SYSTEM ADMINISTRATION

Suspending and resuming a process

After pressing CTRL-Z, you'll see:

Then type "fg" to resume the process:

root 14680240 10092788

Then suspend the process with signal 17:

To resume it again, send signal 19:

TOPICS: AIX, SYSTEM ADMINISTRATION

RANDOM Korn Shell built-in

19962 # echo $RANDOM 19360

To sleep between 1 and 600 seconds (up to 10 minutes):

TOPICS: AIX, SYSTEM ADMINISTRATION

Number of active virtual processors

# echo vpm | kdb

ACTIVE ACTIVE ACTIVE ACTIVE DISABLED DISABLED DISABLED DISABLED

AWAKE AWAKE AWAKE AWAKE AWAKE SLEEPING SLEEPING SLEEPING

0000000000000000 0000000000000000 0000000000000000 0000000000000000 00000000503536C7 0000000051609EAF 0000000051609E64 0000000051609E73

00000000 00000000 00000000 00000000 261137E1 036D61DC 036D6299 036D6224

TOPICS: AIX, SECURITY, SYSTEM ADMINISTRATION

Fix user accounts

fixit() { user=${1} unset myid myid=`lsuser ${user} 2>/dev/null`

echo " Done."

if [ ! -z "${1}" ] ; then user=${1} fi

# If a username is provided, fix that user account

unset myid myid=`id ${user} 2>/dev/null` if [ ! -z "${myid}" ] ; then

TOPICS: AIX, SYSTEM ADMINISTRATION

How to read the /var/adm/ras/diag log file

The first option uses the diag tool. Run:

The second option is to use diagrpt. Run:

To display only the last entry, run:

How to make a system backup of a VIOS

Using mkvgdata and restvg in DR situations

# lsvg -o | xargs -i mkvgdata {} # tar -cvf /sysadm/vgdata.tar /tmp/vgdata

To recreate the volume groups, logical volumes and file systems:

TOPICS: AIX, SECURITY, SYSTEM ADMINISTRATION

H=`uname -n` mywhoami=`whoami` if [ ${mywhoami} = "root" ] ; then

PS1='${USER}@(${H}) ${PWD##/*/} # ' else PS1='${USER}@(${H}) ${PWD##/*/} $ ' fi fi

Put this in /etc/environment, to turn on time stamped history files:

Use machstat to identify power or cooling issues

LVM command history

To filter out only the actual commands:

TOPICS: AIX, SYSTEM ADMINISTRATION

Suspending and resuming a process

After pressing CTRL-Z, you'll see:

Then type "fg" to resume the process:

root 14680240 10092788

Then suspend the process with signal 17:

To resume it again, send signal 19:

TOPICS: AIX, SYSTEM ADMINISTRATION

RANDOM Korn Shell built-in

TOPICS: AIX, SYSTEM ADMINISTRATION

Number of active virtual processors

ACTIVE ACTIVE ACTIVE ACTIVE DISABLED DISABLED DISABLED DISABLED

AWAKE AWAKE AWAKE AWAKE AWAKE SLEEPING SLEEPING SLEEPING

0000000000000000 0000000000000000 0000000000000000 0000000000000000 00000000503536C7 0000000051609EAF 0000000051609E64 0000000051609E73

00000000 00000000 00000000 00000000 261137E1 036D61DC 036D6299 036D6224

TOPICS: AIX, SECURITY, SYSTEM ADMINISTRATION

Fix user accounts

PS1='${USER}@(${H}) ${PWD##//} # ' else PS1='${USER}@(${H}) ${PWD##//} $ ' fi fi