Documente Academic
Documente Profesional
Documente Cultură
cover
Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business
Machines Corp., registered in many jurisdictions worldwide.
The following are trademarks of International Business Machines Corporation, registered in many
jurisdictions worldwide:
AIX 5L™ AIX 6™ AIX®
DS8000® FlashCopy® HACMP™
Initiate® PartnerWorld® POWER Hypervisor™
Power Systems™ Power® PowerVM®
POWER6® POWER7® Redbooks®
RS/6000® Tivoli®
Windows is a trademark of Microsoft Corporation in the United States, other countries, or both.
UNIX is a registered trademark of The Open Group in the United States and other countries.
Other product and service names might be trademarks of IBM or other companies.
TOC Contents
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Agenda . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
TMK
Trademarks
The reader should recognize that the following terms, which appear in the content of this training
document, are official trademarks of IBM or other companies:
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business
Machines Corp., registered in many jurisdictions worldwide.
The following are trademarks of International Business Machines Corporation, registered in many
jurisdictions worldwide:
AIX 5L™ AIX 6™ AIX®
DS8000® FlashCopy® HACMP™
Initiate® PartnerWorld® POWER Hypervisor™
Power Systems™ Power® PowerVM®
POWER6® POWER7® Redbooks®
RS/6000® Tivoli®
Windows is a trademark of Microsoft Corporation in the United States, other countries, or both.
UNIX is a registered trademark of The Open Group in the United States and other countries.
Other product and service names might be trademarks of IBM or other companies.
pref
Course description
Power Systems for AIX III: Advanced Administration and Problem
Determination
Duration: 5 days
Purpose
This course provides advanced AIX system administrator skills with a focus
on availability and problem determination. It provides detailed knowledge of
the ODM database where AIX maintains so much configuration information.
It shows how to monitor for and deal with AIX problems. There is special
focus on dealing with Logical Volume Manager problems, including
procedures for replacing disks. Several techniques for minimizing the system
maintenance window are covered. While the course includes some AIX 7.1
enhancements, most of the material is applicable to prior releases of AIX.
Audience
This course is an advanced course for AIX system administrators, system
support, and contract support individuals with at least six months of
experience in AIX.
Prerequisites
You should have basic AIX System Administration skills. These skills include:
• Use of the Hardware Management Console (HMC) to activate a logical
partition to run AIX and to access the AIX system console
• Install an AIX operating system from an already configured NIM server
• Implementation of AIX backup and recovery
• Manage additional software and base operating system updates
• Familiarity with management tools such as SMIT
• Understand how to manage file systems, logical volumes, and volume
groups
• Mastery of the UNIX user interface, which include use of the vi editor,
command execution, input and output redirection, and the use of utilities
such as grep
These skills can be developed through experience or by formal training. The
recommended training course to obtain these prerequisite skills is:
• Power Systems for AIX II: AIX Implementation and Administration AN12
or AX12 and their prerequisites
If the student has AIX system administration skills, but is not familiar with the
LPAR environment, those skills can be obtained by attending the following
course:
• AN11 or AX11 Power Systems Administration I: LPAR Configuration
Objectives
On completion of this course, students should be able to:
• Perform system problem determination and reporting procedures that
include analyzing error logs, creating memory dumps of the system, and
providing needed data to the AIX Support personnel
• Examine and manipulate Object Data Manager databases
• Identify and resolve conflicts between the Logical Volume Manager
(LVM) disk structures and the Object Data Manager (ODM)
• Complete a basic configuration of Network Installation Manager to
provide network boot support for either system installation or booting to
maintenance mode
• Identify various types of boot and disk failures and perform the matching
recovery procedures
• Implement advanced methods such as alternate disk installation,
multibos, and JFS2 snapshots to use a smaller maintenance window
Contents
• Advanced AIX administration overview
• The Object Data Manager
• Error monitoring
• Network Installation Manager basics
• System initialization: Accessing a boot image
• System initialization: rc.boot and inittab
• LVM metadata and related problems
• Disk management procedures
• Install and cloning techniques
• Advanced backup techniques
• Diagnostics
• The AIX system dump facility
pref
Agenda
Day 1
Welcome
Unit 1: Advanced AIX administration overview
Exercise 1: Problem diagnostic information
Unit 2: The Object Data Manager
Exercise 2: The Object Data Manager
(optional) Exercise 2: Object Data Manager, Part 3
Unit 3: Error monitoring
Day 2
Exercise 3: Error monitoring
Unit 4: Network Installation Manager basics
Exercise 4: Basic Network Installation Manager configuration
Unit 5: System initialization: Accessing a boot image
Exercise 5: System initialization: Accessing a boot image
Day 3
Unit 6: System initialization: rc.boot and inittab
Exercise 6: System initialization: rc.boot and inittab
Unit 7: LVM metadata and related problems
Exercise 7: LVM metadata and related problems
(optional) Exercise 7: LVM metadata and related problems, Part 6
Unit 8: Disk management procedures, Topic 1
Exercise 8: Disk management procedures, Part 1
Day 4
Unit 8: Disk management procedures, Topic 2
Exercise 8: Disk management procedures, Parts 2 and 3
Unit 9: Install and cloning techniques, Topic 1
Exercise 9: Install and cloning techniques, Part 1
Unit 9: Install and cloning techniques, Topic 2
Exercise 9: Install and cloning techniques, Part 2
Unit 10: Advanced backup techniques, Topic 1
Exercise 10: Advanced backup techniques, Part 1
(Optional) Exercise 10: Advanced backup techniques, Part 2
Unit 10: Advanced backup techniques, Topic 2
Exercise 10: Advanced backup techniques, Parts 3 and 4
Day 5
Unit 10: Advanced backup techniques, Topic 3
Unit 11: Diagnostics
Exercise 11: Diagnostics
Unit 12: The AIX system dump facility
Exercise 12: The AIX system dump facility
Wrap up / Evaluations
Uempty
Unit 1. Advanced AIX administration overview
References
SG24-7910 IBM AIX Version 7.1 Differences Guide (Redbooks)
SG24-7559 IBM AIX Version 6.1 Differences Guide (Redbooks)
SG24-5496 Problem Solving and Troubleshooting in AIX 5L (Redbooks)
© Copyright IBM Corp. 2009, 2015 Unit 1. Advanced AIX administration overview 1-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Unit objectives
IBM Power Systems
Notes:
Uempty
Application outages
IBM Power Systems
• Functional or performance
• Avoid unplanned outages with best practices
– Change control
– Data security
– Capacity planning
– High availability design
• Avoid planned outages
– Fail over to a backup server
– Relocate application (LPAR or WPAR mobility)
• Use maintenance windows
– Application that is stopped versus slow activity
– Plan enough time for back-out or recovery
– Minimize time that is needed
• Effective problem determination and recovery
© Copyright IBM Corporation 2009, 2015
Notes:
Introduction
Providing system availability is a major responsibility of any system administrator. An outage
can be from a functional problem (such as an application or system crash) or a server
performance problem (business is seriously impacted due to poor response times or late jobs).
There are many approaches to dealing with this issue.
Unplanned outages
When most of you think of availability, you think of unplanned outages. Regular hardware and
software maintenance can often avoid these outages. Designing the computing facility to have
redundant components (power, network adapters, network switches, storage, and more) can
make the overall system resilient to the failure of individual components. Performance problems
are often the result of failing to do proper capacity planning, resulting in not enough resources
(memory, processors, network bandwidth, or disk I/O bandwidth) to handle the increased
workload. If there is no change control to manage what work is placed on a system, capacity
planning is even more challenging. Furthermore, uncontrolled changes to a system result in
© Copyright IBM Corp. 2009, 2015 Unit 1. Advanced AIX administration overview 1-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
uncontrolled exposure to possible outages created by those changes, and thus unplanned
outages. Computer viruses and other malicious attacks by computer hackers can also reduce
system availability (in addition to the exposure of losing proprietary information). Good data
security policies are essential.
Even when implementing good policies in these areas, some unplanned outages might still
happen. In these situations, the system administrator needs to have a plan for minimizing the
impact and recovering as quickly as possible. One common approach is to have another
system that can take over the work of the failed system. High Availability Cluster
Multi-Processing (HACMP) provides a system for either concurrent processing by multiple
systems, or an automated fallover to a backup system, thus minimizing the impact of a server
failure. Such server redundancy can be designed to work within a single facility or be divided
between different geographical locations. Obviously, rapid notification of a problem, effective
and prompt diagnosis of the cause, and being able to quickly implement an effective solution all
contribute to a shorter mean time to recovery.
Planned outages
By using change control, the risk that is associated with certain categories of potential
unplanned outages can be managed. The impact of any unexpected problem (resulting from the
change) can be minimized by implementing the changes during planned windows of time. In
addition, there are certain types of changes for which an outage is unavoidable.
Some facilities implement multiple types of maintenance windows. One type would be frequent
short maintenance windows for any administrative work that competes with applications for
resources (performance impact) or have a small chance of having a functional disruption.
Another type would be a less frequent window in which any reboot of the system or any major
change to the level of the operating system or major subsystems, such as database software,
would be allowed.
Sometimes, the amount of time in a maintenance window is relatively small and the work must
be carefully planned. You also need to allow time to recover if any thing goes wrong due to the
maintenance. Any needed resources that can be pre-staged helps expedite the work. Any
approach that can speed recovery after a problem occurs is also useful.
For systems that need to be up 24 hours a day, seven days a week, and every day in the year
(24x7x365), even a short outage cannot be tolerated. In those situations, a method to
non-disruptively move the applications to another system can be invaluable. If an HACMP
cluster solution is already in place to handle unplanned outages, then that can be used to
manually fallover the services to another system while maintenance is being done. Other
solutions are to use Live Partition Mobility or Live Application Mobility.
Uempty
• System backups
– Minimizing rootvg size
– Snapshot techniques for user file systems
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 1. Advanced AIX administration overview 1-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
There are two advantages. First, no change is being made to the active rootvg. The update can
be done at any time. Then, when a major maintenance window arrives, the system just needs to
be rebooted to make the update take effect. The second advantage is the ease of recovery. If
there are problems with the new level of code, you need to reboot back to the earlier code level
rather than recover from a mksysb or reject the entire update. The down side is the system
might need to be rebooted to make the update take effect.
Two techniques that can be used. One technique that is called multibos, creates an alternate
set of logical volumes that are copies of the rootvg BOS logical volumes. The other technique,
creates an alternate volume group that is a clone of the rootvg. In each case, you would apply
the maintenance to the copy and then later reboot to make it effective.
System backups
Another common maintenance activity is backing up the system. You need to quiesce the
application activity long enough to be sure that there are no inconsistencies in the backup,
unless you have an application that uses fuzzy backups. The term fuzzy backup refers to a
backup in which the application was making changes during the backup. For a specific
transaction, multiple data changes are made. Some of these transaction-related changes are
made before that data was backed up, while other changes were made after that data was
backed up. Thus the backup has one piece of data that reflects the transaction and another
piece of data that does not reflect the transaction. The two pieces of data are inconsistent and
such a backup is referred to as fuzzy.
For the rootvg itself, the size of the rootvg should be minimized. It should contain what is
needed for the OS. All user data and other non-essential files should be backed up and restored
separately. An example would be the standard location of a software repository:
/usr/sys/inst.images. The software repository can be large and yet this common path is
in the /usr file system, which is in the rootvg. Placing the software repository in a separate file
system with its own recovery plan can help reduce backup and recovery time. Another common
example is the /home file system. If users have large amounts of data that is stored in /home,
then over mounting with a separate file system can speed up working with the rootvg. There are
other file systems such as /tmp that might have contents be eliminated from the system
backup. The trick is that these files would need to be excluded (not mounted or identified in
/etc/exclude.rootvg) from the backup during the mksysb execution. Then, separately
recovered from their own backup. Other user data is in separate user volume groups.
With the emphasis on separate backups for non-BOS data, there comes a need to minimize
how long the applications need to be quiesced and still have data consistency. One technique
that AIX provides is JFS2 snapshots, which can briefly quiesce the application and still have a
consistent picture of the data at a single point in time. Then, you can either use that snapshot of
the data as its own backup, or base an actual backup upon that snapshot to have off-site
storage of the backup). There are other facilities for doing snapshot captures of data. Some are
part of the storage subsystems and some are part of total storage solutions such as Tivoli
Storage Manager. Your focus is on the facility that is provided with AIX, JFS2 snapshot.
Uempty
• If an AIX bug:
– Collect problem information
– Open problem report with AIX Support
– Provide snap with information
Notes:
System maintenance
Sometimes code works well under normal testing or production circumstances, but can have
some poor logic that is discovered when faced with an unanticipated situation. Alternatively, it
might be some non-central aspect of the code that is not noticed normally. However, if the
number of facilities that use this code is large enough, then it is probable that one of the facilities
will detect and report the problem soon after release of the new code level. The fix for the code
defect usually comes out in the next released fix pack. Many facilities might not be effected or
© Copyright IBM Corp. 2009, 2015 Unit 1. Advanced AIX administration overview 1-7
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
concerned about the code defect problem for months until the circumstances arise in which it
represents a problem. By installing newer service packs, a facility can benefit from the
experience of others and avoid known problems.
It is possible that a new fix pack introduces new problems, while solving many old problems.
This course covers some techniques to use in applying fix packs.
Problem determination
If you find yourself impacted by what you believe to be a product defect, you need to obtain
prompt resolution. There is no substitute for the experience of being able to recognize a
situation and remember the details of how you dealt with it the last time a similar problem
occurred. However, many problems are most effectively solved by following a developed
problem determination methodology. This course covers a basic problem determination
methodology.
Problem reporting
When you find yourself impacted by what you believe to be a product defect, you need to
contact AIX Support. Before contacting AIX Support, you should write up a description of the
problem and the surrounding circumstances. When you open a new Problem Management
Report (PMR) with AIX Support, you are expected to provide them with information to assist
them in determining the cause of the problem. The snap command is a common tool to help
collect a large amount of information about the environment. The course materials cover
procedures to report problems.
Uempty
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 1. Advanced AIX administration overview 1-9
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
- Volume groups (names, just a bunch of disks (JBOD) or redundant array of independent
disks (RAID)
- Logical volumes (mirrored or not, which volume group, type)
- File systems (which volume group, what applications)
- Memory (size) and paging spaces (how many, location)
Uempty
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 1. Advanced AIX administration overview 1-11
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
1. Identify the
problem
2. Talk to users
to define the
problem
3. Collect system
data
4. Resolve
the problem
Notes:
Suggested questions
- What is the problem?
- What is the system doing (or not doing)?
- How did you first notice the problem?
- When did it happen?
- Have any changes been made recently?
Keep them talking until the picture is clear. Ask many questions to be able to get the entire
history of the problem.
SMIT logs
If SMIT was used, extra logs might provide further information. The SMIT log files are normally
contained in the home directory of the root user. One is named smit.log, by default.
© Copyright IBM Corp. 2009, 2015 Unit 1. Advanced AIX administration overview 1-13
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Uempty
• Progress codes
– Checkpoint during a process such as boot, shutdown, or dump
• Obtained from:
– Front panel of system enclosure
– HMC or IVM (for logically partitioned systems)
– Operator console message or diagnostics (diag utility)
Notes:
Introduction
AIX provides progress and error indicators (display codes) during the boot process. These
display codes can be useful in resolving startup problems. Depending on the hardware platform,
the codes are displayed on the console and the operator panel.
Operator panel
For non-LPAR systems, the operator panel is an LED display on the front panel. Beginning with
the early POWER4 models, the Power Systems can be divided into multiple Logical Partitions
(LPARs). In this case, a system-wide LED display still exists on the front panel. However, the
operator panel for each LPAR is displayed on the screen of the Hardware Management Console
(HMC). The HMC is a separate system that is required when running multiple LPARs.
Regardless of where they are displayed, they are sometimes referred to as LED Display Codes.
© Copyright IBM Corp. 2009, 2015 Unit 1. Advanced AIX administration overview 1-15
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Uempty
http://ibm.com/support/knowledgecenter
Notes:
Documentation
Note
All information on websites and their design is based on what is available at the time of this
course revision. Website URLs and the design of the related web pages often change.
© Copyright IBM Corp. 2009, 2015 Unit 1. Advanced AIX administration overview 1-17
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
If you believe that your problem is the result of a system defect, you can call AIX Support to
request assistance. Before you call 1-800-IBM-SERV, it is a good idea to have certain
information ready. They want to verify your name against a list of names that are associated
with your customer number, and validate that your customer number has support for the product
in question. They also need to know some details about the hardware and software
environment in which the problem is occurring. The details might include your MTMS (machine
type, model, serial), your AIX OS level, and the level of any other relevant software. You need to
explain your problem, providing as much detail as possible, especially any error messages or
codes.
The Level 1 Support personnel need to identify the priority of your problem.
- Severity level 1(critical) indicates that the function does not work, your business is severely
impacted, there is no work-around, and that there needs to be an immediate solution. For
severity level 1, you are expected to be available 24x7 until the problem is resolved.
- Severity level 2 (significant impact) indicates that the function is usable but is limited in a
way that your business is severely impacted.
Uempty - Severity level 3 (some impact) indicates that the program is usable with less significant
features (not critical to operations) unavailable.
- Severity level 4 (minimal impact) indicates that the problem causes little impact on
operations, or a reasonable circumvention to the problem was implemented.
Level 1 Support assigns you a PMR number (a PMR and branch number combination) for
tracking purposes. In the future, each time you call about this problem, you should have the
PMR and branch numbers at hand.
When the basic information is collected, you are passed to Level 2 Support for the product area
for which you are having a problem. They work with you in investigating the nature and cause of
your problem. They search the support database to see whether it is a known problem that is
either already being worked on or has a solution that is already developed. In many cases, they
request that you update to a specific technology level (TL) and service pack (SP) that already
includes the fix.
If they do not have a fix, you might be asked to update your system and determine whether the
problem still exists. If the problem still exists, they now have a known software environment to
work with. They often ask for a complete set of information from your system to be collected and
uploaded to their server to support their investigation. The basic tool for collecting your system
information is the snap command.
© Copyright IBM Corp. 2009, 2015 Unit 1. Advanced AIX administration overview 1-19
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
# snap –a
Notes:
Uempty Next, you should place any additional testcase data that you feel might be helpful in resolving
the problem into either the /tmp/ibmsupt/other or /tmp/ibmsupt/testcase directory.
This additional information is then included (together with the information gathered directly by
snap) into the compressed pax file that is created in the next step in this command sequence.
As shown, the -c flag of the snap command should then be used to create a compressed pax
file that contains all files that are contained in the /tmp/ibmsupt directory. The output file is
/tmp/ibmsupt/snap.pax.Z.
Next, the /tmp/ibmsupt/snap.pax.Z output file should be renamed by using the mv
command to indicate the PMR number, branch number, and country number that is associated
with the data in the file. For example, if the PMR number is 12345, the branch number is 567,
and the country number is 890, the file should be renamed 12345.b567.c890.snap.pax.Z.
(The country code for the United States is: 000).
© Copyright IBM Corp. 2009, 2015 Unit 1. Advanced AIX administration overview 1-21
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
# ftp testcase.software.ibm.com
User: anonymous
Password: <your email address>
ftp> cd /toibm/aix
ftp> bin
ftp> put PMR#.b<branch#>.c<country#>.snap.pax.Z
ftp> quit
Notes:
Uempty
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 1. Advanced AIX administration overview 1-23
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Fix bundles
It is useful to collect many accumulated PTFs together and test them together. Then, they can
be used as a base line for a new cycle of enhancements and corrections. By testing them
together, it is often possible to find unexpected interactions between them.
There are two types of AIX fix bundles.
- One type of fix bundle is a Technology Level (TL) update (formally known as Maintenance
Level or ML). A TL is a major fix bundle that not only includes many fixes for code problems,
but also includes minor functional enhancements. You can identify the current AIX
technology level by running the oslevel -r command.
- Another type of bundling is a Service Pack (SP). A Service Pack is released more frequently
than a Technology Level (between TL releases) and usually contains only the needed fixes.
You can identify the current AIX technology level and service pack by running the
oslevel -s command.
For the oslevel command to reflect a new TL or SP, all related filesets fixes must be installed.
If a single fileset update in the fix bundle is not installed, the TL or SP level is not changed.
Interim fixes
On rare occasions, a customer has an urgent situation that needs fixes for a problem so quickly
that they cannot wait for the formal PTF to be released. In those situations, a developer might
place one or more individual file replacements on an FTP server and allow the system
administrator to download and install them. Originally, it would involve manually copying the
new files over the old files. But this created problems, especially in identifying the state of a
system that later experienced other (possibly related) problems or in backing out the changes.
Today, there is a better methodology for managing these interim fixes that use the efix
command. Security alerts often provide interim fixes for the identified security exposure.
Depending upon your own risk analysis, you might immediately use the interim fix, or wait for
the next service pack (which will include these security fixes).
The syntax and use of the efix command was covered in the prerequisite course.
Uempty
Relevant documentation
IBM Power Systems
Notes:
IBM Redbooks
Redbooks can be viewed, downloaded, or ordered from the IBM Redbooks website:
http://www.redbooks.ibm.com
© Copyright IBM Corp. 2009, 2015 Unit 1. Advanced AIX administration overview 1-25
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Checkpoint
IBM Power Systems
Notes:
Uempty
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 1. Advanced AIX administration overview 1-27
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Unit summary
IBM Power Systems
Notes:
Uempty
Unit 2. The Object Data Manager
References
Online AIX Version 7.1 Command Reference volumes 1-6
Online AIX Version 7.1 General Programming Concepts: Writing
and Debugging Programs
Online AIX Version 7.1 Technical Reference: Kernel and
Subsystems
Note: References listed as online are available through the IBM Knowledge
Center at the following address: http://ibm.com/support/knowledgecenter.
© Copyright IBM Corp. 2009, 2015 Unit 2. The Object Data Manager 2-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Unit objectives
IBM Power Systems
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 2. The Object Data Manager 2-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
Uempty
Devices Software
System
SMIT menus
Resource ODM and panels
Controller
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 2. The Object Data Manager 2-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
ODM components
IBM Power Systems
Notes:
Uempty
Notes:
Current focus
This unit concentrates on ODM classes that are used to store device information and software
product data. This section focuses on ODM classes that store device information.
© Copyright IBM Corp. 2009, 2015 Unit 2. The Object Data Manager 2-7
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Uempty
Predefined databases
PdDv
PdCn PdAt
Configuration manager
Config_Rules
(cfgmgr)
Customized databases
CuDvDr CuVPD
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 2. The Object Data Manager 2-9
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Configuration manager
IBM Power Systems
Config_Rules
cfgmgr
Customized Methods
CuDv Define
Device Load
CuAt Configure
Driver
CuDep Change
CuDvDr Unload Unconfigure
CuVPD Undefine
Notes:
Uempty
Notes:
Introduction
Originally, the three parts of the ODM were designed to support diskless, dataless, and other
workstations. The ODM object classes are held in three repositories. Each of these repositories
is described in the material that follows.
/etc/objrepos
The purpose of this location is to hold information that is expected to vary from machine to
machine. It contains the part of the product that cannot be shared among machines. Each client
must have its own copy. Most of this software requires a separate copy of the product for each
machine that is associated with the configuration of the machine or product.
One example is the customized device information. For example, the location of a device or the
overrides to the default attributes can be expected to vary.
This repository contains the customized devices object classes and the four object classes that
are used by the Software Vital Product Database (SWVPD) for the / (root) part of the installable
© Copyright IBM Corp. 2009, 2015 Unit 2. The Object Data Manager 2-11
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
software product. The root part of the software contains files that must be installed on the target
system. For example, any configuration files that are used by the programs would be in the root
part.
To access information in the other directories, this directory contains symbolic links to the
predefined devices object classes. The links are needed because the ODMDIR variable points to
only /etc/objrepos.
/usr/lib/objrepos
This repository contains the predefined devices object classes, SMIT menu object classes, and
the four object classes that are used by the SWVPD for the /usr part of the installable software
product. The object classes in this repository can be shared across the network by /usr clients,
dataless and diskless workstations. Software that is installed in the /usr part can be shared
among several machines with compatible hardware architectures.
/usr/share/lib/objrepos
Contains the four object classes that are used by the SWVPD for the /usr/share part of the
installable software product. The /usr/share part of a software product contains files that are not
hardware-dependent. They can be shared among several machines, even if the machines have
a different hardware architecture. An example is terminfo files that describe terminal
capabilities. As terminfo is used on many UNIX systems, terminfo files are part of the
/usr/share part of a system product.
lslpp options
The lslpp command can list the software that is recorded in the ODM. When run with the -l
(lowercase L) flag, it lists each of the locations (/, /usr/lib, /usr/share/lib) where it finds the fileset.
If you are not concerned with these distinctions, it can be distracting. Alternately, you can run
lslpp -L that reports each fileset one time, without distinguishing between the root, usr, and
share portions.
Uempty
# cfgmgr
PdDv: CuDv:
type = "14106902" name = "ent1"
class = "adapter" status = 1
subclass = "pci" chgstatus = 2
prefix = "ent" ddins = "pci/goentdd"
... location = "02-08"
DvDr = "pci/goentdd" parent = "pci2"
Define = /usr/lib/methods/define_rspc" connwhere = "8“
Configure = "/usr/lib/methods/cfggoent" PdDvLn = "adapter/pci/14106902"
...
uniquetype = "adapter/pci/14106902"
PdAt: CuAt:
uniquetype = "adapter/pci/14106902" name = "ent1"
attribute = "jumbo_frames" attribute = "jumbo_frames"
deflt = "no" value = "yes"
values = "yes,no" type = "R"
... ...
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 2. The Object Data Manager 2-13
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
File system
information ?
User/security
information ?
Queues and
queue devices ?
Notes:
Uempty
1.
_______
2. 3.
AIX kernel Applications
Figure 2-11. Let's review: Device configuration and the ODM AN153.0
Notes:
Instructions
Answer the following questions by writing them on the picture in the visual. If you are unsure
about a question, leave it out.
1) Which command configures devices in an AIX system? Note: It is not an ODM
command.
2) Which ODM class contains all devices that your system supports?
3) Which ODM class contains all devices that are configured in your system?
4) Which programs are loaded into the AIX kernel to control access to the devices?
5) If you have a configured tape drive rmt1, which special file do applications access to
work with this device?
© Copyright IBM Corp. 2009, 2015 Unit 2. The Object Data Manager 2-15
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
ODM commands
IBM Power Systems
Descriptors: odmshow
Notes:
Introduction
Different commands are available for working with each of the ODM components: object
classes, descriptors, and objects.
© Copyright IBM Corp. 2009, 2015 Unit 2. The Object Data Manager 2-17
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
# vi file
PdAt:
uniquetype = "tape/scsi/scsd"
attribute = "block_size"
deflt = “512" Change deflt to 512
values = "0-2147483648,1"
width = ""
type = "R"
generic = "DU"
rep = "nr"
nls_index = 6
# odmadd file
Figure 2-13. Changing attribute values with odmadd and odmdelete AN153.0
Notes:
Possible queries
As with any database, you can create queries for records that match certain criteria. The tests
are on the values of the descriptors of the objects. A number of tests can be done:
= Equal
!= Not equal
> Greater
>= Greater than or equal to
< Less than
<= Less than or equal to
like Similar to; finds patterns in character string data
For example to search for records where the value of the lpp_name attribute begins with
bosext1., you would use the syntax lpp_name like bosext1.*
Tests can be linked together by using normal Boolean operations, as shown in the following
example:
uniquetype=tape/scsi/scsd and attribute=block_size
In addition to the * wildcard, a ? can be used as a wildcard character.
© Copyright IBM Corp. 2009, 2015 Unit 2. The Object Data Manager 2-19
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
• The odmchange command modifies all objects that satisfy the search criteria.
1. Create a file with the object to change.
2. Edit the file and change the attribute value.
3. Use odmchange to delete the existing object and add the new object.
• Syntax:
odmchange -o ObjectClass [ -q criteria] input_file
# vi file
PdAt:
uniquetype = "tape/scsi/scsd"
attribute = "block_size"
deflt = "512" Change deflt to 512
values = "0-2147483648,1"
width = ""
type = "R"
generic = "DU"
rep = "nr"
nls_index = 6
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 2. The Object Data Manager 2-21
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
product:
lpp_name = "bos.rte.printers"
comp_id = "5765-G6200" inventory:
update = 0 lpp_id = 38
cp_flag = 2359571 private = 0
fesn = "0000" file_type = 0
name = "bos" format = 1
state = 5 loc0 = "/etc/qconfig"
ver = 7 loc1 = ""
rel = 1 loc2 = ""
mod = 0 size = 0
fix = 0
checksum = 0
ptf = ""
...
media = 0
sceded_by = ""
fixinfo = ""
prereq = "*coreq bos.rte 7.1.0.0"
description = "" history:
supersedes = "" lpp_id = 38
event = 2
lpp: ver = 7
name = "bos.rte.printers" rel = 1
size = 0 mod = 0
state = 5 fix = 0
cp_flag = 2359571 ptf = ""
group = "" corr_svn = ""
magic_letter = "I" cp_mod = ""
ver = 7 cp_fix = ""
rel = 1 login_name = "root"
mod = 0 state = 1
fix = 0 time = 1310159341
description = "Front End Printer Support" comment = ""
lpp_id = 38
Notes:
SWVPD classes
The Software Vital Product Data is stored in the following ODM classes:
lpp The lpp object class contains information about the installed software
products, including the current software product state and
description.
inventory The inventory object class contains information about the files that
are associated with a software product.
product The product object class contains product information about the
installation and updates of software products and their prerequisites.
history The history object class contains historical information about the
installation and updates of software products.
© Copyright IBM Corp. 2009, 2015 Unit 2. The Object Data Manager 2-23
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Software states
IBM Power Systems
Notes:
Introduction
The AIX software vital product database uses software states that describe the status of an
installation or update package.
© Copyright IBM Corp. 2009, 2015 Unit 2. The Object Data Manager 2-25
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Predefined devices
IBM Power Systems
PdDv:
type = "scsd"
class = "tape"
subclass = "scsi"
prefix = "rmt"
...
base = 0
...
detectable = 1
...
led = 0
setno = 54
msgno = 0
catalog = "devices.cat"
DvDr = "tape"
Define = "/etc/methods/define"
Configure = "/etc/methods/cfgsctape"
Change = "/etc/methods/chggen"
Unconfigure = "/etc/methods/ucfgdevice"
Undefine = "etc/methods/undefine"
Start = ""
Stop = ""
...
uniquetype = "tape/scsi/scsd"
Notes:
type
Type specifies the product name or model number, for example, scsd.
class
Specifies the functional class name. A functional class is a group of device instances that share
a high-level function. For example, tape is a functional class name that represents all tape
devices.
Uempty subclass
Device classes are grouped into subclasses. The subclass scsi specifies all tape devices that
can be attached to a SCSI interface.
prefix
Prefix specifies the Assigned Prefix in the customized database, which is used to derive the
device instance name and /dev name. For example, rmt is the prefix name that is assigned to
tape devices. Names of tape devices would then look like rmt0, rmt1, or rmt2.
base
This descriptor specifies whether a device is a base device or not. A base device is any device
that forms part of a minimal base system. During system boot, a minimal base system is
configured to allow access to the root volume group (rootvg) and hence to the root file system.
This minimal base system can include, for example, a SCSI hard disk. The device that is shown
on the visual is not a base device.
This flag is also used by the bosboot and savebase commands, which are introduced later in
this course.
detectable
Detectable specifies whether the device instance is detectable or undetectable by cfgmgr
when it is powered on and attached to the system. A value of 1 means that the device is
detectable, and a value of 0 that it is not (for example, a printer or tty).
led
Led indicates the value that is displayed on the LEDs when the configure method runs. The
value that is stored is decimal, but the value that is shown on the LEDs is hexadecimal (2418 is
972 in hex).
setno, msgno
Each device has a specific description (for example, SCSI Tape Drive) that is shown when the
lsdev command is used to list the device attributes. The setno and msgno descriptors are
used to look up the description in a message catalog.
catalog
Catalog identifies the filename of the National Language Support (NLS) catalog. The LANG
variable on a system controls the catalog file to use to show a message. For example, if LANG is
set to en_US, the catalog file /usr/lib/nls/msg/en_US/devices.cat is used. If LANG is
de_DE, catalog /usr/lib/nls/msg/de_DE/devices.cat is used.
© Copyright IBM Corp. 2009, 2015 Unit 2. The Object Data Manager 2-27
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
DvDr
DvDr identifies the name of the device driver that is associated with the device (for example,
tape). Usually, device drivers are stored in directory /usr/lib/drivers. Device drivers are
loaded into the AIX kernel when a device is made available.
Define
Define names the define method that is associated with the device type. This program is called
when a device is brought into the defined state.
Configure
Configure names the configure method that is associated with the device type. This program is
called when a device is brought into the available state.
Change
Change names the change method that is associated with the device type. This program is
called when a device attribute is changed through the chdev command.
Unconfigure
Unconfigure names the unconfigure method that is associated with the device type. This
program is called when a device is unconfigured by rmdev -l.
Undefine
Undefine names the undefine method that is associated with the device type. This program is
called when a device is undefined by rmdev -l -d.
Start, stop
Few devices support a stopped state (only logical devices). A stopped state means that the
device driver is loaded, but no application can access the device. These two attributes name the
methods to start or stop a device.
uniquetype
uniquetype is a key that other object classes reference. Objects use this descriptor as a pointer
back to the device description in PdDv. The key is a concatenation of the class, subclass, and
type values.
Uempty
Predefined attributes
IBM Power Systems
PdAt:
uniquetype = "tape/scsi/scsd"
attribute = "block_size"
deflt = ""
values = "0-2147483648,1"
...
PdAt:
uniquetype = "disk/scsi/osdisk"
attribute = "pvid"
deflt = "none"
values = ""
...
PdAt:
uniquetype = "tty/rs232/tty"
attribute = "term"
deflt = "dumb"
values = ""
...
Notes:
uniquetype
This descriptor is used as a pointer back to the device defined in the PdDv object class.
© Copyright IBM Corp. 2009, 2015 Unit 2. The Object Data Manager 2-29
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
attribute
Attribute identifies the name of the attribute. This attribute is the name that can be passed to the
mkdev or chdev command. For example, to change the default name of dumb to ibm3151 for
tty0, you can run the following command:
# chdev -l tty0 -a term=ibm3151
deflt
deflt identifies the default value for an attribute. Nondefault values are stored in CuAt.
values
Values identifies the possible values that can be associated with the attribute name. For
example, allowed values for the block_size attribute range from 0 to 2147483648, with an
increment of 1.
Uempty
Customized devices
IBM Power Systems
CuDv:
name = "ent1"
status = 1
chgstatus = 2
ddins = "pci/goentdd"
location = "02-08"
parent = "pci2"
connwhere = "8"
PdDvLn = "adapter/pci/14106902"
CuDv:
name = "hdisk2"
status = 1
chgstatus = 2
ddins = "scdisk"
location = "01-08-01-8,0"
parent = "scsi1"
connwhere = "8,0"
PdDvLn = "disk/scsi/scsd"
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 2. The Object Data Manager 2-31
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
status
Status identifies the status of the device instance. Possible values are:
- status = 0 - Defined
- status = 1 - Available
- status = 2 - Stopped
chgstatus
This flag tells whether the device instance was altered since the last system boot. The
diagnostics facility uses this flag to validate system configuration. The flag can take these
values:
- chgstatus = 0 - New device
- chgstatus = 1 - Does not know
- chgstatus = 2 - Same
- chgstatus = 3 - Device is missing
ddins
This descriptor typically contains the same value as the Device Driver Name descriptor in the
predefined devices (PdDv) object class. It specifies the name of the device driver that is loaded
into the AIX kernel.
location
Identifies the AIX location of a device. The location code is a path from the system unit through
the adapter to the device. In a hardware problem, the location code is used by technical support
to identify a failing device.
parent
Identifies the logical name of the parent device. For example, the parent device of hdisk2 is
scsi1.
connwhere
Identifies the specific location on the parent device where the device is connected. For
example, the device hdisk2 uses the SCSI address 8,0.
PdDvLn
Provides a link to the device instance's predefined information through the uniquetype
descriptor in the PdDv object class.
Uempty
Customized attributes
IBM Power Systems
CuAt:
name = "ent1"
attribute = "jumbo_frames"
value = "yes"
...
CuAt:
name = "hdisk2"
attribute = "pvid"
value = "00c35ba0816eafe50000000000000000"
...
Notes:
Examples on visual
The sample CuAt entries on the visual show two attributes that have customized values. The
attribute jumbo_frames was changed to yes. The attribute pvid shows the physical volume
identifier that was assigned to disk hdisk0.
© Copyright IBM Corp. 2009, 2015 Unit 2. The Object Data Manager 2-33
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
PdCn: CuDvDr:
resource = "devno"
uniquetype = "adapter/pci/sym875"
value1 = "36"
connkey = "scsi"
value2 = "0"
connwhere = "1,0"
value3 = "hdisk3"
PdCn:
CuDvDr:
uniquetype = "adapter/pci/sym875"
resource = "devno"
connkey = "scsi"
value1 = "36"
connwhere = "2,0"
value2 = "1"
value3 = "hdisk2"
CuVPD:
CuDep: name = "hdisk2"
name = "rootvg" vpd_type = 0
dependency = "hd6" vpd = "*MFIBM *TM\n\
HUS151473VL3800 *F03N5280
CuDep: *RL53343341*SN009DAFDF*ECH17923D
name = "datavg" *P26K5531 *Z0\n\
dependency = "lv01" 000004029F00013A*ZVMPSS43A
*Z20068*Z307220"
Notes:
PdCn
The predefined connection (PdCn) object class contains connection information for adapters (or
sometimes called intermediate devices). This object class also includes predefined dependency
information. For each connection location, there are one or more objects that describe the
subclasses of devices that can be connected.
The sample PdCn objects on the visual indicate where the devices that belong to the SCSI
subclass can be attached.
CuDep
The customized dependency (CuDep) object class describes device instances that depend on
other device instances. This object class describes the dependence links between logical
devices and physical devices, and the dependence links between logical devices, exclusively.
Physical dependencies of one device on another device are recorded in the customized devices
(CuDep) object class.
Uempty The sample CuDep objects on the visual show the dependencies between logical volumes and
the volume groups they belong to.
CuDvDr
The customized device driver (CuDvDr) object class is used to create the entries in the /dev
directory. These special files are used from applications to access a device driver that is part of
the AIX kernel. The attribute value1 is called the major number and is a unique key for a
device driver. The attribute value2 specifies a certain operating mode of a device driver.
The sample CuDvDr objects on the visual reflect the device driver for disk drives hdisk2 and
hdisk3. The major number 36 specifies the driver in the kernel. In the example, the minor
numbers 0 and 1 specify two different instances of disk dives, both using the same device driver.
For other devices, the minor number can represent different modes in which the device can be
used. For example, looking at a tape drive, the operating mode 0 would specify a rewind on
close. The operating mode 1 would specify no rewind on close.
CuVPD
The customized vital product data (CuVPD) object class contains vital product data
(manufacturer of device, engineering level, part number, and so forth) that is useful for technical
support. When an error occurs with a specific device, the vital product data is shown in the error
log.
© Copyright IBM Corp. 2009, 2015 Unit 2. The Object Data Manager 2-35
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
Most of the time the information in the ODM device database is accessed and managed by
using high-level commands. Understanding the object classes and their roles helps when using
these commands.
The lsdev command has options that control which ODM object class you list.
To see the objects in the predefined device (PdDv) object class, use the -P flag. If you want to
control the output, you can optionally qualify the command with any combination of the three
key descriptors: class, subclass, and type.
To see objects in the customized device (CuDv) object class, use the -C flag. To control the
output, you can either specify a particular device (by using its logical device name) or you can
use any combination of the PdDv object class key descriptors.
Here is an example of specifying a particular device:
# lsdev -l hdisk0
Uempty The most common PdDv descriptor qualification is the class. Thus, it is common to enter
commands such as:
# lsdev -Cc disk
# lsdev -Cc adapter
The lsattr command, also, has options which control which ODM object classes it uses.
To see the default attribute values, which are stored in the predefined attributes (PdAt) object
class, use the -D flag. You must uniquely identify the object by either:
• Specifying the class, subclass, and type for the object
• Specifying the logical device name of a customized device that is related to the PdAt
object
The effective attributes are either the attributes in the Customized Attributes (CuAt) object class
for the specified device, or the default attribute value from the related PdAt object. The CuAt
object class has entries for attributes that are different from their default values in PdAt. You
must specify a particular device by providing the logical device name of that device.
When using the chdev command to modify an attribute value, the command logic does not let
you enter unacceptable values. It knows what is allowed by examining the value descriptor for
the attribute in the PdAt object class. If you get an exception message when you attempt to set
an attribute value, it is useful to know what is acceptable. The lsattr command displays this
information when using the -R (range) flag. The -R option requires that the attribute name is
specified in addition to the logical name of the device for which you are attempting modify that
attribute.
© Copyright IBM Corp. 2009, 2015 Unit 2. The Object Data Manager 2-37
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Checkpoint (1 of 2)
IBM Power Systems
1. True or False: The CuAt ODM object class contains an entry for
each attribute for each supported device.
2. True or False: The DvDr attribute in the PdDv ODM object class
identifies the program that is loaded into the kernel when the device
is made available.
Notes:
Uempty
Checkpoint (2 of 2)
IBM Power Systems
6. True or False: An available device has its device driver loaded into the
kernel and a device file created in /dev (if applicable).
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 2. The Object Data Manager 2-39
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
Uempty
Unit summary
IBM Power Systems
Notes:
The ODM is made from object classes, which are broken into individual objects and descriptors.
AIX offers a command-line interface to work with the ODM files.
The device information is held in the customized and the predefined databases (Cu*, Pd*).
© Copyright IBM Corp. 2009, 2015 Unit 2. The Object Data Manager 2-41
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Uempty
Unit 3. Error monitoring
References
Online AIX Version 7.1 General Programming Concepts: Writing
and Debugging Programs (Chapter 5. Error-Logging
Overview)
Online AIX Version 7.1 Command Reference volumes 1-6
Note: References listed as online are available through the IBM Knowledge
Center at the following address: http://ibm.com/support/knowledgecenter.
Unit objectives
IBM Power Systems
Notes:
smit
diagnostics
e-mail
console errpt formatted
output
error notify
method
ODM
errlog
errnotify /var/adm/ras/errlog
error daemon
errclear
errstop /usr/lib/errdemon
errlogger
application
errlog() User
Kernel
/dev/error
errsave() (timestamp)
kernel module
Notes:
Detection of an error
The error logging process begins when an operating system module detects an error. The
segment of code that detects errors then sends error information to either the errsave() kernel
service or the errlog() application subroutine, where the information is then written to the
/dev/error special file. This process then adds a time stamp to the collected data. The
errdemon daemon constantly checks the /dev/error file for new entries, and when new data
is written, the daemon conducts a series of operations.
Uempty When you use the errpt command (from the command-line or SMIT), the error log is formatted
according to the error template in the error record template and presented in a report. Most
entries in the error log are attributable to hardware and software problems, but informational
messages can also be logged, for example, by the system administrator, that uses the
errlogger command.
# smit errpt
Generate an Error Report
Notes:
Overview
The SMIT fastpath smit errpt takes you to the screen used to generate an error report. Any user
can use this screen. As shown on the visual, the screen includes a number of fields that can be
used for report specifications.
Type of report
Summary, intermediate, and detailed reports are available. Detailed reports give
comprehensive information. Intermediate reports display most of the error information.
Summary reports contain concise descriptions of errors.
Error types
Valid error types include:
- PEND: The loss of availability of a device or component is imminent.
- PERF: The performance of the device or component has degraded to below an acceptable
level.
- TEMP: Recovered from condition after several attempts.
- PERM: Unable to recover from error condition. Error types with this value are usually the
most severe errors and imply that you have a hardware or software defect. Error types other
than PERM usually do not indicate a defect, but they are recorded to analyze later by the
diagnostic programs.
- UNKN: Severity of the error cannot be determined.
- INFO: The error type is used to record informational entries
Error labels
An error label is the mnemonic name that is used for an error ID.
Error IDs
An error ID is a 32-bit hexadecimal code that is used to identify a particular failure.
Resource classes
Means device class for hardware errors (for example, disk).
Resource types
Indicates device type for hardware (for example, 355 MB).
Resource names
Provides common device name (for example hdisk0).
Uempty
• Summary report
– # errpt
• Intermediate report
– # errpt -A
• Detailed report
– # errpt -a
• Summary report of all hardware errors
– # errpt -d H
• Detailed report of all software errors
– # errpt -a -d S
• Concurrent error logging ("Real-time" error logging)
– # errpt -c > /dev/console
Notes:
The -d option
The -d option (flag) can be used to limit the report to a particular class of errors. Two examples
illustrating use of this flag are shown on the visual:
- The command errpt -d H specifies a summary report of all hardware (-d H) errors.
- The command errpt -a -d S specifies a detailed report (-a) of all software (-d S) errors.
The -c option
If you want to display the error entries concurrently, that is, at the time they are logged, you
must run errpt -c. In the example on the visual, direct the output to the system console.
The -D flag
Duplicate errors can be consolidated by using errpt -D. When used with the -a option,
errpt -D reports only the number of duplicate errors and the time stamp for the first and last
occurrence of the identical error.
The -P flag
Shows only errors that are duplicates of the previous error. The -P flag applies only to duplicate
errors generated by the error log device driver.
Additional information
The errpt command has many options. Refer to your AIX Commands Reference (or the man
page for errpt) for a complete description.
Uempty
# errpt
Notes:
LABEL: LVM_SA_PVMISS
IDENTIFIER: F7DDA124
Description
PHYSICAL VOLUME DECLARED MISSING
Probable Causes
POWER, DRIVE, ADAPTER, OR CABLE FAILURE
Detail Data
MAJOR/MINOR DEVICE NUMBER
8000 0011 0000 0001
SENSE DATA
00C3 5BA0 0000 4C00 0000 0115 7F54 BF78 00C3 5BA0 7FCF 6B93 0000 0000 0000 0000
Notes:
Description
PHYSICAL VOLUME DECLARED MISSING
Probable Causes
POWER, DRIVE, ADAPTER, OR CABLE FAILURE
Detail Data
MAJOR/MINOR DEVICE NUMBER
8000 0011 0000 0001
SENSE DATA
00C3 5BA0 0000 4C00 0000 0115 7F54 BF78 00C3 5BA0 7FCF 6B93 0000 0000 0000
0000
Uempty
Error
Error Label Recommendations
Type
DISK_ERR1 P Failure of physical volume media
Action: Replace device as soon as possible
DISK_ERR2, P Device does not respond
DISK_ERR3 Action: Check power supply
DISK_ERR4 T Error that is caused by bad block or occurrence
of a recovered error
Rule of thumb: If disk produces more than one
DISK_ERR4 per week, replace the disk
SCSI_ERR* P SCSI communication problem
(SCSI_ERR10) Action: Check cable, SCSI addresses,
terminator
Error types: P = Permanent
T = Temporary
© Copyright IBM Corporation 2009, 2015
Notes:
DISK_ERR5 errors
An infrequent error is DISK_ERR5. It is the catch-all (that is, the problem does not match any of
the other DISK_ERRx symptoms). You need to investigate further by running the diagnostic
programs that can detect and produce more information about the problem.
Uempty
Class
Error Label and Recommendations
Type
LVM_BBEPOOL, S,P No more bad block relocation
LVM_BBERELMAX, Action: Replace disk as soon as
LVM_HWFAIL possible
LVM_SA_STALEPP S,P Stale physical partition
Action: Check disk, synchronize data
(syncvg)
LVM_SA_QUORCLOSE H,P Quorum lost, volume group closing
Action: Check disk, consider working
without quorum
Notes:
LVM_SA_STALEPP
Stale physical partition
Action: Check disk, synchronize data (syncvg).
LVM_SA_QUORCLOSE
Quorum lost, volume group closing
Action: Check disk, consider working without quorum.
Uempty
# smit errdemon
Change / Show Characteristics of the Error Log
# smit errclear
Clean the Error Log
Notes:
Uempty
Notes:
ODM-Based:
/etc/objrepos/errnotify
Error notification
Notes:
Uempty
#!/usr/bin/ksh
while true
do
sleep 60 # Let's sleep one minute
done
Notes:
Example on visual
The procedure on the visual shows an easy but effective way of implementing error notification.
- The first errpt command generates a file /tmp/errlog.1.
- The construct while true implements an infinite loop that never ends.
- In the loop, the first action is to sleep 1 minute.
- The second errpt command generates a second file /tmp/errlog.2.
- The two files are compared by using the command cmp -s (silent compare that means no
output is reported). If the files are not different, it jumps back to the beginning of the loop
(continue), and the process sleeps again.
- If there is a difference, a new error entry is posted to the error log. In this case, the operator
is informed that a new entry is in the error log. Instead of print you might use the mail
command to inform another person.
Uempty
errnotify:
en_pid = 0
en_name = "sample"
en_persistenceflg = 1
en_label = ""
en_crcid = 0
en_class = "H"
en_type = "PERM"
en_alertflg = ""
en_resource = ""
en_rtype = ""
en_rclass = "disk"
en_method = "errpt -a -l $1 | mail -s DiskError root"
Notes:
Example on visual
The example on the visual shows an object that creates a mail message to root whenever a disk
error is posted to the log.
List of descriptors
Here is a list of all descriptors for the errnotify object class:
en_alertflg Identifies whether the error is alertable. This descriptor is provided for
use by alert agents with network management applications. The
values are TRUE (alertable) or FALSE (not alertable).
en_class Identifies the class of error log entries to match. Valid values are H
(hardware errors), S (software errors), O (operator messages), and U
(undetermined).
en_crcid Specifies the error identifier that is associated with a particular error.
en_dup Identifies whether the kernel identified the error as a duplicate. TRUE
indicates that it is a duplicate error.
en_err64 Identifies the environment of the error. TRUE indicates that the error is
from a 64-bit environment.
en_label Specifies the label that is associated with a particular error identifier as
defined in the output of errpt -t (show templates).
en_method Specifies a user-programmable action, such as a shell script or a
command string to be run when an error matching the selection criteria
of this Error Notification object is logged. The error notification daemon
uses the sh -c command to run the notify method.
The following keywords are passed to the method as arguments:
$1 Sequence number from the error log entry
$2 Error ID from the error log entry
$3 Class from the error log entry
$4 Type from the error log entry
$5 Alert flags from the error log entry
$6 Resource name from the error log entry
$7 Resource type from the error log entry
$8 Resource class from the error log entry
$9 Error label from the error log entry
en_name Uniquely identifies the object
en_persistenceflg Designates whether the Error Notification object should be removed
when the system is restarted. 0 means removed at boot time; 1 means
persists through boot.
en_pid Specifies a process ID for use in identifying the Error Notification
object. Objects that have a PID specified should have the
en_persistenceflg descriptor set to 0.
Uempty en_rclass Identifies the class of the failing resource. For hardware errors, the
resource class is the device class (see PdDv). Not used for software
errors.
en_resource Identifies the name of the failing resource. For hardware errors, the
resource name is the device name. Not used for software errors.
en_rtype Identifies the type of the failing resource. For hardware errors, the
resource type is the device type (see PdDv). Not used for software
errors.
en_symptom Enables notification of an error that accompanies a symptom string
when set to TRUE.
en_type Identifies the severity of error log entries to match. Valid values are:
INFO: Informational
PEND: Impending loss of availability
PERM: Permanent
PERF: Unacceptable performance degradation
TEMP: Temporary
UNKN: Unknown
TRUE: Matches alertable errors
FALSE: Matches non-alertable errors
0: Removes the Error Notification object at system restart
non-zero: Retains the Error Notification object at system restart
syslogd daemon
IBM Power Systems
/etc/syslog.conf:
daemon.debug /tmp/syslog.debug
/tmp/syslog.debug:
# stopsrc -s inetd
# startsrc -s inetd -a "-d" Provide debug
information
Notes:
Function of syslogd
The syslogd daemon logs system messages from different software components (kernel,
daemon processes, system applications).
/etc/syslog.conf:
All security messages to the
auth.debug /dev/console system console
Notes:
Examples on visual
The visual shows some examples of syslogd configuration entries that might be placed in
/etc/syslog.conf:
- The following line specifies that all security messages are directed to the system console:
auth.debug /dev/console
- The following line specifies that all mail messages are collected in the file
/tmp/mail.debug:
mail.debug /dev/mail.debug
- The following line specifies that all messages produced from daemon processes are
collected in the file /tmp/daemon.debug:
daemon.debug /tmp/daemon.debug
Uempty - The following line specifies that all messages, except messages from the mail subsystem,
are sent to the syslogd daemon on the host server:
*.debug; mail.none @server
If this example and the preceding example appear in the same /etc/syslog.conf file,
messages sent to /tmp/daemon.debug are also sent to the host server.
Facilities
Use the following system facility names in the selector field:
kern Kernel
user User level
mail Mail subsystem
daemon System daemons
auth Security or authorization
syslog syslogd messages
lpr Line-printer subsystem
news News subsystem
uucp uucp subsystem
* All facilities
Priority levels
Use the following levels in the selector field. Messages of the specified level and all levels
above it are sent as directed.
emerg Specifies emergency messages. These messages are not distributed to all users.
alert Specifies important messages such as serious hardware errors. These messages
are distributed to all users.
crit Specifies critical messages, not classified as errors, such as improper login
attempts. These messages are sent to the system console.
err Specifies messages that represent error conditions.
warning Specifies messages for abnormal, but recoverable conditions.
notice Specifies important informational messages.
info Specifies information messages that are useful in analyzing the system.
debug Specifies debugging messages. If you are interested in all messages of a certain
facility, use this level.
none Excludes the selected facility.
Uempty
/etc/syslog.conf:
# errpt
Notes:
errnotify:
en_name = "syslog1"
en_persistenceflg = l
en_method = "logger Error Log: `errpt -l $1 | grep -v TIMESTAMP`"
errnotify:
en_name = "syslog1"
en_persistenceflg = l
en_method = "logger Error Log: $(errpt -l $1 | grep -v TIMESTAMP)"
errnotify:
en_name = "syslog1"
en_persistenceflg = l
en_method = "errpt -l $1 | tail -1 | logger -t errpt -p
daemon.notice"
Notes:
Command substitution
You need to use command substitution (or pipes) before calling the logger command. The first
two examples on the visual illustrate the two ways to do command substitution in a Korn shell
environment:
- Using the ‘UNIX-command‘ syntax (with backquotes) - shown in the first example on the
visual
- Using the newer $(UNIX command) syntax - shown in the second example on the visual
Uempty
• System hangs:
– High priority process
– Other
• What does shdaemon do?
– Monitors the system's ability to run processes
– Takes specified action if threshold is crossed
• Actions:
– Logs error in the error log
– Displays a warning message on the console
– Launches recovery login on a console
– Launches a command
– Automatically reboots the system
Notes:
Actions
If lower priority processes are not being scheduled, shdaemon performs the specified action.
Each action can be individually enabled and has its own configurable priority and timeout
values. There are five actions available:
- Log error in the error log
- Display a warning message on a console
- Start a recovery login on a console
- Start a command
- Automatically reboot the system
Uempty
Configuring shdaemon
IBM Power Systems
# shconf -E -l prio
sh_pp disable Enable Process Priority Problem
Notes:
Introduction
shdaemon configuration information is stored as attributes in the SWservAt ODM object class.
Configuration changes take effect immediately and survive across reboots.
Use shconf (or smit shd) to configure or display the current configuration of shdaemon.
The values that are shown in the visual are the default values.
Enabling shdaemon
At least two parameters must be modified to enable shdaemon:
- Enable priority monitoring (sh_pp)
- Enable one or more actions (pp_errlog, pp_warning, and so forth)
Action attributes
Each action has its own attributes, which set the priority and timeout thresholds and define the
action to be taken. The timeout attribute unit of measure is in minutes.
Example
By changing the shconf attributes, you can enable, disable, and modify the behavior of the
facility. For example, shdaemon is enabled to monitor process priority (sh_pp=enable), and
the following actions are enabled:
- Enable shconf to monitor process priority monitoring:
# shconf -l prio -a sh_pp=enable
- Log error in the error logging:
# shconf -l prio -a pp_errlog=enable
Every 2 minutes (pp_eto=2), shdaemon checks to see whether any process ran with a
process priority number greater than 60 (pp_eprio=60). If not, shdaemon logs an error to
the error log.
- Display a warning message on a console:
# shconf -l prio -a pp_warning=enable (default value)
Every 2 minutes (pp_wto=2), shdaemon checks to see whether any process ran with a
process priority number greater than 60 (pp_wprio=60). If not, shdaemon sends a warning
message to the console specified by pp_wterm.
- Run a command:
# shconf -l prio -a pp_cmd=enable -a pp_cto=5
Every 5 minutes (pp_cto=5), shdaemon checks to see whether any process ran with a process
priority number greater than 60 (pp_cprio=60). If not, shdaemon runs the command that is
specified by pp_cpath (in this case, /home/unhang).
Uempty
Checkpoint (1 of 2)
IBM Power Systems
Notes:
Checkpoint (2 of 2)
IBM Power Systems
Notes:
Uempty
Notes:
Unit summary
IBM Power Systems
Notes:
Uempty
Unit 4. Network Installation Manager basics
References
Online AIX Version 7.1 Installation and migration
SG24-7296 NIM from A to Z in AIX 5L (Redbooks)
http://www.redbooks.ibm.com
Note: References listed as online are available through the IBM Knowledge
Center at the following address: http://ibm.com/support/knowledgecenter.
© Copyright IBM Corp. 2009, 2015 Unit 4. Network Installation Manager basics 4-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Unit objectives
IBM Power Systems
Notes:
Uempty
NIM overview
IBM Power Systems
at each system
PUSH installation: PULL installation:
• Distribute installation load Initiated by master Requested by
client
• Support for push or pull
installations
• NIM administrative tools
– Command line interface Client and
Client Client
– SMIT NIM server
Notes:
Purpose of NIM
NIM provides centralized AIX software administration for multiple machines over the network.
NIM supports full AIX operating system installation, installing or updating individual packages,
and doing software maintenance.
Advantages
NIM provides several advantages:
- Provides one central point for AIX software administration for all the NIM clients
- Eliminates the need to walk a CDROM/DVD or tape to each system and the need for a tape
drive or CDROM/DVD drive at every system
- Installations can be initiated from the master machine (push) or from the client (pull)
© Copyright IBM Corp. 2009, 2015 Unit 4. Network Installation Manager basics 4-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
- The installation load can be distributed. The NIM master machine can be configured as the
server for all the filesets to be installed. However, you can also configure one or more client
machines to act as servers to distribute the load if you have many clients.
Uempty
Machine roles
IBM Power Systems
• Master
– File sets:
• bos.sysmgt.nim.master
• bos.sysmgt.nim.client
• bos.sysmgt.nim.spot
• Stores NIM database
– NIM administration
– Can initiate push installations to NIM clients
– AIX version >= all other NIM machines
• Client
– File sets:
• bos.sysmgt.nim.client
– Can initiate pull installations from a server
• Server
– Any machine, master, or client
– Serves NIM resources to clients, thus requires adequate disk space and
throughput
Notes:
Three basic roles that a machine can have in the NIM environment is master, client, and
resource server. There can be only one master machine in a NIM environment. All other
machines are clients. Any machine, master, or client, can be a resource server.
NIM software
All machines in the NIM environment must install bos.sysmgt.nim.client. The master
machine must also install bos.sysmgt.nim.master and bos.sysmgt.nim.spot.
Master
The NIM master manages all other machines that participate in the NIM environment. The NIM
database is stored on the NIM master. The NIM master is fundamental for all of the operations
in the NIM environment and must be set up and operational before performing any NIM
operations. The master can initiate a software installation to a client, which is called a push
installation.
© Copyright IBM Corp. 2009, 2015 Unit 4. Network Installation Manager basics 4-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Also, the NIM master is the only machine that is given the permissions and ability to run NIM
operations on other machines within the NIM environment. The rsh or nimsh commands are
used to remotely run commands on clients that allow the NIM master to install to a number of
clients with one NIM operation.
The master requires the filesets of bos.sysmgt.nim.master, bos.sysmgt.nim.client,
and bos.sysmgt.nim.spot. It is also required to have its AIX operating system software at a
level that is equal to or higher than any of the clients that it is serving.
Client
All other machines in a NIM environment are clients. Clients can request a software installation
from a server machine (pull installation). The client requires the fileset of
bos.sysmgt.nim.client.
Server
The master can configure any machine, the master, or a client, as a server for a particular
software resource. Most often, the master is also the server. However, if your environment has
many nodes or consists of a complex network environment, you might want to configure some
nodes to act as servers to improve installation performance.
Servers must have adequate disk space for the resources they are providing. They also need
network connections to the client machines they serve and sufficient bandwidth to respond to
the expected volume.
Uempty
Boot image is on
1 Load boot image removable media
Using programs on
3 Configure devices removable media
Backup archive is
4 Install system files on removable
media
Figure 4-4. Boot process for AIX installation: Tape or CD/DVD AN153.0
Notes:
To understand how NIM works, you need to understand what happens when AIX is installed on
a system.
© Copyright IBM Corp. 2009, 2015 Unit 4. Network Installation Manager basics 4-7
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Start boot script and configure devices that are needed for installation
The kernel initializes and eventually runs the boot script (rc.boot), which configures devices
that are needed for the installation such as keyboards, displays, and disks.
Configuring devices
To keep the boot image small, not all of the software needed to configure devices is included in
the boot image. These additional files are contained in a small /usr directory tree that is called
a Shared Product Object Tree or SPOT. The boot script mounts the /usr directory tree on
/SPOT in the memory file system. The SPOT is mounted directly from the CDROM/DVD.
Note: Since tape devices do not support file system operations, the SPOT files are included in
the boot image in the case of booting from a tape drive.
Installation script
After the devices are configured, rc.boot starts the BOS installation program (bi_main), and
installs AIX from the installation images on the tape or CD/DVD.
Uempty
1
Boot image from
Load boot image
NIM server
Figure 4-5. Boot process for AIX installation with NIM (1 of 2) AN153.0
Notes:
Using NIM to boot over the network, is essentially the same as booting from CD or tape, except
that the boot file (SPOT file) and installation images come from the server system over the
network.
© Copyright IBM Corp. 2009, 2015 Unit 4. Network Installation Manager basics 4-9
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Using programs
3 Configure devices on NIM server
Backup archive
4 Install system files is on NIM server
Figure 4-6. Boot process for AIX installation with NIM (2 of 2) AN153.0
Notes:
Start the boot script and configure devices that are needed for
installation
When booting over the network, the SPOT is mounted from the NIM server with the Network
File System (NFS).
Uempty
NIM objects
IBM Power Systems
• Object classes
– Networks
– Machines
– Resources
Machines
• Group objects
– mac_group
– res_group
Notes:
NIM is made up of various components, called objects. There are three classes of objects:
machines, networks, and resources.
All information about the NIM environment is stored in Object Data Manager (ODM) databases
on the NIM master system.
Network objects
Network objects are objects in the NIM database that represent information about each local
area network (LAN) that is part of the NIM environment. These objects and some of their
attributes reflect the physical characteristics of the network. NIM network objects are not used
to perform management tasks in the overall network environment; they are only used to
represent the physical network topology of the NIM environment. In other words, if something
changes in the physical network environment, you must also remember to change it in the NIM
database.
The types of networks that are supported by NIM are: Token-Ring, Ethernet, ATM, FDDI, HFI,
and generic. These network types are represented as network objects in the NIM environment.
© Copyright IBM Corp. 2009, 2015 Unit 4. Network Installation Manager basics 4-11
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Machine objects
Machines in the NIM environment are managed by NIM.
Resource objects
All operations on clients in the NIM environment require one or more NIM resources. NIM
resource objects represent the files, directories, and devices that are used to support each type
of NIM operation. Some resources are AIX filesets (or devices that contain filesets) that can be
installed on a client machine. Other resources are scripts or configuration files that are used in
the installation process.
The location and other attributes for these resources are stored as resource objects in the NIM
database.
Group objects
NIM supports two types of group objects:
- mac_group
A machine group is a group of machine objects. You can use a machine group to simplify
performing a NIM operation on multiple machines.
- res_group
A resource group is a group of resource objects. If you have a set of resources that you
typically want to use at the same time, you can create a resource group to simplify allocating
those resources.
Uempty
# lsnim –l ent0
ent0:
class = networks
type = ent
Nstate = ready for use
prev_state = information is missing from this object's definition
net_addr = 10.31.192.0
snm = 255.255.240.0
routing1 = default 10.31.192.1
Notes:
The lsnim command is used to list various types of NIM information. You have the opportunity
to experiment with lsnim in the exercise.
© Copyright IBM Corp. 2009, 2015 Unit 4. Network Installation Manager basics 4-13
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
NIM configuration
IBM Power Systems
• Configure master
– Install master NIM file sets
– Run nimconfig
• Define resources
– Create real resource with full path
– Create resource object to represent
• Define networks
– How do clients on networks access the master?
• Define clients
– Able to relate network address of the client with object name
• Allocate resources to clients
– Different operations need different resources
• NIM operations on clients
– Setting up for operation
– Initiating operation
Notes:
Installing NIM
The NIM filesets that need to be installed on a machine that is designated to act as NIM master
are:
- bos.sysmgt.nim.client
- bos.sysmgt.nim.master
- bos.sysmgt.nim.spot
Configure master
Configuring the master machine consists of installing the master filesets and running
nimconfig. You must specify the primary network interface and a NIM network name for the
network that is attached to the primary interface. Several optional attributes can be specified.
Uempty nimconfig creates the NIM database and the /etc/niminfo configuration file. It also starts
the NIM daemon (nimesis) and creates an entry in /etc/inittab so that nimesis is started on
every boot of the master machine.
Allocate resources
After the resource and machine objects are defined, you need to decide what operation you
want to perform on your client machine. Different resources are needed for each operation.
Next, you need to allocate the resource to your client. The resources identify which resource
object is used to implement the client operation. There are two ways to allocate the resource:
- Use the nim -o allocate operation (or SMIT) to relate the resource to the machine
- Use SMIT, which prompts for the resources to allocate as part of the machine operation
definition
© Copyright IBM Corp. 2009, 2015 Unit 4. Network Installation Manager basics 4-15
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Resource objects
IBM Power Systems
• Object types
– boot Represents the network boot image resource
– nim_script Directory for customization scripts that are created by NIM
– spot Shared Product Object Tree - equivalent to /usr file system
– lpp_source Source device for software product images
– bosinst_data Configuration file that is used during base system installation
– image_data Configuration file that is used during base system installation
– mksysb A mksysb image
– script A user created script that is executed on a client to perform
customization
– resolv_conf Configuration file for name server information
– ... (additional resource types)
• Attributes
– location Directory path
– server Machine which serves this resource
– Rstate, prev_state Status attributes
– ... (additional attributes)
Notes:
Resources are the files and directories that NIM uses to install software on the clients.
Resource types
Resource types identify the different types of files that are used by NIM. For example:
- An lpp_source resource is a directory that contains the product images to be installed
- A spot resource contains the files that are used during the boot operation
- A script resource is a user definable script that can be used to customize a newly
installed client
- A mksysb resource is a backup image that can be used to install a client
© Copyright IBM Corp. 2009, 2015 Unit 4. Network Installation Manager basics 4-17
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
• lpp_source
– Directory containing software product images
– Supports NIM install operations (bos_inst and cust)
– Also used for creation of SPOT resource
• Defining an lpp_source:
# nim -o define -t lpp_source
-a server=<machine>
-a location=<directory> lppsource
[ optional attributes ]
<lppsource_name>
aix71-03-01 aix71-03-03
• # smit nim_mkres
bos filesets
Notes:
lpp_source
When a resource of this type is defined, it represents a directory in which software product
images are stored. lpp_source resources are used to support NIM installation operations. An
lpp_source can also be used as the source for the creation of a SPOT.
When you perform a NIM installation operation and allocate an lpp_source resource to the
client, NIM NFS mounts the lpp_source directory on the client. Then, it invokes the
installp command on the client to install from the directory. When installp finishes, NIM
automatically unmounts the resource.
simages attribute
This attribute is used to indicate that an lpp_source resource contains the set of installable
images to which NIM requires access to perform its basic functions. This basic set of images is
referred to as support images or simages. NIM automatically manages the use of this attribute
as part of the management of an lpp_source.
Uempty NIM adds this attribute to the definition of an lpp_source when it provides the required
simages, and NIM removes this attribute from the object's definition if a required image
becomes unavailable.
Some NIM operations require access to an lpp_source that has this attribute as part of its
definition, so having this attribute can be important. Perform the check operation on the
lpp_source to have NIM check to see whether the simages requirement was fulfilled. If it has,
NIM adds this attribute to the lpp_source definition.
© Copyright IBM Corp. 2009, 2015 Unit 4. Network Installation Manager basics 4-19
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
• SPOT
– /usr directory tree that is used during network boot lppsource
• Defining a SPOT
# nim -o define -t spot
-a server=<machine> SPOT
-a location=<directory>
-a source=<lpp_source_name>
[ optional attributes ] spot71-03-01 spot71-03-03
<SPOT_name> usr
• # smit nim_mkres
bin include lib etc
© Copyright IBM Corporation 2009, 2015
Notes:
Components
• A /usr file system
A Shared Product Object Tree (SPOT) is a directory that contains AIX code that is equivalent in
content to the code that is in the /usr file system. The NIM SPOT creation process restores
files from AIX filesets into the SPOT directory.
The SPOT is NFS-mounted on a booting client to provide necessary device support for the boot
process.
• Boot image
As part of the creation of a SPOT resource, NIM also creates network boot images. The
network boot images are constructed in /tftpboot on the same machine in which the SPOT
is created. The boot images are constructed with code from the newly created SPOT. The boot
images are also sometimes called SPOT files. The boot image file is transferred to the client
system with the BOOTP protocol.
Uempty Since one SPOT can potentially support several types of machines, several boot image files
can be created. The naming convention identifies each boot image as:
<spot_name>.<Platform>.<Kernel>.<Network>, where:
- <Platform> identifies which architecture this boot image supports: chrp, rspc, and so forth
- <Kernel> specifies whether this boot image contains a multi-processor (mp), 64-bit (64) or
uni-processor (up) kernel.
- <Network> identifies the network type: ent, tok, and so forth
These days, the only combination most of you work with is: chrp.mp.ent or chrp.64.ent.
During a network boot, the boot image is transferred over the network and loaded into the
client’s memory.
• /tftpboot
It is good practice to make /tftpboot a separate file system. As a separate file system, it
removes the risk of filling the root file system. If you are supporting multiple AIX versions on
multiple machine types or multiple network types, this directory can get large.
Optional attributes
There can be a number of optional attributes, including:
- installp_flags=<flags>
NIM calls installp to create the SPOT. By default, NIM uses the -agX flags when calling
installp. You can use installp_flags to specify the options you require.
- auto_expand={yes|no}
Indicates that file systems should be automatically expanded if more space is needed.
© Copyright IBM Corp. 2009, 2015 Unit 4. Network Installation Manager basics 4-21
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Uempty
• mksysb
– Identifies a mksysb system backup image file
– Used for bos_inst operations
• Defining a mksysb
# nim -o define -t mksysb
-a server=<machine>
-a location=<mksysb_path>
[ optional attributes ]
<mksysb_name>
• # smit nim_mkres
Notes:
mksysb
A mksysb resource represents a system backup image file that is created by using the mksysb
command. A mksysb resource can be used as the source of the BOS runtime files when a
bos_inst is performed.
© Copyright IBM Corp. 2009, 2015 Unit 4. Network Installation Manager basics 4-23
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
If the system backup image exists, enter the file name of the image. If you are creating the
system backup image as part of this operation, enter the name of the file that you want to
create.
There are a number of optional attributes, including:
- mk_image={yes|no}
If the backup file exists, specify no (the default). If you want nim to create a new backup file,
specify yes.
- source=<machine_name>
If you want nim to create a backup image for you, specify the NIM name of the machine you
want to back up.
- mksysb_flags=<value>
You can use this attribute to specify optional flags for the mksysb command, if needed.
Uempty
Network objects
IBM Power Systems
• Object types
– ent Ethernet network
– fddi FDDI network
– tok Token ring network
– atm ATM network (no network boot capability)
– hfi Host fabric interface network
– generic Generic network (no network boot capability)
• Attributes
– net_addr Network address for a network
– snm Subnetmask for a network
– routing<X> Routing information for a network
– Nstate, prev_state Status attributes
– ... (additional attributes)
Notes:
To perform certain NIM operations, the NIM master must be able to supply information
necessary to configure client network interfaces. The NIM master must also be able to verify
that client machines can access all the resources that are provided by the NIM server. To avoid
the extra work of repeatedly specifying network information for each individual client, NIM
network objects are used to represent the networks in a NIM environment.
Network types
NIM supports the network types that are shown in the visual, plus a generic type. Network boot
support is provided for Ethernet, Token-Ring, FDDI and HFI. Network boot operations are not
supported on ATM or generic networks. NIM supports both standard Ethernet and IEEE 802.3
Ethernet networks.
© Copyright IBM Corp. 2009, 2015 Unit 4. Network Installation Manager basics 4-25
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Network attributes
Network attributes include the network address, subnet mask, routes, and status. The Nstate
attribute indicates whether the object definition of the network is complete. NIM requires that all
networks be able to communicate with the NIM master, either with the master directly
connected to them or by having a NIM route to a network to which the master connects.
Routing
NIM routing information represents standard TCP/IP routing information for the networks that
are part of a NIM environment. This information defines the gateways that are used to establish
communication between the master machine and the clients.
The routing<X> attribute defines a route and includes:
- A destination (default or a NIM network name)
- A gateway address
If needed, multiple routes can be created and are numbered routing1, routing2, and so forth.
More attributes
There are a number of other attributes for each network object. lsnim is probably the easiest
way to get information about NIM attributes.
Uempty
Machine objects
IBM Power Systems
• Object types
– master
– standalone
– diskless Master
– dataless
• Attributes
– platform Architecture Standalone
– netboot_kernel up or mp
– if<X> Network interface information
– serves Resource served by this machine
– Cstate, Diskless
prev_state,
Mstate Status attributes
– ... (additional attributes)
Dataless
Notes:
NIM supports four types of machines: the master type and three types of clients: standalone,
diskless, and dataless.
Master
The master machine is defined by installing the master fileset, and then performing some quick
configuration. There can be only one master in the NIM environment. After a machine is defined
as the master, it can participate in NIM operations.
Standalone clients
Standalone clients have local disk resources. They are installed from the NIM server, but after
installation, they boot and operate from their local disks.
© Copyright IBM Corp. 2009, 2015 Unit 4. Network Installation Manager basics 4-27
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Diskless clients
Diskless clients have no disks of their own. They run entirely by using resources from the NIM
server.
Dataless clients
Dataless machines can use only a local disk for paging space and the /tmp and /home file
systems. All of the other storage is provided over the network by the NIM server.
Machine attributes
Each machine object belongs to one of the four machines’ object classes. Additionally, machine
objects store other attributes about the machine. The visual shows a few of them:
- The platform attribute describes the machine architecture (chrp, rspc, and so forth).
- netboot_kernel indicates which type of kernel is required, uni-processor (up),
multi-processor (mp), or 64-bit kernel (64).
- if<X> is used to provide information about a machine’s network interfaces. If there are
multiple interfaces, they are numbered: if1, if2, and so forth. This attribute includes the
NIM network this interface connects to, the host name, the MAC address, and the network
type.
- The serves attribute identifies resources served by this machine. If the machine serves
several resources, there is a serves attribute for each resource.
- Cstate indicates the NIM operation that is being performed on a machine or that no NIM
operations are currently being performed.
- prev_state shows the previous Cstate.
- Mstate shows the execution state for a machine.
Note
NIM attempts to keep the value of this attribute synchronized with the machine's execution state,
but NIM does not guarantee its accuracy. Perform the check operation on the machine for NIM to
attempt to determine the machine's execution state.
More attributes
There are a number of other attributes for each machine object. lsnim is probably the easiest
way to get information about NIM attributes.
Uempty
• Examples:
# nim -o define -t standalone -a if1="network1 lpar1 0 ent0"
-a cable_type1="N/A" -a connect=nimsh
-a platform=chrp -a netboot_kernel=mp lpar1
# smit nim
Perform NIM Administrative Tasks
Manage Machines
Define a Machine
<provide hostname of client>
Notes:
Follow these steps to add a client with the network information with SMIT:
1. On the NIM master, add a standalone client to the NIM environment by using SMIT
(nim_mkmac is the fast path).
2. Specify the host name of the client.
The client host name is the name translation of the IP address of the installation adapter
of this machine. By default, this name also becomes the host name of this client when
the client is installed. If using DNS, enter in the long host name here. For example,
lpar1.my.company.com.
3. The next SMIT screen that is displayed depends on whether NIM already has
information about the client's network. Supply the values for the required fields or accept
the defaults. Use the help information and the LIST option to help you specify the correct
values to add the client machine.
© Copyright IBM Corp. 2009, 2015 Unit 4. Network Installation Manager basics 4-29
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Uempty
Define a Machine
* NIM Machine Name [lpar1]
* Machine Type [standalone] +
* Hardware Platform Type [chrp] +
Kernel to use for Network Boot [mp] +
Communication Protocol used by client [nimsh] +
Primary Network Install Interface
* Cable Type N/A +
Network Speed Setting [] +
Network Duplex Setting [] +
* NIM Network network1
* Host Name lpar1
Network Adapter Hardware Address [0]
Network Adapter Logical Device Name [ent0]
IPL ROM Emulation Device [] +/
CPU Id []
Machine Group [] +
Comments []
Notes:
Machine type
The standalone machine type is the only type that is used now.
© Copyright IBM Corp. 2009, 2015 Unit 4. Network Installation Manager basics 4-31
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Kernel type
If a client machine is running the 64-bit kernel, then mp or 64 should be chosen. However, if the
client is running the 32-bit kernel, either the up or mp kernel can be chosen. To determine what
client is, run the ls -l /usr/lib/boot/unix command. Notice whether it is linked to the 64
up or mp kernel in that same directory. Also, the getconf -a command can be run to
determine whether the machine can run an mp kernel. An MP_CAPABLE setting of 1 means yes.
On older releases, run the bootinfo -z command to find out whether the machine can
handle mp. A setting of 1 again means yes. Starting with version 6.1, AIX uses only a 64-bit
kernel.
Communication protocol
Either the less Secure Shell protocol (rsh) can be used or the newer (nimsh) protocol (which is
available in AIX 5.3 and later versions of AIX).
Note
Cable type
Most configurations today are set to N/A (not applicable), as modern adapters are autosensing
of the connection type, or support only a single type (such as twisted pair or fiber).The cable
type can be checked by running the lsattr -El entX command to notice whether the
cable_type field shows. If not, then setting to N/A should work. If running twisted-pair cable,
then setting it to tp should work.
Network speed/duplex
These settings are only used when performing a push boot operation on the client. If not set, the
current SMS speed/duplex settings for your installation adapter are used.
NIM network
The NIM network is the network to which the client is assigned.
CPU_ID
The CPU_ID is the machine ID retrieved from running the uname command on the client. It will
be used to uniquely identify this client in the future. You do not have to set the CPU_ID, NIM
configures it.
Machine group
You can assign a client to a machine group.
Command line
The equivalent NIM command for the operation is:
# nim -o define -t standalone -a if1="network1 lpar1 0 ent0"
-a cable_type1="N/A" -a connect=nimsh
-a platform=chrp -a netboot_kernel=mp lpar1
For more information, use the lsnim -q define -t standalone command or the nim
man page.
© Copyright IBM Corp. 2009, 2015 Unit 4. Network Installation Manager basics 4-33
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
NIM operations
IBM Power Systems
• Operations on clients
– bos_inst
• rte
• mksysb
– cust
– maint
– diag
– maint_boot
• Procedure
– Allocate resources to clients (for intended operation)
– Perform operation
– Deallocate resources
• Other NIM object operations
– define, change, remove, allocate, deallocate, maint,
lslpp, lppchk, check, and so forth
© Copyright IBM Corporation 2009, 2015
Notes:
Operations on clients
NIM supports several different types of operations to install and manage software on NIM
clients. In addition, there are operations to manage the NIM objects themselves.
Three of the client operations are:
- bos_inst
Installs AIX on a client.
- cust and maint
Updates and maintains AIX software.
- diag
Prepares resources for a client to be network-booted into diagnostics mode.
- maint_boot
Boots a client to maintenance mode over the network.
Uempty bos_inst
A bos_inst operation is used to perform a Basic Operating System (BOS) installation on a
client. There are two types of bos_inst operations: rte and mksysb.
bos_inst customization
The NIM installation process allows you to run a customization script after AIX is installed on the
system. To run a script, allocate a script resource to the client before performing the
bos_inst. That script can be used to perform such customization as setting passwords,
changing network addresses, and so forth.
cust
This NIM operation performs software customization on a running NIM client. You can use the
cust operation to:
- Update existing software
- Install more software
- Run a customization script
maint
This NIM operation performs software maintenance operations on clients, such as committing
applied software and removing software.
© Copyright IBM Corp. 2009, 2015 Unit 4. Network Installation Manager basics 4-35
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
diag
This NIM operation enables the client to boot to diagnostics over the network.
maint_boot
This operation enables the client to boot to maintenance mode over the network.
Uempty
bos_inst operation
IBM Power Systems
• Command line
# nim -o bos_inst
-a lpp_source=<lpp_res_name>
-a spot=<SPOT_name>
-a source={rte|mksysb}
-a mksysb=<mksysb_name>
-a boot_client={yes|no}
[optional attributes]
<client_name>
• # smit nim_bosinst
Notes:
bos_inst
Configuring NIM to perform a bos_inst can be done from the command line or through SMIT.
There are two steps: allocating resources to the client and enabling the bos_inst. It is also
possible to combine these steps into one command:
# nim -o bos_inst -a lpp_source=<lpp_res_name> -a spot=<spot_name>
[additional resources] [-a source={rte|mksysb} [additional attributes]
<client_name>
If you use SMIT to enable a bos_inst, SMIT opens a series of windows to prompt you for the
required information and then displays a window where you can set more optional attributes.
Required information
The required information for a bos_inst operation is:
- <client_name>
© Copyright IBM Corp. 2009, 2015 Unit 4. Network Installation Manager basics 4-37
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
The last argument specifies the NIM object that you want to operate on. In this case, the
NIM object is the target client machine that you want to install.
- spot=<spot_name>
Specifies the SPOT resource that you want to use.
- lpp_source=<lpp_res_name>
The name is the name of the lpp_source resource you want to use for the installation. In
AIX 5.3 and later, this attribute is not required for a mksysb installation.
Optional information
Optional attributes include:
- source={rte|mksysb}
mksysb=<mksysb_name>
If you do not specify the source attribute, nim performs a rte bos_inst. If you set
source=mksysb, then you must use the mksysb attribute to specify the name of the mksysb
resource you want to use.
Note
In most cases, you must still include an lpp_source resource, even if you are doing a mksysb
installation. If a mksysb is created that includes all devices, you do not need to specify an
lpp_source.
- boot_client={yes|no}
When set to yes, the master attempts to reboot the client machine automatically for
reinstallation. For this option to succeed, the client must be running and initialized as a NIM
client or have rhosts permissions that are granted to the master. If set to no, the server is
configured to support the network boot. The actual boot would need to be initiated later.
Uempty
• Documentation
– NIM from A to Z in AIX 5L
(http://www.redbooks.ibm.com/ )
– AIX Version 7.1 Installation and migration guide
• EZ NIM
– nim_master_setup
– nim_client_setup
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 4. Network Installation Manager basics 4-39
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Classes
You should also consider the following class.
- AN220 - AIX Network Installation Management (NIM)
(IBM Learning Services training course:
http://www.ibm.com/services/learning/index.html)
Uempty
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 4. Network Installation Manager basics 4-41
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Checkpoint
IBM Power Systems
Notes:
Uempty
Exercise: Basic Network Installation Manager
configuration
IBM Power Systems
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 4. Network Installation Manager basics 4-43
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Unit summary
IBM Power Systems
Notes:
Uempty
Unit 5. System initialization: Accessing a boot
image
References
Online AIX Version 7.1 Operating system and device management
Note: References listed as online are available through the IBM Knowledge
Center at the following address: http://ibm.com/support/knowledgecenter.
© Copyright IBM Corp. 2009, 2015 Unit 5. System initialization: Accessing a boot image 5-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Unit objectives
IBM Power Systems
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 5. System initialization: Accessing a boot image 5-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Possible failures
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 5. System initialization: Accessing a boot image 5-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Firmware
Boot (1) CDROM/DVD
devices RAM
(2) Disk Boot Logical Volume
(3) Network
(hd5)
hdisk0
Boot
controller
Notes:
Introduction
This visual shows how the boot logical volume is found during the AIX boot process. Machines
use one or more bootlists to identify a boot device. The bootlist is part of the firmware.
Bootstrap code
Power Systems can manage several different operating systems. The hardware is not bound to
the software. System firmware reads the boot list to locate the boot device.
The Open Firmware's load method loads the AIX boot image. It reads the boot image as a
whole from the boot device. Then, the SOFTROS code (aixmon_chrp) processes the loaded
boot image to uncompress and relocate to a different region.
The Open Firmware loads the boot image with the Partition Table Entries (PTE) on the boot
disk. The PTEs describe the location and size of the boot image on the disk.
© Copyright IBM Corp. 2009, 2015 Unit 5. System initialization: Accessing a boot image 5-7
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Boot disk
Compressed Compressed
VGDA RAM file system Rest of the root disk
kernel
Boot record Base ODM (hd2, hd4, hd9var, and so forth)
SOFTROS
(aixmon_chrp)
Figure 5-4. Boot disk and the boot logical volume AN153.0
Notes:
Boot record
The boot block is not used to decide how to load the boot image. The load of the boot image is
based on the Partition Table Entry (PTE) table and ELF header of the boot image to decide how
to load the image into memory.
Uempty - SOFTROS
- Compressed kernel
- Compressed RAM file system
- Base ODM
SOFTROS
The SOFTROS program, aixmon_chrp, processes the loaded boot image and uncompresses
the compressed kernel and compressed RAM file system. It then relocates the boot image to a
different region.
Kernel
The kernel initializes itself and then runs /etc/init in the RAM file system. The RAM file
system version of init is a specialized version (/usr/lib/boot/ssh on the boot disk root file
system) and is used in phases 1 and 2 of the AIX initialization process.
Note: The kernel that is loaded from the boot logical volume is never replaced during the boot
process; the same kernel is used in multiuser mode. If you need a new kernel, you must
re-create the boot logical volume with the new kernel.
Base ODM
The boot logical volume contains a reduced copy of the ODM. During the boot process, many
devices are configured before hd4 is available. For these devices, the corresponding ODM files
must be stored in the boot logical volume.
© Copyright IBM Corp. 2009, 2015 Unit 5. System initialization: Accessing a boot image 5-9
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
© Copyright IBM Corp. 2009, 2015 Unit 5. System initialization: Accessing a boot image 5-11
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
• Normal bootlist
# bootlist -m normal hdisk0 hdisk1
# bootlist -m normal -o
hdisk0 blv=hd5 pathid=0
hdisk1 blv=hd5 pathid=0
Notes:
Introduction
You can use the command bootlist or diag from the command-line to change or display the
bootlists. You can also use the System Management Services (SMS) programs. SMS is
covered later in this unit.
bootlist command
The bootlist command is the easiest way to change the bootlist. The first example shows
how to change the bootlist for a normal boot. In this example, the system can be booted from
either hdisk0 or hdisk1. To query the bootlist, you can use the bootlist -o option.
The blv=hd5 part of the bootlist entry is to identify which boot logical volume to use on that
listed disk.
The second example shows how to display the customizable service bootlist.
Uempty With the bootlist command, you can also specify the IP parameters to use when specifying a
network adapter. For example:
# bootlist -m service ent0 gateway=192.168.1.1 bserver=192.168.10.3
client=192.168.1.57
Using the service bootlist in this way, you can boot to maintenance or diagnostic using a NIM
server without having to use SMS to specify the network adapter as the boot device.
Types of bootlists
The normal bootlist is used during a normal boot.
The default bootlist (hardcoded in the firmware) is used when numeric 5 is pressed during the
boot sequence.
Most machines, in addition to the default bootlist and the customized normal bootlist, allow for a
customized service bootlist. The service bootlist is set by using mode service with the
bootlist command. The service bootlist is used when the numeric 6 key is pressed during
boot.
Here is a list that summarizes the boot modes and the manual keys that are associated with
them:
• Numeric 1: Start an SMS (System Management Services) mode boot.
• Numeric 5: Start a service mode boot that uses the default service bootlist.
The default service bootlist is:
cd0
hdisk0 blv=hd5
ent0
• Numeric 6: Start a service mode boot that uses the customized service bootlist.
You can find variations on the different models of AIX systems. Refer to your specific model at:
http://ibm.com/support/knowledgecenter. Look for your model under Power Systems.
© Copyright IBM Corp. 2009, 2015 Unit 5. System initialization: Accessing a boot image 5-13
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
• The pathid argument can be repeated for multiple paths in the wanted
order:
# bootlist -m normal hdisk0 blv=hd5 pathid=0 pathid=1
or
# bootlist -m normal hdisk0 blv=hd5 pathid=0,1
• The bootlist command now shows the pathid with the device:
# bootlist -m normal –o
hdisk0 blv=hd5 pathid=0
hdisk0 blv=hd5 pathid=1
Notes:
The pathid command gives you the ability to operate at a pathid level. In the past, you had
to selectively delete and reconfigure device paths to generate bootlists on systems with MPIO
disks. The operation can now be done with a single command.
There were situations where the bootlist was too long. When the bootlist specifies disks without
any pathid restriction, each path takes an entry in the bootlist. The bootlist has a limited
capacity. Exceeding the capacity can result in being unable to use a different disk. Use of the
pathid specification can avoid this type of problem.
It is important to remember that ordering of paths are maintained with the bootlist command.
If you want the bootlist to be set to boot from paths 1, 0, and 2, use the pathid=1,0,2
argument.
Uempty
IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM
IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM
IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM
IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM
IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM
IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM
IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM
IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM
IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM
IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM
IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM
IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM
IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM
IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM
Notes:
Booting to SMS
If you cannot boot AIX because the bootlist needs correcting, then you need to use the System
Management Services (SMS) to modify the bootlist. The SMS programs are integrated into the
hardware (they are in NVRAM).
The visual shows how to start the System Management Services. During system boot, shortly
before the firmware looks for a boot image, it discovers some basic hardware on the system.
Then, the LED usually displays a value of E1F1. As the devices are discovered, either a text
name or graphic icon for the resource displays on the screen. The second device that is
discovered is usually the keyboard. When the keyboard is discovered, a unique double beep
tone is usually sounded. After the keyboard is discovered, the system is ready to accept input
that overrides the default behavior of conducting a normal boot. But after the last icon or name
is displayed, the system starts to use the bootlist to find the boot image and it is too late to
change it. One of the keyboard actions you can do during this brief period is to press the
numeric 1 key to request the system boot to SMS.
© Copyright IBM Corp. 2009, 2015 Unit 5. System initialization: Accessing a boot image 5-15
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Uempty
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 5. System initialization: Accessing a boot image 5-17
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
With option 1, you select a specific device to boot from right now. With option 2, you can modify
the customized bootlists. Option 3 is a toggle that has the system stop at this Multiboot menu
every time it boots, or continue with the normal boot sequence.
The focus here is the second option, used to modify the customized bootlist. The Configure
Bootlist Device Order panel lists:
1. Select 1st Boot Device
2. Select 2nd Boot Device
3. Select 3rd Boot Device
4. Select 4th Boot Device
5. Select 5th Boot Device
6. Display Current Setting
7. Restore Default Setting
You can either list or modify the bootlist. You select which position in the bootlist you want to
modify and then it lists possible device type to obtain a list of device to select:
1. Diskette
2. Tape
3. CD/DVD
4. IDE
5. Hard Drive
6. Network
7. None
8. List All Devices
Select the device type. If there are not many bootable devices, it is sometimes easier to use the
List All Devices option.
Finally, you would select a specific device to place in that position of the bootlist, as illustrated
on the next visual.
It is important to understand that when SMS is used to modify the bootlist, both the normal
bootlist and the service bootlist are modified. If you wanted them to be different, you need to
customize them later when you have a command prompt (such as in multiuser mode).
Uempty
Select Device
Device Current Device
Number Position Name
1. - IBM 10/100/1000 Base-TX PCI-X Adapter
( loc=U789D.001.DQDWAYT-P1-C5-T1 )
2. - SAS 73407 MB Harddisk, part=2 (AIX 7.1.0)
( loc=U789D.001.DQDWAYT-P3-D1 )
3. 1 SATA CD-ROM
( loc=U789D.001.DQDWAYT-P1-T3-L8-L0
Select Task )
4. None SAS 73407 MB Harddisk, part=2 (AIX 7.1.0)
===> 2 ( loc=U789D.001.DQDWAYT-P3-D1 )
1. Information
2. Set Boot Sequence: Configure as 1st Boot Device
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 5. System initialization: Accessing a boot image 5-19
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
© Copyright IBM Corp. 2009, 2015 Unit 5. System initialization: Accessing a boot image 5-21
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
Boot alternatives
The device where the system boots is the first device that it finds in the designated bootlist.
Whenever the effective boot device is bootable media, such as a mksysb tape/CD/DVD or
installation media, the system will boot to the Installation and Maintenance menu.
If the booting device is a network adapter, the mode of boot depends on the configuration of the
NIM server that services the network boot request. If the NIM server is configured to support an
AIX installation or a mksysb recover, then the system will boot to Install and Maintenance. If
the NIM server is configured to serve out a maintenance image, then the system boots to a
Maintenance menu (a submenu of Installation and Maintenance). If the NIM server is
configured to serve out a diagnostic image, then boot to a diagnostic mode.
There are other ways to boot to a diagnostic utility. If the booting device is a CD/DVD with a
diagnostic CD/DVD in the drive, boot into that diagnostic utility. If a service mode boot is
requested and the booting device is a hard disk with a boot logical volume, then the system
boots into the diagnostic utilities.
Uempty The system can be signaled which bootlist to use during the boot process. The default is to use
the normal bootlist and boot in a normal mode. The bootlist can be changed during a window of
opportunity between when the system discovers the keyboard and before it commits to the
default boot mode. The signal can be generated from the system console (HMC virtual terminal)
or from a service processor attached workstation (such as an HMC).
The keyboard signal that is used can vary from firmware to firmware. But, the most common is a
numeric 5 to indicate that the firmware should use the service bootlist and a numeric 6 to
indicate that the firmware should use the customizable service bootlist. Either of these special
keyboard signals result in a service mode boot, which can cause a boot to diagnostic mode
when booting off a boot logical volume on your hard disk.
With an HMC, you can specify which signal to send as part of the LPAR activation. Even if you
forget to override the default boot mode (usually normal to multiuser), you can still use the
virtual console keyboard to override the action after the keyboard is discovered.
© Copyright IBM Corp. 2009, 2015 Unit 5. System initialization: Accessing a boot image 5-23
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
Uempty
HMC
Advance Activate options: Boot the system from
Default bootlist • BOS CD/DVD
• Tape
• Network device (NIM)
Maintenance
Notes:
Introduction
The visual shows an overview of how to access a system that will not boot normally. The
maintenance mode can be started from an AIX CD/DVD, an AIX bootable tape (like a mksysb),
or a network device that can access a NIM master. The devices that contain the boot media
must be stored in the bootlists.
© Copyright IBM Corp. 2009, 2015 Unit 5. System initialization: Accessing a boot image 5-25
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
deallocate it from that other LPAR.Use a dynamic LPAR operation on the HMC to allocate
that slot.
- If using the default bootlist, the sequence is fixed and the CD/DVD drive is the first practical
device.
- If you are not using SMS for this boot and are using a tape drive or a network adapter as
your boot device, then you need to use one of the customizable bootlists. In this situation, it
is usually the service bootlist.
Verify your bootlist, but do not forget that some machines do not have a service bootlist.
Check that your boot device is part of the bootlist:
# bootlist -m service -o
- If you want to boot from your internal tape device, you need to change the bootlist because
the tape device by default is not part of the bootlist. For example:
# bootlist -m service rmt0 hdisk0
- Whichever bootlist you are using, insert the boot media (either tape or CD/DVD) into the
drive.
- Power on the system (or activate the LPAR). The system begins booting from the
installation media. After several minutes, c31 is displayed in the LED/LCD panel (or as the
reference code on the HMC display). c31 means that the software is prompting on the
console for input (normally to select the console device and then select the language). For
an LPAR, you need to have the virtual console started to interact with the prompts.
- Normally, you are prompted to select the console device and then select the language. After
making these selections, you see the Installation and Maintenance menu.
For partitioned systems with an HMC, you would normally use the HMC to access SMS and
then select the bootable device, which would bypass the use of a bootlist.
You can also use a NIM server to boot to maintenance. You would need to place your system’s
network adapter in your customized service bootlist before any other bootable devices. Or, use
SMS to specifically request boot over that adapter (the latter option is most common). Here is
an example of setting the service boot list:
# bootlist -m service ent0 gateway=192.168.1.1
bserver=192.168.10.3 client=192.168.1.57
You would also need to set up the NIM server to provide a boot image for doing a maintenance
boot. For example, at the NIM server:
# nim -o maint_boot -spot <spotname> <client machine object name>
Uempty
Maintenance
Notes:
First steps
When booting in maintenance mode, you first must identify the system console that is used. For
example, your virtual console (vty), graphic console (lft), or serial attached console (tty that is
attached to the S1 port).
After selecting the console, the Installation and Maintenance menu is shown:
1 Start Install Now with Default Settings
2 Change/Show Installation Settings and Install
3 Start Maintenance Mode for System Recovery
4 Configure Network Disks (iSCSI)
5 Select Storage Adapters
© Copyright IBM Corp. 2009, 2015 Unit 5. System initialization: Accessing a boot image 5-27
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
In a network boot that uses NIM, the console goes straight to the maintenance menu.
From this point, access the rootvg to run any system recovery steps that might be necessary.
Uempty
Type the number for a volume group to display the logical volume
information
and press Enter.
Choice: 1
-----------------------------------------------------------------------------
Volume Group ID 00c35ba000004c00000001153ce1c4b0 includes the following
logical volumes:
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 5. System initialization: Accessing a boot image 5-29
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Access this volume group and start a shell before mounting file systems
When you choose this selection, the rootvg is activated, but the file systems that belong to the
rootvg are not mounted.
A typical scenario where this selection is chosen is when a corrupted file system needs repair by
the fsck command. Repairing a corrupted file system is only possible if the file system is not
mounted.
Another scenario might be a corrupted hd8 transaction log. Any changes that take place in the
superblock or i-nodes are stored in the log logical volume. When these changes are written to
disk, the corresponding transaction logs are removed from the log logical volume.
The logform command reinitializes a corrupted transaction log, which is only possible, when
no file systems are mounted. After initializing the log device, you need to do a file system repair
for all file systems that use this transaction log. You must explicitly specify the file system type:
JFS or JFS2:
# logform -V jfs2 /dev/hd8
# fsck -y -V jfs2 /dev/hd1
# fsck -y -V jfs2 /dev/hd2
# fsck -y -V jfs2 /dev/hd3
# fsck -y -V jfs2 /dev/hd4
# fsck -y -V jfs2 /dev/hd9var
# fsck -y -V jfs2 /dev/hd10opt
# exit
Keep in mind that US keyboard layout is used but you can use the retrieve function by using the
commands set -o emacs or set -o vi.
Uempty
Maintenance
Notes:
Maintenance mode
If the boot logical volume is corrupted (for example, bad blocks on a disk might cause a
corrupted BLV), the machine will not boot.
To fix this situation, you must boot your machine in maintenance mode, from a CD/DVD or tape.
If NIM is set up for a machine, you can also boot the machine from a NIM master in
maintenance mode. NIM is actual a common way to do special boots in a logical partition
environment.
© Copyright IBM Corp. 2009, 2015 Unit 5. System initialization: Accessing a boot image 5-31
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
volume with the bosboot command. You need to specify the corresponding disk device, for
example hdisk0:
# bosboot -ad /dev/hdisk0
# sync
# sync
# reboot
The sync commands flush any file data in memory cache to disk. While you would normally use
a shutdown command, in maintenance mode it is appropriate to use the reboot command.
The bosboot command requires that the boot logical volume (hd5) exists and is valid. The boot
logical volume might be deleted by mistake or the LVCB of the boot logical volume might be
damaged. If you need to re-create the BLV from scratch, the following steps should be followed:
1. Boot your machine in maintenance mode (from CD/DVD or tape (numeric 5) or use
(numeric 1) to access the Systems Management Services (SMS) to select boot device).
2. Remove the old hd5 logical volume, if it exists.
# rmlv hd5
3. Clear the boot record at the beginning of the disk.
# chpv -c hdisk0
4. Create an hd5 logical volume: one physical partition in size, must be in rootvg and outer
edge as intrapolicy. Specify boot as logical volume type.
# mklv -y hd5 -t boot -a e rootvg 1
5. Run the bosboot command as described on the visual.
# bosboot -ad /dev/hdisk0
6. Check the actual bootlist.
# bootlist -m normal -o
7. Write data immediately to disk.
# sync
# sync
8. Reboot the system.
# reboot
By using the internal command ipl_varyon -i, you can check the state of the boot record.
Uempty
Checkpoint (1 of 2)
IBM Power Systems
1. True or False: You must have AIX loaded on your system to use the
System Management Services programs.
2. Your AIX system is powered off. AIX is installed on hdisk1 but the
bootlist is set to boot from hdisk0. How can you fix the problem and
make the machine boot from hdisk1?
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 5. System initialization: Accessing a boot image 5-33
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Checkpoint (2 of 2)
IBM Power Systems
5. What command is used to build a new boot image and write it to the
boot logical volume?
7. True or False: During the AIX boot process, the AIX kernel is loaded
from the root file system.
Notes:
Uempty
Exercise: System initialization: Accessing a boot
image
IBM Power Systems
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 5. System initialization: Accessing a boot image 5-35
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Unit summary
IBM Power Systems
Notes:
Uempty
Unit 6. System initialization: rc.boot and inittab
References
Online AIX Version 7.1 Operating system and device management
Note: References listed as online are available through the IBM Knowledge
Center at the following address: http://ibm.com/support/knowledgecenter.
© Copyright IBM Corp. 2009, 2015 Unit 6. System initialization: rc.boot and inittab 6-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Unit objectives
IBM Power Systems
Notes:
There are many reasons for boot failures. The hardware might be damaged or due to user
errors, the operating system might not be able to complete the boot process.
A good knowledge of the AIX boot process is a prerequisite for all AIX system administrators.
© Copyright IBM Corp. 2009, 2015 Unit 6. System initialization: rc.boot and inittab 6-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
/
Restore RAM file system from
boot image etc dev mnt usr
rc.boot 2
Activate rootvg
Configure remaining
Start "real" init process rc.boot 3 devices
(from rootvg)
/etc/inittab
© Copyright IBM Corporation 2009, 2015
Notes:
Boot sequence
The visual shows the boot sequence after loading the AIX kernel from the boot image.
The AIX kernel gets control and executes the following steps:
1. The kernel restores a RAM file system into memory by using information that is provided
in the boot image. At this stage, the rootvg is not available, so the kernel needs to work
with commands provided in the RAM file system. You can consider this RAM file system
as a small AIX operating system.
2. The kernel starts the init process that was provided in the RAM file system (not from
the root file system). This init process runs a boot script rc.boot.
3. rc.boot controls the boot process. In the first phase, (it is called by init with
rc.boot 1), the base devices are configured. In the second phase (rc.boot 2), the
rootvg is activated (or varied on).
Uempty 4. After activating the rootvg at the end of rc.boot 2, the kernel mounts over the RAM
file system with the file systems from rootvg. The init from the root file system, hd4
replaces the boot image in the kernel.
5. This init processes the /etc/inittab file. Out of this file, rc.boot is called a third
time (rc.boot 3) and all remaining devices are configured.
© Copyright IBM Corp. 2009, 2015 Unit 6. System initialization: rc.boot and inittab 6-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
rc.boot 1
IBM Power Systems
Failure LED
Process 1 rootvg is not active
F05 init
c06
rc.boot 1
Boot image
ODM
restbase
548 510
RAM file system
ODM
cfgmgr -f
Notes:
Uempty configuration of base devices into the system, so that the rootvg can be activated in the
next rc.boot phase.
3. Base devices are all devices that are necessary to access the rootvg. If the rootvg is
stored on a hdisk0, all devices from the system board to the disk itself must be
configured to be able to access the rootvg.
4. At the end of rc.boot 1, the system determines the last boot device (used to establish
the /dev/ipldevice link) by calling bootinfo -b. The LED shows 511 (DEV CFG 1
END), followed by 553 (PHASE 1 COMPLETE).
© Copyright IBM Corp. 2009, 2015 Unit 6. System initialization: rc.boot and inittab 6-7
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
rc.boot 2 (1 of 2)
IBM Power Systems
Failure LED rc.boot 2
551
fsck -f /dev/hd9var
517 mount /var /
518 copycore RAM file system
umount /var
Notes:
Uempty 3. Afterward, /dev/hd4 is mounted directly onto the root (/) in the RAM file system. If the
mount fails, for example due to a corrupted JFS log, the LED 557 (ROOT MNT FAILED)
is shown and the boot process stops.
4. Next, /dev/hd2 is checked and mounted (again with option -f, it is checked only if the
file system wasn't unmounted cleanly). If the mount fails, LED 518 (/USR MOUNT
FAILED) is displayed and the boot stops.
5. Then, the /var file system is checked and mounted. This check is necessary at this
stage because the copycore command checks if a memory dump occurred. If a
memory dump exists in a paging space device, it is copied from the memory dump
device, /dev/hd6, to the copy directory that is by default the directory /var/adm/ras.
/var is unmounted afterward. If the /var mount fails, LED 518 (/VAR MOUNT FAILED)
is displayed and the boot stops.
6. The primary paging space /dev/hd6 is made available.
Note
This syntax works only during the boot process. If you boot from the CD/DVD into maintenance
mode and need to mount the root file system manually, you need to mount it over another directory,
such as /mnt. Otherwise, you are unable to access the RAMFS files.
© Copyright IBM Corp. 2009, 2015 Unit 6. System initialization: rc.boot and inittab 6-9
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
rc.boot 2 (2 of 2)
IBM Power Systems
mount /var
dev etc mnt usr var
ODM
Copy boot messages to
alog /
RAM file system
Kernel removes RAMFS
Notes:
Final stage
At this stage, the AIX kernel removes the RAM file system (returns the memory to the free
memory pool) and starts the init process from the root (/) file system in rootvg.
© Copyright IBM Corp. 2009, 2015 Unit 6. System initialization: rc.boot and inittab 6-11
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
rc.boot 3 (1 of 2)
IBM Power Systems
Process 1 /etc/inittab:
init /sbin/rc.boot 3 553
fsck -f /dev/hd3
Here, you work with mount /tmp 517 518
rootvg
savebase hd5:
ODM
Notes:
Uempty 4. The configuration manager reads the ODM class Config_Rules and runs either all
methods for phase=2 or phase=3. All remaining devices that are not base devices are
configured in this step.
5. cfgcon configures the console. The numbers c31, c32, c33, or c34 are displayed
depending on the type of console:
- c31: Console not yet configured. Provides instruction to select a console.
- c32: Console is a lft (graphic display) terminal.
- c33: Console is a tty.
- c34: Console is a file on the disk.
If CDE is specified in /etc/inittab, the CDE is started and you get a graphical boot
on the console.
6. To synchronize the ODM in the boot logical volume with the ODM from the root (/) file
system, savebase is called.
© Copyright IBM Corp. 2009, 2015 Unit 6. System initialization: rc.boot and inittab 6-13
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
rc.boot 3 (2 of 2)
IBM Power Systems
/etc/objrepos:
savebase ODM
syncd 60
errdemon
hd5:
Turn off LEDs ODM
rm /etc/nologin
A device that was previously detected
could not be found. Run "diag -a".
chgstatus=3
in CuDv ? System initialization is completed.
Notes:
Uempty 4. If devices exist that are flagged as missing in CuDv (chgstatus=3), a message is
displayed on the console. For example, this message is displayed if external devices are
not powered on during system boot.
5. The last message, System initialization completed, is written to the console.
rc.boot 3 is finished. The init process runs the next command in /etc/inittab.
© Copyright IBM Corp. 2009, 2015 Unit 6. System initialization: rc.boot and inittab 6-15
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
rc.boot summary
IBM Power Systems
Executed Phase
Command Primary Actions
From Config_Rules
RAM • restbase
rc.boot 1 file system 1
(/dev/ram0) • cfgmgr -f
• ipl_varyon
RAM • Mount /, /usr, /var file systems
rc.boot 2 file system
(/dev/ram0) • mergedev
• Copy ODM files
• mount /tmp
• cfgmgr -p2 2=normal
rc.boot 3 rootvg or
cfgmgr -p3 3=service
• savebase
Notes:
Summary
During rc.boot 1, all base devices are configured. This configuration is done by cfgmgr
-f, which runs all phase 1 methods from Config_Rules.
During rc.boot 2, the rootvg is varied on. All /dev files and the customized ODM files from
the RAM file system are merged to disk.
During rc.boot 3, cfgmgr -p configures all remaining devices. The configuration manager
reads the Config_Rules class and runs the corresponding methods. To synchronize the ODMs,
savebase is called that writes the ODM from the disk back to the boot logical volume.
Uempty
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 6. System initialization: rc.boot and inittab 6-17
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
For JFS2:
# logform -V jfs2 /dev/hd8
# fsck -y -V jfs2 /dev/hd1
# fsck -y -V jfs2 /dev/hd2
# fsck -y -V jfs2 /dev/hd3
# fsck -y -V jfs2 /dev/hd4
# fsck -y -V jfs2 /dev/hd9var
# fsck -y -V jfs2 /dev/hd10opt
# fsck -y -V jfs2 /dev/hd11admin
exit
The logform command initializes a new JFS transaction log and can result in loss of data
because JFS transactions can be destroyed. Your machine will boot after the JFS log is
repaired.
JFS log corruption typically happens when the system crashes or is taken down in a hard
manner by the administrator.
The JFS log recovery that is described does not ensure that disk updates in process are
completed. Determining what was processed and what needs reprocessing is the responsibility
of the applications by using their transaction logs and any checkpoint processing that was
completed.
Uempty
(1)
rc.boot 1
(2)
(4)
(3)
(5)
Notes:
Instructions
Using the following questions, put the solutions into the visual.
1. What calls rc.boot 1? Is it:
• /etc/init from hd4
• /etc/init from the RAMFS in the boot image
2. Which command copies the ODM files from the boot image into the RAM file system?
3. Which command triggers the execution of all phase 1 methods in Config_Rules?
4. Which ODM files contain the devices are configured in rc.boot 1?
• ODM files in hd4
• ODM files in RAM file system
5. How can you determine the last boot device?
© Copyright IBM Corp. 2009, 2015 Unit 6. System initialization: rc.boot and inittab 6-19
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
(5)
rc.boot 2
(1) (6)
(2) (7)
(3)
(8)
557
(4)
Notes:
Instructions
Order the following eight expressions in the correct sequence.
- Turn on paging
- Merge RAM /dev files.
- Copy boot messages to alog
- Activate rootvg
- Mount /var; copy memory dump; unmount /var
- Mount /dev/hd4 onto / in RAMFS
- Copy RAM ODM files
- Finally, answer the following question. Put the answer in box 8:
Your system stops booting with an LED 557. Which command failed?
Uempty
sy____ ___
/sbin/rc.boot 3 err_______
rm _________
s_______ ________&
Missing devices ?
_________=3
________ -p2
in ______ ?
________ -p3
Execute next line in
Start Console: ______ _____________
Start CDE: _________
Notes:
Instructions
Complete the missing information in the picture.
Your instructor reviews the activity with you.
© Copyright IBM Corp. 2009, 2015 Unit 6. System initialization: rc.boot and inittab 6-21
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
© Copyright IBM Corp. 2009, 2015 Unit 6. System initialization: rc.boot and inittab 6-23
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Configuration manager
IBM Power Systems
Predefined
PdDv
PdAt
PdCn
cfgmgr Config_Rules
Customized Methods
CuDv Define
CuAt Device
Configure
Driver load
CuDep Change
CuDvDr Unconfigure
unload
CuVPD Undefine
Notes:
Automatic configuration
The configuration manager automatically detects many devices. For this configuration to occur,
device entries must exist in the predefined device object classes. The configuration manager
uses the methods from PdDv to manage the device state, for example, to bring a device into the
defined or available state.
Define method
When a device is defined through its define method, the information from the predefined
database for that type of device is used to create the information that describes the
device-specific instance. This device-specific information is then stored in the customized
database.
Configuration order
The configuration process requires that a device is defined or configured before a device
attached to it can be defined or configured. At system boot time, the configuration manager
configures the system in a hierarchical fashion. Finally, first the system board is configured,
then the buses, then the adapters that are attached, and the devices that are connected to the
adapters. The configuration manager then configures any pseudodevices (volume groups,
logical volumes, and so forth) that need to be configured.
© Copyright IBM Corp. 2009, 2015 Unit 6. System initialization: rc.boot and inittab 6-25
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
1 10 0 /etc/methods/defsys cfgmgr -f
1 12 0 /usr/lib/methods/deflvm
2 10 0 /etc/methods/defsys
2 12 0 /usr/lib/methods/deflvm cfgmgr -p2
2 19 0 /etc/methods/ptynode (Normal boot)
2 20 0 /etc/methods/startlft
3 10 0 /etc/methods/defsys
3 12 0 /usr/lib/methods/deflvm
3 19 0 /etc/methods/ptynode
cfgmgr -p3
3 20 0 /etc/methods/startlft
3 25 0 /etc/methods/starttty (Service boot)
Notes:
Introduction
The Config_Rules ODM object class is used by cfgmgr during the boot process. The phase
attribute determines when the respective method is called.
Phase 1
All methods with phase=1 are run when cfgmgr -f is called. The first method that is started
is /etc/methods/defsys, which is responsible for the configuration of all base devices. The
second method /usr/lib/methods/deflvm loads the logical volume device driver (LVDD)
into the AIX kernel.
If you have devices that must be configured in rc.boot 1, that means before the rootvg is
active, you need to place phase 1 configuration methods into Config_Rules. A bosboot is
required afterward.
Uempty Phase 2
All methods with phase=2 are run when cfgmgr -p2 is called. This action takes place in the
third rc.boot phase for a normal boot. The seq attribute controls the sequence of the
execution: The lower the value, the higher the priority.
Phase 3
All methods with phase=3 are run when cfgmgr -p3 is called. This action takes place in the
third rc.boot phase for a service boot.
Sequence number
Each configuration method has an associated sequence number. When running the methods
for a particular phase, cfgmgr sorts the methods based on the sequence number. The methods
are then started, one by one, starting with the smallest sequence number. Methods with a
sequence number of zero are started last after the methods with nonzero sequence numbers.
Boot mask
Each configuration method has an associated boot mask:
- If the boot_mask is zero, the rule applies to all types of boot.
- If the boot_mask is nonzero, the rule then applies to the boot type specified. For example,
if boot_mask = DISK_BOOT, the rule would be used for boots from disk versus
NETWORK_BOOT, which apply when booting through the network.
© Copyright IBM Corp. 2009, 2015 Unit 6. System initialization: rc.boot and inittab 6-27
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
# alog -t boot -o
-------------------------------------------------------
attempting to configure device 'sys0'
invoking /usr/lib/methods/cfgsys_rspc -l sys0
return code = 0
******* stdout *******
bus0
******* no stderr *****
-------------------------------------------------------
attempting to configure device 'bus0'
invoking /usr/lib/methods/cfgbus_pci bus0
return code = 0
******** stdout *******
bus1, scsi0
****** no stderr ******
-------------------------------------------------------
attempting to configure device 'bus1'
invoking /usr/lib/methods/cfgbus_isa bus1
return code = 0
******** stdout ******
fda0, ppa0, sa0, sioka0, kbd0
****** no stderr *****
Figure 6-15. cfgmgr output in the boot log using alog AN153.0
Notes:
If you have boot problems, it is always a good idea to check the boot alog file for potential boot
error messages. All output from cfgmgr is shown in the boot log with other information that is
produced in the rc.boot script.
The default boot log file size is 128 KB. If you want to increase the size of the boot log, for
example to 256 KB, run the following command:
# print “Resizing boot log” | alog -C -t boot -s 262144
© Copyright IBM Corp. 2009, 2015 Unit 6. System initialization: rc.boot and inittab 6-29
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
/etc/inittab file
IBM Power Systems
init:2:initdefault:
brc::sysinit:/sbin/rc.boot 3 >/dev/console 2>&1 # Phase 3 of system boot
powerfail::powerfail:/etc/rc.powerfail 2>&1 | alog -tboot > /dev/console #
mkatmpvc:2:once:/usr/sbin/mkatmpvc >/dev/console 2>&1
atmsvcd:2:once:/usr/sbin/atmsvcd >/dev/console 2>&1
tunables:23456789:wait:/usr/sbin/tunrestore -R > /dev/console 2>&1 # Set tunab
securityboot:2:bootwait:/etc/rc.security.boot > /dev/console 2>&1
rc:23456789:wait:/etc/rc 2>&1 | alog -tboot > /dev/console # Multi-User checks
rcemgr:23456789:once:/usr/sbin/emgr -B > /dev/null 2>&1
fbcheck:23456789:wait:/usr/sbin/fbcheck 2>&1 | alog -tboot > /dev/console # ru
srcmstr:23456789:respawn:/usr/sbin/srcmstr # System Resource Controller
rctcpip:23456789:wait:/etc/rc.tcpip > /dev/console 2>&1 # Start TCP/IP daemons
mkcifs_fs:2:wait:/etc/mkcifs_fs > /dev/console 2>&1
sniinst:2:wait:/var/adm/sni/sniprei > /dev/console 2>&1
rcnfs:23456789:wait:/etc/rc.nfs > /dev/console 2>&1 # Start NFS Daemons
cron:23456789:respawn:/usr/sbin/cron
piobe:2:wait:/usr/lib/lpd/pioinit_cp >/dev/null 2>&1 # pb cleanup
cons:0123456789:respawn:/usr/sbin/getty /dev/console
qdaemon:23456789:wait:/usr/bin/startsrc -sqdaemon
writesrv:23456789:wait:/usr/bin/startsrc -swritesrv
uprintfd:23456789:respawn:/usr/sbin/uprintfd
shdaemon:2:off:/usr/sbin/shdaemon >/dev/console 2>&1 # High availability
Notes:
Purpose of /etc/inittab
The /etc/inittab file supplies information for the init process. Note how the rc.boot script
is run out of the inittab file to configure all remaining devices in the boot process.
Modifying /etc/inittab
Do not use an editor to change the /etc/inittab file. One small mistake in /etc/inittab,
and your machine will not boot. Instead, use the commands mkitab, chitab, and rmitab to edit
/etc/inittab. The advantage of these commands is that they always guarantee a
non-corrupted /etc/inittab file. If your machine stops booting with an LED 553, this code
indicates a bad /etc/inittab file in most cases.
Viewing /etc/inittab
The lsitab command can be used to view the /etc/inittab file. For example:
# lsitab dt
dt:2:wait:/etc/rc.dt
If you issue lsitab -a, the complete /etc/inittab file is shown.
© Copyright IBM Corp. 2009, 2015 Unit 6. System initialization: rc.boot and inittab 6-31
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Example inittab:
init:2:initdefault:
brc::sysinit:/sbin/rc.boot 3 >/dev/console 2>&1 # Phase 3 of system boot
powerfail::powerfail:/etc/rc.powerfail 2>&1 | alog -tboot > /dev/console #
mkatmpvc:2:once:/usr/sbin/mkatmpvc >/dev/console 2>&1
atmsvcd:2:once:/usr/sbin/atmsvcd >/dev/console 2>&1
tunables:23456789:wait:/usr/sbin/tunrestore -R > /dev/console 2>&1
securityboot:2:bootwait:/etc/rc.security.boot > /dev/console 2>&1
rc:23456789:wait:/etc/rc 2>&1 | alog -tboot > /dev/console
rcemgr:23456789:once:/usr/sbin/emgr -B > /dev/null 2>&1
fbcheck:23456789:wait:/usr/sbin/fbcheck 2>&1 | alog -tboot > /dev/console
srcmstr:23456789:respawn:/usr/sbin/srcmstr # System Resource Controller
rctcpip:23456789:wait:/etc/rc.tcpip > /dev/console 2>&1
mkcifs_fs:2:wait:/etc/mkcifs_fs > /dev/console 2>&1
sniinst:2:wait:/var/adm/sni/sniprei > /dev/console 2>&1
rcnfs:23456789:wait:/etc/rc.nfs > /dev/console 2>&1 # Start NFS Daemons
cron:23456789:respawn:/usr/sbin/cron
piobe:2:wait:/usr/lib/lpd/pioinit_cp >/dev/null 2>&1 # pb cleanup
cons:0123456789:respawn:/usr/sbin/getty /dev/console
qdaemon:23456789:wait:/usr/bin/startsrc -sqdaemon
writesrv:23456789:wait:/usr/bin/startsrc -swritesrv
uprintfd:23456789:respawn:/usr/sbin/uprintfd
shdaemon:2:off:/usr/sbin/shdaemon >/dev/console 2>&1 # High availability
Uempty
551, 555, 557 File system or log corrupted Rebuild journal log and fsck the file systems.
rootvg locked (only if 551) Unlock rootvg (chvg –u rootvg)
552, 554, 556 File system superblock Rebuild journal log and fsck the file systems
corrupted Or recover superblock from secondary
Reduced ODM corrupted If that fails, recover from mksysb
523 - 534 ODM files missing ODM files are missing or inaccessible.
Restore missing files from a system backup
Notes:
Introduction
The visual shows some common boot errors that might happen during the AIX software boot
process.
Bootlist wrong?
If the bootlist is wrong, the system cannot boot. This problem is easy to fix. Boot in SMS and
select the correct boot device. Keep in mind that only hard disks with boot records are shown as
selectable boot devices.
© Copyright IBM Corp. 2009, 2015 Unit 6. System initialization: rc.boot and inittab 6-33
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Superblock corrupted?
Another thing that you can try is to check the superblocks of your rootvg file systems. If you boot
in maintenance mode and you get error messages like Not an AIX file system or Not a
recognized file system type, it is probably due to a corrupted superblock in the file system.
Each file system has two super blocks. Running fsck should automatically recover the primary
superblock by copying from the backup superblock. The following steps are provided in case
you need to do this recovery manually.
For JFS, the primary superblock is in logical block 1 and a copy is in logical block 31. To
manually copy the superblock from block 31 to block 1 for the root file system (in this example),
run the following command:
# dd count=1 bs=4k skip=31 seek=1 if=/dev/hd4 of=/dev/hd4
For JFS2, the locations are different. To manually recover the primary superblock from the
backup superblock for the root file system (in this example), run the following command:
# dd count=1 bs=4k skip=15 seek=8 if=/dev/hd4 of=/dev/hd4
rootvg locked?
Many LVM commands place a lock into the ODM to prevent other commands from working at
the same time. If a lock remains in the ODM due to a crash of a command, it might lead to a
hanging system.
To unlock the rootvg, boot in maintenance mode and access the rootvg with file systems. Run
the following command to unlock the rootvg:
# chvg -u rootvg
© Copyright IBM Corp. 2009, 2015 Unit 6. System initialization: rc.boot and inittab 6-35
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
init:2:initdefault:
brc::sysinit:/sbin/rc.boot 3
rc:2:wait:/etc/rc
fbcheck:2:wait:/usr/sbin/fbcheck
srcmstr:2:respawn:/usr/sbin/srcmstr
cron:2:respawn:/usr/sbin/cron
rctcpip:2:wait:/etc/rc.tcpip
rcnfs:2:wait::/etc/rc.nfs
qdaemon:2:wait:/usr/bin/startsrc -
sqdaemon
dt:2:wait:/etc/rc.dt
tty0:2:off:/usr/sbin/getty /dev/tty1
myid:2:once:/usr/local/bin/errlog.check
Notes:
Instructions
Answer the following questions as they relate to the /etc/inittab file shown in the visual:
1. Which process does the init process start only one time?
The init process does not wait for the initialization of this process.
4. Which line determines that multiuser mode is the initial run level of the system?
11. Which line takes care of varying on the volume groups, activating paging spaces, and
mounting file systems that are activated during boot?
© Copyright IBM Corp. 2009, 2015 Unit 6. System initialization: rc.boot and inittab 6-37
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Checkpoint (1 of 2)
IBM Power Systems
2. Your system stops booting with LED 557. In which rc.boot phase
does the system stop?
4. Which ODM file is used by the cfgmgr during boot to configure the
devices in the correct sequence?
Notes:
Uempty
Checkpoint (2 of 2)
IBM Power Systems
5. What is the likely cause if your system stops booting with LED 553?
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 6. System initialization: rc.boot and inittab 6-39
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
Uempty
Unit summary
IBM Power Systems
Notes:
Highlights
- After the boot image is loaded into RAM, the rc.boot script is run three times to configure
the system.
- During rc.boot 1, devices to vary on the rootvg are configured.
- During rc.boot 2, the rootvg is varied on.
- In rc.boot 3, the remaining devices are configured.
- The init process initiates processes that are defined in the /etc/inittab file.
© Copyright IBM Corp. 2009, 2015 Unit 6. System initialization: rc.boot and inittab 6-41
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Uempty
Unit 7. LVM metadata and related problems
References
Online AIX Version 7.1 Command Reference volumes 1-6
Online AIX Version 7.1 Operating system and device management
SG24-5422-00 AIX Logical Volume Manager from A to Z: Introduction and
Concepts (Redbooks)
SG24-5433-00 AIX Logical Volume Manager from A to Z: Troubleshooting
and Commands (Redbooks)
GG24-4484-00 AIX Storage Management (Redbooks)
Note: References listed as online are available through the IBM Knowledge
Center at the following address: http://ibm.com/support/knowledgecenter.
© Copyright IBM Corp. 2009, 2015 Unit 7. LVM metadata and related problems 7-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Unit objectives
IBM Power Systems
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 7. LVM metadata and related problems 7-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Physical Logical
Partitions Partitions
Physical Logical
Volumes Volume
Volume
Group
Notes:
Introduction
This visual and the associated student notes provide a review of basic LVM terms.
Uempty For scalable volume groups, the maximum number of physical partitions is no longer defined on
a per disk basis but applies to the entire volume group. The scalable volume group can hold up
to 2097152 (2048 KB) physical partitions.
© Copyright IBM Corp. 2009, 2015 Unit 7. LVM metadata and related problems 7-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
LVM identifiers
IBM Power Systems
# uname -m
00C35BA04C00
© Copyright IBM Corporation 2009, 2015
Notes:
Use of identifiers
The LVM uses identifiers for disks, volume groups, and logical volumes. As volume groups can
be exported and imported between systems, these identifiers must be unique worldwide.
AIX generated identifiers are based on the CPU ID of the creating host and a time stamp.
© Copyright IBM Corp. 2009, 2015 Unit 7. LVM metadata and related problems 7-7
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
Uempty the mklv command to request that the LVCB is not stored in the beginning of the logical
volume, but instead part of the VGDA.
LVCB-related considerations
For normal volume groups, the LVCB is in the first block of the user data within the logical
volume. Big volume groups keep more LVCB information in the VGDA. The LVCB structure on
the first logical volume user block and the LVCB structure within the VGDA are similar but not
identical. If a big volume group was created with the -T O option of the mkvg command, no
LVCB occupies the first block of the logical volume. With scalable volume groups, logical
volume control information is no longer stored on the first user block of any logical volume.
Therefore, no precautions need to be taken when using raw logical volumes because there is
no longer a need to preserve the information that is held by the first 512 bytes of the logical
device.
© Copyright IBM Corp. 2009, 2015 Unit 7. LVM metadata and related problems 7-9
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
• AIX files
– /etc/vg/vgVGID Handle to the VGDA copy in memory
– /dev/hdiskX Special file for a disk
– /dev/VGname Special file for administrative access to a
volume group
– /dev/LVname Special file for a logical volume
– /etc/filesystems Used by the mount command to associate
logical volume name, file system log, and
mount point
Notes:
Uempty
Notes:
Overview
The LVM metadata that is maintained in the ODM database has a large overlap with the
information maintained in the VGDA and LVCB control blocks on disk. Yet, there is information
in the control blocks (such as the mapping of logical partitions) that is not kept in the ODM.
There is also information (such as device drivers and logical names) that is not kept in the
control blocks. Each metadata location plays a special role. There are mechanisms to ensure
that the information does not conflict.
© Copyright IBM Corp. 2009, 2015 Unit 7. LVM metadata and related problems 7-11
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
© Copyright IBM Corp. 2009, 2015 Unit 7. LVM metadata and related problems 7-13
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
moon
To export a volume group:
hdisk9
Notes:
The scenario
The exportvg and importvg commands can be used to fix ODM problems. These
commands also provide a way to transfer data between different AIX systems. This visual
provides an example of how to export a volume group.
The disk, hdisk9, is connected to the system moon. This disk belongs to the myvg volume
group. This volume group needs to be transferred to another system.
Uempty 2. When all logical volumes are closed, use the varyoffvg command to vary off the
volume group.
3. Finally, export the volume group with the exportvg command. After this point, the
complete volume group (including all file systems and logical volumes) is removed from
the ODM.
After exporting the volume group, the disks in the volume group can be transferred to
another system.
© Copyright IBM Corp. 2009, 2015 Unit 7. LVM metadata and related problems 7-15
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
myvg
Notes:
Uempty
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 7. LVM metadata and related problems 7-17
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Uempty
mars
hdisk3
myvg
hdisk2
datavg
importvg can also accept the PVID in place of the hdisk name
© Copyright IBM Corporation 2009, 2015
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 7. LVM metadata and related problems 7-19
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
# umount /home/michael
# mount -o log=/dev/loglv01 /dev/lv24 /home/michael
Notes:
Uempty
/home/michael_moon:
dev = /dev/lv24
vfs = jfs /dev/lv23: /home/peter
log = /dev/loglv01 /dev/lv24: /home/michael
mount = false
options = rw /dev/loglv01: log device
account = false
hdisk3 (myvg)
# mount /home/michael
# mount /home/michael_moon
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 7. LVM metadata and related problems 7-21
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
- account specifies whether the accounting system processes the file system. A value of
false indicates no accounting.
Before mounting the file system /home/michael_moon, the corresponding mount point must
be created.
© Copyright IBM Corp. 2009, 2015 Unit 7. LVM metadata and related problems 7-23
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
Introduction
The table in the visual shows the contents of the VGDA. The individual items that are listed are
discussed in the paragraphs that follow.
Time stamps
The time stamps are used to check whether a VGDA is valid. If the system crashes while
changing the VGDA, the time stamps differ. The VGDA is marked invalid the next time the
volume group is varied on. The most current intact VGDA is used to overwrite the other VGDAs
in the volume group.
© Copyright IBM Corp. 2009, 2015 Unit 7. LVM metadata and related problems 7-25
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
VGDA example
IBM Power Systems
# lqueryvg -p hdisk1 -At
Max LVs: 256
PP Size: 20 1: ____________
5: ____________
Logical:
00c35ba000004c00000001157fcf6bdf.1 lv00 1
00c35ba000004c00000001157fcf6bdf.2 lv01 1
00c35ba000004c00000001157fcf6bdf.3 lv02 1
Physical: 00c35ba07fcf6b93 2 0
6: ____________ 7: ____________
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 7. LVM metadata and related problems 7-27
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Uempty PV RESTRICTION 0
Infinite Retry: 2
Varyon State: 0
Disk Block Size 512
© Copyright IBM Corp. 2009, 2015 Unit 7. LVM metadata and related problems 7-29
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
The logical volume control block (LVCB) and the getlvcb command
The LVCB stores attributes of a logical volume. The getlvcb command queries an LVCB.
In the example, the logical volume hd2 has the following characteristics:
- intrapolicy, which specifies what strategy should be used for choosing physical
partitions on a physical volume. The five general strategies are edge (sometimes called
outer-edge), inner-edge, middle (sometimes called outer-middle), inner-middle, and center
(c = Center).
- copies (1 = No mirroring)
- interpolicy, which specifies the number of physical volumes to extend across (m =
Minimum).
- lvid
- lvname - Logical volume name (hd2)
- number lps - Number of logical partitions (102)
- Can the partitions be reorganized? (relocatable = y)
- Each mirror copy on a separate disk (strict = y)
- Number of disks that are involved in striping (stripe width)
- Stripe size (stripe size in exponent)
- Logical volume type (type = jfs)
- JFS file system information (fs=)
- Creation and last update time (time created, time modified)
© Copyright IBM Corp. 2009, 2015 Unit 7. LVM metadata and related problems 7-31
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
importvg
ODM
VGDA and
LVCB
Change, using Match IDs by /etc/filesystems
low-level name
commands
mkvg
extendvg
mklv Update
crfs
chfs exportvg
rmlv
reducevg
...
© Copyright IBM Corporation 2009, 2015
Figure 7-16. How LVM interacts with the ODM and the VGDA AN153.0
Notes:
High-level commands
Most of the LVM commands that are used when working with volume groups, physical volumes,
or logical volumes are high-level commands. These high-level commands (like mkvg,
extendvg, mklv, and others that are listed on the visual) are implemented as executable code
or shell scripts and use names to reference a certain LVM object. The ODM is consulted to
match a name, for example, rootvg or hdisk0, to an identifier.
© Copyright IBM Corp. 2009, 2015 Unit 7. LVM metadata and related problems 7-33
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
CuDv:
name = "hdisk0"
status = 1
chgstatus = 2
ddins = "scsidisk"
location = ""
parent = "vscsi0"
connwhere = "810000000000"
PdDvLn = "disk/vscsi/vdisk"
CuDv:
name = "hdisk2"
status = 1
chgstatus = 0
ddins = "scdisk"
location = "01-08-01-8,0"
parent = "scsi1"
connwhere = "8,0"
PdDvLn = "disk/scsi/scsd"
Notes:
CuDv:
name = "hdisk0"
status = 1
chgstatus = 2
ddins = "scsidisk"
location = ""
parent = "vscsi0"
connwhere = "810000000000"
PdDvLn = "disk/vscsi/vdisk"
CuDv:
name = "hdisk2"
status = 1
chgstatus = 0
ddins = "scdisk"
location = "01-08-01-8,0"
parent = "scsi1"
connwhere = "8,0"
PdDvLn = "disk/scsi/scsd"
Key attributes
Remember the most important attributes:
- status = 1 means that the disk is available
- chgstatus = 2 means that the status did not change since last reboot
- location specifies the location code of the device
- parent specifies the parent device
© Copyright IBM Corp. 2009, 2015 Unit 7. LVM metadata and related problems 7-35
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
CuAt:
name = "hdisk1"
attribute = "unique_id"
value = "3321360050768019102C0F000000000006E2A04214503IBMfcp"
type = "R"
generic = "D"
rep = "nl"
nls_index = 79
CuAt:
name = "hdisk1"
attribute = "pvid"
value = "00f606036452e56a0000000000000000"
type = "R"
generic = "D"
rep = "s"
nls_index = 2
Notes:
CuAt:
name = "hdisk1"
attribute = "unique_id"
value = "3321360050768019102C0F000000000006E2A04214503IBMfcp"
type = "R"
generic = "D"
rep = "nl"
nls_index = 79
CuAt:
name = "hdisk1"
attribute = "pvid"
value = "00f606036452e56a0000000000000000"
type = "R"
generic = "D"
rep = "s"
nls_index = 2
© Copyright IBM Corp. 2009, 2015 Unit 7. LVM metadata and related problems 7-37
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
CuDv:
name = "hdisk1"
status = 1
chgstatus = 2
ddins = "scsidisk"
location = "31-T1-01"
parent = "fscsi0"
connwhere = "W_0"
PdDvLn = "disk/fcp/mpioosdisk"
# lscfg -l hdisk1
hdisk1 U8233.E8B.100603P-V16-C31-T1-W500507680140581E-L1000000000000
MPIO IBM 2145 FC Disk
Notes:
For Fibre Channel accessed LUNs, the location field would identify the parent FC adapter; the
connwhere would have a place holder value of W_0, which indicates that the disk identify is
stored in the ww_name attribute of the disk.
The physical location code consists of the location code of the parent adapter, followed by the
ww_name and the LUN ID (obtained from the lun_id attribute of the disk).
CuDv:
name = "hdisk1"
status = 1
chgstatus = 2
ddins = "scsidisk"
location = "31-T1-01"
parent = "fscsi0"
connwhere = "W_0"
PdDvLn = "disk/fcp/mpioosdisk“
# lscfg -l hdisk1
hdisk1 U8233.E8B.100603P-V16-C31-T1-W500507680140581E-L1000000000000
MPIO IBM 2145 FC Disk
© Copyright IBM Corp. 2009, 2015 Unit 7. LVM metadata and related problems 7-39
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
CuDvDr:
resource = "devno"
value1 = "36"
value2 = "0"
value3 = "hdisk3"
# ls -l /dev/hdisk[03]
brw------- 1 root system 17, 0 Jul 16 15:13 /dev/hdisk0
brw------- 1 root system 36, 0 Jul 16 04:21 /dev/hdisk3
Notes:
Special files
Applications or system programs use the special files to access a certain device. For example,
the visual shows special files that are used to access hdisk0 (/dev/hdisk0) and hdisk1
(/dev/hdisk1).
Uempty
Notes:
VGID
One of the most important pieces of information about a volume group is the VGID. As shown
on the visual, this information is stored in CuAt.
© Copyright IBM Corp. 2009, 2015 Unit 7. LVM metadata and related problems 7-41
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
CuAt:
name = "rootvg"
attribute = "timestamp"
value = "470a1bc9243ed693"
type = "R"
generic = "DU"
rep = "s"
nls_index = 0
CuAt:
name = "rootvg"
attribute = "pv"
value = "00c35ba07b2e24f00000000000000000"
type = "R"
generic = ""
rep = "sl"
nls_index = 0
Notes:
Length of PVID
Remember that the PVID is a 32-number field, where the last 16 numbers are set to zeros.
Uempty
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 7. LVM metadata and related problems 7-43
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
# ls -l /dev/hd2
brw-rw---- 1 root system 10, 5 08 Jul 16 23:21 /dev/hd2
Notes:
CuDvDr logical volume objects
Each logical volume has an object in CuDvDr that is used to create the special file entry for that
logical volume in /dev. As an example, the sample output on the visual shows the CuDvDr
object for hd2 and the corresponding /dev/hd2 (major number 10, minor number 5) special file
entry in the /dev directory.
© Copyright IBM Corp. 2009, 2015 Unit 7. LVM metadata and related problems 7-45
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
2.
Notes:
Causes of problems
The signal handlers that are used by high-level LVM commands do not work with a kill -9, a
system shutdown, or a system crash. You might end up in a situation where the VGDA is, but
the change was not stored in the ODM.
Problems might also occur because of the improper use of low-level commands or hardware
changes.
Uempty Another common problem is ODM corruption when doing LVM operations when the root file
system (which contains /etc/objrepos) is full. Always check the root file system free space
before attempting LVM recovery operations.
© Copyright IBM Corp. 2009, 2015 Unit 7. LVM metadata and related problems 7-47
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
# varyoffvg homevg
Notes:
Uempty You need to specify only one intact physical volume of the volume group that you import.
The importvg command reads the VGDA and LVCB on that disk and creates new ODM
objects.
This procedure does not allow the data to be used while repairing the corruption, even if the file
systems are mounted and are accessible despite the problem. The logical volumes must be
closed to vary the volume group offline.
© Copyright IBM Corp. 2009, 2015 Unit 7. LVM metadata and related problems 7-49
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
If the ODM problem is in the rootvg, try using the rvgrecover procedure:
PV=hdisk0
VG=rootvg
cp /etc/objrepos/CuAt /etc/objrepos/CuAt.$$
cp /etc/objrepos/CuDep /etc/objrepos/CuDep.$$
cp /etc/objrepos/CuDv /etc/objrepos/CuDv.$$
cp /etc/objrepos/CuDvDr /etc/objrepos/CuDvDr.$$
Notes:
Problems in rootvg
For ODM problems in rootvg, finding a solution is more difficult because rootvg cannot be varied
off or exported. However, it might be possible to fix the problem by using one of the techniques
that are described next.
© Copyright IBM Corp. 2009, 2015 Unit 7. LVM metadata and related problems 7-51
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
• synclvodm <vgname>
– Synchronizes the VGDA, LVCB, ODM, and special device files
– Volume group must be active
– First run the redefinevg command if ODM does not have the
minimum required information about the volume group
Notes:
Overview
There are situations where you are unable to run the exportvg or importvg commands
because they depend on finding a minimal level of information in the ODM. Even if these
high-level LVM commands can be run, they require that the volume group is offline, which would
be disruptive. In these situations, it is useful to know some intermediate level LVM commands.
These commands are primarily intended to be used by high-level ODM commands, but they
can be useful in solving tough problems.
Uempty resynchronization to occur. If logical volume names are specified, only the information that is
related to those logical volumes is updated.
The synclvodm command, by itself, can do a fairly complete job of resynchronizing the ODM
with the LVM data areas on the disk. It will also synchronize the information between the LVM
data areas. As such, it can worsen a situation where only one disk in the volume group contains
corrupted data areas. The command can be restricted to synchronizing only specific logical
volumes. Otherwise, it synchronizes all logical volumes. The synclvodm command depends
upon a minimal amount of information in the ODM; most importantly, the ODM needs to know
the volume group name plus the physical volume and logical volume memberships.
© Copyright IBM Corp. 2009, 2015 Unit 7. LVM metadata and related problems 7-53
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Checkpoint
IBM Power Systems
Notes:
Uempty
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 7. LVM metadata and related problems 7-55
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Unit summary
IBM Power Systems
Notes:
The LVM information is held in a number of different places on the disk, including the ODM and
the VGDA.
ODM-related problems might be solved by:
• exportvg and importvg (non-rootvg volume groups)
• rvgrecover (rootvg)
• LVM intermediate commands
• Manually fixing by using ODM commands.
Uempty
Unit 8. Disk management procedures
References
Online AIX Version 7.1 Command Reference volumes 1-6
Online AIX Version 7.1 Operating system and device management
GG24-4484 AIX Storage Management (Redbooks)
SG24-5432 AIX Logical Volume Manager from A to Z: Introduction and
Concepts (Redbooks)
SG24-5433 AIX Logical Volume Manager from A to Z: Troubleshooting
and Commands (Redbooks)
Note: References listed as online are available through the IBM Knowledge
Center at the following address: http://ibm.com/support/knowledgecenter.
© Copyright IBM Corp. 2009, 2015 Unit 8. Disk management procedures 8-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Unit objectives
IBM Power Systems
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 8. Disk management procedures 8-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Mirroring
IBM Power Systems
hdisk1
Notes:
Role of VGSA
The information about the mirrored partitions is stored in the VGSA, which is contained on each
disk. In the example that is shown on the visual, logical partition 5 points to physical partition 5
on hdisk0, physical partition 8 on hdisk1, and physical partition 9 on hdisk2.
Uempty
Stale partitions
IBM Power Systems
hdisk0
Mirrored
hdisk1 logical
volume
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 8. Disk management procedures 8-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Uempty
Mirroring rootvg
IBM Power Systems
hd1 hd1
hdisk0 hdisk1
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 8. Disk management procedures 8-7
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Check that the disk is bootable, since it holds a boot logical volume.
# bootinfo -B hdisk1
Any returned value other than a value of 1, indicates that the disk is not bootable.
2. If the disk is not part of the rootvg, add the new disk to the volume group (for example,
hdisk1):
# extendvg [ -f ] rootvg hdisk1
3. Use the mirrorvg command to mirror all of the logical volumes in the rootvg to the new
disk. The mirrorvg command, by default, disables quorum and mirrors the existing
logical volumes in the specified volume group. Changes to the volume group quorum
attribute are effective immediately without having to vary off and then vary on the volume
group. By default, it will also synchronize the copies; though, you might suppress
synchronization by using the -s flag. You should use the exact mapping option (-m) to
ensure that the mirror copy of the boot logical volume (hd5) is allocated contiguous
physical partitions. To mirror rootvg, use the command:
# mirrorvg -m rootvg hdisk1
Restrictions:
- You cannot use the mirrorvg command on a snapshot volume group
- You cannot use the mirrorvg command on a volume group that has an active
firmware assisted dump logical volume
- You cannot use the mirrorvg command if ALL of the following conditions exist:
• The target system is a logical partition (LPAR).
• A copy of the boot logical volume (by default, hd5) is on the failed physical
volume.
• The replacement physical volume's adapter was dynamically configured into the
LPAR since the last cold start.
An alternative to running mirrorvg is to separately run the component tasks:
- If you use one mirror disk, be sure that a quorum is not required for vary on:
# chvg -Qn rootvg
- Add the mirrors for all rootvg logical volumes:
# mklvcopy hd1 2 hdisk1
# mklvcopy hd2 2 hdisk1
# mklvcopy hd3 2 hdisk1
# mklvcopy hd4 2 hdisk1
# mklvcopy hd5 2 hdisk1
# mklvcopy hd6 2 hdisk1
# mklvcopy hd8 2 hdisk1
# mklvcopy hd9var 2 hdisk1
# mklvcopy hd10opt 2 hdisk1
# mklvcopy hd11admin 2 hdisk1
Uempty - If you have other logical volumes in your rootvg, be sure to create copies for them as
well.
- Now, synchronize the new copies that you created:
# syncvg -v rootvg
4. To be able to boot from the different disks, run bosboot:
# bosboot -a
As hd5 is mirrored, there is no need to do it for each disk.
5. Update the bootlist. In a disk failure, you must be able to boot from different disks.
# bootlist -m normal hdisk1 hdisk0
# bootlist -m service hdisk1 hdisk0
6. Check that the system boots from the first boot disk.
# bootinfo -b
© Copyright IBM Corp. 2009, 2015 Unit 8. Disk management procedures 8-9
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
VGDA count
IBM Power Systems
Notes:
Uempty
datavg
hdisk1 hdisk2
Notes:
Introduction
What happens if quorum checking is enabled for a volume group and a quorum is not available?
Consider the following example (illustrated on the visual and discussed in the following
paragraphs): In a two-disk volume group datavg, the disk hdisk1 is not available due to a
hardware defect. hdisk1 is the disk that contains the two VGDAs; that means the volume group
does not have a quorum of VGDAs.
© Copyright IBM Corp. 2009, 2015 Unit 8. Disk management procedures 8-11
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Uempty
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 8. Disk management procedures 8-13
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Uempty
datavg
hdisk1 hdisk2
# varyonvg -f datavg
Failure accessing hdisk1.
Set PV STATE to removed.
Volume group datavg is varied on.
© Copyright IBM Corporation 2009, 2015
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 8. Disk management procedures 8-15
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Quorum checking on
With quorum checking on, you always need > 50% of the VGDAs available (except to vary on
rootvg).
Uempty
missing missing
varyonvg -f VGName
Hardware
repair
removed
Hardware repair
followed by:
varyonvg VGName
chpv -v a hdiskX
removed
© Copyright IBM Corporation 2009, 2015
Notes:
Introduction
This page introduces physical volume states (not device states). Physical volume states can be
displayed with lsvg -p VGName.
Active state
If a disk can be accessed when a volume group is varied on with the command, varyonvg, it
gets a physical volume state of active.
Missing state
If a disk cannot be accessed during a varyonvg, but quorum is available, the failing disk gets a
physical volume state missing. If the disk can be repaired, for example, after a power failure,
you must run a varyonvg VGName to bring the disk into the active state again. Any stale
partitions are synchronized.
© Copyright IBM Corp. 2009, 2015 Unit 8. Disk management procedures 8-17
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Removed state
If a disk cannot be accessed during a varyonvg and the quorum of disks is not available, you
can run the command, varyonvg -f VGName, and force the volume group online.
The failing disk gets a physical volume state of removed, and it is not used for quorum checks
any longer.
© Copyright IBM Corp. 2009, 2015 Unit 8. Disk management procedures 8-19
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Yes
Disk mirrored? Procedure 1
No
Yes
Disk still working? Procedure 2
No
Volume group No
Procedure 3
lost?
Yes
Procedure 4 Procedure 5
Notes:
Flowchart
Before starting the disk replacement, always follow the flowchart that is shown in the visual.
This flowchart helps you whenever you must replace a disk.
1. If the disk that must be replaced is mirrored onto another disk, follow procedure 1
2. If a disk is not mirrored, but still works, follow procedure 2
3. If you are sure that a disk failed and you are not able to repair the disk:
Uempty - If the volume group can be varied on (normal or forced), use procedure 3
- If the volume group is lost after the disk failure, that means the volume group might
not be varied on (either normal or forced)
• If the volume group is rootvg, follow procedure 4
• If the volume group is not rootvg, follow procedure 5
© Copyright IBM Corp. 2009, 2015 Unit 8. Disk management procedures 8-21
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
Disk state
This procedure requires that the disk state of the failed disk is either missing or removed. Use
the command, lspv hdiskX, to check the state of your physical volume. If the disk is still in
the active state, you cannot remove any copies or logical volumes from the failing disk. In this
case, one way to bring the disk into a removed or missing state is to run the reducevg -d
command or to do a varyoffvg and a varyonvg on the volume group by rebooting the
system.
© Copyright IBM Corp. 2009, 2015 Unit 8. Disk management procedures 8-23
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
3. Run replacepv:
# replacepv hdiskX hdiskY
Notes:
The replacepv command greatly simplifies the procedure.
1. Provide a replacement disk. It can be an unused disk, already known to AIX. Otherwise,
you need to provide a new disk. There are many ways to provide a disk that is new to
AIX:
- Directly allocate a PCI storage adapter to the LPAR. If the adapter does not have an
available PCI, it needs to be provided through a hot add (if a local disk) or by zoning
a LUN (if it is a Fibre Channel adapter).
- Use PowerVM to provision a virtual SCSI disk.
2. Discover the new disk by running the cfgmgr command.
3. Run the replacepv to allocate physical partitions on the replacement disk for the
problem disk. Effectively the new disk replaces the failing disk in the mirroring
configuration. In the example, hdiskX is the failing disk.
4. Remove the failing disk.
Uempty
Procedure 1 (3 of 4): Disk mirrored without
replacepv
IBM Power Systems
Notes:
The goal of each disk replacement is to remove all logical volumes from a disk.
1. Remove all logical volume copies from the disk. Use either the SMIT fastpath smit
unmirrorvg or the unmirrorvg command as shown in the visual. These commands
unmirror each logical volume that is mirrored on the disk.
If you have more unmirrored logical volumes on the disk, you must either move them to
another disk (migratepv), or remove them if the disk cannot be accessed (rmlv).
2. If the disk is empty, remove the disk from the volume group. Use SMIT fastpath smit
reducevg or the reducevg command.
3. After the disk is removed from the volume group, you can remove it from the ODM. Use
the rmdev command as shown in the visual.
4. Use a hot-swap procedure to replace the failed or failing disk. (In older machines, disk
replacement would effectively require the system to be shut down for the procedure).
Run cfgmgr to discover and configure the new disk.
5. Add the new disk to the volume group. Use either the SMIT fastpath
smit extendvg or the extendvg command.
© Copyright IBM Corp. 2009, 2015 Unit 8. Disk management procedures 8-25
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
6. Finally, create new copies for each logical volume on the new disk. Use either the SMIT
fastpath smit mirrorvg or the mirrorvg command. If synchronization was
suppressed during mirroring, then remember to eventually synchronize the volume
group (or each logical volume), with the syncvg command.
Uempty
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 8. Disk management procedures 8-27
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
Uempty 3. Before running the next step, it is necessary to distinguish between the rootvg and a
non-rootvg volume group.
- If the disk that is replaced is in rootvg, execute the steps that are shown on the next
visual Procedure 2 (2 of 2): Special Steps for rootvg.
- If the disk that is replaced is not in rootvg, use the migratepv command:
# migratepv hdisk_old hdisk_new
This command moves all logical volumes from one disk to another. You can do the
migratepv during normal system activity. The command migratepv requires that
the disks are in the same volume group.
4. If the old disk was migrated, remove it from the volume group. Use either the SMIT
fastpath smit reducevg or the reducevg command.
5. If you need to remove the disk from the system, remove it from the ODM with the rmdev
command as shown. Finally, remove the physical disk from the system.
© Copyright IBM Corp. 2009, 2015 Unit 8. Disk management procedures 8-29
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
rootvg 1…
hdiskX 2…
hdiskY
3.Disk contains hd5?
# migratepv -l hd5 hdiskX hdiskY
# bosboot -ad /dev/hdiskY
1. Connect new disk to system
# chpv -c hdiskX
# bootlist -m normal hdiskY
2. Add new disk to volume
group Migrate old disk to new disk:
# migratepv hdiskX hdiskY
3.
Notes:
Uempty from the old disk, clear the old boot record by using the chpv -c command. Then,
change your bootlist:
# migratepv -l hd5 hdiskX hdiskY
# bosboot -ad /dev/hdiskY
# chpv -c hdiskX
# bootlist -m normal hdiskY
If the disk contains the primary dump device, you must deactivate the dump before
migrating the corresponding logical volume:
# sysdumpdev -p /dev/sysdumpnull
- Migrate the complete old disk to the new one:
# migratepv hdiskX hdiskY
If the primary dump device is not active, you must activate it:
# sysdumpdev -p /dev/hdX
4. After the disk is migrated, remove it from the rootvg volume group.
# reducevg rootvg hdiskX
5. If the disk must be removed from the system, remove it from the ODM (use the rmdev
command), shut down your AIX, and remove the disk from the system afterward.
# rmdev -l hdiskX -d
© Copyright IBM Corp. 2009, 2015 Unit 8. Disk management procedures 8-31
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 8. Disk management procedures 8-33
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
rootvg
3. Restore from a mksysb image
hdiskX hdiskY
4. Import each volume group into the new
ODM (importvg) if needed
Contains OS
datavg
logical
volumes
hdiskZ
mksysb
Notes:
Procedure steps
Follow these steps:
1. Replace the bad disk
2. Boot your system in maintenance mode
3. Restore your system from a mksysb
Uempty If any rootvg file systems were not mounted when the mksysb was made, those file
systems are not included on the backup image. You need to create and restore those file
systems as a separate step.
4. Import any user volume groups after restoring the mksysb. For example:
# importvg -y datavg hdisk9
Only one disk from the volume group (in the example hdisk9), needs to be selected.
Export and import of volume groups is discussed in more detail in the next topic.
© Copyright IBM Corp. 2009, 2015 Unit 8. Disk management procedures 8-35
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
datavg
1. Export the volume group from the system:
# exportvg vg_name
Notes:
Procedure steps
Follow these steps:
1. To fix this problem, export the volume group from the system. Use the command
exportvg as shown. During the export of the volume group, all ODM objects that are
related to the volume group is deleted.
2. Check your /etc/filesystems. There should be no references to logical volumes or file
systems from the exported volume group.
Uempty 3. Remove the bad disk from the ODM (use rmdev as shown). Shut down your system and
remove the physical disk from the system.
4. Connect the new drive and boot the system. The cfgmgr configures the new disk.
5. If you have a volume group backup available (created by the savevg command), you
can restore the complete volume group with the restvg command (or the SMIT
fastpath smit restvg). All logical volumes and file systems are recovered.
If you have more than one disk that should be used during restvg, you must specify
these disks:
# restvg -f /dev/rmt0 hdiskY hdiskZ
The savevg and restvg commands will be discussed in a future unit.
6. If you have no volume group backup available, you must re-create everything that was
part of the volume group.
Re-create:
- The volume group with mkvg or smit mkvg
- All logical volumes with mklv or smit mklv
- All file systems with crfs or smit crfs
7. Finally, restore the lost data from backups, for example with the restore command or
any other tool you use to restore data in your environment.
© Copyright IBM Corp. 2009, 2015 Unit 8. Disk management procedures 8-37
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
# lsvg -p datavg
unable to find device id ...734...
ODM failure in device configuration database
ODM problem in No
rootvg? Export and import
volume group
Yes
rvgrecover
Notes:
ODM failure
After an incorrect disk replacement, you might detect ODM failures. For example, when running
the command lsvg -p datavg, a typical error message might be:
unable to find device id 00837734 in device configuration database
In this case, a device might not be found in the ODM.
© Copyright IBM Corp. 2009, 2015 Unit 8. Disk management procedures 8-39
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
The problem
A frequent error occurs when the administrator removes a disk from the ODM (by running
rmdev). Then, physically removes the disk from the system, without first running the reducevg
command to remove volume group references to that disk (in the VGDA and in the ODM).
The VGDA stores information about all physical volumes of the volume group. ODM disk
references include the physical volume attributes for the volume group.
Throughout this course, the physical volume ID (PVID) is abbreviated in the visuals for
simplicity. The physical volume ID is really 32 characters.
The result of this mistake is that the volume group cannot be varied on. If you try to use
reducevg after the fact, it fails, since the command requires that the volume group is active.
Uempty
Notes:
The fix
Before fixing the problem, be sure that you have the PVID for the removed disk.
The problem can be fixed by running the reducevg command, but the volume group needs to
be active. The varyonvg command does not work if volume group has a PVID value that
cannot be resolved to a disk.
You might use the odmdelete command to remove the bad PVID attribute object, but this
action is not as simple as it sounds and a mistake might make matter worse. An easier way to
clean up the bad ODM reference is to export the volume group and then import the volume
group by using the VGDA on the remaining disk.
After the volume group is active, you can then use the reducevg command to properly remove
the bad PVID reference from the VGDA. Instead of specifying the disk name, the PVID of the
removed disk is specified. If you did not earlier record the PVID, then you need to obtain it from
the VGDA itself.
© Copyright IBM Corp. 2009, 2015 Unit 8. Disk management procedures 8-41
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
To obtain the PVID of the removed disk from the VGDA, use the command:
# lqueryvg -p hdisk4 -At (Use any disk from the volume group.)
You need to compare this output with the lsvg -p datavg output to identify which PVID is for
the missing disk.
Uempty
Checkpoint
IBM Power Systems
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 8. Disk management procedures 8-43
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
Uempty
Unit summary
IBM Power Systems
Notes:
Different procedures are available that can be used to fix disk problems under any
circumstance:
Procedure 1: Mirrored disk
Procedure 2: Disk still working (rootvg specials)
Procedure 3: Total disk failure
Procedure 4: Total rootvg failure
Procedure 5: Total non-rootvg failure
The exportvg and importvg commands can be used to easily transfer volume groups
between systems.
© Copyright IBM Corp. 2009, 2015 Unit 8. Disk management procedures 8-45
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Uempty
Unit 9. Install and cloning techniques
Reference
Online AIX Version 7.1 Command Reference volumes 1-6
Online AIX Version 7.1 Operating system and device management
Online AIX Version 7.1 Installation and migration
SC24-7910 AIX Version 7.1 Differences Guide (Redbooks)
SC23-6742 AIX Version 7.1 Understanding the Diagnostic Subsystem
for AIX
http://www.ibm.com/developerworks/aix/library/au-alt_disk_
copy
Note: References listed as online are available through the IBM Knowledge
Center at the following address: http://ibm.com/support/knowledgecenter.
© Copyright IBM Corp. 2009, 2015 Unit 9. Install and cloning techniques 9-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Unit objectives
IBM Power Systems
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 9. Install and cloning techniques 9-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Topic 1 objectives
IBM Power Systems
Notes:
Uempty
# smit alt_install
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 9. Install and cloning techniques 9-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Filesets
An alternate disk installation uses the following filesets:
- bos.alt_disk_install.boot_images must be installed for alternate disk mksysb
installations
- bos.alt_disk_install.rte must be installed for rootvg cloning and alternate disk
mksysb installations
Uempty
hdisk0
• rootvg (AIX 6.1)
hdisk1
AIX 7.1
mksysb
Notes:
Introduction
An alternate mksysb installation involves installing a mksysb image that was created from
another system onto an alternate disk of the target system.
Example
In the example, an AIX 7.1 mksysb tape image is installed on an alternate disk, hdisk1 by
running the following command:
# alt_disk_mksysb -m /dev/rmt0 -d hdisk1
The system now contains two rootvgs on different disks. In the example, one rootvg has an AIX
6.1 (hdisk0), one has an AIX 7.1 (hdisk1).
© Copyright IBM Corp. 2009, 2015 Unit 9. Install and cloning techniques 9-7
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
alt_disk_mksysb options
The alt_disk_mksysb command has the following options:
-m device
-d target-disks
-B (Do not change the bootlist).
-i image.data
-s script
-R resolve.conf
-p platform
-L mksysb_level
-n (Remain a NIM client.)
-P phase
-c console
-r (Reboot after installation).
-k (Keep mksysb device customization).
-y (Import non-rootvg volume groups).
Uempty
# smit alt_mksysb
[Entry Fields]
* Target Disk(s) to install [hdisk1] +
* Device or image name [/dev/rmt0] +
Phase to execute all +
image.data file [] /
Customization script [] /
Set bootlist to boot from this disk
on next reboot? yes +
Reboot when complete? no +
Verbose output? no +
Debug output? no +
resolv.conf file [] /
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 9. Install and cloning techniques 9-9
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
hdisk0
• rootvg (AIX 7.1 TL01)
Clone
hdisk1
AIX AIX 7.1 TL03 • rootvg (AIX 7.1 TL03)
Notes:
Example
In the example, alt_disk_copy -b update_all -l /dev/cd0 -d hdisk1, rootvg that
is on hdisk0, is cloned to the alternate disk hdisk1. Additionally, a new technology level is
applied to the cloned version of AIX.
Uempty
# smit alt_clone
Clone the rootvg to an Alternate Disk
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
[Entry Fields]
* Target Disk(s) to install [hdisk1] +
Phase to execute all +
image.data file [] /
Exclude list [] /
Bundle to install [update_all] +
-OR-
Fileset(s) to install []
Fix bundle to install []
-OR-
Fixes to install []
Directory or Device with images [/dev/cd0]
(required if filesets, bundles or fixes used)
installp Flags
COMMIT software updates? yes +
SAVE replaced files? no +
AUTOMATICALLY install requisite software? yes +
EXTEND file systems if space needed? yes +
OVERWRITE same or newer versions? no +
VERIFY install and check file sizes? no +
ACCEPT new license agreements? no +
Customization script [] /
Set bootlist to boot from this disk
on next reboot? yes +
Reboot when complete? no +
Verbose output? no +
Debug output? no +
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 9. Install and cloning techniques 9-11
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Original hdisk0
• rootvg (AIX 7.1 TL01)
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 9. Install and cloning techniques 9-13
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Clone
NIM server NIM client:
lpar1
hdisk1
• rootvg
AIX AIX 7.1
• (AIX 7.1)
Notes:
What is nimadm?
The nimadm command (Network Install Manager Alternate Disk Migration) creates a copy of
rootvg to a free disk (or disks) and simultaneously migrates it to a new version or release level
of AIX. The nimadm command uses NIM resources to perform this function.
Advantages of nimadm
There are several advantages to using the nimadm command over a conventional migration:
- Reduced downtime. The migration is done while the system is up and functioning normally.
There is no requirement to boot from installation media, and most of processing occurs on
the NIM master.
- The nimadm command facilitates quick recovery in the event of migration failure. Since the
nimadm command uses alt_disk_install to create a copy of rootvg, all changes are
done to the copy (altinst_rootvg). In the event of serious migration installation failure, the
failed migration is cleaned up and there is no need for the administrator to take further
Uempty action. In the event of a problem with the new (migrated) level of AIX, the system can be
quickly returned to the pre-migration operating system by booting from the original disk.
- The nimadm command allows a high degree of flexibility and customization in the migration
process. This process is done with the use of optional NIM customization resources:
image_data, bosinst_data, exclude_files, pre-migration script,
installp_bundle, and post-migration script.
Details of using NIM to do an alternate disk migration are not covered in this course.
© Copyright IBM Corp. 2009, 2015 Unit 9. Install and cloning techniques 9-15
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 9. Install and cloning techniques 9-17
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Topic 2 objectives
IBM Power Systems
Notes:
Uempty
multibos overview
IBM Power Systems
Notes:
Overview
The main purpose of using multibos is to have the type of alternate BOS (base operating
system) capabilities that are available with the alternate disk technology, without having to use
another disk. The operating system filesets do not occupy enough space to justify allocating
another entire disk for that purpose. With multibos, you can have the two BOS versions on
the same disk.
This task is accomplished by creating copies of the effected (by an OS update) base operating
system logical volumes (active BOS) with a different file name path. These copies are in the
only rootvg.
Another advantage to multibos is that it does not need as much space as the cloning
operation, since it does not need to clone all the logical volumes in the rootvg.
After you create the alternate BOS, changes, such as applying maintenance, can be made to
these copies, without changing the AIX version in the active BOS. In addition to applying
maintenance, you can access and make configuration changes to the standby BOS through two
© Copyright IBM Corp. 2009, 2015 Unit 9. Install and cloning techniques 9-19
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
techniques: mounting the standby BOS and starting an interactive shell (chroot) for the
standby BOS.
When you would like to test the standby BOS, you reboot the standby copy of the boot logical
volume (BLV). If there is a problem with the changes that were made, configure the bootlist to
use the original BLV and a reboot returns you to the original version of the BOS.
Uempty
Active BOS
/
BLV jfslog (hd4)
(hd5) (hd8)
Standby BOS
home opt usr var tmp bos_inst (if mounted)
(hd1) (hd10opt) (hd2) (hd9var) (hd3) (bos_hd4)
BLV jfslog
(bos_hd5) (bos_hd8)
© Copyright IBM Corporation 2009, 2015
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 9. Install and cloning techniques 9-21
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
• multibos –s –X
• Special logical volumes and file systems are created for the
standby OS
– bos_<lvname>
– /bos_inst/<mount point>
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 9. Install and cloning techniques 9-23
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
Uempty
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 9. Install and cloning techniques 9-25
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
4) If you specify the update_all function, with the -a flag, it is done by using the
install_all_updates utility. If you specify the -p preview flag, then
install_all_updates does a preview operation. Note: It is possible to do one,
two, or all three of the installation options during a single customization operation.
5) The standby boot image is created and written to the standby BLV by using the AIX
bosboot command. You can block this step with the -N flag. You should use the -N
flag if you are an experienced administrator and have a good understanding of the
AIX boot process.
6) Upon exit, if standby BOS file systems were mounted in step 1, they are unmounted.
Uempty
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 9. Install and cloning techniques 9-27
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Alternate boot
The bootlist command supports multiple BLVs. As an example, to boot from disk hdisk0 and
BLV bos_hd5, you would enter the command:
# bootlist –m normal hdisk0 blv=bos_hd5
After the system is rebooted from the standby BOS, the standby BOS logical volumes are
mounted over the usual BOS mount points, such as /, /usr, and /var. The set of BOS
objects, such as the BLV, logical volumes, file systems that are currently booted are considered
the active BOS, regardless of logical volume names. The previously active BOS becomes the
standby BOS in the existing boot environment.
Some facilities are blocked from alternating the BLV. When they tried to set the bootlist to the
standby BLV, they would receive the following error:
0514-226 bootlist: Invalid attribute value for blv
This error is an indication that either the BLV is corrupted or the ODM entry for it is corrupted. A
suggested solution is to rebuild the standby BLV. This solution requires a special bosboot flag:
# bosboot -sd /dev/ipldevice -M standby -l bos_hd5
Uempty
Checkpoint (1 of 2)
IBM Power Systems
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 9. Install and cloning techniques 9-29
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Checkpoint (2 of 2)
IBM Power Systems
Notes:
Uempty
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 9. Install and cloning techniques 9-31
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Unit summary
IBM Power Systems
Notes:
Alternate disk installation techniques are available:
- Installing a mksysb onto an alternate disk
- Cloning the current rootvg onto an alternate disk
Alternate BOS can be created and maintenance that is applied.
Uempty
Unit 10. Advanced backup techniques
Reference
Online AIX Version 7.1 Command Reference volumes 1-6
Online AIX Version 7.1 Operating system and device management
Online AIX Version 7.1 Installation and migration
SG24-7910 IBM AIX Version 7.1 Differences Guide (Redbooks)
SG24-7559 IBM AIX Version 6.1 Differences Guide (Redbooks)
Note: References listed as online are available through the IBM Knowledge
Center at the following address: http://ibm.com/support/knowledgecenter.
© Copyright IBM Corp. 2009, 2015 Unit 10. Advanced backup techniques 10-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Unit objectives
IBM Power Systems
Notes:
Uempty
Transaction X0, Y0
Write X1 X1, Y0
backup
X1, Y0
Write Y1 X1, Y1
Notes:
Backing up data while a file system is active can lead to data consistency problems. The
backup utility is sequentially copying files while applications might still be updating those
contents. For a collection of related updates, the backup utility can copy one piece of data the
data after the update, but copy the other related data before it is updated. The result can be a
backup where two pieces of data are not consistent.
Some applications, especially database engines, record the progress of related updates in a
transaction log. During the application recovery process, the log identifies transactions where
not all related updates were confirmed. The recovery process then backs out the transaction,
backing out any updates that were recorded during the previous backup.
If an application does not have this type of recovery logic, then use of the inconsistent backup
can result in serious problems. In that situation, you need to have a way to ensure that the
backup has consistency.
© Copyright IBM Corp. 2009, 2015 Unit 10. Advanced backup techniques 10-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
Traditionally, the best way to ensure that the data is consistent is to stop the application and
unmount the file system, followed by running a backup by inode. This procedure ensures that
there are no updates during the backup and that all file system’s data is flushed to disk. If a
backup takes a long time, having the application down for a long period can be unacceptable.
Some applications can be quiesced. In this state, either new transactions are not accepted or
they are only processed in user space without writing the updates to the file system. Either way,
the backup of the mounted file system can proceed without any file system activity from the
quiesced application. Again, if the backup takes a long time, being quiesced for a long period
might still be unacceptable.
The solution is to use the quiesced state to quickly capture the state of the file system. On-going
updates do not affect the actual file system. A method for capturing the file system state might
run for a few seconds. Such a short time for being in a quiesced state is often acceptable.
© Copyright IBM Corp. 2009, 2015 Unit 10. Advanced backup techniques 10-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Topic 1 objectives
IBM Power Systems
Notes:
Uempty
jfslog
# lsvg -l newvg
newvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
loglv00 jfslog 1 3 3 open/syncd N/A
lv03 jfs 1 3 3 open/syncd /fs1
Notes:
Requirements
By splitting a mirror, you can back up the copy of the mirror that is not changing while the other
mirrors remain online.
To use this technique, it is best to have three copies of your data. You need to stop one of the
copies but the other one or two copies continue to provide redundancy for the online portion of
the logical volume.
You are also required to mirror the journal log for the file system.
The output from lsvg -l indicates that the logical volume and the log are both mirrored.
# lsvg -l newvg
newvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
loglv00 jfslog 1 3 3 open/syncd N/A
lv03 jfs 1 3 3 open/syncd /fs1
© Copyright IBM Corp. 2009, 2015 Unit 10. Advanced backup techniques 10-7
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
/backup
File system
/fs1
jfslog
Notes:
Uempty Example
# lsvg -l newvg
newvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT
POINT
loglv00 jfslog 1 3 3 open/syncd N/A
lv03 jfs 1 3 3 open/stale /fs1
lv03copy00 jfs 0 0 0 open/syncd /backup
The /fs1 file system still contains three physical partitions, but the mirror is now stale. The
stale copy is now accessible by the newly created read-only file system /backup. That file
system is on a newly created logical volume, lv03copy00. This logical volume is not
synchronized and is considered stale. Also, it does not indicate any logical partitions (since the
logical partitions really belong to lv03).
You can look at the content and interact with the /backup file system just like any other
read-only file system.
© Copyright IBM Corp. 2009, 2015 Unit 10. Advanced backup techniques 10-9
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Copy 1 Copy 2
Copy 3
syncvg
jfslog
# unmount /backup
# rmfs /backup
Notes:
Uempty
• All logical volumes must be mirrored on disks that contain only those
mirrors
• The split copy becomes a new volume group, called a snapshot volume
group, with its own VGname
• New logical volumes and mount points are created in the snapshot
volume group
Notes:
How it works
Snapshot support for a mirrored volume group is provided to split a mirrored copy of a fully
mirrored volume group into a snapshot volume group.
Ensure that there are no stale copies in the original volume group. The splitvg command
rejects a situation where the only remaining non-stale copy is in disk to be split unless you use
the force (-f) option.
When the volume group is split, the original volume group does not use the disks that are now
part of the snapshot volume group.
The splitvg command uses the recreatevg command to implement the split. This method
is a different technique from the JFS split mirror. It creates a new volume group with new file
system and logical volume names.
© Copyright IBM Corp. 2009, 2015 Unit 10. Advanced backup techniques 10-11
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
Both volume groups track changes in physical partitions within the volume group. When the
snapshot volume group is rejoined with the original volume group, the synchronization needs to
occur on only the subset of physical partitions that were touched during the split period. This
method is much faster and has less performance impact than resynchronizing all physical
partitions, as is needed with the JFS split copy function.
Physical partition changes in both volume groups are tracked. Writes to a physical partition in
the original volume group causes a corresponding physical partition in the snapshot volume
group to be marked stale. Writes to a physical partition in the snapshot volume group causes
that physical partition to be marked stale.
To rejoin the volume groups, use the joinvg command. The stale physical partitions are
included in the original mirroring and the stale copies are automatically resynchronized.
The user sees the same data in the rejoined volume group as was in the original volume group
before the volume group is rejoined. In other words, the third copy shows the data changes that
occurred in the original volume group during the period it was split off.
Uempty
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 10. Advanced backup techniques 10-13
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
The splitvg creates a point in time separate snapshot volume group. The splitvg
command fails if any of the disks to be split are not active within the original volume group.
This volume group can be used to do the backup or other operations. In the example, the
backup command backs up one of the renamed file systems by inode (unmounted). You can
also mount the file system and backup by name instead.
Later, the joinvg command is used to rejoin the snapshot volume to the original volume group.
In the event of a system crash or loss of quorum while running this command, the joinvg
command must be run to rejoin the disks back to the original volume group.
You must have root authority to run these commands.
© Copyright IBM Corp. 2009, 2015 Unit 10. Advanced backup techniques 10-15
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Topic 2 objectives
IBM Power Systems
Notes:
Uempty
JFS2 snapshot (1 of 2)
IBM Power Systems
Notes:
JFS2 snapshot
A point-in-time image for a JFS2 file system is called a snapshot. The file system that is the
source of this point-in-time image is referred to as the snapped file system or snappedFS.
The snapshot view of the data remains static and retains the same security permissions that the
original snappedFS had when the snapshot was made. Also, a JFS2 snapshot can be created
without unmounting the file system, or quiescing the file system (though it is advisable for some
application to briefly quiesce during the snapshot). A snapshot can be used to access files or
directories as they existed when the snapshot was taken.
The snapshot can then be used to create a backup of the file system at the point in time that the
snapshot was taken. The snapshot also provides the capability to access files or directories as
they were at the time of the snapshot.
© Copyright IBM Corp. 2009, 2015 Unit 10. Advanced backup techniques 10-17
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
JFS2 snapshot (2 of 2)
IBM Power Systems
Notes:
Uempty its own unique mount point. A file system can use either internal or external snapshots; it cannot
mix the different types.
© Copyright IBM Corp. 2009, 2015 Unit 10. Advanced backup techniques 10-19
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
snappedFS
inode1 inode2
snapshot
inode1 inode2
Notes:
Uempty
snappedFS
inode1 inode2
snapshot
inode1 inode2
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 10. Advanced backup techniques 10-21
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
# smit jfs2
...
List Snapshots for an Enhanced Journaled File System
Create Snapshot for an Enhanced Journaled File System
Mount Snapshot for an Enhanced Journaled File System
Remove Snapshot for an Enhanced Journaled File System
Unmount Snapshot for an Enhanced Journaled File System
Change Snapshot for an Enhanced Journaled File System
Rollback an Enhanced Journaled File System to a Snapshot
Notes:
The various JFS2 snapshot operations can be done from SMIT. The SMIT JFS2 menu includes
many items that relate to JFS2 snapshots.
An example with only the menu items for snapshot is shown in the visual.
Uempty
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 10. Advanced backup techniques 10-23
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Creating an external snapshot for a JFS2 file system that is not mounted
The mount option, -o snapto=/snapshotlv, can be used to create a snapshot for a JFS2 file
system that is not currently mounted:
# mount -o snapto=/snapshotLV snappedFS MountPoint
If the snapto value starts with a slash, then it is assumed to be a special device file for an
existing logical volume where the snapshot should be created. For example:
# mount -o snapto=/dev/mysnaplv /dev/fslv00 /home/myfs
This command mounts the file system that is contained on the /dev/fslv00 to the mount
point of /home/myfs and then proceeds to create a snapshot for the /home/myfs file system
in the logical volume /dev/mysnaplv.
Uempty
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 10. Advanced backup techniques 10-25
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Creating an internal snapshot for a JFS2 file system that is not mounted
The mount option, -o snapto=snapshotlv, can be used to create a snapshot for a JFS2 file
system that is not currently mounted:
# mount -o snapto=snapshotname snappedFS MountPoint
If the snapto value starts with a slash, then it is assumed to be a special device file for an
existing logical volume where the snapshot should be created. If the snapto value does not
start with a slash, then it is assumed to be the name of an internal snapshot to be created.
Uempty
Listing snapshots
IBM Power Systems
# snapshot -q /home/myfs
Notes:
The snapshot –q option can be used display the snapshots that are related to the specified
file system.
If the file system uses internal snapshots, then the report provides the snapshot names and
creation times. The * indicates the current snapshot.
# snapshot -q /home/myfs2
© Copyright IBM Corp. 2009, 2015 Unit 10. Advanced backup techniques 10-27
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
If the file system uses external snapshots, then the report provides, for each snapshot, the
logical volume special device file, the snapshot size, how much space is free in the snapshot,
and the creation time.
# snapshot -q /home/myfs
Uempty
Notes:
Rollback
The rollback command is an interface to revert a JFS2 file system to a point-in-time
snapshot. The snappedFS parameter must be unmounted before the rollback command is
run and remains inaccessible during the command. Any snapshots that are taken after the
specified snapshot (snapshotObject for external or snapshotName for internal) are removed.
The associated logical volumes are also removed for external snapshots.
© Copyright IBM Corp. 2009, 2015 Unit 10. Advanced backup techniques 10-29
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
As with any file copying, be careful about changing the nature of the file (for example,
ownership, permission, and sparseness). Using the backup and restore utilities to implement
a copy of files is often a safer technique.
Uempty
For example:
# backsnap -m /mntsnapshot -s size=16M –i –f /dev/rmt0 /home/myfs
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 10. Advanced backup techniques 10-31
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
/mntsnapshot. The remaining arguments are passed to the backup command. In this
example, the files and directories in the snapshot are backed up by name (-i) to /dev/rmt0.
Uempty
For example:
# backsnap –n mysnap -s size=16M -i -f/dev/rmt0 /home/myfs
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 10. Advanced backup techniques 10-33
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
/mntsnapshot. The remaining arguments are passed to the backup command. In this
example, the files and directories in the snapshot are backed up by name (-i) to /dev/rmt0.
Uempty
• External snapshot:
– The snapshot report identifies the size and amount of free space
– If the snapshot needs more space:
# snapshot –o size=+1 snapshotLV
• Internal snapshot:
– Shares logical volume with the snappedFS
# df –m snappedFS
– If snappedFS is out of space, try to free up space – possibly delete old
snapshots
# snapshot –d –n snapshot_name snappedFS
Notes:
It is useful to be able to identify situation where a snapshot is growing large. If a snapshot runs
out of space, then all snapshots are invalidated and become unusable. If dealing with an
internal snapshot, the snapshots can contribute to the entire file system running out of space.
To monitor an external snapshot, use the query option of the snapshot command. An
alternative would be to mount the snapshot and use the df command, but that is more
complicated.
If an external snapshot needs more room, you can dynamically increase the size of the
snapshot logical volume by using the size option of the snapshot command.
For an internal snapshot, there is no mechanism for identifying the space usage of the
snapshots. Instead, you monitor the size of the snappedFS.
When a file system is running out of space, one way to free space is to delete old snapshots.
Keeping many generations of snapshots can be useful, but it can also be expensive in terms of
space usage.
© Copyright IBM Corp. 2009, 2015 Unit 10. Advanced backup techniques 10-35
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
© Copyright IBM Corp. 2009, 2015 Unit 10. Advanced backup techniques 10-37
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Topic 3 objectives
IBM Power Systems
Notes:
Uempty
Notes:
Use of copy services that are provided by SAN-attached storage subsystems is fairly common
and sometimes referred to as SAN Copy. These copy services make a point-in-time exact copy
of the contents of a LUN as seen by the storage subsystem controller. Not only can they provide
a point in time copy of a LUN, but this activity does not depend on any host system resources.
However, potential problems can result from seeing only the data because it is in the storage
subsystem.
Normally, when an application writes data, it receives confirmation of the write when AIX caches
the data in memory. Later, various AIX mechanisms flush that data to disk storage. When a SAN
Copy is initiated, the transaction-related updates can either be in AIX kernel memory or in the
storage subsystem. The SAN Copy might have inconsistent data, even if the application was
quiesced before taking the snapshot.
To avoid this problem, you need to ensure that none of the related data updates are cached in
AIX memory at the time of the SAN Copy. Unmounting the file system is generally not an
acceptable solution given the disruption to the application.
© Copyright IBM Corp. 2009, 2015 Unit 10. Advanced backup techniques 10-39
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
AIX provides a JFS2 file system freeze capability. It stops processing new file system I/O
requests and then flushes out all memory cached file system data to the physical volume.
After the application is quiesced and the file system that is frozen, use of a SAN Copy captures
consistent data.
After the SAN Copy completes, you can then thaw the file system and resume application
processing.
This procedure is only needed when the application allows AIX to cache writes and to decide
when to flush the cached data. There are two situations where the freeze mode is not needed.
- The application processes the file by using Direct I/O (DIO). With DIO, writes are
synchronous and go directly to storage without any caching in kernel memory. Concurrent
I/O always uses DIO.
- The application calls the synchronous fsync() system call for its output files, forcing AIX
to flush all cached data for that file and returning to the application when that is completed.
Uempty The chfs freeze attribute requires a value that specifies a timeout period. If the file system is
not explicitly thawed (again using the chfs command) within that timeout period, the file system
is automatically thawed. This attribute is intended to avoid permanent file system freezes and
the timeout should be set a time period that is much longer than you would imagine being
required to process your SAN Copy.
The sync command is run immediately before the freeze request because for large amounts
of cached data, the sync command is much more efficient in finding and flushing that data than
the freeze function. Then, the freeze function needs to handle only data that was cached
immediately after the flush; which should be a small amount of data.
© Copyright IBM Corp. 2009, 2015 Unit 10. Advanced backup techniques 10-41
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Consistency groups
IBM Power Systems
Notes:
While the previously discussed techniques can ensure the consistency of a point-in-time copy of
a single LUN, when multiple LUNs are interrelated you are faced with new issues. Normally,
each LUN would be SAN Copied separately and each would be at a different point-in-time. But
since they are at different points-in-time, between them, they can have inconsistency of related
data.
When the storage subsystem defines LUNs as belonging to a common consistency group, the
entire consistency group is copied at the same point-in-time. This procedure ensures data
consistency.
Of special concern is the relationship between a file system and its journal log. If the file
systems are on different LUNS and you do not ensure consistency, then you essentially have
metadata corruption that can make that file system and log combination unusable.
You can have a problem if multiple file systems share a log and some of the file systems are not
included in the consistency group. What can happen is you have a situation where later access
of the log is incompatible with the state of those other file systems. Thus, for file systems that
are using SAN Copy, either each file system has its own external journal log or that they use
JFS2 in-line journal logs.
Uempty If the LUN is one of many physical volumes in an entire volume group that is being backed up,
all of the LUNs in the volume group should be included in the same consistency group.
© Copyright IBM Corp. 2009, 2015 Unit 10. Advanced backup techniques 10-43
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
SAN Copy creates exact duplicates of the physical volumes, rather than a backup image to be
restored. For an AIX system to access the disk, it needs to be discovered (zoned to that host
and detected, by way of cfgmgr, by that host) and then imported into the ODM.
If it is to act as the rootvg of that system, it must be designated as the boot device before
booting that host.
User volume groups can be accessed to either directly recover contents from the copy, or to
enable a backup utility to create a backup of the copied volume group. In either case, the PVID
on the disk (or disks) should be changed to avoid issues of duplicate PVIDs.
If accessing the entire volume group from a system that is different from the original system,
use the importvg command on any disk in the consistency group for the volume group. Then,
vary online, run a file system check, and mount the file systems of interest. To avoid possible
future PVID conflicts, you should consider changing the PVID on the disks after importvg is
completed. Changing the PVID can be accomplished by using the chdev command as follows:
# chdev -l hdisk# -a pv=clear
# chdev -l hdisk# -a pv=yes
Uempty When accessing from the same system (it is assumed that the original volume group still exists)
or accessing a subset of the physical volumes in the volume group, use the recreatevg
command. Then, do a file system check and a mount of the file systems that are of interest. The
recreatevg command has special abilities to selectively restore only the logical volumes that
are on the specified disks. The recreatevg command can automatically change the PVIDs.
© Copyright IBM Corp. 2009, 2015 Unit 10. Advanced backup techniques 10-45
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
The recreatevg command is specially designed to handle the import of volume group copies
to the same system from which they were copied.
One way in which it differs from just using importvg, is the creation of a new VGID and new
PVIDs. Another major difference is that you can specify prefixes to be used when creating new
file system names and logical volume names, which avoid conflicts with the original names.
As seen in the visual, the -L option is used to create a prefix to the file system name, which
becomes a common parent directory to all of the file system mount points. The -Y option is
used to create a prefix for the logical volume names.
It is important that you specify all disks that belong to the volume group, as arguments to the
command, when trying to access the entire volume group.
You can have the recreatevg command use only the specified disks and logical volumes that
are on those disks.
Uempty
Checkpoint
IBM Power Systems
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 10. Advanced backup techniques 10-47
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
Uempty
Unit summary
IBM Power Systems
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 10. Advanced backup techniques 10-49
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Uempty
Unit 11. Diagnostics
References
Online AIX Version 7.1 Understanding the Diagnostic Subsystem
for AIX
Note: References listed as online are available through the IBM Knowledge
Center at the following address: http://ibm.com/support/knowledgecenter.
Unit objectives
IBM Power Systems
Notes:
Uempty
Diagnostics
NIM Master
CD/DVD
bos.diag
Diagnostics
Notes:
Introduction
The lifetime of hardware is limited. Broken hardware leads to hardware errors in the error log, to
systems that will not boot, or to strange system behavior.
The diagnostic package helps you to analyze your system and discover hardware that is
broken. Additionally, the diagnostic package provides information to service representatives that
allows fast error analysis.
Uempty
Physical
P
adapter Virtual I/O Server Client Client
S
VSCSI server
virtual adapter
Hypervisor
Physical VSCSI protocol
storage
hdisk
© Copyright IBM Corporation 2009, 2015
Notes:
Diagnostics are done on physical devices. It is fairly common to have logical partitions that see
only virtual devices: virtual Ethernet, virtual SCSI, virtual Fibre Channel. The diag utilities do
not diagnose virtual devices.
In a virtualized environment, the physical devices are allocated to the virtual I/O servers (VIOS).
If a client LPAR cannot access a device, the administrator needs to identify the VIOS providing
access and run the diagnostics at the VIOS.
The VIOS command-line interface (CLI) equivalent of the AIX diag command is the diagmenu
command. The alternative is to create a root AIX subshell with the oem_setup_env command
and run the AIX command in that shell.
diag
Notes:
Uempty
# diag
FUNCTION SELECTION 801002
Diagnostic Routines
This selection will test the machine hardware. Wrap plugs and
other advanced functions will not be used.
Advanced Diagnostics Routines
This selection will test the machine hardware. Wrap plugs and
other advanced functions will be used.
Task Selection (Diagnostics, Advanced Diagnostics, Service Aids, etc.)
This selection will list the tasks supported by these procedures.
Once a task is selected, a resource menu may be presented showing
all resources supported by the task.
Resource Selection
This selection will list the resources in the system that are supported
by these procedures. Once a resource is selected, a task menu will
be presented showing all tasks that can be run on the resource(s).
Notes:
- If the selected task does not support resource selection, then the task is started.
If the Resource Selection menu is selected, then the following happens:
- The Diagnostic Controller displays a list of resources available on the system.
- After a resource is selected, a Task Selection menu will appear containing the commonly
supported tasks for each selected resource. After selection of a task, the task is started.
Uempty
# diag
FUNCTION SELECTION 801002
Diagnostic Routines
This selection will test the machine hardware. Wrap plugs and
other advanced functions will not be used.
...
System Verification
This selection will test the system, but will not analyze the error
log. Use this option to verify that the machine is functioning
correctly after completing a repair or an upgrade.
Problem Determination
This selection tests the system and analyzes the error log
if one is available. Use this option when a problem is
suspected on the machine.
Notes:
From the list below, select any number of resources by moving the
cursor to the resource and pressing 'Enter'.
To cancel the selection, press 'Enter' again.
To list the supported tasks for the resource highlighted, press 'List'.
All Resources
This selection will select all the resources currently displayed.
sysplanar0 System Planar
U7311.D20.107F67B-
sisscsia0 P1-C04 PCI-XDDR Dual Channel Ultra320 SCSI
Adapter
+ hdisk2 P1-C04-T2-L8-L0 16 Bit LVD SCSI Disk Drive (73400 MB)
hdisk3 P1-C04-T2-L9-L0 16 Bit LVD SCSI Disk Drive (73400 MB)
ses0 P1-C04-T2-L15-L0 SCSI Enclosure Services Device
L2cache0 L2 Cache
...
Notes:
Uempty
No trouble was found. However, the resource was not tested because
the device driver indicated that the resource was in use.
Notes:
Diagnostic modes (1 of 3)
IBM Power Systems
Concurrent mode:
# diag
• Execute diag during normal
system operation
• Limited testing of components
# shutdown -m
Maintenance mode:
Password:
• Execute diag during single-user # diag
mode
• Extended testing of components
Notes:
Diagnostic modes
Three different diagnostic modes are available:
- Concurrent mode
- Maintenance (single-user) mode
- Service (standalone) mode (covered on the next visual).
Concurrent mode
Concurrent mode provides a way to run online diagnostics on some of the system resources
while the system is running normal system activity. Certain devices can be tested, for example,
a tape device that is not in use, but the number of resources that can be tested is limited.
Devices that are in use cannot be tested.
Diagnostic modes (2 of 3)
IBM Power Systems
Notes:
Standalone mode
But what do you do if your system does not boot or if you must test a system without AIX
installed on the system? In this case, you must use the standalone mode.
Standalone mode offers the greatest flexibility. You can test systems that do not boot or that
have no operating system installed (the latter requires a diagnostic CD/DVD).
Uempty 3. Boot your AIX system. If in manufacturing default configuration, you can power on the
server from the operator panel. If in a partitioned system, you would use the HMC to
start the LPAR.
4. If starting a partition with the HMC, you would specify a boot mode of Diagnostic with
Default Bootlist. If using the manufacturing default configuration with an attached
console, see the paragraph on using the console keyboard to control the boot mode.
Either method boot the machine in service mode.
5. If the CD/DVD drive has a diagnostic CD/DVD mounted, the diagnostic program boots
from that device. If there is nothing in the CD/DVD drive, then it will boot off the hard
disk, running the diagnostic program on that disk.
6. Now, you can run one of the diagnostic routines.
Diagnostic modes (3 of 3)
IBM Power Systems
HMC
diag is started
automatically
Notes:
Uempty
# diag
FUNCTION SELECTION 801002
Notes:
Other tasks
The diag command offers a wide number of other tasks that are hardware-related. All these
tasks can be found after starting the diag main menu and selecting Task Selection.
The tasks that are offered are hardware (or resource) related. For example, if your system has a
service processor, there is a service processor maintenance task, which you do not find on
machines without a service processor. On some systems, you find tasks to maintain RAID and
SSA storage systems.
Uempty
Diagnostic log
IBM Power Systems
# /usr/lpp/diagnostics/bin/diagrpt -r
ID DATE/TIME T RESOURCE_NAME DESCRIPTION
DC00 Mon Oct 08 16:13:06 I diag Diagnostic Session was started
DAE0 Mon Oct 08 16:10:38 N hdisk2 The device could not be tested
DC00 Mon Oct 08 16:10:13 I diag Diagnostic Session was started
DA00 Mon Oct 08 16:05:11 N sysplanar0 No Trouble Found
DA00 Mon Oct 08 16:05:05 N sisscsia0 No Trouble Found
DC00 Mon Oct 08 16:04:46 I diag Diagnostic Session was started
# /usr/lpp/diagnostics/bin/diagrpt -a
IDENTIFIER: DC00
Date/Time: Mon Oct 08 16:13:06
Sequence Number: 15
Event type: Informational Message
Resource Name: diag
Diag Session: 327726
Description: Diagnostic Session was started.
----------------------------------------------------------------------------
IDENTIFIER: DAE0
Date/Time: Mon Oct 08 16:10:38
Sequence Number: 14
Event type: Error Condition
Resource Name: hdisk2
Resource Description: 16 Bit LVD SCSI Disk Drive
Location: U7311.D20.107F67B-P1-C04-T2-L8-L0
© Copyright IBM Corporation 2009, 2015
Notes:
Diagnostic log
When diagnostics are run in online or single user mode, the information is stored into a
diagnostic log. The binary file is called /var/adm/ras/diag_log. The command,
/usr/lpp/diagnostics/bin/diagrpt, is used to read the content of this file.
Report fields
The ID column identifies the event that was logged. In the example in the visual, DC00 and DA00
are shown. DC00 indicated that the diagnostics session was started and the DA00 indicates No
Trouble Found (NTF).
The T column indicates the type of entry in the log. I is for informational messages. N is for No
Trouble Found. S shows the Service Request Number (SRN) for the error that was found. E is
for an Error Condition.
Checkpoint
IBM Power Systems
Notes:
Uempty
Exercise: Diagnostics
IBM Power Systems
Notes:
Unit summary
IBM Power Systems
Notes:
Uempty
Unit 12. The AIX system dump facility
References
Online AIX Version 7.1 Command Reference volumes 1-6
Online AIX Version 7.1 Kernel Extensions and Device Support
Programming Concepts (Chapter 16. Debug Facilities)
Online AIX Version 7.1 Operating system and device management
(section on System Startup)
Note: References listed as online are available through the IBM Knowledge
Center at the following address: http://ibm.com/support/knowledgecenter.
© Copyright IBM Corp. 2009, 2015 Unit 12. The AIX system dump facility 12-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Unit objectives
IBM Power Systems
Notes:
Uempty
System dumps
IBM Power Systems
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 12. The AIX system dump facility 12-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Types of dumps
IBM Power Systems
• Traditional:
– AIX generates dump before halt
• Firmware assisted (fw-assist):
– POWER firmware generates dump in parallel with AIX halt process
– Defaults to same scope of memory as traditional
– Can request a full system dump
• Live dump facility:
– Selective dump of registered components without need for a system
restart
– Can be initiated by software or by operator
– Controlled by livedumpstart and dumpctrl
– Written to a file system rather than a dump device
Notes:
Overview
In addition to the traditional dump function, AIX 6 introduced two new types of dumps, firmware
assisted dumps and the live dump facility.
Traditional dumps
Traditionally, AIX alone handled system dump generation and the only way to get a dump was
to halt the system either due to a crash or through operator request. In a logical partition, AIX
dumps only the memory that is allocated to that partition.
Uempty to this is that the operating system can start its reboot while the firmware handles the dumping
of the memory contents.
In its default mode, it captures the same scope of memory as the traditional dump, but it can be
configured for a full memory dump.
If for some reason (such as memory restrictions), a configured or requested firmware assisted
dump is not possible, then the traditional dump facility is started.
© Copyright IBM Corp. 2009, 2015 Unit 12. The AIX system dump facility 12-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Selective
Partial data copy
from running
applications
Kernel
extension
code and data
Secondary dump
device
© Copyright IBM Corporation 2009, 2015
Notes:
Uempty
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 12. The AIX system dump facility 12-7
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
The second step is to inform the other processors on the system about the dumps so that a
consistent snapshot of memory can be written to the dump image.
The dump routine then disables error logging, which allows the most recent error that was
recorded in NVRAM to be preserved instead of being overwritten. The most recent error is likely
more indicative of why the dump routine started.
The routine then arranges for a value to be shown on the operator panel that indicates whether
the system initiated the dump, or manually started by the system administrator.
After these initial steps are taken, the dump routine then proceeds with the main task, which is
processing the master dump table to determine which areas of memory should be written to the
dump image.
If a failure occurs while writing data to the primary dump device, and a secondary dump device
is defined, then the dump routine fails over to the secondary dump device. The dump restarts
processing of the master dump table in an attempt to write a complete dump image to the
secondary device. When a failover to the secondary device occurs, the dump sequence uses a
serial dump algorithm instead of the parallel algorithm.
When the dump routine finishes (successfully or not), it returns to the calling function, which in
most cases automatically reboot the operating system (the default setting).
Uempty
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 12. The AIX system dump facility 12-9
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
devices and adapters. A logical volume on the same physical disk as the primary would be
worthless. It would be better to increase the size of the primary.
Uempty
• Considerations:
– If using paging space:
• Use only /dev/hd6 for primary dump device
• Secondary device can be any paging space in rootvg
– If using logical volumes:
• Primary dump device must be in rootvg
• Secondary device can be in any volume group
– Mirrored paging space can be used
– Dump to DVD-RAM or tape does not span multiple volumes
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 12. The AIX system dump facility 12-11
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
drive. That is, the dump routine expects to find writable media in the dump device, so there are
problems if no media is in the drive, or the media is read-only.
Uempty
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 12. The AIX system dump facility 12-13
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
Uempty
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 12. The AIX system dump facility 12-15
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
Dump sequence
The dump sequence for a firmware assisted dump is different from the dump sequence of a
traditional dump.
When the dump routine is started, as with the traditional dump sequence, it starts ignoring all
interrupts, stop the other CPUs in the AIX instance, and display a value on the virtual operator
panel.
When the operating system is first booted and the firmware assisted dump mechanism is
configured, an assumption is made that a full memory dump is used.
If a selective memory dump is configured, the dump routine processes the master dump table
(MDT) to determine the memory blocks being used for the data to include in the dump. This
information is used to update the table in physical memory that initially assumed that a full
memory dump would take place.
Uempty The dump routine then starts the POWER Hypervisor (PHYP) to freeze the partition memory,
and reboot. PHYP copies data to a reserved area of memory to indicate to the AIX bootloader
(SoftROS) that a firmware assisted dump is in progress.
© Copyright IBM Corp. 2009, 2015 Unit 12. The AIX system dump facility 12-17
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
Uempty
Notes:
Introduction
Many aspects of dump device configuration and monitoring are accomplished by using the
sysdumpdev command. The command is used to do the tasks that are listed on the visual.
The system dump configuration settings are stored in the SWservAt ODM object class.
© Copyright IBM Corp. 2009, 2015 Unit 12. The AIX system dump facility 12-19
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
# sysdumpdev –l
primary /dev/lg_dumplv
secondary /dev/sysdumpnull
copy directory /var/adm/ras
forced copy flag TRUE
always allow dump FALSE
dump compression ON
type of dump traditional
Notes:
Overview
The -l flag of the sysdumpdev command lists the current dump configuration. The command
lists the current primary and secondary dump device settings. These settings are used until the
system reboots, or the sysdumpdev command is started again to change the configuration of
the primary or secondary devices.
The copy directory and forced copy flag settings are only relevant if a paging space
device is used as one of the dump devices. These settings are covered a little later in this unit.
The last line of output that is shown on the visual, indicating the type of dump, is displayed
when running AIX 6 or newer.
Uempty
# sysdumpdev –l
primary /dev/lg_dumplv
secondary /dev/sysdumpnull
copy directory /var/adm/ras
forced copy flag TRUE
always allow dump FALSE
dump compression ON
type of dump fw-assisted
full memory dump disallow
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 12. The AIX system dump facility 12-21
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
Dump type
The dump type is selected by using the -t flag of the sysdumpdev command. Allowable values
are traditional and fw-assisted. When changing from traditional to fw-assisted,
a reboot is required for the change to take place. The reboot is required because the firmware
assisted dump facility must reserve an area of memory for use in communication between
PHYP and SoftROS during system reboot when a firmware assisted dump is in progress.
Changing from fw-assisted to traditional does not require a reboot, as the existing
reserved memory can be released.
Uempty The allow value specifies that the full memory system dump mode is allowed but is done only
when the operating system cannot properly handle the dump request.
The require value specifies that the full memory system dump mode is allowed and is always
done.
© Copyright IBM Corp. 2009, 2015 Unit 12. The AIX system dump facility 12-23
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
Uempty the -s flag allows the secondary object to be changed. Using the -P flag in addition to either -p
or -s changes the tprimary or tsecondary object in addition to the primary or secondary ODM
object. This value changes the device that the system uses at reboot.
© Copyright IBM Corp. 2009, 2015 Unit 12. The AIX system dump facility 12-25
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
Uempty volume is not in the rootvg volume group, it cannot be configured as the permanent primary
dump device. It is used as the primary dump device from when it is configured until the system
crashes, reboots, or the sysdumpdev command is run again to change the primary dump
device setting.
© Copyright IBM Corp. 2009, 2015 Unit 12. The AIX system dump facility 12-27
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
Copy directory
The copy directory location is only relevant when a paging space device is used as a dump
device.
When the system reboots after a crash, if the dump was written to a paging space device it must
be copied somewhere before the paging space can be activated. The copy directory is the
location where the dump is copied. By default, it is set to /var/adm/ras.
When a dump in a paging space device is detected at system boot time, only the root volume
group is active and no file systems are mounted. The /sbin/rc.boot script does an explicit
mount of the /var file system before running the copycore command to copy the dump from
the paging space into the copy directory. After the dump is copied, the /var file system is
unmounted before the script continues.
If you change the location of the copy directory, it must be to another location that is contained
within the rootvg volume group. rootvg is the only active volume group at the stage of the boot
sequence when the copy must be done.
Uempty
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 12. The AIX system dump facility 12-29
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
• If the copy of the dump image fails and the forced copy
flag value is FALSE:
– The boot sequence continues (In other words, the dump is ignored)
– Dump image will likely be corrupted when paging space is activated
• If the copy of the dump image fails and the forced copy
flag value is TRUE:
– A menu is displayed on the console that allows you to copy the dump
to removable media, or to ignore the dump
– AIX 5.3 and above support tape and DVD-RAM devices
– The boot sequence waits until you interact with the menu
• The output of sysdumpdev –L indicates whether either of
these dump copy failure situations has occurred
Notes:
Uempty
The system dump is 117215744 bytes and will be copied from /dev/hd6
to media inserted into the device from the list below.
88 Help ?
99 Exit -- Warning, the dump will be lost!
>>> Choice[1]:
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 12. The AIX system dump facility 12-31
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
How a dump can be manually initiated with the always allow dump flag
The value of the always allow dump flag has significance over how a system dump can be
manually initiated.
For systems with AIX 6.1 or newer, the flag controls whether a special key sequence entered on
a native console keyboard initiates a system dump. A native console keyboard is either a USB
keyboard that is used with a graphics adapter in an LFT console configuration. Or, a keyboard
that is configured as part of a physical terminal device (such as a VT220 or IBM 3153) attached
to a physical serial port. On a system with a Hardware Management Console (HMC), the
integrated serial ports are disabled. The key sequence is not recognized when generated on the
virtual console that is provided by the HMC.
Uempty
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 12. The AIX system dump facility 12-33
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
Uempty The -s flag of sysdumpstart is used to specify a dump to the secondary dump device.
The -t flag of sysdumpstart is used to change the default type from fw_assist to traditional.
The -f flag of sysdumpstart is used to change the scope of the dump (interacts with the
configuration set up with sysdumpdev):
- disallow - Do not allow a full memory dump
- require - Require a full memory dump
© Copyright IBM Corp. 2009, 2015 Unit 12. The AIX system dump facility 12-35
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
S1
login: #dump#>1
Add a TTY
...
REMOTE Reboot ENABLE: dump
REMOTE Reboot STRING: #dump#
...
Notes:
reboot_enable
The value of this attribute (referred to as REMOTE Reboot ENABLE in SMIT) indicates whether
this port is enabled to reboot the machine by the remote reboot_string, and if so, whether to
take a system dump before rebooting:
- no - Indicates that remote reboot is disabled
- reboot - Indicates that remote reboot is enabled
- dump - Indicates that remote reboot is enabled, and, before rebooting, a system dump is
taken on the primary dump device
reboot_string
This attribute (referred to as REMOTE Reboot STRING in SMIT) specifies the remote
reboot_string that the serial port scans for when the remote reboot feature is enabled.
When the remote reboot feature is enabled, and the reboot_string is received on the port, a
'>' character is transmitted, and the system is ready to reboot. If a '1' character is received, the
system is rebooted (and a system dump might be started, depending on the value of the
reboot_enable attribute); any character other than '1' aborts the reboot process. The
reboot_string has a maximum length of 16 characters and must not contain a space, colon,
equal sign, null, new line, or Ctrl-\ character.
© Copyright IBM Corp. 2009, 2015 Unit 12. The AIX system dump facility 12-37
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
# smit dump
System Dump
Move cursor to desired item and press Enter
Show Current Dump Devices
Show Information About the Previous System Dump
Show Estimated Dump Size
Change the Type of Dump
Change the Full Memory Dump Mode
Change the Primary Dump Device
Change the Secondary Dump Device
Change the Directory to which Dump is Copied on Boot
Start a Dump to the Primary Dump Device
Start a Traditional System Dump to the Secondary Dump Device
Copy a System Dump from a Dump Device to a File
Always ALLOW System Dump
Check Dump Resources Utility
Change/Show Global System Dump Properties
Change/Show Dump Attributes for a Component
Change Dump Attributes for multiple Components
Notes:
The menu items that show or change the dump information use the sysdumpdev command.
© Copyright IBM Corp. 2009, 2015 Unit 12. The AIX system dump facility 12-39
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
If using an HMC to manage the LPAR, you can use the HMC GUI interface (or the chsysstate
command) to trigger a dump of the operating system.
In the GUI interface you would select the LPAR and then from the tasks menu: Operations >
Restart. The resulting window is shown in the visual. Clicking the Dump button selects an
operation to signal the system to effectively signal a reset to initiate a dump.
Uempty
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 12. The AIX system dump facility 12-41
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
# sysdumpdev –L
0453-039
Notes:
Uempty
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 12. The AIX system dump facility 12-43
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
Uempty
Dump problems
IBM Power Systems
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 12. The AIX system dump facility 12-45
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
If no dump image is obtained, you should check the error log for information. There might still be
error log entries that are related to the crash. When the dump fails to complete successfully,
there might be a partial dump created. The partial dump might or might not be useful, since it
depends on what is present in the dump image, and what is missing. A partial dump is indicated
when the Size field is greater than zero in the sysdumpdev -L output, and the dump status
value is not zero.
Uempty
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 12. The AIX system dump facility 12-47
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
Uempty contains the value 2. In this case, the savecore command would copy the dump image and
name it vmcore.2.BZ, and then update the bounds file to contain the value 3.
When the savecore command copies the dump image, it marks the dump device copied. If the
savecore command is run on the dump device again, it fails with an error message that the
dump is no longer valid. You can ignore this warning by using the -f flag of savecore.
© Copyright IBM Corp. 2009, 2015 Unit 12. The AIX system dump facility 12-49
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
Uempty
# smit chgsys
...
Maximum number of PROCESSES allowed per user [128] +#
Maximum number of pages in block I/O BUFFER CACHE [20] +#
...
...
Enable full CORE dump false +
OR
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 12. The AIX system dump facility 12-51
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
/unix
/var/adm/ras/vmcore.x
(Kernel)
(Dump file)
# uncompress /var/adm/ras/vmcore.x.Z
OR
# dmpuncompress /var/adm/ras/vmcore.x.BZ
# kdb /var/adm/ras/vmcore.x /unix
> status
> stat
(further subcommands for analyzing)
> quit
Notes:
Uempty To use kdb, the vmcore file must be uncompressed. After a crash, it is typically named
vmcore.x.Z, which indicates that it is in a compressed format. As illustrated on the visual, use
the uncompress command before using kdb.
To analyze a dump file, you would first uncompress the compressed dump. If the dump file has
a .Z suffix, then you would use the uncompress command. Starting in AIX 6.1, the dump file
ends in a .BZ suffix and you must use the dmpuncompress command to process this file. If
you want to leave the original compressed file intact (rather than replacing it with the
uncompressed file), then use the -p option of the dmpuncompress command.
# uncompress /var/adm/ras/vmcore.x.Z
or
# dmpuncompress /var/adm/ras/vmcore.x.BZ
When the dump is uncompressed, you would analyze it with the kdb command.
# kdb /var/adm/ras/vmcore.x /unix
Useful subcommands
Examining a system dump requires an in-depth knowledge of the AIX kernel. However, two
subcommands that might be useful to you are:
- The subcommand status displays the processes/threads that were active on the CPUs
when the crash occurred
- The subcommand stat shows the machine status when the dump occurred
To exit the kdb debug program, type quit at the > prompt.
© Copyright IBM Corp. 2009, 2015 Unit 12. The AIX system dump facility 12-53
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
Overview
The snap command collects system configuration information for problem determination
purposes.
By default, the command creates the /tmp/ibmsupt directory, and other subdirectories under
/tmp/ibmsupt depending on the data that is collected. The -d flag can be used to change the
default directory that is used to collect the data. The snap command has control options that
determine what the command does, and data collection options that determine the type of
system information that is collected.
By default, the command checks the file system that is used for data collection to ensure that
enough free space is available. It gathers the system configuration information based on the
flags that are specified, and then optionally either copy the results to media or create a
compressed pax archive file.
The intent of the snap command is to serve as a single interface for every step that is required
to package information for transmission to a support group. While some of the options might
Uempty seem simple, they do help to prevent simple mistakes from occurring. They also help those
administrators who are not familiar with basic UNIX commands like pax and compress. This
knowledge helps maximize the chances that the snap images sent to IBM for analysis are
correct and contain valid data.
© Copyright IBM Corp. 2009, 2015 Unit 12. The AIX system dump facility 12-55
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
Uempty
Control flags
IBM Power Systems
Flag Description
–c Create a compressed pax archive file
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 12. The AIX system dump facility 12-57
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
required, and collects the data immediately. Using this flag can cut the time that is taken to run
the snap command in half, the risk is that you can run out of space on the file system without
warning.
Uempty
snap examples
IBM Power Systems
• Example 1:
snap –a –c –d /some/directory
• Example 2:
snap –Dkg –o /dev/rmt0
• Recommendation is to use –a for initial data collection
– Minimum of –Dkg when collecting information about a system dump
• Other considerations:
– Most data collection options append information to files in the
/tmp/ibmsupt directory structure
– Depending on the AIX version, some options might not be available
– /tmp/ibmsupt/other and /tmp/ibmsupt/testcase can be
used to supply additional information
Notes:
Examples
In the first example, snap -a gathers all the system information and use the directory
/some/directory to store the information (-d /some/directory), and to create a
compressed pax archive of the information that is collected (the -c flag).
In the second example, snap captures the dump (-D flag), kernel (-k flag) and general
information (-g flag), and writes the information as a pax archive to the device /dev/rmt0.
The data is collected in /tmp/ibmsupt before being written to the tape.
You should always use the -a data collection option with snap when gathering problem
determination information on a system for the first time. One thing to watch out for with the
snap command is that many of the functions append data to existing files. If the snap
command is run multiple times without cleaning the temporary directory, some of the collected
data files have multiple sets of information, but collected at different times. Always make sure
that you are looking at the most recent set of data in a file. The most recent set of data is at the
end of the file.
© Copyright IBM Corp. 2009, 2015 Unit 12. The AIX system dump facility 12-59
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
The subdirectories other and testcase can be used to supply test case data and programs as
part of the snap package. Run snap -a to collect system data, then place files into the other or
testcase directories, then run snap -c or snap -o device to create the package.
Uempty
Checkpoint
IBM Power Systems
Notes:
© Copyright IBM Corp. 2009, 2015 Unit 12. The AIX system dump facility 12-61
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
Uempty
Unit summary
IBM Power Systems
Notes:
When a dump occurs, kernel and system data are copied to the primary dump device.
By default, the system has a primary dump device (/dev/hd6) and a secondary device
(/dev/sysdumpnull).
During reboot, the dump is copied to the copy directory (/var/adm/ras).
A system dump should be retrieved from the system by using the snap command.
The Support Center uses the kdb debugger to examine the dump.
© Copyright IBM Corp. 2009, 2015 Unit 12. The AIX system dump facility 12-63
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
AP
Appendix A. Checkpoint solutions
Checkpoint solutions
IBM Power Systems
Checkpoint solutions (1 of 2)
IBM Power Systems
1. True or False: The CuAt ODM object class contains an entry for
each attribute for each supported device.
The answer is false. It is the PdAt ODM object class. The CuAt
object class only contains attributes that are different from the
default value.
2. True or False: The DvDr attribute in the PdDv ODM object class
identifies the program that is loaded into the kernel when the device
is made available.
The answer is true.
Checkpoint solutions (2 of 2)
IBM Power Systems
Checkpoint solutions (1 of 2)
IBM Power Systems
Checkpoint solutions (2 of 2)
IBM Power Systems
Checkpoint solutions
IBM Power Systems
Checkpoint solutions (1 of 2)
IBM Power Systems
1. True or False: You must have AIX loaded on your system to use the System
Management Services programs.
The answer is false: SMS is part of the built-in firmware.
2. Your AIX system is powered off. AIX is installed on hdisk1 but the bootlist is set to
boot from hdisk0. How can you fix the problem and make the machine boot from
hdisk1?
The answer is you need to boot the SMS programs and set the new boot list to
include hdisk1.
3. Your machine is booted and is at the # prompt. What is the command that
displays the normal bootlist?
The answer is # bootlist -m normal –o.
4. Your machine is booted and is at the # prompt. How might you change the
normal bootlist?
The answer is # bootlist -m normal device1 device2.
Checkpoint solutions (2 of 2)
IBM Power Systems
5. What command is used to build a new boot image and write it to the
boot logical volume?
The answer is bosboot -ad /dev/hdiskx.
7. True or False: During the AIX boot process, the AIX kernel is loaded
from the root file system.
The answer is false: the AIX kernel is loaded from hd5.
Checkpoint solutions (1 of 2)
IBM Power Systems
Checkpoint solutions (2 of 2)
IBM Power Systems
5. What is the likely cause if your system stops booting with LED 553?
The answer is there is a problem with processing /etc/inittab.
Checkpoint solutions
IBM Power Systems
Checkpoint solutions
IBM Power Systems
2. This volume group consists of two disks that are completely mirrored.
Because of the disk failure you are not able to vary on datavg. How do
you recover from this situation?
The answer is forced varyon: varyonvg -f datavg. Use
procedure 1 for mirrored disks.
3. After disk replacement, you find that a disk has been removed from the
system but not from the volume group. How do you fix this problem?
The answer is repair the ODM, for example through exportvg and
importvg. Execute reducevg using the PVID instead of disk name.
Checkpoint solutions (1 of 2)
IBM Power Systems
3. Why should you not use exportvg with an alternate disk volume
group?
The answer is this removes rootvg related entries from
/etc/filesystems.
Checkpoint solutions (2 of 2)
IBM Power Systems
Checkpoint solutions
IBM Power Systems
2. True or False: The creation of a JFS split copy marks all of the split
mirror copies as stale.
The answer is true.
3. True or False: After the creation of a JFS split mirror copy, the
administrator needs to mount the new file system to be able to access
the split copy.
The answer is false.
Checkpoint solutions
IBM Power Systems
Checkpoint solutions
IBM Power Systems
AP
Appendix B. Command summary
Directories
mkdir Make directory
cd Change the directory. The default is $HOME directory.
rmdir Remove a directory (beware of files that start with “.”).
rm Remove file; -r option removes directory and all files and
subdirectories recursively.
pwd Print working directory: shows name of current directory
ls List files
-a (all)
-l (long)
-d (directory information)
-r (reverse alphabetic order)
-t (time changed)
-C (multi-column format)
-R (recursively)
-F (places / after each directory name and * after each exec file)
Files: Basic
cat List files contents (concatenate). cat can open a new file with
redirection, for example, cat > newfile. Use <Ctrl>d to end
input.
chmod Change the permission mode for files or directories.
• chmod =+- files or directories
• (r,w,x = permissions and u, g, o, a = who)
• Can use + or - to grant or revoke specific permissions
• Can also use numerics, 4 = read, 2 = write, 1 = execute
• Can combine them, first - user, next - group, last - other
• For example, chmod 746 file1 is user = rwx, group = r, other
= rw
Files: Advanced
awk Programmable text editor / report write
banner Display banner (can redirect to another terminal nn with
> /dev/ttynn)
cal Calendar (cal month year)
Editors
ed Line editor
vi Screen editor
INed LPP editor
emacs Screen editor +
Metacharacters
* Any number of characters (0 or more)
? Any single character
[abc] [ ] any character from the list
[a-c] [ ] match any character from the list range
! Not any of the following characters (for example, [, !abc, or ] )
; Command terminator that is used to string commands on a single line
& Command preceding and to be run in background mode
AP # Comment character
\ Removes special meaning (no interpretation) of the following
character
Removes special meaning (no interpretation) of character in
quotation marks
" Interprets only $, backquote, and \ characters between the quotation
marks
' Used to set variable to results of a command.
For example, now='date' sets the value of now to current results of
the date command
$ Preceding variable name indicates the value of the variable
Variables
= Set a variable (for example, d="day" sets the value of d to "day").
Can also set the variable to the results of a command by the `
character. For example, now=`date` sets the value of now to the
current result of the date command.
HOME Home directory
PATH Path to be checked
SHELL Shell to be used
TERM Terminal being used
PS1 Primary prompt characters, usually $ or #
PS2 Secondary prompt characters, usually >
$? Return code of the last command run
set Displays current local variable settings
export Exports variable so the child process can inherit the variable
env Displays inherited variables
echo Echo a message (for example, echo HI or echo $d).
Can turn off carriage returns with \c at the end of the message.
Can print a blank line with \n at the end of the message.
Transmitting
mail Send and receive mail. With user ID sends mail to user ID. Without
user ID, displays your mail. When processing your mail, at the ?
prompt for each mail item, you can:
• d - delete
• s - append
• q - quit
• enter - skip
• m - forward
mailx Upgrade of mail
uucp Copy file to other UNIX systems (UNIX to UNIX copy)
uuto/uupick Send and retrieve files to public directory
uux Run on remote system (UNIX to UNIX execute)
System administration
df Display file system usage
installp Install program
kill (pid) Stop batch process with ID or (PID) (find by using ps);
kill -9 PID absolutely kill process
mount Associate logical volume to a directory;
For example, mount device directory
ps -ef Shows process status (ps -ef)
umount Disassociate file system from directory
smit System management interface tool
Miscellaneous
banner Displays banner
date Displays current date and time
newgrp Change active groups
nice Assigns lower priority to following command (for example,
nice ps -f)
passwd Modifies current password
sleep n Sleep for n seconds
System files
/etc/group List of groups
/etc/motd Message of the day, which is displayed at login
/etc/passwd List of users and signon information. Password that is shown as !,
can prevent password checking by editing to remove !
/etc/profile System-wide user profile that is executed at login, can override
variables by resetting in the user's .profile file
/etc/security Directory not accessible to normal users
/etc/security/environ User environment settings
/etc/security/group Group attributes
/etc/security/limits User limits
/etc/security/login.cfg Login settings
/etc/security/passwd User passwords
/etc/security/user User attributes, password restrictions
Variables
var=string Set variable to equal string. (NO SPACES). Spaces must be enclosed
by double quotation marks. Special characters in string must be
enclosed by single quotation marks to prevent substitution. Piping (|),
redirection (<, >, >>), and & symbols are not interpreted.
$var Gives value of var in a compound
echo Displays value of var, for example, echo $var
HOME = Home directory of user
MAIL = Mail file name
PS1 = Primary prompt characters, usually "$" or "#"
PS2 = Secondary prompt characters, usually ">"
Commands
# Comment designator
&& Logical-and. Run command after && only if command Preceding &&
succeeds (return code = 0)
|| Logical-or. Run command after || only if the command that precedes
|| fails (return code < > 0)
exit n Used to pass return code nl from shell script, passed as variable
$? to parent shell
expr Arithmetic expressions
Syntax: "expr expression1 operator expression2"
Operators: + - \* (multiply) / (divide) % (remainder)
for loop for n (or: for variable in $*); for example:
do
command
done
if-then-else if test expression
then command
elif test expression
then command
else
then command
fi
read Read from standard input
shift Shifts arguments 1-9 one position to the left and decrements number
of arguments
Miscellaneous
sh Run shell script in the sh shell
-x (execute step-by-step, used for debugging shell scripts)
vi Editor
Entering vi
vi file Edits the file named file
vi file file2 Edit files consecutively (through :n)
.exrc File that contains the vi profile
wm=nn Sets wrap margin to nn. Can enter a file other than at first line by
adding + (last line), +n (line n), or +/pattern (first occurrence of
pattern).
vi -r Lists saved files
vi -r file Recover file that is named file from crash
:n Next file in stack
:set all Show all options
:set nu Display line numbers (off when set nonu)
Units of measure
h, l Character left, character right
k or <Ctrl>p Move cursor to character above cursor
j or <Ctrl>n Move cursor to character below cursor
w, b Word right, word left
^, $ Beginning, end of current line
<CR> or + Beginning of next line
- Beginning of previous line
G Last line of buffer
Cursor movements
Can precede cursor movement commands (including cursor arrow) with number of times to repeat,
for example, 9--> moves right 9 characters.
0 Move to first character in line
$ Move to last character in line
^ Move to first nonblank character in line
fx Move right to character x
Fx Move left to character x
Adding text
a Add text after the cursor (end with <esc>)
Deleting text
<Ctrl>w Undo entry of current word
@ Delete the insert on this line
x Delete current character
dw Delete to end of current word (observe punctuation)
dW Delete to end of current word (ignore punctuation)
dd Delete current line
d Erase to end of line (same as d$)
d) Delete current sentence
d} Delete current paragraph
dG Delete current line through end of buffer
d^ Delete to the beginning of line
u Undo last change command
U Restore current line to original state before modification
Replacing text
ra Replace current character with a
R Replace all characters that are written over until <esc> is entered
s Delete current character and append test until <esc>
s/s1/s2 Replace s1 with s2 (in the same line only)
S Delete all characters in the line and append text
cc Replace all characters in the line (same as S)
ncx Delete n text objects of type x, w, b = words,) = sentences, } =
paragraphs, $ = end-of-line, ^ = beginning of line) and enter append
mode
C Replace all characters from cursor to end of line
AP Moving text
p Paste last text that is deleted after cursor (xp will transpose 2
characters)
P Paste last text that is deleted before cursor
nYx Yank n text objects of type x (w, b = words,) = sentences, } =
paragraphs, $ = end of line, and no "x" indicates lines. Can then paste
them with p command. Yank does not delete the original.
"ayy" Can use named registers for moving, copying, cut/paste with "ayy" for
register a (use registers a-z), can then paste them with ap command.
Miscellaneous
. Repeat last command
J Join current line with next line
AP
Appendix C. AIX dump code and progress codes
This appendix is an extract out of the AIX 4.3 Messages Guide and Reference.
0c0 - 0cc
0c0 A user-requested dump completed successfully.
0c1 An I/O error occurred during the dump.
0c2 A user-requested dump is in progress. Wait at least 1 minute for the dump to
complete.
0c4 The dump ran out of space. Partial dump is available.
0c5 The dump failed due to an internal failure. A partial dump might exist.
0c7 Progress indicator. Remote dump is in progress.
0c8 The dump device is disabled. No dump device configured.
0c9 A system-initiated dump started. Wait at least 1 minute for the dump to
complete.
0cc (AIX 4.2.1 and later) An error occurred writing to the primary dump device. It
switched over to the secondary.
100 - 195
100 Progress indicator. BIST completed successfully.
101 Progress indicator. Initial BIST started following system reset.
102 Progress indicator. BIST started following power-on reset.
103 BIST could not determine the system model number.
104 BIST could not find the common on-chip processor bus address.
105 BIST could not read from the on-chip sequencer EPROM.
106 BIST detected a module failure.
111 On-chip sequencer stopped. BIST detected a module error.
112 Checkstop occurred during BIST and checkstop results could not be logged
out.
113 The BIST checkstop count equals 3 which means three unsuccessful system
restarts. System halts.
120 Progress indicator. BIST started CRC check on the EPROM.
121 BIST detected a bad CRC on the on-chip sequencer EPROM.
122 Progress indicator. BIST started a CRC check on the EPROM.
© Copyright IBM Corp. 2009, 2015 Appendix C. AIX dump code and progress codes C-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
AP 201 Checkstop occurred during system restart. If a 299 LED was shown before,
re-create the boot logical volume (bosboot).
202 Unexpected machine check interrupt, system halts
203 Unexpected data storage interrupt, system halts
204 Unexpected instruction storage interrupt, system halts
205 Unexpected external interrupt, system halts
206 Unexpected alignment interrupt, system halts
207 Unexpected program interrupt, system halts
208 Machine check due to an L2 uncorrectable ECC, system halts
209 Reserved, system halts
210 Unexpected switched virtual circuit (SVC) 1000 interrupt, system halts
211 IPL ROM CRC miscompare occurred during system restart, system halts
212 POST found processor to be bad, system halts
213 POST failed. No good memory could be detected, the system halts.
214 An I/O planar failure was detected. The power status register, the time-of-day
clock, or NVRAM on the I/O planar failed. The system halts
215 Progress indicator. The level of voltage that is supplied to the system is too
low to continue a system restart.
216 Progress indicator. The IPL ROM code is being uncompressed into memory
for execution.
217 Progress indicator. The system encountered the end of the boot devices list.
The system continues to loop through the boot devices list.
218 Progress indicator. POST is testing for 1MB of good memory.
219 Progress indicator. POST bit map is being generated.
21c L2 cache was not detected as part of systems configuration (when LED
persists for 2 seconds).
220 Progress indicator. IPL control block is being initialized.
221 An NVRAM CRC miscompare occurred while loading the operating system
with the key mode switch in Normal position. System halts.
222 Progress indicator. Attempting a Normal-mode system restart from the
standard I/O planar-attached devices. System tries to restart.
223 Progress indicator. Attempting a Normal-mode system restart from the
SCSI-attached devices that are specified in the NVRAM list.
224 Progress indicator. Attempting a Normal-mode system restart from the 9333
High-Performance Disk Drive Subsystem.
225 Progress indicator. Attempting a Normal-mode system restart from the
bus-attached internal disk.
© Copyright IBM Corp. 2009, 2015 Appendix C. AIX dump code and progress codes C-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
© Copyright IBM Corp. 2009, 2015 Appendix C. AIX dump code and progress codes C-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
260 Progress indicator. Menus are being displayed on the local display or terminal
that is connected to your system. The system waits for input from the
terminal.
261 No supported local system display adapter was found. The system waits for a
response from an asynchronous terminal on serial port 1.
262 No local system keyboard was found.
263 Progress indicator. Attempting a Normal-mode system restart from the Family
2 Feature ROM specified in the NVRAM boot devices list.
269 Progress indicator. Cannot boot system, end of bootlist reached.
270 Progress indicator. Ethernet/FDX 10 Mbps MC adapter POST is running.
271 Progress indicator. Mouse and mouse port POST are running.
272 Progress indicator. Tablet port POST is running.
276 Progress indicator. A 10/100 Mbps Ethernet MC adapter POST is running.
277 Progress indicator. Auto Token Ring LAN streamer MC 32 adapter POST is
running.
278 Progress indicator. Video ROM scan POST is running.
279 Progress indicator. FDDI POST is running
280 Progress indicator. 3Com Ethernet POST is running.
281 Progress indicator. Keyboard POST is running.
282 Progress indicator. Parallel port POST is running.
283 Progress indicator. Serial port POST is running.
284 Progress indicator. POWER Gt1 graphics adapter POST is running.
285 Progress indicator. POWER Gt3 graphics adapter POST is running.
286 Progress indicator. Token Ring adapter POST is running.
287 Progress indicator. Ethernet adapter POST is running.
288 Progress indicator. Adapter slot cards are being queried.
289 Progress indicator. POWER Gt0 graphics adapter POST is running.
290 Progress indicator. I/O planar test started.
291 Progress indicator. Standard I/O planar POST is running.
292 Progress indicator. SCSI POST is running.
293 Progress indicator. Bus-attached internal disk POST is running.
294 Progress indicator. TCW SIMM in slot J is bad.
295 Progress indicator. Color Graphics Display POST is running.
296 Progress indicator. Family 2 Feature ROM POST is running.
AP 297 Progress indicator. System model number could not be determined. System
halts.
298 Progress indicator. Attempting a warm system restart.
299 Progress indicator. IPL ROM passed control to loaded code.
2e6 Progress indicator. A PCI Ultra/Wide differential SCSI adapter is being
configured.
2e7 An undetermined PCI SCSI adapter is being configured.
© Copyright IBM Corp. 2009, 2015 Appendix C. AIX dump code and progress codes C-7
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
AP 553 The /etc/inittab file was incorrectly modified or is damaged. Phase 1 boot is
completed and the init command started.
554 The IPL device could not be opened or a read failed (hardware not configured
or missing).
555 The fsck -fp /dev/hd4 command on the root file system failed with a nonzero
return code.
556 LVM subroutine error from ipl_varyon.
557 The root file system could not be mounted. The problem is usually due to bad
information on the log logical volume (/dev/hd8) or the boot logical volume
(hd5) is damaged.
558 Not enough memory is available to continue system restart.
559 Less than 2 MB of good memory are left for loading the AIX kernel. System
halts.
560 Unsupported monitor is attached to the display adapter.
561 Progress indicator. The TMSSA device is being identified or configured.
565 Configuring the MWAVE subsystem.
566 Progress indicator. Configuring Namkan twinaxx common card.
567 Progress indicator. Configuring High-Performance Parallel Interface (HIPPI)
device driver (fpdev).
568 Progress indicator. Configuring High-Performance Parallel Interface (HIPPI)
device driver (fphip).
569 Progress indicator. FCS SCSI protocol device is being configured.
570 Progress indicator. A SCSI protocol device is being configured.
571 HIPPI common functions driver is being configured.
572 HIPPI IPI-3 master mode driver is being configured.
573 HIPPI IPI-3 slave mode driver is being configured.
574 HIPPI IPI-3 user-level interface is being configured.
575 A 9570 disk-array driver is being configured.
576 Generic async device driver is being configured.
577 Generic SCSI device driver is being configured.
578 Generic common device driver is being configured.
579 Device driver is being configured for a generic device.
580 Progress indicator. A HIPPI-LE interface (IP) layer is being configured.
581 Progress indicator. TCP/IP is being configured. The configuration method for
TCP/IP is being run.
582 Progress indicator. Token ring data link control (DLC) is being configured.
© Copyright IBM Corp. 2009, 2015 Appendix C. AIX dump code and progress codes C-9
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
583 Progress indicator. Ethernet data link control (DLC) is being configured.
584 Progress indicator. IEEE Ethernet (802.3) data link control (DLC) is being
configured.
585 Progress indicator. SDLC data link control (DLC) is being configured.
586 Progress indicator. X.25 data link control (DLC) is being configured.
587 Progress indicator. Netbios is being configured.
588 Progress indicator. Bisync read-write (BSCRW) is being configured.
589 Progress indicator. SCSI target mode device is being configured.
590 Progress indicator. Diskless remote paging device is being configured.
591 Progress indicator. Logical Volume Manager device driver is being
configured.
592 Progress indicator. An HFT device is being configured.
593 Progress indicator. SNA device driver is being configured.
594 Progress indicator. Asynchronous I/O is being defined or configured.
595 Progress indicator. X.31 pseudo device is being configured.
596 Progress indicator. SNA DLC/LAPE pseudo device is being configured.
597 Progress indicator. Outboard communication server (OCS) is being
configured.
598 Progress indicator. OCS hosts is being configured during system reboot.
599 Progress indicator. FDDI data link control (DLC) is being configured.
5c0 Progress indicator. Streams-based hardware driver being configured.
5c1 Progress indicator. Streams-based X.25 protocol stack being configured.
5c2 Progress indicator. Streams-based X.25 COMIO emulator driver being
configured.
5c3 Progress indicator. Streams-based X.25 TCP/IP interface driver being
configured.
5c4 Progress indicator. FCS adapter device driver being configured.
5c5 Progress indicator. SCB network device driver for FCS is being configured.
5c6 Progress indicator. AIX SNA channel being configured.
c00 - c99
c00 AIX Install/Maintenance loaded successfully.
c01 Insert the AIX Install/Maintenance diskette.
c02 Diskettes inserted out of sequence.
c03 Wrong diskette inserted.
© Copyright IBM Corp. 2009, 2015 Appendix C. AIX dump code and progress codes C-11
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
c55 Could not remove the specified logical volume in a preservation installation.
c56 Running user-defined customization.
c57 Failure to restore BOS.
c58 Displaying message to turn the key.
c59 Could not copy either device special files, device ODM, or volume group
information from RAM to disk.
c61 Failed to create the boot image.
c70 Problem mounting diagnostic CD disk in stand-alone mode.
c99 Progress indicator. The diagnostic programs completed.
backpg
Back page