Sunteți pe pagina 1din 31

SAN boot

KiWaon Kim @ IBMKR

Thursday, October 22, 2009 1


Why the “SAN boot” is needed?

• Now, storage devices are very reliable. They provides a higher


availability through built-in redundancy and hot sparing feature.

• In todayʼs virtualized environment, SAN boot is required for key


feature like “Partition Mobility”. The rootvg can be easily moved to the
other LPARs on the different CEC.

• Fast recovery time during system wide outage

• Fast deploy of new systems

• Various storage-based backup solutions can be used (Flashcopy, etc.)

Thursday, October 22, 2009 2


AIX boot environments

• Boot from internal disk (common way of booting)

• Boot from direct attached disk drive, CD/DVD Rom, tape, ...

• Boot from disk from SAN environment with HBA (Host Bus Adapter).
Using multiple paths to the boot disk is also supported

• Boot from disk given by Virtual I/O Server through virtual SCSI
adapter (dual VIOSes can provide the same boot disk)

• Boot from disk from SAN environment with Virtual Fiber Adapter given
by Virtual I/O Server through N-Port ID Virtualization feature.

Thursday, October 22, 2009 3


Normal AIX boot procedure vs. SAN boot

SAN boot procedure is different from the normal boot


procedure only in this phase. If the boot device is successfully
accessed, then the next step is exactly the same.

Thursday, October 22, 2009 4


bootlist
• bootlist is a boot device sequence used by open-firmware. It is stored in device tree residing on
NVRAM. bootlist can be set in AIX or in SMS menu.

• bootlist can be examined by bootlist command. Also, device tree can be examined with snap
data ( devtree.out file in general directory)

# bootlist -m normal -ov


'ibm,max-boot-devices' = 0x5
NVRAM variable: (boot-device=/
pci@800000020000015/pci@2,2/fibre-channel@1/disk@5005076300c1a096,5708000000000000:2 /
pci@800000020000015/pci@2,2/fibre-channel@1/disk@5005076300cda096,5708000000000000:2 /
pci@800000020000015/pci@2,2/fibre-channel@1/disk@5005076300c9a096,5708000000000000:2 /
pci@800000020000017/pci@2,2/fibre-channel@1/disk@5005076300c1a096,5708000000000000:2 /
pci@800000020000017/pci@2,2/fibre-channel@1/disk@5005076300cda096,5708000000000000:2)
Path name:...

in devtree.out,
boot-device
2f706369 40383030 30303030 32303030 [/pci@80000002000]
30303135 2f706369 40322c32 2f666962 [0015/pci@2,2/fib]
72652d63 68616e6e 656c4031 2f646973 [re-channel@1/dis]
6b403530 30353037 36333030 63316130 [k@5005076300c1a0]
39362c35 37303830 30303030 30303030 [96,5708000000000]
3030303a 32202f70 63694038 30303030 [000:2 /pci@80000]
30303230 30303030 31352f70 63694032 [0020000015/pci@2]
2c322f66 69627265 2d636861 6e6e656c [,2/fibre-channel]
40312f64 69736b40 35303035 30373633 [@1/disk@50050763]
30306364 61303936 2c353730 38303030 [00cda096,5708000]
30303030 30303030 303a3220 2f706369 [000000000:2 /pci]
40383030 30303030 32303030 30303135 [@800000020000015]
2f706369 40322c32 2f666962 72652d63 [/pci@2,2/fibre-c]
68616e6e 656c4031 2f646973 6b403530 [hannel@1/disk@50]
30353037 36333030 63396130 39362c35 [05076300c9a096,5]
37303830 30303030 30303030 3030303a [708000000000000:]
32202f70 63694038 30303030 30303230 [2 /pci@800000020]
30303030 31372f70 63694032 2c322f66 [000017/pci@2,2/f]
69627265 2d636861 6e6e656c 40312f64 [ibre-channel@1/d]
....

Thursday, October 22, 2009 5


ioinfo utility (in open firmware)
• From Power 6, the new utility, ioinfo, can be used for debugging at the very first stage of the
system booting

• Through ioinfo, we can check the boot devices and run several I/O tests on specific device

0 > ioinfo

!!! IOINFO: FOR IBM INTERNAL USE ONLY !!!


This tool gives you information about SCSI,IDE,SATA,SAS,and USB devices attached to the
system

Select a tool from the following

1. SCSIINFO
2. IDEINFO
3. SATAINFO
4. SASINFO
5. USBINFO
6. FCINFO <====
7. VSCSIINFO

q - quit/exit
==> 6

FCINFO Main Menu


Select a FC Node from the following list:
# Location Code Pathname
---------------------------------------------------------------
1. U9117.MMA.65EBF8C-V32-C5-T1 /vdevice/vfc-client@30000005
2. U9117.MMA.65EBF8C-V32-C6-T1 /vdevice/vfc-client@30000006

q - Quit/Exit

==> 2

Thursday, October 22, 2009 6


ioinfo utility (in open firmware)
FC Node Menu
FC Node String: /vdevice/vfc-client@30000006
FC Node WorldWidePortName: c05076001ab6003a
-----------------------------------------------------------------
1. List Attached FC Devices
2. Select a FC Device
3. Enable/Disable FC Adapter Debug flags

q - Quit/Exit

==> 2

1. 50060e801530f310,0 - 10240 MB Disk drive (bootable)


2. 50060e801530f310,1000000000000 - 35840 MB Disk drive

Select a FC Device : 1

FC Device Menu
FC Target Address ==> 50060e801530f310 FC Lun Address ==> 0
FC Device String: /vdevice/vfc-client@30000006/disk@50060e801530f310,0:0
FC Device: 10240 MB Disk drive (bootable)
----------------------------------------------------------------------

1. Display Inquiry Data


2. Spin up Drive
3. Spin down Drive
4. Continuous random Reads ( hit any key to stop )
5. Enable/Disable FC Device Debug flags
98. Boot from this Device

q - Quit/Exit

==> 1

Thursday, October 22, 2009 7


ioinfo utility (in open firmware)
INQUIRY DATA FOR : TARGET ==> 50060e801530f310 LUN ==> 0 - 10240 MB Disk drive (bootable)

000002f4cd00: 00 00 03 32 cf 00 00 02 48 49 54 41 43 48 49 20 :...2....HITACHI :
000002f4cd10: 4f 50 45 4e 2d 56 20 20 20 20 20 20 20 20 20 20 :OPEN-V :
000002f4cd20: 36 30 30 34 35 30 20 31 33 30 46 33 33 30 33 33 :600450 130F33033:
000002f4cd30: 20 32 41 20 01 01 01 01 00 00 00 00 00 00 00 00 : 2A ............:
000002f4cd40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 :................:
000002f4cd50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 :................:
000002f4cd60: 05 01 05 70 30 30 ff 00 c0 50 76 00 1a b6 00 3a :...p00...Pv....::
000002f4cd70: c0 50 76 00 1a b6 00 3a 00 00 00 0f 00 00 00 00 :.Pv....:........:
000002f4cd80: 00 00 00 00 00 00 00 00 00 00 00 00 00 03 00 00 :................:
000002f4cd90: 01 01 01 01 00 00 00 00 01 01 01 01 01 01 01 01 :................:
000002f4cda0: 01 01 01 01 01 01 01 01 55 55 55 55 55 55 55 55 :........UUUUUUUU:
000002f4cdb0: 55 55 55 55 00 00 00 00 ff ff ff ff 00 00 00 00 :UUUU............:
000002f4cdc0: 00 00 00 03 00 00 00 01 00 00 00 01 00 01 99 40 :...............@:
000002f4cdd0: 00 00 71 a3 00 00 00 00 00 00 00 00 00 00 00 00 :..q.............:
000002f4cde0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 :................:
000002f4cdf0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 :...............:
Hit a key to continue...

FC Device Menu
FC Target Address ==> 50060e801530f310 FC Lun Address ==> 0
FC Device String: /vdevice/vfc-client@30000006/disk@50060e801530f310,0:0
FC Device: 10240 MB Disk drive (bootable)
----------------------------------------------------------------------

1. Display Inquiry Data


2. Spin up Drive
3. Spin down Drive
4. Continuous random Reads ( hit any key to stop )
5. Enable/Disable FC Device Debug flags
98. Boot from this Device

q - Quit/Exit

==> 98

Thursday, October 22, 2009 8


ioinfo utility (in open firmware)
-------------------------------------------------------------------------------
Welcome to AIX.
boot image timestamp: 06:26 10/01
The current time and date: 09:46:20 10/01/2009
processor count: 1; memory size: 8192MB; kernel size: 23463042
boot device: /vdevice/vfc-client@30000006/disk@50060e801530f310,0
-------------------------------------------------------------------------------

Thursday, October 22, 2009 9


Boot from SAN without Virtual I/O Server
• In this configuration, AIX has two paths to the boot disk.
(according to the zoning configuration, AIX can have four paths to
the boot disk but, itʼs not so beneficial to define more than 2 paths.)
AIX LPAR
• According to the disk vendor, various kinds of multipath S/Ws are
hdisk0 used (SDD, SDDPCM, Powerpath, HDLM, DMP, ...). Each multipath
MPIO S/W has its own steps for configuring boot disk in SAN environment.
boot disk
• The SCSI-2 reservation issue arise when the AIX default PCM is
used. The default value of reserve_policy is “single-path”, so the
first path which opens the boot disk reserves the boot disk. Because
fscsi0 fscsi1 this SCSI-2 reservation on boot disk is not cleared with reboot, The
boot procedure may be stopped with LED 554 (not always). To
circumvent this situation, You have to clear up this SCSI-2
reservation, but AIX doesnʼt provide the utility for clearing up SCSI-2
hdisk attribute : reservations.
reserve_policy=no_reserve
fscsi attribute : • Generally speaking, even though you have multiple paths to the
dyntrk=yes, boot disk, Use only one path during AIX installation. After breaking
fc_err_recov=fast_fail SCSI-2 reservation using relbootrsv or equivalents, connect the
other cables used by different paths.

Usually disk itself has a method to clear the SCSI-2 reservation.


Please contact the appropriate person according to the storage
device if SCSI-2 reservation canʼt be cleared up. Sometimes
unassign/reassign will break the SCSI-2 reservation.

SAN Switch #1 SAN Switch #2 •The fc_err_recov attribute of the fscsi is recommended to be
changed from delayed_fail to fast_fail, and this is the general
guideline for multipath environment. If you have only single path to
the boot disk, then delayed_failover is the recommended value.
hdiskx With fast_failover, the path failover will be done faster (15 seconds
around).

Disk • The dyntrk attribute of the fscsi is recommended to be changed


from no to yes. With dynamic tracking feature, you can move the FC
switch port of target device without reconfiguration.

Thursday, October 22, 2009 10


Boot from SAN with the Virtual I/O Servers
AIX LPAR

hdisk0 hdisk devices in client :


MPIO
boot disk algorithm = failover
path_priority= 1 path_priority= 2 reserve_policy = no_reserve
vscsi devices in client :
hcheck_mode = nonactive
vscsi_path_to = 30
hcheck_interval = 60+
vSCSI vSCSI
Client Adapter Client Adapter

VIOS1 VIOS2
vSCSI vSCSI
Server Adapter Server Adapter

hdiskx hdiskx

fcs0 fcs1 fcs0 fcs1

fscsi devices in client : SAN Switch #1 SAN Switch #2 hdisk devices in VIOS :
dyntrk = yes algorithm = round_robin
fc_err_recov=fast_fail reserve_policy = no_reserve
hcheck_mode = nonactive
hdiskx hcheck_interval = 60+

Disk

Thursday, October 22, 2009 11


Boot from SAN with the Virtual I/O Servers
- In Virtual I/O Server
• Each VIOS should access the boot device, so reserve_policy should be changed to
no_reserve prior to the client AIX installation. According to the disk vendor, various kinds of
multipath S/W or PCM can be used. In this case, configure the disk and path as their guide. if
AIX default PCM and MPIO is used, then the default value is single_path which uses the
SCSI-2 reservation, it should be changed to “no_reserve” before client AIX installation.

• If AIX is installed When the hdisk of VIOS has single_path attribute, then SCSI-2 reservation
is set. In this case, you have to break it up with relbootrsv or equivalent. But keep in mind that
always change the reserve policy before AIX install. But keep in mind that relbootrsv can only
run against the rootvg name. If “only specific disk” needs to be cleared, then you can “forced”
open against that specific disk with small aplications using openx().

• dynamic tracking is beneficial in case that the fiber adapter of a host is connected to the SAN
switch. Without this feature, reconfiguration of each LUNs is required once the scsi_id
(N_port ID) of disk HBA is changed.

• Use fast_fail for a fc_err_recov attribute, and this will minimize time for detecting state change
of the target device.

• The algorithm attribute of disk is recommended being changed to “round_robin” for each
VIOSes for spreading out the I/O traffics. (We have a defect on the round_robin feature.
Please apply IZ47220 (IZ52365) or appropriate on according to the AIX level before using this
attribute)

• If the hcheck_mode is nonactive (default value), health check command will be down to the
path which the I/O is not handled at specific time. By default, health check feature is disabled,
but once hcheck_interval is changed to non-zero value, it will be enabled. This value should

Thursday, October 22, 2009 12


Boot from SAN with the Virtual I/O Servers
- In Client

• The default value of reserve_policy for client hdisk is “no_reserve”. If the reserve_policy was
successfully changed to “no_reserve” on VIOSes, then there will be no SCSI-2 reservation
issue.

• For the VIO client, fail_over is the default algorithm and recommended. Also, for distribution
of I/O requests to 2 Virtual I/O server, path priority should be managed appropriately. There
may be several numbers of VIO client, by adjusting the path priority of each VIO client, you
can divide the I/O requests into 2 VIO servers. You can change the path priority like,

chpath -l hdisk0 -p vscsi1 -a priority=2

• If the health check is enabled by changing the hcheck_interval value from 0 to the other value
(20 seconds will be good start), then It will send a health check command to the devices
which donʼt handle the I/O at specific time. If health check is not turned on, then failed path
will not be available until it is manually enabled. Using health check feature, failed path can
be dynamically enabled when itʼs recovered. Also, inactive path (due to the low priority value)
can be checked, so unreasonable takeover (when the inactive one is not usable, and the all
the active paths are downed) can be avoidable.

Thursday, October 22, 2009 13


Boot from SAN with N-Port ID Virtualization
AIX LPAR

hdisk0 MPIO
boot disk hdisk devices in client :
algorithm = failover
reserve_policy = no_reserve
hcheck_mode = nonactive
hcheck_interval = 60+
fscsi devices in client :
dyntrk = yes (default) fcs0 fcs1
fc_err_recov = fast_fail (default)

VIOS1 VIOS2
vfchost1 vfchost1

fcs0 fcs0

fscsi devices in VIOS : SAN Switch #1 SAN Switch #2


not quite important because
the client AIX will use its own
fscsi driver.
hdiskx

Disk

Thursday, October 22, 2009 14


Boot from SAN with N-Port ID Virtualization
- In Virtual I/O Server

• With NPIV, Client has its own fscsi layer, as a result, we donʼt have to consider the attributes of
the fscsi device driver in VIOS side.

• For more information regarding the NPIV itself, please refer to the following document.

http://ausgsa.ibm.com/projects/o/oneteam/public/Itrans/
ItransProjectsCompleted.html (NPIV_Introduction and problem
determination hints.ppt written by Bertram Begau from IBM Germany

• Using NPIV, the boot process itself is very similar to that of AIX which has 2 physical fiber
adapters. SCSI-2 reservation must be considered during AIX installation.

• But, still client LPARs are using the physical fiber adapters residing on the VIOSes, error logs /
traces / dumps of VIOSes will be needed to debug problems.

Thursday, October 22, 2009 15


Boot from SAN with N-Port ID Virtualization
- In Client

• Using NPIV, the client AIX will have its own scsi_id through physical fiber adapters residing on
the VIOSes. In switchʼs perspective, this virtual fiber adapter is regarded as a separate fiber port.

• The considerations for SAN boot using NPIV is almost the same as using more than 2 physical
fiber adapters. During AIX installation, only one path to the boot disk should be used to handle
SCSI-2 reservation.

• NPIV is quite a new technology, care should be taken regarding the S/W and H/W prerequisites.

✓ Power6 based H/W


✓ 8Gb PCI Express Dual Port FC Adapter
✓ VIOS version 2.1.0.10-FP-20.1 or later
✓ HMC release 7.3.4.0 with MH01152 or later
✓ Minimum client level : AIX 6.1 TL02 SP02, AIX 5.3 TL09 SP02
✓ Firmware level : EM340_039
✓ SAN Switch support is required
✓ SDD 1.7.2.0 + PTF 1.7.2.2
✓ SDDPCM 2.2.0.0 + PTF 2.2.0.6, 2.4.0.0 + PTF 2.4.0.1

Thursday, October 22, 2009 16


Questions when SAN boot fails...

• Can firmware can detect the boot device?

• Can the boot device be accessible from other OS?

• Whatʼs the LED code?

• How many paths are defined for the boot device?

• Is this a fresh install of AIX or migration from other systems using mksysb or alt_disk_install?

• Is VIOS involved?

• Is there any error log entries in VIOS when the VIO client boot fails? (including NPIV)

• Does the bootlist have all paths appropriately? (using bootinfo -m normal -ov command)

Thursday, October 22, 2009 17


Other issues regarding SAN boot

• For any reason, if the rootvg of AIX canʼt be accessed during normal operation, then system may
be hung for very long time (over 10 minutes) or forever.

• Unfortunately, the dump procedure will be failed in many cases. (But always dump procedure
should be initiated after significant of time)

• If VIOS is used, the kernel traces for VIOS will be needed. In many cases, system dump for
VIOS is also required to verify the problem.

• If the LPAR is migrated to the other LPAR, the open firmware of new H/W doesnʼt know the
bootlist of that AIX image yet. So, In this case, SMS mode boot is required to put the valid
bootlist information to the NVRAM. (When LPM (Live Partition Mobility) is used, then you donʼt
have to do this)

Thursday, October 22, 2009 18


Case #1 : Boot failed with only one VIOS
- Environments

• 2 VIOSes are used. 20+ client partitions are using NPIV for their storage access, and rootvgs of
the clients are serviced through this NPIV virtual fiber adapters. Hitachi disk is used for rootvgs.

• Each client has 2 virtual fiber adapters, and 2 paths are configured to the rootvg.

- Symptoms

• If the first VIOS is downed, the client partitions still work properly using the second VIOS. The
access for rootvg has no problem at this moment.

• Client partition can successfully boot only with the first VIOS

• But, client partition canʼt boot only with the second VIOS. Customer reported that they could see
the virtual fiber adapter given by the second VIOS on SMS menu, but they couldnʼt see any
disks behind that virtual fiber adapter.

Thursday, October 22, 2009 19


Case #1 : Boot failed with only one VIOS
- Analysis
* bootlist

boot-device
2f766465 76696365 2f766663 2d636c69 [/vdevice/vfc-cli]
656e7440 33303030 30303035 2f646973 [ent@30000005/dis]
6b403530 30363065 38303135 33306633 [k@50060e801530f3]
30343a32 202f7664 65766963 652f7666 [04:2 /vdevice/vf]
632d636c 69656e74 40333030 30303030 [c-client@3000000]
362f6469 736b4035 30303630 65383031 [6/disk@50060e801]
35333066 3331303a 3200 [530f310:2.......]

* bootlist has 2 paths :


/vdevice/vfc-client@30000005/disk@50060e801530f304:2
/vdevice/vfc-client@30000006/disk@50060e801530f310:2

* When this lun is assigned to the other AIX partition, we can get the
output of lquerypv -h /dev/hdiskx. This is not a SCSI-2 Reservation
issue.

* In terms of configuration, we can’t find any problems.

Thursday, October 22, 2009 20


Case #1 : Boot failed with only one VIOS
- Analysis
* ioinfo was used for debugging.

1. 50060e801530f310,0 - 10240 MB Disk drive (bootable)


2. 50060e801530f310,1000000000000 - 35840 MB Disk drive

* interestingly, we can see the boot device with ioinfo utility!

INQUIRY DATA FOR : TARGET ==> 50060e801530f310 LUN ==> 0 - 10240 MB Disk drive (bootable)
000002f4cd00: 00 00 03 32 cf 00 00 02 48 49 54 41 43 48 49 20 :...2....HITACHI :
000002f4cd10: 4f 50 45 4e 2d 56 20 20 20 20 20 20 20 20 20 20 :OPEN-V :
000002f4cd20: 36 30 30 34 35 30 20 31 33 30 46 33 33 30 33 33 :600450 130F33033:
000002f4cd30: 20 32 41 20 01 01 01 01 00 00 00 00 00 00 00 00 : 2A ............:
000002f4cd40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 :................:
000002f4cd50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 :................:

* We can read the data from that disk.


* When the customer tried to boot in ioinfo utility, the AIX welcome
banner was shown but, boot failed with LED 554.
* LED 554 is “unknown boot device” or “can’t access the boot device”.
Also, the boot process was failed with LED 554, that means open
firmware can successfully load the boot image from disk but AIX can’t
proceed the boot process with that boot device.

Thursday, October 22, 2009 21


Case #1 : Boot failed with only one VIOS
- Analysis
*VIOS ERRPT entries

Oct 1 04:44:50 vfchost15 T VFC_HOST module: npiv_port_sciolst rc: 4F location: 2523


Oct 1 04:44:50 vfchost15 T VFC_HOST module: npiv_port_sciolst rc: 4F location: 2523
Oct 1 04:44:50 vfchost15 T VFC_HOST module: npiv_port_sciolst rc: 4F location: 2523
Oct 1 04:44:50 vfchost15 T VFC_HOST module: npiv_port_sciolst rc: 4F location: 2523
Oct 1 04:44:50 vfchost15 T VFC_HOST module: npiv_port_sciolst rc: 4F location: 2523
Oct 1 04:44:50 vfchost15 T VFC_HOST module: npiv_port_sciolst rc: 4F location: 2523
Oct 1 04:44:50 vfchost15 T VFC_HOST module: npiv_port_sciolst rc: 4F location: 2523
Oct 1 04:44:50 vfchost15 T VFC_HOST module: npiv_port_sciolst rc: 4F location: 2523

* At that time, a lot of vfchost errors were logged.

Detail Data
ADDITIONAL INFORMATION
module: npiv_port_sciolst rc: 000000000000004F location: 00002523
data: 1 9 29 0 CC080

#define FCPH_ELS_RJT_UNABLE 0x92900 /* Issue LS_RJT with the reason code */


/* of "Unable to Perform Command" and */
/* reason explanation of */
/* "Insufficient Resources to support */
/* Login" */

* And rc is 0x4F, ECONNREFUSED. and symptom is “Insufficient Resource


to support Login”
* After this test, the problem disappeared. So, now we are suspecting
that SAN has some problems.

Thursday, October 22, 2009 22


Case #2 : Boot failed with LED 554 or 557
- Environments

• This system has 2 NPIV paths and 1 physical fcs path to the boot disk (rootvg) on Netapp
storage.

- Symptoms

• After installing AIX, system was crashed during I/O test.

• After then Boot fails with LED 554 or 557.

Thursday, October 22, 2009 23


Case #2 : Boot failed with LED 554 or 557
- Debug boot log
+ read rc
+ 0< /tmp/rc
+ [ 0 -ne 0 ]
+ echo rc.boot: executing "mount /"
+ 1>> /tmp/boot_log
+ mount -f /
+ 2>& 1
exec(/usr/sbin/mount,-f,/){172132,159840}
exec(/sbin/helpers/jfs2/mount,-V,jfs2,-o,rw,log=/dev/hd8,/dev/hd4,/){176228,172132}
+ tee -a /../tmp/boot_log
exec(/usr/bin/tee,-a,/../tmp/boot_log){122956,114746}
Replaying log for /dev/hd4.
exec(/usr/bin/sh,-c,/sbin/helpers/jfs2/logredo /dev/hd8 > /dev/null){118882,176228}
exec(/sbin/helpers/jfs2/logredo,/dev/hd8){118882,176228}
mount: /dev/hd4 on /: Unformatted or incompatible media
+ print 1
+ 1> /../tmp/rc
+ read rc
+ 0< /../tmp/rc
+ [ 1 -ne 0 ]
+ loopled 0x557 ROOT MNT FAILED

• The system was hung with LED 0x557.

Thursday, October 22, 2009 24


Case #2 : Boot failed with LED 554 or 557
- errpt entry
erec_rec.resource_name .. hdisk6
00010600 00000000 00000000 00000000 ................
00000000 00000000 00000118 00000000 ................
00000000 00000000 00000000 00000000 ................
00000000 00000000 00000000 00000000 ................
00000000 00000000 00000000 00000000 ................
00000000 00000000 00000000 00000000 ................
00000000 00000000 00000000 00000000 ................
00000000 00000000 00000000 00000000 ................
00000000 00000000 00000000 00000000 ................
00000000 00000000 00000000 00000000 ................
00000000 00000000 00000000 00000000 ................
00000000 00000000 00000000 00000000 ................
00000000 00000000 00000000 00000600 ................
00000000 00000000 00000000 00000083 ................
00000000 003D001A .....=..

• Reservation conflict on hdisk6. hdisk6 is a bootdisk.

Thursday, October 22, 2009 25


Case #2 : Boot failed with LED 554 or 557
- ioinfo utility
from fcs0:
.
25. 500a098587e934b3,0 - 15360 MB Disk drive
26. 500a098587e934b3,1000000000000 - 15360 MB Disk drive
27. 500a098587e934b3,2000000000000 - 30720 MB Disk drive (bootable)
28. 500a098587e934b3,3000000000000 - 15360 MB Disk drive
.
from fcs1:
.
25. 500a098587e934b3,0 - 15360 MB Disk drive
26. 500a098587e934b3,1000000000000 - 15360 MB Disk drive
27. 500a098587e934b3,2000000000000 - ??? MB Disk drive
28. 500a098587e934b3,3000000000000 - 15360 MB Disk drive
.
from fcs3, which is the physical FC adapter:
.
25. 500a098587e934b3,0 - 15360 MB Disk drive
26. 500a098587e934b3,1000000000000 - 15360 MB Disk drive
27. 500a098587e934b3,2000000000000 - ??? MB Disk drive
28. 500a098587e934b3,3000000000000 - 15360 MB Disk drive

• So, a path through fcs0 set a SCSI-2 reservation on this disk. The boot disk canʼt be accessed
through the other 2 paths. So, the client will successfully boot with only one path (fcs0).

• To use only one path, changing zoning or disk assigning configurations or pulling out cables of
fcs1 and fcs2 will be helpful. Remind that during boot, using only one path is the best way to get
rid of the SCSI-2 reservation issue.

Thursday, October 22, 2009 26


Case #3 : Boot failed using mksysb image
- Environments

• Customer got an mksysb image, and tries to boot with those image in LPAR of a different CEC.

- Symptoms

• Customer couldnʼt boot with LED 996.

• If fiber adapter (not used for boot disk) is removed, then it boots successfully.

• If they install an AIX with fresh install, it boots successfully with the fiber adapters.

Thursday, October 22, 2009 27


Case #3 : Boot failed using mksysb image
- Analysis
*cfgmgr log after maintenance mode boot

# cfgmgr -vi /mnt/5307/installp/ppc/ >/tmp/cfgmgr.txt


Method error (/usr/lib/methods/cfgefc -l fcs0 ):
0514-040 Error initializing a device into the kernel.
Method error (/usr/lib/methods/cfgvioent -l ent1 ):
0514-040 Error initializing a device into the kernel.

* verbose output

Completed method for: fcs0, Elapsed time = 0


Return code = 40
*** no stdout ****
*** no stderr ****
----------------
Time: 0 LEDS: 0x25b2 for ent1
Number of running methods: 2
----------------
Completed method for: ent1, Elapsed time = 0
Return code = 40
*** no stdout ****
*** no stderr ****

* fcs0 and ent1 is not configured with return code 40 (E_CFGINIT)


* E_CFGINIT was returned due to sysconfig of those adapters failed with
return code -1

Thursday, October 22, 2009 28


Case #3 : Boot failed using mksysb image
- Analysis
* error report during booting

---------------------------------------------------------------------------
LABEL:
LGPG_FREED
IDENTIFIER:
C4C3339D
Date/Time: Mon Oct 12 17:36:48 2009
Sequence Number: 4168
Machine Id: 00C46DC24C00
Node Id: phls6840
Class: S
Type: INFO
Resource Name: SYSVMM
Description
ONE OR MORE LARGE PAGES HAS BEEN CONVERTED INTO PAGEABLE PAGES
Probable Causes
System at or near pinned memory limit.
Recommended Actions
Tune maxpin percentage or lgpg_regions.
Detail Data
Number of large pages attempted to free:
1
Number of large pages actually freed:
1
---------------------------------------------------------------------------*
* 1 large page (16MB page) was freed due to pinned memory shortage
* We are now suspecting that the memory size is different between two
LPARs and they are using 16MB pages (always pinned). Due to pinned memory
shortage at boot time, some devices including fcs0 is not configured and
boot hung. (more analysis is required)

Thursday, October 22, 2009 29


APPENDIX : debug boot procedure

• A system cannot boot when the open firmware can detect the boot device, the debug boot
procedure will be needed to get to know the steps in which the boot fails.

• This procedure is based on the LPAR environment.

• Please refer to the following website for debug boot.

http://www-01.ibm.com/support/docview.wss?uid=isg3T1000251

1. Log in to the HMC using ssh. (After allowing the ssh login)
2. Prepare the screen logging with “script” command like,
script -f debugboot.log <-- This log will be stored in HMC
3. Make a vterm of specific LPAR which has problem.
mkvterm -m <managed_system> --id <lpar_id>
4. Boot to the open firmware prompt (in HMC menu) if ok prompt is given,
ok> boot -s trap
KDB(0)> mw enter_dbg
enter_dbg+000000: 00000000 = 42
enter_dbg+000004: 00000000 = . (symbol dot)
KDB(0)> g
4-1. In the newer version (AIX 5.3 ML03 or later, AIX 5.2 ML07 or later), you can do this
easily,
ok> boot -s verbose
5. After getting the screen logs containing the error symptom, You can quit the virtual
session just type the following sequence in the vterm
#~. (tilde and period)

Thursday, October 22, 2009 30


References

• Web materials

- AIX higher availability using SAN services on IBM developerworks


- http://www.ibmsystemsmag.com

• Technical Documents
- Multipathing on AIX Version 2.1 by James Lee
- Understanding AIX boot process by Uma Sankar Atluri
- NPIV introduction and problem determination hints by Bertram Begau

• IBM Publications and Redbooks


- PowerVM Virtualization on IBM System p: Introduction and Configuration (SG24-7940)
- Multipath Subsystem Device Driver Userʼs Guide (GC52-1309)

Thursday, October 22, 2009 31

S-ar putea să vă placă și