Sunteți pe pagina 1din 8

c „

This document goes over local boot issues, errors and resolutions.
„
„
„
„
6  
  „
The following are some of the most common local boot errors encountered and
possible workarounds.
„
„
„
Local boot

1) Time out waiting for arp/rarp:


Note: If this is an Enterprise Server, check to make sure that the "key
switch"
located on the front panel is not in diag mode "positioned to the right"
The switch should be in the vertical position, "straight up and down".

If this error occurs immediately after the boot command is issued (but prior
to
seeing the OS release displayed) look at diag-switch. (From the ok prompt
type
"printenv diag-switch ") Check to see if it is set to true. (See the
above note
regarding Enterprise Systems.) When diag-switch is set to true the system
will look
attempt to boot from the diag-device (rather than the boot-device.) By
default the
diag-device is set to net

I have seen cases where diag-switch is set to false and the system after
reset
will still attempt to boot from the diag-device. In this case it may be
required
to set the diag-device to the root disk (i.e. "disk") (This may indicate a
stuck bit.)
Another possibility is that the first boot-device is not being seen, so It
will look for the second boot-device which by default is "net". Do a probe
for
that disk. (The probe command depends on the system type.) If arp/rarp
time-outs
appear after the kernel starts (i.e. OS release is displayed on screen) this
would indicate
that there is "system" call attempting to go over the net. This can
sometimes be
debugged by placing a set -x in a /etc run level file (e.g. rcS, rc2, rc3)
for the
suspected startup run levels or a boot -v from the OK prompt to the root
drive.
=============================================================================
=
2) Can't open kernel/unix:
This can mean that either the system can't open kernel/unix due to wrong rev.
of OS for that system (during a cdrom boot) or that the kernel is corrupt and
can't be opened. Or finally, I have seen the boot-file set to kernel/unix but
with a
extra character which makes the system unbootable even from cdrom. Solution
would be to do a set-default boot-file.
=============================================================================
=
3) Can't mount/usr:
Possibilities on this error include /usr is not mountable due to corruption
or bad
super block. Also, that the /etc/vfstab entry is either missing or pointing
to
the incorrect block device or mount point. Also see Symptom Resolution <
Solution: 216132> which addresses
a /dev link issue.
=============================================================================
=
4) Cannot create /var/adm/utmp or /utmpx:
First check (by booting from cdrom with the -s option) that /var can be
mounted.
(fsck may have to be run against /var if its a separate partition.) This
error
almost always points to a corrupted kernel that is caused by a kernel patch
that was added in multi-user mode. There are SRDB's out that recommends to
boot
single user from cdrom (i.e. "boot cdrom -s" from the ok prompt). However I
have
never seen this procedure work. What we have had to resort to is:
(1) Restore the system disk from backup. (if available)
(2) Run an upgrade from either the same OS release and revision. For
example from
Solaris[TM] 2.6 5/98 to Solaris 2.6 5/98. Doing an upgrade is not a
guaranteed
fix however this has been done successfully. After this procedure It may be
necessary to repatch the system to the most current patch cluster revision
and or individual needed patches.
(3) As a last resort,
backup all data files and reinstall.
=============================================================================
=
5) Spawning too rapidly:
This indicates corruption and can normally be corrected by booting single
user from cdromand running fsck on the raw logical device for the root disk.

example: fsck /dev/rdsk/c0t0d0s0

It may be necessary to run this command repeatedly until it runs cleanly.


Then
halt the system and attempt a boot from the root disk.
=============================================================================
=
6) File just loaded does not appear to be executable:
This can be a missing or corrupt boot block. If the boot alias or
boot path is not correct or if the root drive is not known, it may be
necessary to boot single user from a cdrom and run format too look at the
partition table to determine the boot partition. If the boot alias is
correct it may be necessary to install a boot block. First mount the root
drive to verify that root (and the files and directories) are all
there. If so install a boot block with the following procedure:

ok boot cdrom -s
# cd /
# fsck /dev/dsk/c0t3d0s0
# mount /dev/dsk/c0t3d0s0 /a

*** Solaris releases 2.x to 2.4 ***

# cp /ufsboot /a/ufsboot
# cd /
# umount /a
# /usr/sbin/installboot /usr/lib/fs/ufs/bootblk /dev/rdsk/c0t3d0s0
# halt

*** Solaris 2.5 and later releases ***

# cp /platform/`uname -i`/ufsboot /a/platform/`uname -i`/ufsboot


# cd /
# umount /a
# /usr/sbin/installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk \
/dev/rdsk/c0t3d0s0
# halt
=============================================================================
=
7) Can't find boot program:
This is similar to the above error but this indicates a missing boot program.
This can be caused by missing root files/directories or of course by the boot
alias/path being incorrect.
=============================================================================
=
8) Fast data access mmu miss (or CPU panics):
This is a memory management error. It can be caused by bad memory or
by corrupt software info. There is a known issue with Sun Enterprise[TM]
Systems with the 400MHz CPU with 8MB cache as described in following info:

When trying to install Solaris 2.5.1 HW 11/97 or 2.6 HW 5/98 on a Ex000


server
with a 400MHz/8MB CPU module, booting from cdrom or network install server
produces Fast Data Access MMU Miss error or panic with mutex_enter: bad
mutex.

Solution:
NOTE: This procedure requires downloading and applying patches so system must
have a network connection.

1. Verify the OpenBoot[TM] PROM (OBP) version by typing ".Version" at the


OK> prompt or use the /usr/sbin/prtconf -V command at the UNIX prompt. If
needed, upgrade to at least flash PROM version 3.2.21 using patch 103346-22
Click Here
or greater.

2. OK> setenv auto-boot false

3. OK> reset (usually not needed with 2.6)

4. OK> limit-ecache-size

5. OK> boot cdrom (at least 2.5.1 HW 11/97 or 2.6 HW 3/98)

6. Install the OS but do not allow auto-reboot !

7. # init 0

8. OK> reset (usually not needed with 2.6)

9. OK> limit-ecache-size

10. OK> boot

11. Make sure you have a network connection, FTP to sunsolve.sun.com and get

latest kernel patch (minimum levels to support 400 mhz/8mb cache


listed):

Solaris 2.5.1 -->103640-27Click Here and prtdiag patch 104595-08Click


Here.
Solaris 2.6 -->105181-14Click Here

12. Patches should be installed single-user so bring system down with "init
S"
13. Reboot.c „

This document goes over local boot issues, errors and resolutions.
„
„
„
„
6  
  „
The following are some of the most common local boot errors encountered and
possible workarounds.
„
„
„
Local boot

1) Time out waiting for arp/rarp:


Note: If this is an Enterprise Server, check to make sure that the "key
switch"
located on the front panel is not in diag mode "positioned to the right"
The switch should be in the vertical position, "straight up and down".

If this error occurs immediately after the boot command is issued (but prior
to
seeing the OS release displayed) look at diag-switch. (From the ok prompt
type
"printenv diag-switch ") Check to see if it is set to true. (See the
above note
regarding Enterprise Systems.) When diag-switch is set to true the system
will look
attempt to boot from the diag-device (rather than the boot-device.) By
default the
diag-device is set to net

I have seen cases where diag-switch is set to false and the system after
reset
will still attempt to boot from the diag-device. In this case it may be
required
to set the diag-device to the root disk (i.e. "disk") (This may indicate a
stuck bit.)
Another possibility is that the first boot-device is not being seen, so It
will look for the second boot-device which by default is "net". Do a probe
for
that disk. (The probe command depends on the system type.) If arp/rarp
time-outs
appear after the kernel starts (i.e. OS release is displayed on screen) this
would indicate
that there is "system" call attempting to go over the net. This can
sometimes be
debugged by placing a set -x in a /etc run level file (e.g. rcS, rc2, rc3)
for the
suspected startup run levels or a boot -v from the OK prompt to the root
drive.
=============================================================================
=
2) Can't open kernel/unix:
This can mean that either the system can't open kernel/unix due to wrong rev.
of OS for that system (during a cdrom boot) or that the kernel is corrupt and
can't be opened. Or finally, I have seen the boot-file set to kernel/unix but
with a
extra character which makes the system unbootable even from cdrom. Solution
would be to do a set-default boot-file.
=============================================================================
=
3) Can't mount/usr:
Possibilities on this error include /usr is not mountable due to corruption
or bad
super block. Also, that the /etc/vfstab entry is either missing or pointing
to
the incorrect block device or mount point. Also see Symptom Resolution <
Solution: 216132> which addresses
a /dev link issue.
=============================================================================
=
4) Cannot create /var/adm/utmp or /utmpx:
First check (by booting from cdrom with the -s option) that /var can be
mounted.
(fsck may have to be run against /var if its a separate partition.) This
error
almost always points to a corrupted kernel that is caused by a kernel patch
that was added in multi-user mode. There are SRDB's out that recommends to
boot
single user from cdrom (i.e. "boot cdrom -s" from the ok prompt). However I
have
never seen this procedure work. What we have had to resort to is:
(1) Restore the system disk from backup. (if available)
(2) Run an upgrade from either the same OS release and revision. For
example from
Solaris[TM] 2.6 5/98 to Solaris 2.6 5/98. Doing an upgrade is not a
guaranteed
fix however this has been done successfully. After this procedure It may be
necessary to repatch the system to the most current patch cluster revision
and or individual needed patches.
(3) As a last resort,
backup all data files and reinstall.
=============================================================================
=
5) Spawning too rapidly:
This indicates corruption and can normally be corrected by booting single
user from cdromand running fsck on the raw logical device for the root disk.

example: fsck /dev/rdsk/c0t0d0s0

It may be necessary to run this command repeatedly until it runs cleanly.


Then
halt the system and attempt a boot from the root disk.
=============================================================================
=
6) File just loaded does not appear to be executable:
This can be a missing or corrupt boot block. If the boot alias or
boot path is not correct or if the root drive is not known, it may be
necessary to boot single user from a cdrom and run format too look at the
partition table to determine the boot partition. If the boot alias is
correct it may be necessary to install a boot block. First mount the root
drive to verify that root (and the files and directories) are all
there. If so install a boot block with the following procedure:

ok boot cdrom -s
# cd /
# fsck /dev/dsk/c0t3d0s0
# mount /dev/dsk/c0t3d0s0 /a

*** Solaris releases 2.x to 2.4 ***

# cp /ufsboot /a/ufsboot
# cd /
# umount /a
# /usr/sbin/installboot /usr/lib/fs/ufs/bootblk /dev/rdsk/c0t3d0s0
# halt

*** Solaris 2.5 and later releases ***

# cp /platform/`uname -i`/ufsboot /a/platform/`uname -i`/ufsboot


# cd /
# umount /a
# /usr/sbin/installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk \
/dev/rdsk/c0t3d0s0
# halt
=============================================================================
=
7) Can't find boot program:
This is similar to the above error but this indicates a missing boot program.
This can be caused by missing root files/directories or of course by the boot
alias/path being incorrect.
=============================================================================
=
8) Fast data access mmu miss (or CPU panics):
This is a memory management error. It can be caused by bad memory or
by corrupt software info. There is a known issue with Sun Enterprise[TM]
Systems with the 400MHz CPU with 8MB cache as described in following info:

When trying to install Solaris 2.5.1 HW 11/97 or 2.6 HW 5/98 on a Ex000


server
with a 400MHz/8MB CPU module, booting from cdrom or network install server
produces Fast Data Access MMU Miss error or panic with mutex_enter: bad
mutex.

Solution:
NOTE: This procedure requires downloading and applying patches so system must
have a network connection.

1. Verify the OpenBoot[TM] PROM (OBP) version by typing ".Version" at the


OK> prompt or use the /usr/sbin/prtconf -V command at the UNIX prompt. If
needed, upgrade to at least flash PROM version 3.2.21 using patch 103346-22
Click Here
or greater.
2. OK> setenv auto-boot false

3. OK> reset (usually not needed with 2.6)

4. OK> limit-ecache-size

5. OK> boot cdrom (at least 2.5.1 HW 11/97 or 2.6 HW 3/98)

6. Install the OS but do not allow auto-reboot !

7. # init 0

8. OK> reset (usually not needed with 2.6)

9. OK> limit-ecache-size

10. OK> boot

11. Make sure you have a network connection, FTP to sunsolve.sun.com and get

latest kernel patch (minimum levels to support 400 mhz/8mb cache


listed):

Solaris 2.5.1 -->103640-27Click Here and prtdiag patch 104595-08Click


Here.
Solaris 2.6 -->105181-14Click Here

12. Patches should be installed single-user so bring system down with "init
S"

13. Reboot.

S-ar putea să vă placă și