Documente Academic
Documente Profesional
Documente Cultură
Emergency Maintenance
Issue Date
01 2009-04-15
Huawei Technologies Co., Ltd. provides customers with comprehensive technical support and service. For any assistance, please contact our local office or company headquarters.
Website: Email:
Copyright Huawei Technologies Co., Ltd. 2009. All rights reserved. No part of this document may be reproduced or transmitted in any form or by any means without prior written consent of Huawei Technologies Co., Ltd.
Notice
The information in this document is subject to change without notice. Every effort has been made in the preparation of this document to ensure accuracy of the contents, but the statements, information, and recommendations in this document do not constitute a warranty of any kind, express or implied.
Contents
Contents
About This Document.....................................................................................................................1 1 Overview of Emergency Maintenance...................................................................................1-1
1.1 Definition of Emergency Maintenance...........................................................................................................1-2 1.2 Definition of Emergencies..............................................................................................................................1-2 1.3 Initiation of Emergency Maintenance.............................................................................................................1-2 1.4 Guidelines for Emergency Maintenance.........................................................................................................1-3 1.5 Flow for Emergency Maintenance..................................................................................................................1-3 1.5.1 Notifying Huawei of the Emergency.....................................................................................................1-5 1.5.2 Locating the Fault...................................................................................................................................1-5 1.5.3 Collecting Fault Information..................................................................................................................1-5 1.5.4 Rectifying the Fault................................................................................................................................1-6 1.5.5 Obtaining Help.......................................................................................................................................1-7 1.5.6 Checking the Handling Result................................................................................................................1-7 1.5.7 Recording Information About Emergency Maintenance.......................................................................1-7 1.6 Emergency Maintenance Precautions.............................................................................................................1-8 1.7 Technical Support...........................................................................................................................................1-9
Contents
4.1 Overview.........................................................................................................................................................4-2 4.2 Collection of Basic Fault Information.............................................................................................................4-2 4.3 Collection of Device Fault Information..........................................................................................................4-2
ii
Issue 01 (2009-04-15)
Figures
Figures
Figure 1-1 Flowchart of emergency maintenance................................................................................................1-4 Figure 1-2 Flowchart for identifying the type of a fault.......................................................................................1-6 Figure 2-1 Flowchart for handling device faults..................................................................................................2-3 Figure 2-2 Flowchart for handling the failed login to a system through the console interface............................2-5 Figure 2-3 Flowchart for handling the failed system start...................................................................................2-8 Figure 2-4 Flowchart for handling the abnormality of the board status.............................................................2-10 Figure 2-5 Flowchart for handling the abnormality of the interface status........................................................2-12 Figure 3-1 Flowchart for handling service faults.................................................................................................3-3 Figure 3-2 Flowchart for handling the failure to forward IP unicast packets......................................................3-6 Figure 3-3 Flowchart for handling the failure to forward IP multicast packets.................................................3-11 Figure 3-4 Flowchart for handling the failure to forward MPLS VPN packets.................................................3-15
Issue 01 (2009-04-15)
iii
Tables
Tables
Table 1-1 Methods of identifying the fault type...................................................................................................1-6 Table 2-1 Collection of information about the failure to log in to a system through the console interface.........2-4 Table 2-2 Collection of information about the failure to start a system...............................................................2-7 Table 2-3 Collection of information about the abnormality of the board status................................................2-10 Table 2-4 Collection of information about the abnormality of the interface status............................................2-11 Table 3-1 Collection of information about the failure to forward IP unicast packets..........................................3-4 Table 3-2 Collection of information about the failure to forward IP multicast packets.......................................3-9 Table 3-3 Collection of information about the failure to forward MPLS VPN packets.....................................3-14 Table 4-1 Collection of basic fault information...................................................................................................4-2 Table 4-2 Collection of device fault information.................................................................................................4-3 Table 6-1 Notice of emergency maintenance.......................................................................................................6-2
Issue 01 (2009-04-15)
Related Versions
The following table lists the product versions related to this document. Product Name S9300 Version V100R001
Intended Audience
This document is intended for:
l l l l
Policy planning engineers Installation and commissioning engineers NM configuration engineers Technical support engineers
Organization
This document is organized as follows. Chapter 1 Overview of Emergency Maintenance 2 Emergency Maintenance for Device Faults Description Provides the definitions, causes, principle, flowcharts, and precautions of emergency maintenance. Describes the emergency maintenance for device faults, focusing on fault clearance and service recovery and not fault rectification.
1
Issue 01 (2009-04-15)
Chapter 3 Emergency Maintenance for Service Faults 4 Guide to Fault Information Collection 5 Guide to System Reboot
Description Describes the emergency maintenance for service faults, focusing on fault clearance and service recovery rather than fault rectification. Describes how to collect and back up fault information on time after an emergency fault occurs. Describes how to restart the device manually when services are interrupted because of a device fault and the device cannot restart automatically. Describes the tables that you need to fill in when performing emergency maintenance. Describes how to upgrade software through BIOS when the host software program fails to start.
Conventions
Symbol Conventions
The symbols that may be found in this document are defined as follows. Symbol Description
DANGER
Indicates a hazard with a high level of risk, which if not avoided, will result in death or serious injury. Indicates a hazard with a medium or low level of risk, which if not avoided, could result in minor or moderate injury. Indicates a potentially hazardous situation, which if not avoided, could result in equipment damage, data loss, performance degradation, or unexpected results. Indicates a tip that may help you solve a problem or save time. Provides additional information to emphasize or supplement important points of the main text.
WARNING
CAUTION
TIP
NOTE
General Conventions
The general conventions that may be found in this document are defined as follows. Convention Times New Roman
2
Description Names of files, directories, folders, and users are in boldface. For example, log in as user root. Book titles are in italics. Examples of information displayed on the screen are in Courier New.
Command Conventions
The command conventions that may be found in this document are defined as follows. Convention Boldface Italic [] { x | y | ... } [ x | y | ... ] { x | y | ... }* Description The keywords of a command line are in boldface. Command arguments are in italics. Items (keywords or arguments) in brackets [ ] are optional. Optional items are grouped in braces and separated by vertical bars. One item is selected. Optional items are grouped in brackets and separated by vertical bars. One item is selected or no item is selected. Optional items are grouped in braces and separated by vertical bars. A minimum of one item or a maximum of all items can be selected. Optional items are grouped in brackets and separated by vertical bars. Several items or no item can be selected. The parameter before the & sign can be repeated 1 to n times. A line starting with the # sign is comments.
[ x | y | ... ]* &<1-n> #
GUI Conventions
The GUI conventions that may be found in this document are defined as follows. Convention Boldface > Description Buttons, menus, parameters, tabs, window, and dialog titles are in boldface. For example, click OK. Multi-level menus are in boldface and separated by the ">" signs. For example, choose File > Create > Folder.
Keyboard Operations
The keyboard operations that may be found in this document are defined as follows.
Issue 01 (2009-04-15) Huawei Proprietary and Confidential Copyright Huawei Technologies Co., Ltd. 3
Description Press the key. For example, press Enter and press Tab. Press the keys concurrently. For example, pressing Ctrl+Alt +A means the three keys should be pressed concurrently. Press the keys in turn. For example, pressing Alt, A means the two keys should be pressed in turn.
Mouse Operations
The mouse operations that may be found in this document are defined as follows. Action Click Double-click Drag Description Select and release the primary mouse button without moving the pointer. Press the primary mouse button twice continuously and quickly without moving the pointer. Press and hold the primary mouse button and move the pointer to a certain position.
Update History
Updates between document issues are cumulative. Therefore, the latest document issue contains all updates made in previous issues.
Issue 01 (2009-04-15)
Issue 01 (2009-04-15)
1-1
Abnormal system: All services are interrupted. Abnormal Switch Routing Unit (SRU) or Main Control Unit (MCU): All services are interrupted. Abnormal service card: Some services are interrupted. Abnormal service module: Some services are interrupted. Abnormal network: Network services are interrupted.
l l l
Generally, alarms and logs about an abnormality are displayed before an emergency arises. You can determine whether an emergency occurs by checking either alarms and logs or a complaint of a customer.
NOTE
The roadmap of emergency maintenance described in this chapter applies to emergencies. For common troubleshooting, refer to the Quidway S9300Terabit Routing Switch Troubleshooting.
Complaints of customers
Huawei Proprietary and Confidential Copyright Huawei Technologies Co., Ltd. Issue 01 (2009-04-15)
1-2
A complaint of a customer is a main cause for emergency maintenance. When a fault reported by a customer or the Customer Service Center conforms to the conditions in Definition of Emergencies. initiate the emergency maintenance.
l
Alarm indication When you check the alarms output by the Network Management System (NMS) or displayed on the terminal, initiate the emergency maintenance if the alarms possibly cause a wide range of service failures.
Natural disaster When a natural disaster such as an earthquake, a fire, or a flood happens, it is required to temporarily power off devices to prevent them from damages. Therefore, the emergency maintenance need be initiated. Then power on the devices again after the disaster.
To keep the stable running of a device and minimize the probability of emergencies, refer to the Quidway S9300Terabit Routing Switch Routine Maintenance. The core function of emergency maintenance is to recover system operation and service provisioning as soon as possible. To respond to an emergency, you must have ready plans to cope with various emergencies according to the emergency maintenance manual. Managers and maintenance personnel must be familiar with the plans and well-trained. The maintenance personnel must attend the emergency maintenance training, which is mandatory for maintenance personnel. You must learn the basic methods of identifying emergent faults and how to handle them. When an emergency occurs, keep calm and check whether the hardware devices and the routing are working normally. Then check whether the emergency is caused by an S9300. If it is caused by the S9300, handle the fault according to the prepared schemes or the procedures in this manual. The CF card contains important data. When an emergency occurs, do not format the CF card before consulting Huawei engineers. Contact the Customer Service Center or the local office of Huawei early for technical support during troubleshooting. After handling an emergent fault, collect alarm information related to this fault and send the fault handling report, device alarm files, and log files to Huawei for analysis. This can help Huawei to improve the after-sales service.
Issue 01 (2009-04-15)
1-3
l l
You must maintain detailed records of operations and results for further reference by Huawei engineers during troubleshooting so that they can handle a fault quickly. When a fault persists, contact Huawei Customer Service Center. For contact information, see Technical Support.
The main purpose of emergency maintenance is to recover a system as soon as possible. Figure 1-1 shows the flowchart of emergency maintenance. Figure 1-1 Flowchart of emergency maintenance
Start Notify Huawei of the Emergency Locate the Fault Collect fault information Rectify the Fault
Service recover?
No
Obtain help
Yes Check the handling result Record information about emergency maintenance End
1.5.1 Notifying Huawei of the Emergency 1.5.2 Locating the Fault 1.5.3 Collecting Fault Information 1.5.4 Rectifying the Fault 1.5.5 Obtaining Help 1.5.6 Checking the Handling Result 1.5.7 Recording Information About Emergency Maintenance
1-4 Huawei Proprietary and Confidential Copyright Huawei Technologies Co., Ltd. Issue 01 (2009-04-15)
Even if you can independently complete emergency maintenance with the guidance of this manual, notify Huawei of the emergency. Then Huawei technical personnel maintain records of the fault to improve aftersales services.
Abnormal system: All services are interrupted. Abnormal Switch Routing Unit (SRU) or Main Control Unit (MCU): All services are interrupted. Abnormal service card: Some services are interrupted. Abnormal service module: Some services are interrupted. Abnormal network: Network services are interrupted.
l l l
Specific time when the fault occurs Detailed description of the fault Software version of the S9300 Measures taken after the fault and the results Severity level of the problem and expected time of system recovery
Indicator status of the boards, power modules, and fans Device alarms Device logs Device configuration Device debugging information if the debugging is enabled
Huawei Proprietary and Confidential Copyright Huawei Technologies Co., Ltd. 1-5
Issue 01 (2009-04-15)
Can log in through the console interface? Yes System starts normally? Yes Board status is normal? Yes Interface status is normal? Yes A service fault occurs
No
No
No
No
Table 1-1 Methods of identifying the fault type Item Login through the console interface Identifying Method Connect the COM port of the PC or terminal to the console interface of the S9300 with a standard RS-232 configuration cable and set relevant parameters correctly on the terminal. For details, refer to the Quidway S9300 Terabit Routing Switch Configuration Guide - Basic Configurations. Check that a terminal displays normally, for example, <Quidway> is available on the terminal. Check whether the system starts normally. If the command prompt such as <Quidway> is displayed, it means that the system starts normally.
System startup
1-6
Issue 01 (2009-04-15)
Identifying Method Run the display device command on the terminal to check whether the status of all boards is Normal. In the case of a local fault, check the status of the service board connected to the user who reports the fault. For example:
<Quidway> display device S9312's Device status: Slot Sub Type Online Primary - - - - - - - - - - - - - 9 LPU Present 13 SRU Present Master Power Register Alarm
Interface status
Run the display interface command on the terminal to check whether the status of the interface connected to the user who reports the fault is Up and whether more packets are transmitted and received on the interface during a specified period. For example:
<Quidway> display interface GigabitEthernet 1/0/12 GigabitEthernet1/0/12 current state : UP Description:HUAWEI, Quidway Series, GigabitEthernet1/0/12 Interface Switch Port,PVID : 1,The Maximum Frame Length is 1526 IP Sending Frames' Format is PKTFMT_ETHNT_2, Hardware address is 0018-2000-0083 Speed : 1000, Loopback: NONE Duplex: FULL, Negotiation: ENABLE Mdi : NORMAL Last 300 seconds input rate 0 bits/sec, 0 packets/sec Last 300 seconds output rate 616 bits/sec, 0 packets/sec Input: 0 packets, 0 bytes Unicast: 0, NUnicast: 0 Discard: 0, Error : 0 Jumbo : 0 Output: 191636 packets, 18992248 bytes Unicast: 12, NUnicast: 191624 Discard: 19, Error : 0 Jumbo : 0
After you identify the fault type, see 2 Emergency Maintenance for Device Faults and 3 Emergency Maintenance for Service Faults to proceed with emergency maintenance.
Time of emergency maintenance Version information Fault symptom Handling procedure and result
For the format of an information record table, refer to Appendix A Emergency Maintenance Record Table. You need to record the output information during emergency maintenance by using the Capture Text function of the HyperTerminal or the related functions of other Telnet terminals.
Static Electricity
Wear an ESD wrist strap before operating a board or the backplane, and follow these rules:
l
Perform active/standby switchover if the board to be replaced is an active SRU/MCU. After the active/standby switchover, remove the board. The standby SRU/MCU can be removed directly without active/standby switchover. When the board to be replaced is a standby SRU/MCU, an LPU, or a CMU, run the power off slot slot-id command to power off the board, and then remove the board.
l l
Always hold the board in an antistatic bag before installing it. Always place the removed board in an antistatic bag.
Laser/LED
When you maintain a device with an optical module or optical interface, follow these rules:
l
Do not look straight into the optical fiber from which the light beam shoots out when you install and maintain the optical fiber. Do not look straight into the connector of the optical fiber from which the light beam shoots out when you replace the pluggable optical module. Only the qualified personnel who have attended training can operate the optical module and optical fiber.
CAUTION
When you install and maintain the optical fiber, keep the connector of the optical fiber clean, unfolded, and straight.
1-8
Issue 01 (2009-04-15)
l l
For contact information about local offices, log in to http://support.huawei.com. For ease of contacting technical support personnel, it is recommended to make a phone directory and mark it on the maintenance site. The phone directory can contain contact information about the superior maintenance personnel, Huawei engineers, transmission office maintenance personnel, and remote office maintenance personnel. At least two contact methods of each person must be provided.
The maintenance personnel need maintain a detailed record of the emergency maintenance procedures, notify Huawei of the type of the board to be replaced, and apply for a spare one according to the warranty articles. The fault can thus be removed sooner. The fax can adopt the format of the Notice of Emergency Maintenance. For the details, refer to Appendix A Emergency Maintenance Record Table.
Issue 01 (2009-04-15)
1-9
Issue 01 (2009-04-15)
2-1
2.1 Overview
This section describes the definition and types of device faults. A device fault refers to a hardware failure of a device. To rectify a device fault, you must reset, repair, or replace the relevant hardware. During the running of a device, you can determine that a device fault occurs and initiate the emergency maintenance in either of the following cases:
l l l l
You fail to log in to the system through the console interface. You fail to start the system. The board status is abnormal. The interface status is abnormal.
2-2
Issue 01 (2009-04-15)
Can log in through the console interface? Yes System starts normally? Yes Board status is normal? Yes Interface status is normal? Yes Proceed to the flow for handling service faults
No
No
No
Handle the abnormality of the board status Handle the abnormality of the interface status
No
RUN, ALM, and ACT indicators of the MCU/SRU RUN, ALM, and FAULT indicators of the power modules Status indicators of the fans
2-4
Issue 01 (2009-04-15)
Handling Flowchart
Figure 2-2 Flowchart for handling the failed login to a system through the console interface
start
No
Fault rectified? No
Yes
No
Fault rectified? No
Yes
No
Fault rectified? No
Yes
No
Fault rectified? No
Yes
End
Issue 01 (2009-04-15)
2-5
CAUTION
All the following steps can be performed only when the user services are already interrupted. If the user services are not interrupted, collect fault information and provide feedback to Huawei engineers for further processing.
Procedure
Step 1 Check and modify the parameters of the COM port. Check whether the parameters of the COM port are identical with those of the console interface on the S9300. If the parameters are not identical, modify the parameters of the COM port. By default, the console interface of the S9300 adopts 9600 bps as the baud rate, 8 as the data bit, 1 as the stop bit, no parity check, and no flow control.
NOTE
When the parameters of the console interface are modified, adopt the modification.
Step 2 Check and replace the cable. If the parameters of the COM port are correct, check whether the cable is in good condition. You can replace the cable with a new one to check that you can normally log in. Step 3 Check and repair the power supply system. When you find that the indicators of all the boards are off and all the fans fail to work (can be determined by listening to fans rotating), or the ALM indicator of the power module is on, the power supply system of the device is possibly faulty and need repairs. The power supply system consists of the following:
l l l
Power supply system of the equipment room, chassis, or cabinet Power module Power supply system of the backplane
Check that the power module is switched on. When there are multiple power modules, ensure that at least one works normally. Check whether the ALM indicator of the power module is on. If so, it indicates that the power module is faulty. You can replace the power module to solve the problem. When no problem is found after the preceding checking, but the power supply system fails to work, see Technical Support for Huawei technical support.
Step 4 Exchange and replace the MCU/SRU. After you confirm that the parameters of the COM port are correctly set, the cable is in good condition, and the power supply system works normally, the MCU/SRU is possibly faulty. When there are active and standby MCUs/SRUs, you can connect the configuration cable to the standby MCU/SRU; when there is only one MCU/SRU, you can replace it with a spare one. Step 5 Reset the system.
2-6 Huawei Proprietary and Confidential Copyright Huawei Technologies Co., Ltd. Issue 01 (2009-04-15)
After you perform the preceding steps, you can reset the system if the fault persists. You can switch off the power and switch on the power module after three minutes to reset the system. Step 6 Seek technical support. For seeking Huawei technical support, see Technical Support. ----End
The terminal displays a message indicating that initialization fails. The terminal stops at the file decompression state for a long period. The system restarts continuously.
Issue 01 (2009-04-15)
2-7
Handling Flowchart
Figure 2-3 Flowchart for handling the failed system start
Start
Yes
Fault rectified? No
Yes
No
Fault rectified? No
Yes
Yes
Fault rectified? No
Yes
Yes
End
CAUTION
All the following steps can be performed only when the user services are already interrupted. If the user services are not interrupted, collect fault information and provide feedback to Huawei engineers for further processing.
2-8
Issue 01 (2009-04-15)
Procedure
Step 1 Remove and insert the CF card. If the "CF Card Init.....FAIL!" message is displayed, the CF card may be held loosely. You can try the following operations to solve the problem: 1. 2. 3. Remove the MCU/SRU. Remove the CF card, and then insert it. Re-insert the MCU/SRU.
Step 2 Replace the CF card. If the fault cannot be rectified after the CF card is re-inserted, you need to replace the CF card. Step 3 Replace the MCU/SRU. When either the system prompts "Initializing module IPC_VP_CHANNEL.................FAIL!", or the memory self-test still fails after you perform Steps 1 and 2, the MCU/SRU is possibly faulty. You can try to replace the MCU/SRU. When there is only one MCU/SRU, you can replace it with a spare one. Step 4 Upload the startup file through BIOS again. When the system stops at the phase of file decompression or continuously restarts, the startup file is possibly incorrect or damaged. You can try to upload the startup file through BIOS. It is complicated to upload the startup file through BIOS. Contact Huawei engineers and perform the uploading with their guidance. For the procedures, see System Upgrading Through BIOS. Step 5 Seek technical support. For seeking Huawei technical support, see Technical Support. ----End
When you run the display device command to view information about a board, the board status is Abnormal. When you run the display device command to view information about a board, the board status is Unregistered. The RUN/ALM indicator of a board blinks at a frequency of 2 Hz or the red indicator is on. A board continuously restarts.
Table 2-3 Collection of information about the abnormality of the board status No. 1 2 Collecting Item Indicator status of a board Detailed information about a board Collecting Method Check whether the indicator of a board is off, is on, blinks at a frequency of 2 Hz, or blinks at a frequency of 1 Hz. Check detailed information about a board by using the display device slot-id command.
Handling Flowchart
Figure 2-4 Flowchart for handling the abnormality of the board status
Start
Yes
No
Cut over the services on the board and seek technical support
CAUTION
All the following steps can be performed only when the user services are already interrupted. If the user services are not interrupted, collect fault information and provide feedback to Huawei engineers for further processing.
Procedure
Step 1 Reset the board.
2-10 Huawei Proprietary and Confidential Copyright Huawei Technologies Co., Ltd. Issue 01 (2009-04-15)
It is complicated to handle the abnormality of the board status. In an emergency situation, it is recommended to solve the problem by resetting or replacing the board. For other maintenance measures, such as fault location, contact Huawei engineers. You can reset a board by using the reset slot command, pressing the RESET button on the panel, or plugging in/out the board. Step 2 Replace the board. When resetting the board fails to solve the problem, you can try to replace the board with a spare one. Step 3 Cut over the services on the board and seek technical support. After you perform steps 1 and 2, but the fault persists, you can cut over the services on the faulty board to a board that is running normally or in an idle slot. For the cutover operations, contact Huawei engineers or perform the cutover according to the cutover scheme of the customer. In addition, provide fault information to the local office for technical support. ----End
When you run the display interface command to view the status of an interface, the interface status is DOWN. When you run the display interface command to view the status of an interface, the number of the sent and received packets on the interface remains the same. The indicator status of an interface is abnormal. For example, the LINK indicator of the interface is off.
Issue 01 (2009-04-15)
2-11
No. 4
Collecting Method Collect brief information about all interfaces by using the display interface brief command.
Handling Flowchart
Figure 2-5 Flowchart for handling the abnormality of the interface status
Start
Yes Status of interface indicator normal? No Interface status is Up? Yes No Yes
Yes Detect the link Fault rectified? No Packets are transeived normally? Yes No Yes Check and modify the configuration of the data link layer or the upper layer protocol No Perform a local loopback test Cut over the services on the board and seek technical support End
No
2-12
Issue 01 (2009-04-15)
CAUTION
All the following steps can be performed only when the user services are already interrupted. If the user services are not interrupted, collect fault information and provide feedback to Huawei engineers for further processing.
Procedure
Step 1 Start the interface. When you find that an interface is shut down through the shutdown command by checking the configuration, you can run the undo shutdown command in the interface view to start it. Step 2 Detect the link. Before detecting a link, check whether the LINK indicator of the interface is on. If so, it indicates that the physical link is Up and you can detect the link as follows: 1. 2. Check that the interface parameters at both ends of the link are identical, such as the duplex mode and rate. When the interfaces are optical ones, check whether the receiving and sending optical powers at both ends are normal by using the optical power meter. When you find that either end only sends or receives data, the optical module is possibly faulty or the optical fiber possibly fails to match the optical module. Then you can try to replace the optical module or the optical fiber.
DANGER
Do not look straight into the optical fiber from which the light beam shoots out reversely along a beam of light when you check the receiving and sending optical powers. You must use the optical power meter to measure the optical power. When the LINK indicator of the interface is off, you can check the link as follows: 1. 2. Perform a physical loopback test on the device. That is, connect the faulty interface to another interface that is in the normal state with an optical fiber or cable in good condition. When the LINK indicator is on, it indicates that the interface runs normally. You need check whether the optical fiber or the cable is damaged and whether the trunk link runs normally. In this case, the neighboring office is required to cooperate. If the LINK indicator is off, it indicates that the interface hardware is faulty. When a pluggable optical module is used, you can replace the optical module; otherwise, you can cut over the services from the faulty interface to another interface that runs normally.
3.
Step 3 Perform a local loopback test. When the interface status is Up, but the number of sent and received packets on the interface remains the same during a long period, it indicates that the interface neither receives nor sends any packets. Then you can run the loopback local command on the interface to perform a local loopback test and test data sending and receiving by using the ping command to view the change in the number of sent and received packets.
Issue 01 (2009-04-15) Huawei Proprietary and Confidential Copyright Huawei Technologies Co., Ltd. 2-13
After the local loopback test is complete, run the undo loopback command to disable the local loopback immediately.
Step 4 Check and modify the configurations of the data link layer or the upper layer protocols. If the interface still fails to send and receive packets in the local loopback test, check the configuration of the data link layer or the upper layer protocols. For example, check that the configurations of the Point-to-Point Protocol (PPP) or the High level Data Link Control protocol at both ends are identical and the routing protocols run normally. Step 5 Reset the interface. After you perform the preceding steps, you can reset the interface if the fault persists. To reset an interface, run the shutdown and undo shutdown commands. Step 6 Contact Huawei technical support personnel. For seeking Huawei technical support, see 1.7 Technical Support. ----End
2-14
Issue 01 (2009-04-15)
Issue 01 (2009-04-15)
3-1
3.1 Overview
This section describes the definition and types of service faults. A service fault refers to the partial or global service congestion due to a software or network fault. You can handle a service fault by modifying service configuration, resetting service modules, or restoring network connections.
NOTE
Generally, a hardware fault may result in service interruption. For the handling of a device fault, see Emergency Maintenance for Device Faults.
This chapter describes the emergency maintenance for service faults, focusing on fault clearance and prompt service recovery rather than fault rectification. To locate, handle, and rectify common service faults, refer to the Quidway S9300Terabit Routing Switch Troubleshooting. For the S9300, emergent service faults that commonly occur fall into the following:
l l l
Failure to forward IP unicast packets Failure to forward IP multicast packets Failure to forward MPLS VPN packets
NOTE
3-2
Issue 01 (2009-04-15)
Yes
Yes
Fault involves Yes users on certain interface? No Fault involves users of certain type? No Fault involves single users? No End Yes Yes
NOTE
For a fault affects a single user, you do not need to initiate the emergency maintenance. For the common handling flowchart of a fault, refer to the Quidway S9300Terabit Routing Switch Troubleshooting.
3 4 5
6 7 8 9 10 11 12 13 14 15 16
3-4
No. 17 18 19 20 21
Collecting Item Information about the OSPF routing table Running status and configuration of RIP processes Information about all the activated routes of the RIP database Information about the interface enabled with RIP Information about RIP neighbors
Collecting Method display ospf routing display rip display rip database display rip interface display rip neighbor
NOTE FIB = Forwarding Information Base; ARP = Address Resolution Protocol; BGP = Border Gateway Protocol; IS-IS = Intermediate System to Intermediate System; LSDB = Link State Database; OSPF = Open Shortest Path First; RIP = Routing Information Protocol
Issue 01 (2009-04-15)
3-5
Handling Flowchart
Figure 3-2 Flowchart for handling the failure to forward IP unicast packets
Start
No
Fault rectified? No
Yes
No
Fault rectified? No
Yes
No
Fault rectified? No
Yes
No
Fault rectified? No
Yes
Fault rectified? No
Yes
End
CAUTION
All the following steps can be performed only when the user services are already interrupted. If the user services are not interrupted, collect fault information and provide feedback to Huawei engineers for further processing.
3-6
Issue 01 (2009-04-15)
Procedure
Step 1 Check and recover the uplink. When some unicast packets fail to be forwarded, check whether the S9300 can receive upstream packets. You can run the display interface command to view whether the number of received packets on the device changes. When you find that the device cannot receive any upstream packets, perform the following: 1. 2. Check whether the status of the upstream interface on the S9300 is normal. For details, see Abnormality of the Interface Status. If the status of the upstream interface is normal, ping the peer interface of the upstream interface. When the ping is successful, you can assume that a fault occurs on the upstream device. To recover the system, contact the site office where the upstream device resides. When the ping fails, detect the link connecting the interface on the S9300 to the upstream device. For example, check the cable for correct positioning, the optical module and the optical power for normality, the relay agent for normality, and the IP address for correctness. If the fault persists after you perform the preceding steps, contact Huawei for technical support. For seeking technical support, see Technical Support.
3.
4.
Step 2 Check and recover the downlink. When the S9300 can receive incoming packets rather than send packets, check the connection and communication between the S9300 and the downstream device as follows: 1. 2. Check whether the status of the downstream interface on the S9300 is normal. For details, see Abnormality of the Interface Status. If the status of the downstream interface is normal, ping the peer interface of the downstream interface. When the ping is successful, you can judge that a fault occurs on the downstream device. To recover the system, contact the site office where the downstream device resides. When the ping fails, detect the link connecting the interface on the S9300 to the downstream device. For example, check the cable for correct positioning, the optical module and the optical power for normality, the relay agent for normality, and the IP address for correctness. When the link is in good condition, the communication between the S9300 and the downstream device is possibly abnormal. You need to check the configuration such as routing according to the following step.
3.
4.
Step 3 Check and restore the routing entries. If the S9300 fails to communicate with its downstream device, the routing entries are possibly incorrect. You can try to check and restore the routing entries as follows: 1. Check whether a route to the downstream device exists in the routing table of the S9300. If the route does not exist, add a static route, and then check whether the ARP entries on the downstream device can be learned. When the ARP entries on the downstream device cannot be learned, you can add static ARP entries. If there is still no route to the downstream device in the routing table of the S9300, the routing table is possibly oversized. You can try to delete unnecessary routing entries and update the routing table. Then check whether the S9300 learns the route. If a route to the downstream device exists, check this routing entry for its correctness, such as the routing protocol, subnet mask, preference, and hop count. As the troubleshooting of
Huawei Proprietary and Confidential Copyright Huawei Technologies Co., Ltd. 3-7
2. 3.
4.
Issue 01 (2009-04-15)
IP routing is complicated, it is not mentioned here. For details, refer to the Quidway S9300Terabit Routing Switch Troubleshooting - IP Routing. 5. 6. If the fault persists after you perform the preceding steps, reset the relevant routing protocol. For example, reset all IS-IS connections through the reset isis all command. If resetting the relevant routing protocol is ineffective, proceed to the following step.
Step 4 Check and restore FIB entries. If the communication fails when the routing entries are normal, the FIB entries are possibly incorrect. You can run the display fib [ verbose ] command to check the FIB entries for their correctness. In the case of incorrect FIB entries, update the FIB entries and deliver them again. Step 5 Reset the system. To solve a software problem, resetting the system is the last and most effective solution. If other users are not affected, you can reset the system to solve the problem. Before resetting a system by using the reboot command, save the current configurations with the save command. If the fault impacts a small range, you can run the schedule reboot command to reset the system in off hours such as the wee hours.
NOTE
If the system can be restarted through a software program, do not reset the system.
Step 6 Seek technical support. For seeking Huawei technical support, see Technical Support. ----End
A multicast distribution tree (MDT) cannot be set up. No multicast routing entry exists on the S9300 directly connected to the multicast source. Clients fail to receive multicast data, which may be due to the incorrect configuration of the Internet Group Management Protocol (IGMP). The Protocol Independent Multicast (PIM) routing table has no (S, G) entry. The multicast data can reach intermediate S9300s but not the last hop S9300. Although an interface on an intermediate S9300 receives the multicast data, no corresponding (S, G) entry is created in the PIM routing table. The static Rendezvous Point (RP) fails to communicate with the dynamic RP. Mosaics are displayed in the multicast video image on clients. The multicast video programs displayed are asynchronous on the clients connected to different S9300s, but the program is played fluently, without mosaics.
Huawei Proprietary and Confidential Copyright Huawei Technologies Co., Ltd. Issue 01 (2009-04-15)
l l l
l l l
3-8
Before using the debugging command to collect debugging information, run the terminal debugging command to enable the debugging display on a terminal, and then run the terminal monitor command to enable the display on the terminal. For ease of fault location, it is recommended to collect long-term debugging information. After you collect debugging information, run the undo debugging all command to disable all the debugging immediately.
l l
Table 3-2 Collection of information about the failure to forward IP multicast packets No. 1 2 3 4 5 6 7 8 9 10 Collecting Item All routes learned on the S9300 PIM routing table on the S9300 Information about the unicast routes used by PIM Multicast routing table on the S9300 Multicast forwarding table on the S9300 All PIM neighbors of the S9300 All the interfaces enabled with PIM on the S9300 BSR information learned by the S9300 when PIM-SM is enabled RP information learned by the S9300 when PIM-SM is enabled Whether the group that wants to receive multicast data can be mapped to the RP when the S9300 runs PIMSM Information about the RPF neighbors and interfaces of the RPF from the S9300 to the multicast source Information about IGMP groups Information about IGMP interfaces Information about the IGMP routing table Collecting Method display ip routing-table display pim routing-table display pim claimed-route display multicast routing-table display multicast forwarding-table display pim neighbor display pim interface display pim bsr-info display pim rp-info display pim rp-info group-address
11
12 13 14
Issue 01 (2009-04-15)
3-9
No. 15
Collecting Method After you collect information by using the debugging pim all command, disable the debugging immediately. After you collect information by using the debugging igmp all command, disable the debugging immediately.
16
NOTE PIM-SM = Protocol Independent Multicast-Sparse Mode; RPF = Reverse Forwarding Path
3-10
Issue 01 (2009-04-15)
Handling Flowchart
Figure 3-3 Flowchart for handling the failure to forward IP multicast packets
Start
No
Fault rectified? No
Yes
Yes
RP about group G on all devices is identical? Yes Multicast routing entries are correct? Yes
No
No
Fault rectified? No
Seek technical support
End
Issue 01 (2009-04-15)
3-11
CAUTION
All the following steps can be performed only when the user services are already interrupted. If the user services are not interrupted, collect fault information and provide feedback to Huawei engineers for further processing.
Procedure
Step 1 Check and restore the IGMP configuration. When clients fail to receive multicast data, check the IGMP configuration on the S9300 connecting the clients for correctness as follows: 1. Check whether multicast is enabled on the S9300. That is, check whether the multicast routing-enable command is run. If the command is not run, enable multicast in the system view and ensure that IGMP is enabled on all interfaces. Then check whether the clients succeed in receiving multicast data. If the clients still fail to receive multicast data, check whether the interface status is normal. Run the display igmp interface interface-name command to view whether information about the specified interface is displayed. If no information is displayed, see Abnormality of the Interface Status to handle it; if the interface status is normal, check whether the clients succeed in receiving multicast data. If the clients still fail to receive multicast data, check whether access control lists (ACLs) are configured on the interface to prevent group G from joining the multicast group. Run the display current-configuration interface interface-name command to check whether the IGMP group policy is configured. If so, modify the ACL configuration to permit IGMP group G to join the multicast group. Then check whether the clients succeed in receiving multicast data. When the clients still fail to receive multicast data, check whether the interface resides on the same network as the hosts. If the interface resides on a different network, modify the IP address of the interface, and then check whether the clients succeed in receiving multicast data. If the fault persists after you perform the preceding checking, run the reset igmp group command to delete the IGMP group, and then add it again to the multicast group. If deleting the IGMP group is not effective, proceed to the following step.
2.
3.
4.
5. 6.
Step 2 Check and modify the Time-to-Live (TTL) value of the packets sent by the multicast source. Check the TTL value of the (S, G) packets sent by the S server. If this value is too small, it is recommended to modify the TTL value to a larger one. The larger TTL value thus ensures the packets reach the hosts. Step 3 Check and modify the RP configuration. If the fault persists after you perform the preceding steps, check the RP configuration for correctness. First, ensure that all the devices in the PIM domain are enabled with PIM. There are two cases: When an RP is specified statically in the network, perform the following: 1. Check whether the same static-rp command is run on all the devices. If the command is not run, run the same static-rp command on all the devices, and then check the receiving
Huawei Proprietary and Confidential Copyright Huawei Technologies Co., Ltd. Issue 01 (2009-04-15)
3-12
of multicast data. When ACLs are configured, ensure that the ACL configurations are also the same. Then check whether the clients succeed in receiving multicast packets. 2. Check whether ACLs are configured to prevent the static RP from serving group G. If so, modify the ACL configuration to remove the restriction. Then check whether the clients succeed in receiving multicast packets.
When a dynamic BSR-RP is configured in the network, perform the following: 1. 2. Check whether the BSR is correctly configured by running the display pim bsr-info command on the BSR. If the BSR is not configured, re-configure the BSR. Run the display pim rp-info command on the BSR to check whether the BSR learns RP information. If the BSR fails to learn RP information, check that the RP is correctly configured, a route between the BSR and the RP exists, and the BSR and the RP can ping each other. If the route is faulty, refer to the Quidway S9300Terabit Routing Switch Troubleshooting - Multicast. Run the display current-configuration command on both the BSR and the RP to check whether the crp-policy commands are run to prohibit group G. If so, modify the ACL configuration. If performing this step is not effective, proceed to the following step.
3.
4.
Step 4 Check and restore multicast routing entries. If the fault persists after you perform the preceding steps, routing entries are possibly faulty. Perform the following: 1. Check whether the multicast routing entries from the RP to the clients, from the multicast source to the RP, and from the multicast source to the clients are correct. For details, refer to the VRP Troubleshooting - IP Multicast. If the fault persists after you troubleshoot the multicast routing entries, reset the corresponding multicast and unicast routing protocols. For example, reset all IS-IS connections through the reset isis all command. If resetting the relevant routing protocols is ineffective, proceed to the following step.
2.
3.
Step 5 Reset the system. To solve a software problem, resetting the system is the last and most effective solution. If other users are not affected, you can reset the system to solve the problem. Before resetting a system, save the current configurations with the save command, and then run the reboot command to reset the system. If the fault impacts a small range, you can run the schedule reboot command to reset the system in off hours such as the wee hours.
NOTE
If the system can be restarted through a software program, do not reset the system.
Step 6 Seek technical support. For seeking Huawei technical support, see Technical Support. ----End
Issue 01 (2009-04-15)
3-13
Before using the debugging command to collect debugging information, run the terminal debugging command to enable the debugging display on a terminal, and then run the terminal monitor command to enable the display on the terminal. For ease of fault location, it is recommended to collect long-term debugging information. After you collect debugging information, run the undo debugging all command to disable all the debugging immediately.
l l
Table 3-3 Collection of information about the failure to forward MPLS VPN packets No. 1 2 3 4 5 6 7 8 9 10 Collecting Item Information about LDP and LSRs Information about all the interfaces enabled with LDP Information about the peer Information about LDP sessions Information about LDP LSPs Information about LDP and LSRs of the specified VPN instance The values of labels allocated by BGP, LDP LSP or RSVP Information about the VPN instance Information about the routing table of the specified VPN instance All LDP debugging information Collecting Method display mpls ldp all display mpls ldp interface display mpls ldp peer display mpls ldp session display mpls ldp lsp display mpls ldp vpn-instance vpninstance-name display mpls lsp display ip vpn-instance verbose display ip routing-table vpn-instance vpninstance-name After you collect information by using the debugging mpls ldp all command, disable the debugging immediately.
NOTE LDP = Label Distribution Protocol; LSR = Label Switching Router; LSP = Label Switching Path
3-14
Issue 01 (2009-04-15)
Handling Flowchart
NOTE
First, check whether all or some MPLS VPN services are interrupted on a network. If some MPLS VPN services are interrupted, the cause possibly lies in the incorrect setting of the maximum transmission unit (MTU) of a certain device on the network. The protocol stack or application of some servers does not minimize packet fragments. The length of a packet in VPN forwarding, however, exceeds the default MTU 1500 after it is added with MPLS labels, each of which is of four bytes. Therefore, the P that forwards MPLS packets must be set with an MTU greater than 1500 plus the label length. This section only describes the handling flowchart for the failure to forward all MPLS VPN packets.
Figure 3-4 Flowchart for handling the failure to forward MPLS VPN packets
Start Yes Restore LDP Fault rectified? No No Restore the IGP configuration on the public network Yes Fault rectified? No
LDP sessions are set up? Yes LSPs are set up? Yes
No
No
Yes No Yes Fault rectified? No Yes Reset the system Fault rectified? No
Seek technical support
End
Issue 01 (2009-04-15)
3-15
CAUTION
All the following steps can be performed only when the user services are already interrupted. If the user services are not interrupted, collect fault information and provide feedback to Huawei engineers for further processing.
Procedure
Step 1 Check and restore LDP. When MPLS VPN services are interrupted on a network, check whether the LDP sessions between Provider Edges (PEs) are set up. If the LDP sessions are not set up, perform the following: 1. Run the display mpls ldp command to check whether the LSR IDs of different PEs conflict. On a network, similar to a router ID, an LSR ID must be globally unique. If the LSR IDs conflict, change the LSR IDs to keep each of them unique. Then check whether the LDP sessions can be set up. If the LDP sessions cannot be set up, run the display mpls ldp peer command to check the IP address of the peer. Run the ping -a source-ip command to check whether the peer address is reachable. If the peer cannot be pinged, run the display ip routing-table command to check whether the route destined for the peer is reachable. Then, run the display fib command to check whether the forwarding entry exists in the FIB table of the local end. If neither the route nor the corresponding forwarding entry exists, check the link layer and network layer. If packets cannot be forwarded after the LDP sessions are set up, proceed to the following step.
2. 3. 4.
5.
Step 2 Check and restore LSPs. Run the display mpls ldp lsp command to check whether LSPs can be set up. If the LSPs are not set up, perform the following: 1. Check how to set up an LSP by LDP. By default, only the route to a local loopback interface is assigned labels to set up an LSP. When all the routes to local interfaces besides the loopback interface need to be assigned labels to set up LSPs, the lsp trigger all command must be run for LDP. Check that the label mapping message is received from the source device of the route. Then check whether the outbound interface and next hop of the route are those in the label mapping message. If the outbound interface and next hop are different, check the Interior Gateway Protocol (IGP) configuration for correctness or reset the IGP. If MPLS VPN packets still fail to be forwarded after the successful setup of LDP LSPs, proceed to the following step.
2.
3.
Step 3 Check and restore the configuration of VPN instances. On a network, the parameters of VPN instances on all the devices correlate with each other. To effectively control correlation between routes, you can direct the route flooding if different Route-Distinguishers are set.
3-16 Huawei Proprietary and Confidential Copyright Huawei Technologies Co., Ltd. Issue 01 (2009-04-15)
1.
Check whether the Route-Distinguisher of each VPN instance is unique and the VPN target configuration caters to the requirements of network planning. If the Route-Distinguisher or VPN target does not meet requirements, re-configure them. Check whether interfaces join VPN instances. If the interfaces are not in the VPN instance, re-bind the interfaces to the relevant VPN instances. Note that all IP-related configurations on an interface are removed when the interface is bound to a VPN instance. Therefore, you need to perform IP-related configuration again.
2.
Step 4 Check and restore VPN routes. Run the display bgp vpnv4 all routing-table command on a PE to check whether the routes to other PEs or Customer Edges (CEs) are correct. If the routes are incorrect, perform the following: 1. Check whether the Multicast Border Gateway Protocol (MBGP) neighbor relationships between the PEs are set up. If the neighbor relationships are not set up, check whether the IGP spreads the routes of a loopback interface to the peer. If the IGP does not spread routes to the peer, modify the IGP configuration. Check whether the address family is created for each VPN instance in the BGP view and the routes are imported to the BGP routing table according to the routing protocol between PEs and CEs. Check whether the MBGP sessions between PEs use the loopback interfaces for protocol connections. If the MBGP sessions do not use the loopback interfaces, cancel the configuration and re-configure them. If static routes are configured between PEs and CEs, you need check whether the next hop of a static route is directly connected. The next hop of a static route cannot be iterated. Otherwise, delete the static routes and re-configure them. If the fault persists after you perform the preceding checking, reset the relevant IGP and BGP. For example, reset all IS-IS connections through the reset isis all command and BGP connections through the reset bgp command. If resetting the routing protocols is ineffective, proceed to the following step.
2.
3.
4.
5.
Step 5 Reset the system. To solve a software problem, resetting the system is the last and most effective solution. If other users are not affected, you can reset the system to solve the problem. Before resetting a system, save the existing configurations with the save command, and then run the reboot command to reset the system. If the fault impacts a small range, you can run the schedule reboot command to reset the system in off hours such as the wee hours.
NOTE
If the system can be reset through a software program, do not power off the device.
Step 6 Seek technical support. For seeking Huawei technical support, see Technical Support. ----End
Issue 01 (2009-04-15)
3-17
Issue 01 (2009-04-15)
4-1
4.1 Overview
This section describes the collection and classification of fault information. After an emergency occurs, collect and back up fault information on time for reference. In addition, provide fault information to Huawei engineers for fault location and rectification. When a fault occurs, collect the following information:
l l
Software version
5 6
NOTE
When you collect fault information through command lines, you can copy the information displayed on the console, including the COM port or the Telnet terminal, and then attach it to a txt file for a record.
Table 4-2 Collection of device fault information No. 1 2 3 4 5 6 7 8 9 10 Collecting Item Device information Temperature CPU usage Routing table information Logs Traps Configuration Diagnostic information about the device Interface information Network connectivity information Collecting Method Run the display device command. Run the display temperature command. Run the display cpu-usage command. Run the display ip routing-table command. Run the display logbuffer command. Run the display trapbuffer command. Run the display current-configuration command. Run the display diagnostic-information command. Run the display interface command. Run the ping command to collect information about the network connectivity and record the results.
Issue 01 (2009-04-15)
4-3
5
About This Chapter
This chapter describes the preparation, guide, and verification of system reboot, and describes how to handle the faults. 5.1 Overview This section describes the applicable environment and precautions for system reboot. 5.2 Preparation for System Reboot This section describes the preparation before reboot a system. 5.3 Guide to System Reboot This section describes how to reboot a system. 5.4 Verification of System Reboot This section describes how to verify system reboot. 5.5 Handling of a System Reboot Failure This section describes how to handle the failure to reboot a system.
Issue 01 (2009-04-15)
5-1
5.1 Overview
This section describes the applicable environment and precautions for system reboot.
CAUTION
Do not reboot an S9300 randomly. If necessarily required, learn the guidelines and precautions described in Overview of Emergency Maintenance or restart the system with the guidance of Huawei engineers. During the S9300 reboot, all services on the device should be interrupted except in the dualsystem hot backup networking. The services are resumed after the S9300 is rebooted successfully. An S9300 automatically reboots when an excessively severe fault occurs on it. After the automatic reboot, the system begins to run normally. Therefore, you do not need to reboot a system manually. Rebooting an S9300 is applicable to only an emergency or an exception. For example, if an S9300 fails to automatically reboot when services on it are interrupted and other taken measures are ineffective, you can reboot it manually.
CAUTION
Do not remove the LPU or a flexible plug-in card of the SRU in service. The boards of other types are hot pluggable. The S9300 can be manually rebooted in either of the following ways:
l l l
Running command lines Pressing the RESET button on the MCU/SRU Switching the system off, and then on
Huawei Proprietary and Confidential Copyright Huawei Technologies Co., Ltd. Issue 01 (2009-04-15)
5-2
It is not recommended to reboot an S9300 remotely; otherwise, the reboot may fail and services are interrupted for a long period.
5.3.1 Running Command Lines 5.3.2 Pressing the RESET Button on the MCU/SRU 5.3.3 Switching Off and Switching On the System 5.3.4 Operating Through the NMS
reboot Command
Enter the reboot command in the user view and press Y after the display. Then the system reboots. The operation example is as follows:
<Quidway> reboot Info:The system is now comparing the configuration, please wait. Info:Save current configuration?[Y/N]:y Now saving the current configuration to the device Info:The current configuration was saved to the masterboard device successfully. System will reboot! Continue?[Y/N]:
NOTE
After you run the reboot command, the displays maybe vary with different system versions.
Running the schedule reboot delay command, you can enable the scheduled reboot function and set the wait delay. You can set the wait delay for the scheduled reboot function in two formats: "hour:minute" and "absolute minutes". The total minutes cannot be more than 30 x 24 x 60 minutes. Running the schedule reboot at command, you can enable the scheduled reboot function and specify the reboot date and time. Note that the specified date cannot be 30 days later than the current date. If the schedule reboot at command specifies the date parameter (yyyy/mm/dd) and the date is a later date, the S9300 will reboot at a specified time with the error no more than one minute. If no specific date is set, the following situations occur:
If the set time is later than the current time, the S9300 reboots at this time that day. If the set time is before the current time, the S9300 reboots at this time the next day.
Huawei Proprietary and Confidential Copyright Huawei Technologies Co., Ltd. 5-3
Issue 01 (2009-04-15)
After the schedule reboot delay or schedule reboot at command is run, the system prompts you to confirm the reboot. Enter Y or y, and the configuration takes effect. If the related configuration exists, the latest configuration overrides the previous one.
NOTE
If you adjust the system time through the clock command after running the schedule reboot delay or schedule reboot at command, the previous configuration through the schedule reboot delay or schedule reboot at command becomes invalid.
You can run the undo schedule reboot command to remove the configuration through the schedule reboot delay or schedule reboot at command. You can run the display schedule reboot command to view the configuration through the schedule reboot delay or schedule reboot at command.
CAUTION
It is recommended to reboot an S9300 in this mode only when a critical fault occurs in the power supply system of the equipment room and the S9300, therefore, is powered off. In this case, switch the S9300 off, and then switch it on again when the power supply system returns to normal. The S9300 chassis can hold two AC or DC power modules, which do not support intermixing. It is recommended to install an active power module and a standby power module in the chassis, which work in 1+1 load balancing mode. The switch of the power module is located on the front panel of the power module. Turn the switch of the power module point to OFF to turn off the power; turn the switch of the power module point to ON to turn on the power. When an S9300 uses two power modules for load balancing, you need to switch off both the power modules to turn off the power and switch on both the power modules to turn on the power.
Procedure
Step 1 In the topology navigation tree or the topology view, select the S9300 to be operated and rightclick it. Step 2 Choose Device Management > Reboot Device on the shortcut menu. Step 3 Click Yes in the displayed dialog box. ----End
Postrequisite
When the S9300 is rebooted, its node icon becomes unavailable. After the reboot is successful, the node icon becomes green. For detailed operations, refer to the NMS Online Help.
CAUTION
After an S9300 is rebooted, check that the configuration data is recovered correctly and completely in case services are interrupted owing to failed recovery of configuration data. If some configuration data is lost, add it manually and save it. 5.4.1 Displaying Information About System Reboot 5.4.2 Checking the Software Version and Configuration File
Issue 01 (2009-04-15)
5-5
Uncompressing Data from Rom to RAM ........................................Done Initializing Flash Module .................................................Done ... Starting... Starting at 0x6c00000... **************************************************** * * * S9300 Bootrom, Ver 003 * * * **************************************************** Copyright(C) 2008-2026 by HUAWEI TECHNOLOGIES CO., LTD. Creation date: Dec 26 2008, 16:29:56 Board Type CPU type CPU L2 Cache CPU Clock Speed BUS Clock Speed Memory Type Memory Size Memory Speed ... Recover configuration...OK! Press ENTER to get started. : : : : : : : : LE02SRUA Cavium Octeon 128KB 700MHz 133MHz DDR2 SDRAM 512MB 667MHz
The preceding display shows that the S9312 is rebooted successfully. Press Enter and enter the user view.
5-6
Issue 01 (2009-04-15)
LPU 9 : uptime is 0 week, 3 days, 21 hours, 54 minutes StartupTime 2009/02/01 16:58:32 SDRAM Memory Size : 128 M bytes Flash Memory Size : 8 M bytes LPU version information : 1. PCB Version : LE02G24C VER.A 2. MAB Version : 0 3. Board Type : G24CA 4. CPLD1 Version : 8120410 5. BootROM Version : 3 6. BootLoad Version : 3
The preceding information displays the Versatile Routing Platform (VRP) version, host version, and patch version. You can check that the version numbers are the same as those before system reboot.
The preceding information displays the name of the current startup file, the name of the current configuration file, and the patch package loaded when startup.
Issue 01 (2009-04-15)
5-7
Issue 01 (2009-04-15)
6-1
Fault description and handling procedure (in detail): Approved by: Signature: Filled by Huawei engineers Handling method Guide in call Remote maintenance On-site support
7
Context
CAUTION
The process of upgrading the system through the Basic Input/Output System (BIOS) is complicated and this method is not recommended. The BIOS is required only when the host program of the S9300 cannot be started. The BIOS can be used on only the FTP client. The operation terminal must be connected to the S9300 through the COM port and communicate with the S9300 through the HyperTerminal.
NOTE
Take the S9312 upgrading procedure as an example here. The upgrading procedure of S9303 and S9306 is the same as the upgrading procedure of S9312.
Procedure
Step 1 Run FTP on the configuration terminal or PC to specify the path of system files. Create an FTP user named 9300 and set the password as 9300. Step 2 Reboot the S9312. When the S9312 is powered on, the PC or terminal used to set up the configuration environment displays the following:
input 'm' to Select Debug Console: Boardname ..................................................................SRU Start L2 Cache Test ? ('t' is test):skip BIOS Creation Date ....................................... Feb 2 2009 14:48:10 Bootbus init.................................................................OK DDR DRAM init................................................................OK Memory Data Bus Walk '0' Test .............................................Pass Memory Data Bus Walk '1' Test .............................................Pass
Issue 01 (2009-04-15)
7-1
Memory Address Bus Walk '0' Test ..........................................Pass Memory Address Bus Walk '1' Test ..........................................Pass Start Memory Unit Test ? ('t' or 'T' is test):skip Copying Uncompressed Data from Rom to Ram .................................Done Uncompressing Data from Rom to RAM ........................................Done Initializing Flash Module .................................................Done Starting...
The S9312 is starting the basic BootROM menu. Then, the S9312 starts the extended BootROM menu. If a fault is caused by detection or other reasons, the system displays the basic BootROM menu. You can also press Ctrl+A within two seconds to enter the basic BootROM menu. Otherwise, the system automatically initiates the extended BootROM menu. The basic BootROM is used to upgrade the basic BootROM and the extended BootROM. For details, see the following description.
Update BIOS menu(ver004) Creation date: Feb 2 2009 14:48:04 1. 2. 3. 4. 5. Update base BIOS through serial interface Update extend BIOS through serial interface Modify serial interface parameter Boot extend BOIS system Reboot
To upgrade the BootRom, you need to change the baud rate, and then download the files. After the upgrade, restore the default connection rate to 9600 bit/s; otherwise, information may not be displayed when you start or restart the system. After you select 4, the system copies the extended BootROM to the SDRAM, and then decompresses and starts the extended BootROM. After startup, the system starts the extended BootROM menu.
7-2
Issue 01 (2009-04-15)
The initial password is 7800, which can be changed. If three wrong passwords are entered consecutively, the system restarts.
boot device : eth1 processor number : 0 host name : host file name : s9300.cc # Name of the software program to be loaded inet on ethernet (e) : 192.168.1.1:ffffff00 inet on backplane (b): 192.168.1.1 host inet (h) : 192.168.1.2 # IP address of the FTP server gateway inet (g) : user (u) : 9300 # FTP user name ftp password (pw) (blank = use rsh): 9300 # FTP login password flags (f) : 0x0 target name (tn) : octeon startup script (s) : other (o) :
The preceding information shows that the name of the software program to be loaded is s9300.cc, the IP address of the FTP server is 192.168.1.2, the FTP user name is 9300, and the password is 9300. Modify the preceding parameters according to the actual situation. The other parameters do not need to be modified.
Issue 01 (2009-04-15) Huawei Proprietary and Confidential Copyright Huawei Technologies Co., Ltd. 7-3
Loading...................................Done! Please type a new file name for saving it. Press return key to save it named "s9300.cc". Check disk space Writing file.............................................................................. ...........................................Done
Enter your choice(1-8): Modify flash description area Please select booting device. Press ENTER directly for no change or input your choice. 1: Flash, 2: CF Card Current booting device: 2, your choice: 2 # Start from the CF card Current booting File Name: cfcard:/s9300.cc, Press ENTER directly for no change. Or, please input the file name (e.g. s9300.cc): new upgrade program The expected booting file: cfcard:/s9300.cc. Are you sure? Yes or No(Y/N)y
^s9300.cc
7-4
Issue 01 (2009-04-15)
Issue 01 (2009-04-15)
7-5