7600 HW Troubleshooting

GOLD Tests
and
ES+ Crashes
EDCS-694119 CA Training
2008 Cisco Systems, Inc. All rights reserved.
Cisco Confidential
Generic Online
Diagnostics
Si
Generic OnLine Diagnostics

What Is Gold?
Gold defines a common framework for diagnostics
operations across Cisco
platforms running Cisco IOS Software.
Goal: check the health of hardware components and verify
proper operation of the system data plane and control plane
at run-time and boot-time.
Provides a common CLI and scheduling for field diagnostics

including:
Bootup Tests (includes online insertion)
Health Monitoring Tests (background non-disruptive)
On-Demand Tests (disruptive and Non-disruptive)
User Scheduled Tests (disruptive and Non-disruptive)
Generic Online Diagnostics

How Does Gold Work?
Diagnostic packet switching

tests verify that the system
is operating correctly:
Is the supervisor control plane and
forwarding plane functioning properly?
Is the standby supervisor ready to
take over?
Are line cards forwarding packets
properly?
Are all ports working?
Is the backplane connection working?
Other types of diagnostics tests

including memory and error
correlation tests are also available.
Forwarding
Engine
Line
Car
d
Fabric
Forwarding
Engine
CPU
Active Supervisor
Standby
Supervisor
Line
Car
d

What Type of Failure Does Gold Detect?
Diagnostics capabilities
built in hardware
Depending on hardware,
Gold can catch:
Port failure
Bent backplane connector
Bad fabric connection
Malfunctioning forwarding engines
Stuck control plane
Bad memory

Diagnostic Operation
Boot-Up Diagnostics
Switch(config)# diagnostic bootup level complete
Runtime Diagnostics
Run During System Bootup, Line

Card OIR Or Supervisor
Switchover
Makes Sure Faulty Hardware Is
Taken out of Service
Health-Monitoring
Switch(config)# diagnostic monitor module 5 test 2
Switch(config)# diagnostic monitor interval module 5 test 2 00:00:15
Non-Disruptive Tests
Run in the Background
Serves As HA Trigger
On-Demand
Switch# diagnostic start module 4 test 8
Module 4: Running test(s) 8 may disrupt normal system
operation
Do you want to continue? [no]: y
Switch# diagnostic stop module 4
Scheduled
Switch(config)# diagnostic schedule module 4 test 1
port 3 on Jan 3 2005 23:32
Switch(config)# diagnostic schedule module 4 test 2
daily 14:45
All diagnostics tests can be run

on demand, for troubleshooting
purposes. It can also be used as
a pre-deployment tool.
Schedule Diagnostics Tests, for
Verification and Troubleshooting
Purposes

Catalyst Gold Operation Example
Switch# show diagnostic content mod 5
Module 5: Supervisor Engine 720 (Active)
<snip>
Testing Interval
ID
Test Name
Attributes
(day hh:mm:ss.ms)
==== ================================== ============
=================
1) TestScratchRegister -------------> ***N****A***
000 00:00:30.00
2) TestSPRPInbandPing --------------> ***N****A***
000 00:00:15.00
3) TestTransceiverIntegrity --------> **PD****I***
not configured
4) TestActiveToStandbyLoopback -----> M*PDS***I***
not configured
5) TestLoopback --------------------> M*PD****I***
not configured
6) TestNewIndexLearn ---------------> M**N****I***

not configured
Diagnostics test suite attributes:
7) TestDontConditionalLearn --------> M**N****I***
not configured
M/C/* - Minimal bootup level test / Complete bootup
8) TestBadBpduTrap -----------------> M**D****I***
level not
testconfigured
/ NA
9) TestMatchCapture ----------------> M**D****I*** B/*not
configured
- Basic
ondemand test / NA
10) TestProtocolMatchChannel --------> M**D****I***
configured
P/V/*not
- Per
port test / Per device test / NA
11) TestFibDevices ------------------> M**N****I***
configured test / Non-disruptive test / NA
D/N/*not
- Disruptive
12) TestIPv4FibShortcut -------------> M**N****I*** S/*not
configured
- Only
applicable to standby unit / NA
13) TestL3Capture2 ------------------> M**N****I*** X/*not
configured
- Not
a health monitoring test / NA
14) TestIPv6FibShortcut -------------> M**N****I*** F/*not
configured
- Fixed
monitoring interval test / NA
15) TestMPLSFibShortcut -------------> M**N****I*** E/*not
configured
- Always
enabled monitoring test / NA
16) TestNATFibShortcut --------------> M**N****I*** A/Inot
configured is active / Monitoring is
- Monitoring
inactive
17) TestAclPermit -------------------> M**N****I***
not configured
R/* - Power-down line cards and need reset
18) TestAclDeny ---------------------> M**N****A***
000 00:00:05.00
supervisor
/ NA
19) TestQoSTcam ---------------------> M**D****I*** K/*not
configured
- Require resetting the line card after the
test has completed / NA
<snip>
T/* - Shut down all ports and need reset
supervisor / NA

Switch# show diagnostic result mod 7
Current bootup diagnostic level: complete
Module 7: CEF720 24 port 1000mb SFP
Overall Diagnostic Result for Module 7 : MINOR ERROR
Diagnostic level at card bootup: complete
Test results: (. = Pass, F = Fail, U = Untested)
1) TestTransceiverIntegrity:
Port
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
---------------------------------------------------------------------------Test results: (. = Pass, F = Fail, U = Untested)

U U . U . . U U . . U U . . U U U U U U U U U U
2) TestLoopback:
Port
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
---------------------------------------------------------------------------.
3) TestScratchRegister -------------> .
4) TestSynchedFabChannel -----------> .
<snip>

r1# show diagnostic description module 5 test ?
<1-33> Test ID Number
ID
Test Name [On-Demand Test Attributes]
--- ------------------------------------------1 TestScratchRegister
[***N****]
2 TestSPRPInbandPing
[***N****]
3 TestTransceiverIntegrity
[**PD****]
4 TestActiveToStandbyLoopback
[M*PDS***]
5 TestLoopback
[M*PD****]
6 TestNewIndexLearn
[M**N****]
<snip>
r1# show diagnostic description module 5 test 2
TestSPRPInbandPing :
By default, this test is enabled as health-monitoring test.
The SP-RP Inband test catches most of the runtime software driver
and hardware issues on supervisors. This is done by using diagnostic
packet tests exercising the layer 2 forwarding engine, the L3-4
forwarding engine, and the replication engine along the path from
the Switch Processor to the Route Processor.
Packets are sent at an interval of 15 seconds and 10 consecutive
failures of the SP-RP Inband test result in failover to the
redundant supervisor (default).

Recommendations
Bootup diagnostics:
Set level to complete
On demand diagnostics:
Use as a pre-deployment tool: run complete diagnostics
before putting hardware into production environment
Use as a troubleshooting tool when suspecting
hardware failure
Scheduled diagnostics:
Schedule key diagnostics tests periodically
Schedule all non-disruptive tests periodically
Health-monitoring diagnostics:
Key tests running by default
Enable additional non-disruptive tests for specific functionalities
enabled in your network: IPv6, MPLS, NAT
Si
Reference:
http://www.cisco.com/c/en/us/td/docs/routers/7600/ios/15S/configu
ration/guide/7600_15_0s_book/diagtest.html
Google cisco 7600 configuring online diagnostics 1st Link
VCC
P
VCCP
The Issue
Cisco has been working with individual customers on an issue
related to memory components manufactured by a single supplier
between 2005 and 2010.
The affected memory component is the DRAM. So, in most of the platforms,
its required only to replace the DIMM and not the entire linecard/SUP.
In some cases, you might be required to replace the entire Linecard/SUP.
This can be confirmed by the TAC engineer.
he Field notice for all the individual products and related error messages can be
ccessed via
www.cisco.com/go/memory
Symptoms
This issue does not affect boards while the boards are in
operation. The board failure might occur after one or more of
the actions that are executed.
Reload.
Software Upgrade.
Power cycle.
One of these symptoms might be observed in the syslog for a
7600 platform based devices:
*May 16 02:59:54.575: %PM_SCP-SP-1-LCP_FW_ERR: System
resetting module 1 to recover from error: Linecard received system
exception
*May 16 02:59:54.575: %OIR-SP-3-PWRCYCLE: Card in module 1, is
being power-cycled Off (Module Reset due to exception or user
request)
Alternatively, the card might crash repeatedly with this error reported in
the syslog:
%EARL-DFC<n>-2-PATCH_INVOCATION_LIMIT: 10 Recovery patch
When should you be Cautious?
doesnt affect the recent products that are less than 5 years old / older products
are more than 10 years old. This only affects few products that were
ufactured only by a single vendor in between 2005 and 2010.
Fix on Failure Replacement Guidelines

Request Return Material Authorization (RMA) product through normal
service support channels.
Reference:
www.cisco.com/go/memory
ES+ Product Family
ES+ Series 4-Port 10GE Line Cards
ES+ Series 40-Port GE Line Cards
ES+ Series 2-Port 10GE Line Cards
ES+ Series 20-Port GE Line Cards
ES+
What is it ?
What are the various flavors?
The 7600-ES+ is an aggregate 40G

linecard series targeting the Metro
Ethernet market on the 7600 Platform
7600-ES+20G3C
7600-ES+20G3CXL
7600-ES+40G3C
7600-ES+40G3CXL
7600-ES+4TG3C
7600-ES+4TG3CXL
7600-ES+2TG3C
7600-ES+2TG3CXL
ES+ Overview Flavors & Terminology

Excalibur (ES+)
1G
10G
Ginsu [10G-OTN]
Combo(1G & 10G)
7600-ES+20G3C
7600-ES+2TG3C
7600-ES+ITU-2TG
7600-ES+20G3CXL
7600-ES+2TG3CXL
7600-ES+ITU-4TG
7600-ES+40G3C
7600-ES+4TG3C
7600-ES+40G3CXL
7600-ES+4TG3CXL
7600-ES+20C3C
7600-ES+20C3CXL
7600-ES+40C3C
7600-ES+40C3CXL
ES+
Each ES+ board consists of one Baseboard, one Link Daughter card and one
Earl Daughter card.
Baseboard has no flavors.
Link Card flavors

a. 4 ports of 10 Gigabit Ethernet (XFP form factor)
b. 40 ports of 1 Gigabit Ethernet (SFP form factor)
c. 2 ports of 10 Gigabit Ethernet (XFP form factor)
d. 20 ports of 1 Gigabit Ethernet (SFP form factor)
e. 2 ports of 10 OTN Gigabit Ethernet (SFP form factor)
f. 4 ports of 10 OTN Gigabit Ethernet (SFP form factor)
g. 2 ports of 10 Gigabit Ethernet (XFP form factor)
20 ports of 1 Gigabit Ethernet (SFP form factor)
h. 1 port of 10 Gigabit Ethernet (SFP form factor)
10 ports of 1 Gigabit Ethernet (XFP form factor)
Earl Card flavors

a. 3C (Lite)
b. 3CXL
ES+ Troubleshooting
Getting Started
Linecard console is needed for debugging many problems

Can connect to ES+ console using the attach <slot> CLI
commands
Can get a good snapshot from the RP console using show
hw-module slot <slot> tech-support
To display versions of various devices on ES+ use the
cmd, show platform hardware version
Troubleshooting card state - fault

isolation
Check the RP and ES+ console logs there is almost always a
message that will tell you why the card did not come up.
Check card LED status.
Use show module to check the current
status.
Watch out for FPD. Some device may not be updated to the latest
version required by the IOS image.
Incompatible FPD version.
Not enough power
ES+ Modules
- Hardware and Software requirement
Hardware requirement
Supported by all the Cisco 7600 series routers:
7604, 7606, 7609, 7613 router (not in slot 1-8) and 7606-S, 7609-S.
7600-ES+xx will be supported by all SUP720 models except PFC3A
7600-ES+xx will be supported with RSP720
7600-ES+xx will not be supported by SUP2, SUP32
Software Requirement
Supported from version 12.2(33)SRD of the Native IOS image
CatOS and Hybrid images are not supported.
Show module
the Linecard
Show power
To check what is the status of

To check the available power
Show hw-module [all|slot <> ] fpd To check the

FPD version
Show log
Logs will always give some
info
attach <slot> To check the ES+ console messages.
Troubleshooting interface state common

problems
Incorrect optics
Optics not matched on both ends
Unsupported optics.
routerdfc12#shplatformhardwaretransceiver?
briefBriefdeviceinformation
configDeviceconfiguration
countersDevicestatistics
errorsDeviceerrorinformation
registersDeviceregistercontents
statusDevicestatus
Transceiver Verification
Router#show module 8
Mod Ports Card Type
Model
Serial No.
--- ----- -------------------------------------- ------------------ ----------8
4 7600 ES+
7600-ES+4TG3CXL
XXXABCDXXX
Mod MAC addresses
Hw
Fw
Sw
Status
--- ---------------------------------- ------ ------------ ------------ ------8 001f.9e13.76e0 to 001f.9e13.76ef
0.303 12.2(33r)SRD 12.2(2008102 Ok
Mod
---8
8
Sub-Module
--------------------------7600 ES+ DFC XL
7600 ES+ 4x10GE XFP
Model
-----------------7600-ES+3CXL
7600-ES+4TG
Serial
Hw
Status
----------- ------- ------XXXABCDXXX 0.200 Ok
XXXABCDXXX 0.250 Ok
Mod Online Diag Status

---- ------------------8 Pass
Router#show interfaces status module 8
Port
Te8/1
Te8/2
Te8/3
Te8/4
Name
Status
connected
disabled
notconnect
disabled
Vlan
routed
1
1
1
Duplex
full
full
full
full
Speed Type
10G 10Gbase-LR
10G DWDM-51.72
10G No Connector
10G No Connector
Transceiver Verification
Router#show idprom interface te8/1
IDPROM for transceiver TenGigabitEthernet8/1:
Description
=
Transceiver Type:
=
Product Identifier (PID)
=
Vendor Revision
=
Serial Number (SN)
=
Vendor Name
=
Vendor OUI (IEEE company ID)
=
CLEI code
=
Cisco part number
=
Device State
=
Date code (yy/mm/dd)
=
Connector type
=
Encoding
=
Minimum bit rate
Maximum bit rate
Power dissipation class
cdr function
Tx Reference clock
Max link length for SMF fiber
Max link length for EBW 50/125um fiber
Max link length for 0/125um fiber
Max link length for 62.5/125um fiber
Max link length for copper
Tx device technology
Wavelength control technology
Transceiver cooling technology
Detector type
Transmitter tuning
Supported CDR rates
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
Nominal laser wavelength

Wavelength tolerance w.r.t. nominal
=
=
XFP optics (type 6)

OC192 + 10GBASE-L (97)
XFP-10GLR-OC192SR
05
ONT11021087
CISCO-OPNEXT
00.0B.40 (2880)
WMOTBEVAAB
10-1989-02
Enabled.
07/01/09
LC.
64B/66B
SONET Scrambled
NRZ
9900 Mbits/s
10500 Mbits/s
Pwr Level 2 (2.5 W max pwr dissipation).
supported.
not Required.
10 km
not supported
not supported
not supported
not supported
1550 nm DFB
no wavelength control
un-cooled transmitter device
PIN detector
transmitter not tunable
9.95 Gb/s
10.3 Gb/s
10.5 Gb/s
1310 nm
(+/-) 20 nm ...
Router#show interfaces te8/1 transceiver

Transceiver monitoring is disabled for all interfaces.
ITU Channel not available (Wavelength not available),
Transceiver is internally calibrated.
If device is externally calibrated, only calibrated values are printed.
++ : high alarm, + : high warning, - : low warning, -- : low alarm.
NA or N/A: not applicable, Tx: transmit, Rx: receive.
mA: milliamperes, dBm: decibels (milliwatts).
Port
------Te8/1
Temperature
(Celsius)
----------35.0
Voltage
(Volts)
------0.00
Current
(mA)
-------51.5 --
Optical
Tx Power
(dBm)
--------3.1
Optical
Rx Power
(dBm)
--------3.5
ES+ LC IOS
Crash
ES+ LC IOS Crash
the event of ES+ linecard crash with errors

*Mar 15 15:14:40.875 IST: %PM_SCP-SP-1-LCP_FW_ERR: System
resetting module 7 to recover from error: x40g_crashinfo_init:
Linecard received system exception
*Mar 15 15:14:40.875 IST: %OIR-SP-3-PWRCYCLE: Card in module 7,
is being power-cycled Off (Module Reset due to exception or user
request)
*Apr 6 11:16:55.122: %NP_DEV-DFC3-2-WATCHDOG: Watchdog
detected on NP 2
Obtain the crashinfo file as mentioned in the below

message
*Writing crashinfo to bootdisk:crashinfo_20110315151433-IST
Keepalive failure reset with/without crashinfo

ES+ card reloaded because it stops responding to keepalives, a.k.a.
"Silent reload of ES+ cards".
Symptom:
*Sep 12 17:15:54.985: %OIR-SP-3-PWRCYCLE: Card in module 1, is being
power-cycled off (Module not responding to Keep Alive polling)
*Sep 12 17:15:55.013: %C7600_PWR-SP-4-DISABLED: power to module
in slot 1 set off (Module not responding to Keep Alive polling)
Upgrade to an IOS with a fix for CSCts25729 and CSCtr74953 (12.2(33).SRE5)
The two SW enhancements will ensure that under the similar conditions
previously leading to a reset without crashinfo, a crashinfo or a mini-crashlog
is created (and the linecard is gracefully restarted as before).
Module failed SCP download during bootup

Problem
Following message " Module failed SCP download" is observed on RP
logs.
Root Cause:
The message means that the Booting process of the linecard was
not complete. It could be due to variety of reasons and can be seen
at different points of booting. If the ES+ card is continuously
crashing during the boot, root cause could be FN63553
TroubleShooting:
It's recommended to collect following output from SP:
Remote command switch show oir debug all
Remote command switch show oir state-machine
scp_dnld all
Remote command switch show oir state-machine oir
Watchdog Reset
Problem:
ES+ line card crashes during the execution of "show platform hardware
config-pld" or "show platform hardware version". Both commands are
included in the line card " show hw-module slot X tech-support ".
Root Cause: In both cases crash happens when the attempt is made
to read the PLD register on the ES+ line card. The read may time out,
which triggers the watchdog to restart the line card.
Known DDTS: CSCtw77894, CSCti78408,
CSCtz30983
Next Action: Please contact TAC with the crashinfo in order to
confirm
Failed to Bootup in PUL Ph6

Problem: ES+ line card crashes during bootup, and remains down,
with an error like this:
%PM_SCP-SP-1-LCP_FW_ERR: System resetting module 8 to
recover from error: x40g_cardmgr_event_process: Failed in
PUL Ph6 NP: 3
Root Cause: All cards received for FA with this symptom have been
root-caused to a failure inside the Network Processor (NP) chip.
TroubleShooting:
Replace the failing line card.
Request from the customer to keep the failing line card with them
until a decision is made on whether FA is required.
Linecard received system exception

Problem: ES+ line card crashes due to system exception
Symptom:
%PM_SCP-SP-1-LCP_FW_ERR: System resetting module 1 to recover
from error: Linecard received system exception
%OIR-SP-3-PWRCYCLE: Card in module 1, is being power-cycled Off
(Module Reset due to exception or user request)
Root Cause:
Multiple root causes are possible. If the ES+ card is continuously
crashing during the boot, root cause could be FN63553
Crash due to EARL PATCH_INVOCATION_LIMIT

Problem: ES+ line card crashes because the EARL patch had to be
applied too many times
Sample symptom:
%EARL-DFC1-2-PATCH_INVOCATION_LIMIT: 10 Recovery patch
invocations in the last 30 secs have been attempted. Max limit reached
Root Cause: Multiple root causes are possible. Also, issue is not limited
to ES+ linecards. When EARL detects a certain type of errors, it activates
a 'patch'. This is effectively a restart of ASICs connected to EARL. If the
limit on the number of consecutive patches is reached, line card crash is
triggered.
Next action:
Please collect crashinfo and
remote command {switch|module 1} show platform software earl
reset {histry|data}
OIR the card. Software reset of the line card does not help. It
really has to be removed and re-inserted.
ECC Single Bit Errors

%ECC-DFC8-3-SBE_LIMIT: Single bit error detected and corrected
%ECC-DFC8-3-SYNDROME_SBE_LIMIT: 8-bit Syndrome for the
detected Single-bit error: 0x0
%PM_SCP-SP-1-LCP_FW_ERR: System resetting module 8 to
recover from error: Linecard received system exception
%NP_DEV-DFC3-6-ECC_SINGLE: Recovered from a single-bit ECC
error detected on NP 0, Mem 10, SubMem 0x1,SingleErr 1,
DoubleErr 0 Count 63 TotalSingle
2
%ECC-DFC1-3-SBE_HARD:
bit *hard* error detected at
0x082C7FC0
Next Action:
On single occurrence no action needs to be taken. Please DO NOT
RMA
the more
card. occurrences of SINGLE_Bit_ECC_Error require to
Two (or)
upgrade to one of the below mentioned release.
Recommended releases to take care of the problem are
12.2(33)SRD6 or later 12.2(33)SRE3 or later 15.0(1)S or
later
In case
the error is seen multiple times even after the upgrade, then this
would be termed as hard failure and card
should be RMA'd and flagged for FA.
ECC Double Bit Errors

%NP_DEV-DFC5-3-ECC_DOUBLE: Double-bit ECC error detected on
NP 1, Mem AB, SubMem 0x1,SingleErr 1, DoubleErr 1 Count 1 Total
1
AB refers to values 16 - 19
Root Cause
Double bit ECC errors on ES+ family of cards on NP Mem 16 to NP Mem
19 may be caused by sub-optimal HW programming, these were taken
care through several softwarefixes.
One or more occurrences of DOUBLE_Bit_ECC_Error require an upgrade to
one of the below mentioned release.
Recommended releases to take care of the problem are 12.2(33)SRD6
or later 12.2(33)SRE3 or later 15.0(1)S or later.
If the double bit error is for "Mem 17",then there is a new fix
committed recently via CSCtn95122 - SRE5, 15.0(1)S4 or later
In case the error is seen multiple times even after the upgrade, then this
would be termed as hard failure and card
should be RMA'd and flagged for FA.
Parity
Errors
Soft parity errors : These errors occur when an energy level within
the chip (for example, a one or a zero) changes, most often due to
radiation.
When referenced by the CPU, such errors cause the system to crash.
Incase of a soft parity error, there is no need to swap the board or any
of the parity
Hard
components.
errors These errors occur when there is a chip or board
failure that corrupts data. In this case, you need to re-seat or replace
the affected component, which usually involves a memory chip swap
or a boar swap.
Next Action: At the first occurrence it is not possible to distinguish

between a soft or hard parity errors. From experience most parity
errors are soft parity errors and we can usually dismiss them. It's
suggested that the system be kept under monitor. If the error is
found to reoccur in a very short time internal, then it could be a
hard-parity error, in which case the module should be replaced.
TestMacNotification Diag Test Error

Problem Symptom:
Following messages may be seen on the console :
*Mar 10 10:25:53.562: %FABRIC_INTF_ASIC-DFC9-4-FABRICCRCERRS:
Fabric ASIC 0: 322 Fabric CRC error events in 100ms period
*Mar 10 10:26:31.071: %CONST_DIAG-SP-3-HM_TEST_FAIL:
Module 9 TestMacNotification consecutive failure count:5
oubleshooting and Recommendation:
Check if there any FabricCRC errors *"*FABRICCRCERRS" in the logs.

If there areFabric CRC errorsfollowed by TestMacNotificationtest failure then
issue could be because of CSCto55567,upgrade to release 12.2(33)SRE5 or h
If TestMacNotificationtest fails without Fabric CRC error then contact TAC,

it requires complete data path debugging.
lated DDTS: CSCto55567
EOBC Jam or Freeze Error Router crash after inserting ES+
Sup720:
OBC-SP-0-EOBC_JAM_FATAL: Primary supervisor in slot 5 is jamming the EOBC cha
It has been disabled. Supervisor will return to ROMMON
RSP720:
TSEC-SP-3-RESTART: Interface EOBC0/0 Restarted Due to TX Freeze Error
TSEC-SP-2-EXCEPTION: Fatal Error, Interface EOBC0/0 not transmitting
ot Cause: - CSCtu50337
s issue may only occur if the RSP/SUP is running HW revision 5.0/5.1/5.2 and the
should be NON-S type.
ou encounter this issue, please contact TAC referring this DDTS.
Register Read Errors

Problem Symptom:
Following messages may be seen on the console or
syslog:
DFC7: ERROR! number: 0x80003902, NPprmReg_Read_NP_3c:
register 6 is not supported for NP-3c2.
DFC7: ERROR! number: 0x80003902, Failed to read register Id: 6.
DFC7: ERROR! number: 0x80003902, Failed to read register Id: 6.
DFC1: NPcfgStatistics_ReadStatMsg_NP_3c return error
DFC1: ERROR! number: 0x800000FF, Unexpected error, Counter
type is not long neither double.
Next action:
This message can safely be ignored.
Related DDTS : CSCsy88170
DBUS Header Error

Problem Description
Following error messageseen on the console :
%EARL_L2_ASIC-DFC1-4-DBUS_HDR_ERR: EARL L2 ASIC #0: Dbus
Hdr. Error occurred. Ctrl1 0x930D0EBD
Root Cause
Whenever EARL receives a bad packet (or) garbage signal
which is to be treated as packet, it will trigger a
DBUS_HDR_ERR interrupt.
TroubleShooting and Recommendation
If it happens one time randomly and very low frequency, and
it is not reproducible then it can be ignored
If it happens randomly and high frequency and it is not
reproducible then it could be hardware failure. RMA is
recommended
If it could be reproduce constantly, please collect the packet
dumps (See the steps below)
DBUS Header Error Contd..

To display number of DBUS_HDR_ERR errors (on
LC):show platform hardware superman fwdstats | i DBus Header
DBus Header Checksum errors
= 0x0000000000000000 (0)
Collecting the offending packets:
We can use ELAM to collect the offending packet.

This way we can verify if the packet is always same or different.
f the offending packet is always same, there is a possibility of bad end device.
f the offending packet keep on changing, there is a possibility of hardware / softw

ssue.
PLEASE get TACs help to capture the offending packet using ELAM tool.
Related DDTS: CSCtg31984
Fabric Errors
1. Fabric Sync Failure
%C6KPWR-SP-4-DISABLED: power to module in slot 4 set off
(Fabric channel errors)
2. Fabric CRC Errors
FABRIC_INTF_ASIC-DFC2-4-FABRICCRCERRS: Fabric ASIC 0: 5 Fabric
CRC error events in 100ms period
3. Repeated Fabric Sync
%FABRIC_INTF_ASIC-DFC10-5-FABRICSYNC_REQ
4. Fabric Channel Counter Errors
Error counters incrementing on a specified fabric
channel.
show fabric errors
Fabric Errors Contd..

Root Cause
This is usually a transient issue on the fabric channels,
potentially due to improper insertion of any LCs on the
backplane.
Troubleshooting and Recommendations
1. Collect below outputs from RP
console:
show fabric channel-counters
show fabric errors
show fabric drops
2. Collect below outputs from LC console:
show logg
show platform hardware ssa {brief|counter|error|fabricmon}
{history|registers}
Possible actions that can be tried out:
1. Re-seat of the module.
2. Try switchover of the supervisor module
TCAM Write Inconsistency Errors

Problem Description
"MLSCEF-DFC4-2-FIB_TCAM_WRITE_INCONSISTENCY: FIB TCAM
Mismatch for value: Index: 154 Expected: Entry: 0xB10x0005000B-0x40000000Hardware: Entry: 0x00-0x000000000x00000000
Root Cause
Above message indicates that there is a issue in
TCAM write.
Try OIR/reseating of the linecard and check if the problem is
solved.
If the linecard is crashing consistently with this error, collect
"show tech" output and contact TAC for further assistance
NPU Cluster Error

Problem Description
%NP_DEV-DFC7-3-PERR: Non-recoverable Parity error detected
detected on NP 0, cause 39 count 1 uqParityMask 0x2000000,
uqSRAMLine 0x90, bRecov 1, bRewr 1 Total 1
Root Cause
This problem generally happen due to the internal NPU SRAM
memory corruption.
RMA the card and raise it for EFA.
ES+ Combo Card Power Denied Issue

Problem Description
Card powered down due to power denied with ES+ combo

cards in same chassis.
ES+ Combo variants have wrong power values programmed in
the idprom making it to allocate more power than specified in
power calculator.
This might lead to a case of insufficient power causing module
to power down with "FRU-power denied".
9 76-ES+XC-20G3CXL 405.72 9.66 - - on off (FRU-power
denied)
Contd..
The right power values for ES+XC are listed below:

Pwr-RequestedCardType
Watts
A @42V
76-ES+XC-20G3C
309.12
7.36
76-ES+XC-20G3CXL
337.26
8.03
76-ES+XC-40G3C
399
9.5
76-ES+XC-40G3CXL
427.14
10.17
Values more than the above will proportionally increase "total
available power" which would cause other module to power down
with insufficient power.
oubleshooting and Recommendations
1. confirm the power values for ES+XC is NOT as per the above
table using "show power"
2. Check if any of the modules in the same chassis fail to power
up with error
"power denied" using "show power" or "show
module"
On 7600/ES+ platform, recommended releases to take
care of the problem are 12.2(33)SRE4 or later, 15.0(1)S4
or later.
Known DDTS: CSCtn41667
Questions ?

7600 HW Troubleshooting

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

7600 HW Troubleshooting

Încărcat de

Drepturi de autor:

Formate disponibile

GOLD Tests

2008 Cisco Systems, Inc. All rights reserved.

Generic OnLine Diagnostics

Provides a common CLI and scheduling for field diagnostics

Generic Online Diagnostics

Diagnostic packet switching

Other types of diagnostics tests

Generic Online Diagnostics

Generic Online Diagnostics

Run During System Bootup, Line

All diagnostics tests can be run

Generic Online Diagnostics

==== ================================== ============

1) TestScratchRegister -------------> ***N****A***

2) TestSPRPInbandPing --------------> ***N****A***

3) TestTransceiverIntegrity --------> **PD****I***

4) TestActiveToStandbyLoopback -----> M*PDS***I***

5) TestLoopback --------------------> M*PD****I***

6) TestNewIndexLearn ---------------> M**N****I***

Generic Online Diagnostics

---------------------------------------------------------------------------Test results: (. = Pass, F = Fail, U = Untested)

Generic Online Diagnostics

Generic Online Diagnostics

When should you be Cautious?

Fix on Failure Replacement Guidelines

ES+ Product Family

ES+ Series 4-Port 10GE Line Cards

ES+ Series 40-Port GE Line Cards

ES+ Series 2-Port 10GE Line Cards

ES+ Series 20-Port GE Line Cards

What are the various flavors?

The 7600-ES+ is an aggregate 40G

ES+ Overview Flavors & Terminology

Combo(1G & 10G)

Baseboard has no flavors.

Link Card flavors

Earl Card flavors

Linecard console is needed for debugging many problems

Troubleshooting card state - fault

- Hardware and Software requirement

To check what is the status of

Show hw-module [all|slot <> ] fpd To check the

Troubleshooting interface state common

Optics not matched on both ends

Mod Online Diag Status

Nominal laser wavelength

XFP optics (type 6)

Router#show interfaces te8/1 transceiver

ES+ LC IOS Crash

the event of ES+ linecard crash with errors

Obtain the crashinfo file as mentioned in the below

Keepalive failure reset with/without crashinfo

Module failed SCP download during bootup

Failed to Bootup in PUL Ph6

Linecard received system exception

Crash due to EARL PATCH_INVOCATION_LIMIT

ECC Single Bit Errors

ECC Double Bit Errors

Next Action: At the first occurrence it is not possible to distinguish

TestMacNotification Diag Test Error

oubleshooting and Recommendation:

Check if there any FabricCRC errors *"*FABRICCRCERRS" in the logs.

If TestMacNotificationtest fails without Fabric CRC error then contact TAC,

EOBC Jam or Freeze Error Router crash after inserting ES+

1) TestScratchRegister -------------> NA

2) TestSPRPInbandPing --------------> NA

3) TestTransceiverIntegrity --------> PDI*

4) TestActiveToStandbyLoopback -----> MPDSI*

5) TestLoopback --------------------> M*PD**I*

6) TestNewIndexLearn ---------------> MNI*

Check if there any FabricCRC errors "FABRICCRCERRS" in the logs.