Sunteți pe pagina 1din 112

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide This article provides guidance for troubleshooting issues that

may appear when using Cisco Nexus 7000 Series. This article introduces tools and methodologies to recognize a problem, determine its cause, and find possible solutions. However, these documentation helps only in basic troubleshooting. We encourage users to review the Cisco Live Presentation for detailed troubleshooting for Nexus 7000. Sections of this presentation covers, both platform independent, and platform specific step by step troubleshooting for most common issues. Access to this presentation is available FREE. Follow the below instructions to access the presentation.

1. Visit https://www.ciscolivevirtual.com/ 2. Register for free. 3. Click on "Cisco Live Virtual" link. 4. Click on the ?Sessions? Tab on top, and select ?2011 Sessions Catalog? 5. In the search box, type ?BRKCRS-3144? and Submit search. 6. Select the session. You can either View the Session (or) download the pdf
Welcome to Cisco DocWiki. We encourage registered Cisco.com users to contribute to this wiki to improve Cisco product documentation. Note that you cannot log in to DocWiki with Cisco.com "guest" account credentials.

See Terms of Use and About DocWiki for more information about Cisco DocWiki. Select the "edit" tab to edit an article or select the "discussion" tab to submit questions or comments about the article. Click here to return to the Cisco Nexus 7000 Series documentation on www.cisco.com.

Contents
1 Audience and Generating a PDF of This Guide 2 Organization

Audience and Generating a PDF of This Guide


This article is for experienced network administrators who configure and maintain NX-OS devices. Cisco Nexus 7000 Series NX-OS Troubleshooting Guide -- Book PDF

Organization
This article is organized into the following sections: Troubleshooting Overview Troubleshooting Installs, Upgrades, and Reboots Troubleshooting Licensing

Contents

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide Troubleshooting VDCs Troubleshooting CFS Troubleshooting Ports Troubleshooting vPCs Troubleshooting VLANs Troubleshooting STP Troubleshooting Routing Troubleshooting Unicast Traffic Troubleshooting Multicast Traffic Troubleshooting WCCP Troubleshooting Memory Troubleshooting Packet Flow Issues Before Contacting Technical Support Troubleshooting Tools and Methodology This article introduces the basic concepts, methodology, and general troubleshooting guidelines for problems that may occur when configuring and using Cisco NX-OS.

We encourage users to review the Cisco Live Presentation for detailed troubleshooting for Nexus 7000. Sections of this presentation covers, both platform independent, and platform specific step by step troubleshooting for most common issues. Access to this presentation is available FREE. Follow the below instructions to access the presentation. 1. Visit https://www.ciscolivevirtual.com/ 2. Register for free. 3. Click on "Cisco Live Virtual" link. 4. Click on the ?Sessions? Tab on top, and select ?2011 Sessions Catalog? 5. In the search box, type ?BRKCRS-3144? and Submit search.

Guide Contents Troubleshooting Overview (this section) Troubleshooting Installs, Upgrades, and Reboots Troubleshooting Licensing 2

Organization

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide Troubleshooting VDCs Troubleshooting CFS Troubleshooting Ports Troubleshooting vPCs Troubleshooting VLANs Troubleshooting STP Troubleshooting Routing Troubleshooting Unicast Traffic Troubleshooting WCCP Troubleshooting Memory Troubleshooting Packet Flow Issues Troubleshooting FCoE Before Contacting Technical Support Troubleshooting Tools and Methodology

Contents
1 Overview of the Troubleshooting Process 1.1 Gathering Information 1.2 Verifying Ports 1.3 Verifying Layer 2 Connectivity 1.4 Verifying Layer 3 Connectivity 2 Overview of Symptoms 3 System Messages 3.1 System Message Text 3.2 syslog Server Implementation 4 Troubleshooting with Logs 5 Troubleshooting Modules 6 Viewing NVRAM logs 7 Contacting Customer Support 8 See Also 9 Further Reading 10 External Links

Overview of the Troubleshooting Process


To troubleshoot your network, follow these general steps: 1. Maintain a consistent Cisco NX-OS release across all your devices. 2. See the Cisco NX-OS release notes for your Cisco NX-OS release for the latest features, limitations, and caveats. 3. Enable system message logging. See System Messages. 4. Troubleshoot any new configuration changes after implementing the change. 5. Gather information that defines the specific symptoms. See Gathering Information. 6. Verify the physical connectivity between your device and end devices. See Verifying Ports. 7. Verify the Layer 2 connectivity. See Verifying Layer 2 Connectivity. 8. Verify the end-to-end connectivity and the routing configuration. See Verifying Layer 3 Connectivity. 9. After you have determined that your troubleshooting attempts have not resolved the problem, contact Cisco TAC or your technical support representative. Contents 3

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide

Note: View the Cisco Nexus 7000 instructional videos for an overview of Cisco NX-OS.

Gathering Information
This section describes the tools that are commonly used to troubleshoot problems within your network. Specific troubleshooting articles may include additional tools and commands specific to the symptoms and possible problems covered in that article. Note: You should have an accurate topology of your network to isolate problem areas. Contact your network architect for this information. Use the following commands to gather general information on your device: show module show version show running-config show logging log show interfaces brief show vlan show spanning-tree show {ip | ipv6} routing show processes | include ER show accounting log

Verifying Ports
Answer the following questions to verify that your ports are connected correctly and are operational: Are you using the correct media (copper, optical, fiber type)? Is the media broken or damaged? Is the port LED green on the module? Is the interface in the correct VDC? Use the show vdc membership command to check which VDC that the interface is a member of. You must log into the device with the network-admin role to use this command. Is the interface operational? Use the show interface brief command. The status should be up.

See Troubleshooting Ports for more troubleshooting tips for ports.

Verifying Layer 2 Connectivity


Use the following commands to verify Layer 2 connectivity: Use the show vlan all-ports command to verify that all the necessary interfaces are in the same VLAN. The status should be active for the VLAN. Use the show port-channel compatibility-parameters command to verify that all the ports in a port channel are configured the same for the speed, the duplex, and the trunk mode. 4

Overview of the Troubleshooting Process

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide Use the show running-config spanning-tree command to verify that the Spanning Tree Protocol (STP) is configured the same on all devices in the network. Use the show processes | include ER command to verify that no essential Layer 2 processes are in the error state. Use the show spanning-tree blockedports command to display the ports that are blocked by STP. Use the show mac address-table dynamic vlan command to determine if learning or aging is occurring at each node.

See Troubleshooting VLANs and Troubleshooting STP for more information on troubleshooting Layer 2 issues.

Verifying Layer 3 Connectivity


Answer the following questions to verify Layer 3 connectivity: Have you configured a default gateway? Have you configured the same dynamic routing protocol parameters throughout your routing domain or configured static routes? Are any IP access lists, filters, or route maps blocking route updates?

Use the following commands to verify your routing configuration: show arp show ip routing show platform forwarding

See Ping and Traceroute to verify Layer 3 connectivity. See Troubleshooting Routing for more information on troubleshooting Layer 3 issues.

Overview of Symptoms
This article uses a symptom-based troubleshooting approach that allows you to diagnose and resolve your Cisco NX-OS problems by comparing the symptoms that you observed in your network with the symptoms listed in each chapter.

By comparing the symptoms in this publication to the symptoms that you observe in your own network, you should be able to diagnose and correct software configuration issues and inoperable hardware components so that the problems are resolved with minimal disruption to the network. Those problems and corrective actions include the following: Identify key Cisco NX-OS troubleshooting tools. Obtain and analyze protocol traces using SPAN and RSPAN or Ethanalyzer on the CLI. Identify or rule out physical port issues. Identify or rule out switch module issues. Diagnose and correct Layer 2 issues. Diagnose and correct Layer 3 issues. Recover from switch upgrade failures. Obtain core dumps and other diagnostic data for use by Cisco TAC or your customer support representative.

Verifying Layer 2 Connectivity

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide

System Messages
The system software sends syslog (system) messages to the console (and, optionally, to a logging server on another device). Not all messages indicate a problem with your device. Some messages are purely informational, while others might help diagnose problems with links, internal hardware, or the device software.

System Message Text


Message text is a text string that describes the condition. This portion of the message might contain detailed information about the event, including terminal port numbers, network addresses, or addresses that correspond to locations in the system memory address space. Because the information in these variable fields changes from message to message, it is represented here by short strings enclosed in square brackets ([ ]). A decimal number, for example, is represented as [dec ].

PORT-3-IF_UNSUPPORTED_TRANSCEIVER: Transceiver for interface [chars] is not supported. Use this string to find the matching system message in the NX-OS System Messages Reference. Each system message is followed by an explanation and recommended action. The action may be as simple as "No action is required." It may involve a fix or a recommendation to contact technical support as shown in the following example: Error Message PORT-3-IF_UNSUPPORTED_TRANSCEIVER: Transceiver for interface [chars] is not supported. Explanation Transceiver (SFP) is not from an authorized vendor. Recommended Action Enter the show interface transceiver CLI command or similar DCNM command to determine the transceiver being used. Please contact your customer support representative for a list of authorized transceiver vendors.

syslog Server Implementation


The syslog facility allows the Cisco NX-OS device to send a copy of the message log to a host for more permanent storage. This feature allows you to examine the logs over a long period of time or if the Cisco NX-OS device is not accessible. This example shows how to configure a Cisco NX-OS device to use the syslog facility on a Solaris platform. Although a Solaris host is being used, the syslog configuration on all UNIX and Linux systems is very similar. syslog uses the facility to determine how to handle a message on the syslog server (the Solaris system in this example) and the message severity. Different message severities are handled differently by the syslog server. They could be logged to different files or e-mailed to a particular user. Specifying a severity level on the syslog server determines that all messages of that level and greater severity (lower number) will be acted upon as you configure the syslog server.

Note: You should configure the syslog server so that the Cisco NX-OS messages are logged to a different file from the standard syslog file so that they cannot be confused with other non-Cisco syslog messages. Do not locate the logfile on System Messages 6

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide the / file system. You do not want log messages to fill up the / file system. This example uses the following values: syslog client: switch1 syslog server: 172.22.36.211 (Solaris) syslog facility: local1 syslog severity: notifications (level 5, the default) File to log Cisco NX-OS messages to: /var/adm/nxos_logs

To configure the syslog feature on Cisco NX-OS, follow these steps: 1. switch1# config terminal 2. switch1(config)# logging server 192.0.2.1 6 facility local1 Use the show logging server command to verify the syslog configuration. switch1# show logging server
Logging server: {172.22.36.211} server severity: server facility: server VRF: enabled notifications local1 management

To configure a syslog server, follow these steps: 1. Modify /etc/syslog.conf to handle local1 messages. For Solaris, you must allow at least one tab between the facility.severity and the action (/var/adm/nxos_logs). local1.notice /var/adm/nxos_logs 2. Create the log file. touch /var/adm/nxos_logs 3. Restart the syslog process. /etc/init.d/syslog stop /etc/init.d/syslog start
syslog service starting.

4. Verify that the syslog process has started. ps -ef |grep syslogd
root 23508 1 0 11:01:41 ? 0:00 /usr/sbin/syslogd

Test the syslog server by creating an event in Cisco NX-OS. In this case, port e1/2 was shut down and reenabled and the following was listed on the syslog server. The IP address of the switch is listed in brackets. tail -f /var/adm/MDS_logs syslog Server Implementation 7

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide

Sep 17 11:07:41 [172.22.36.142.2.2] : 2004 Sep 17 11:17:29 pacific: PORT-5-IF_DOWN_INITIALIZING: %$VLAN 1%$ Interf

Sep 17 11:07:49 [172.22.36.142.2.2] : 2004 Sep 17 11:17:36 pacific: %PORT-5-IF_UP: %$VLAN 1%$ Interface e 1/2 is u

Sep 17 11:07:51 [172.22.36.142.2.2] : 2004 Sep 17 11:17:39 pacific: %VSHD-5-VSHD_SYSLOG_CONFIG_I: Configuring cons

Troubleshooting with Logs


Cisco NX-OS generates many types of system messages on the device and sends them to a syslog server. You can view these messages to determine what events may have led up to the current problem condition that you are facing. Use the following commands to access and view logs in Cisco NX-OS: switch# show logging ?
console info internal ip last level logfile loopback module monitor nvram onboard pending pending-diff server session status timestamp Show console logging configuration Show logging configuration syslog syslog internal information IP configuration Show last few lines of logfile Show facility logging configuration Show contents of logfile Show logging loopback configuration Show module logging configuration Show monitor logging configuration Show NVRAM log show logging onboard server address pending configuration server address pending configuration diff Show server logging configuration Show logging session status Show logging status Show logging timestamp configuration

This example shows the output of the show logging server command: switch# show logging server
Logging server: {172.28.254.254} server severity: server facility: server VRF: enabled notifications local7 management

Troubleshooting Modules
You can directly connect to a module console port to troubleshoot module bootup issues. Use the attach console module command to connect to the module console port.

Viewing NVRAM logs


System messages that are priority 0, 1, or 2 are logged into NVRAM on the supervisor module. After a switch reboots, you can display these syslog messages in NVRAM by using the show logging nvram command.

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide switch# show logging nvram
2008 Sep 10 15:51:58 switch %$ VDC-1 %$ %SYSMGR-2-NON_VOLATILE_DB_FULL: System n on-volatile storage usage is unexpectedly high at 99%. 2008 Sep 10 15:52:13 switch %$ VDC-1 %$ %PLATFORM-2-PFM_SYSTEM_RESET: Manual sys tem restart from Command Line Interface 2008 Sep 10 15:57:49 switch %$ VDC-1 %$ %KERN-2-SYSTEM_MSG: Starting kernel... kernel 2008 Sep 10 15:58:00 switch %$ VDC-1 %$ %CARDCLIENT-2-REG: Sent 2008 Sep 10 15:58:01 switch %$ VDC-1 %$ %USER-1-SYSTEM_MSG: R2D2: P1 SUP NO GMTL FOR P1 SUP - r2d2 2008 Sep 10 15:58:01 switch %$ VDC-1 %$ %USER-1-SYSTEM_MSG: R2D2: P1 SUP NO GMTL FOR P1 SUP - r2d2 2008 Sep 10 15:58:05 switch %$ VDC-1 %$ %USER-1-SYSTEM_MSG: R2D2: P1 SUP: Reset Tx/Rx during QOS INIT - r2d2 2008 Sep 10 15:58:16 switch %$ VDC-1 %$ %USER-2-SYSTEM_MSG: can't dlsym ssnmgr_i s_session_command: please link this binary with ssnmgr.so! - svi 2008 Sep 10 15:58:16 switch %$ VDC-1 %$ %CARDCLIENT-2-SSE: LC_READY sent 2008 Sep 10 15:58:17 switch %$ VDC-1 %$ snmpd: load_mib_module :Error, while loa ding the mib module /isan/lib/libpmsnmp_common.so (/isan/lib/libpmsnmp_common.so : undefined symbol: sme_mib_get_if_info) 2008 Sep 10 15:58:17 switch %$ VDC-1 %$ %CARDCLIENT-2-SSE: MOD:6 SUP ONLINE 2008 Sep 10 15:58:17 switch %$ VDC-1 %$ %VDC_MGR-2-VDC_LIC_WARN: Service using g race period will be shutdown in 9 day(s)

Contacting Customer Support


If you are unable to solve a problem after using the troubleshooting suggestions in these articles, contact a customer service representative for assistance and further instructions. Before you call, have the following information ready to help your service provider assist you as quickly as possible: Date that you received the switch Chassis serial number (located on a label on the right side of the rear panel of the chassis) Type of software and release number Maintenance agreement or warranty information Brief description of the problem Brief explanation of the steps that you have already taken to isolate and resolve the problem

For more information on steps to take before calling Technical Support, see Before Contacting Technical Support.

See Also
Before Contacting Technical Support

Further Reading
The following links contain further information on this topic from Cisco.com: Cisco NX-OS System Messages Reference

Viewing NVRAM logs

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide

External Links
External links contain content developed by external authors. Cisco does not review this content for accuracy. Nexus: Hands on with NX-OS, Part#1

This article describes how to identify and resolve problems that might occur when upgrading or restarting. Guide Contents Troubleshooting Overview Troubleshooting Installs, Upgrades, and Reboots (this section) Troubleshooting Licensing Troubleshooting VDCs Troubleshooting CFS Troubleshooting Ports Troubleshooting vPCs Troubleshooting VLANs Troubleshooting STP Troubleshooting Routing Troubleshooting Unicast Traffic Troubleshooting WCCP Troubleshooting Memory Troubleshooting Packet Flow Issues Troubleshooting FCoE Before Contacting Technical Support Troubleshooting Tools and Methodology

Contents
1 Information About Upgrades and Reboots 2 Upgrades and Reboot Checklist 3 Verifying Software Upgrades 4 Verifying a Nondisruptive Upgrade 4.1 Using ROM Monitor Mode 5 Troubleshooting Software Upgrades and Downgrades 5.1 Software Upgrade Ends with Error 5.2 Upgrading Cisco NX-OS Software 6 Troubleshooting Software System Reboots 6.1 Power-On or Switch Reboot Hangs 6.2 Corrupted Bootflash Recovery 6.3 Recovery from the loader> Prompt on Supervisor Modules 6.4 Recovery from the loader> Prompt 6.5 Recovery from the switch(boot)# Prompt 6.6 Recovery for Systems with Dual Supervisor Modules 6.6.1 Recovering One Supervisor Module With Corrupted Bootflash 6.6.2 Recovering Both Supervisor Modules with Corrupted Bootflash 6.7 System or Process Resets External Links 10

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide 6.8 Recoverable System Restarts 6.9 Unrecoverable System Restarts 6.10 Standby Supervisor Fails to Boot 6.11 Recovering the Administrator Password 7 See Also 8 Further Reading 9 External Links

Information About Upgrades and Reboots


Cisco NX-OS consists of two images--the kickstart image and the system image. In order to bring up the system, both images should have the same image version. Upgrades and reboots are ongoing network maintenance activities. You should try to minimize the risk of disrupting the network when performing these operations in production environments and to know how to recover quickly when something does go wrong. Note: This publication used the term upgrade to refer to both Cisco NX-OS upgrades and downgrades.

Upgrades and Reboot Checklist


Use the following checklist to prepare for an upgrade:

Checklist Read the Release Notes for the release that you are upgrading or downgrading to. Ensure that an FTP or TFTP server is available to download the software images. Copy the new image onto your supervisor modules in bootflash: or slot0:. Use the show install all impact command to verify that the new image is healthy and the impact that the new load will have on any hardware with regards to compatibility. Check for compatibility. Copy the startup-config file to a snapshot configuration in NVRAM. This step creates a backup copy of the startup-config file (see the Rollback chapter in the Cisco NX-OS System Management Configuration Guide). Save your running configuration to the startup configuration. Back up a copy of your configuration to a remote TFTP server. Schedule your upgrade during an appropriate maintenance window for your network.

Check off

After you have completed the checklist, you are ready to upgrade the systems in your network. Note: It is normal for the active supervisor to become the standby supervisor during an upgrade. Note: Log messages are not saved across system reboots. However, a maximum of 100 log messages with a severity level of critical and below (levels 0, 1, and 2) are saved in NVRAM. You can view this log at any time by entering the show logging nvram command.

Verifying Software Upgrades

You can use the show install all status command to watch the progress of your software upgrade or to view the ongoing install all command or the log of the last installed install all command from a console, SSH, or Telnet session. This command shows the install all output on both the active and standby supervisor module even if you are not connected to the console terminal. 11

Contents

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide

switch# show install all status There is an on-going installation... <---------------------- in progress installation Enter Ctrl-C to go back to the prompt. Verifying image bootflash:/b-4.0.0.104 -- SUCCESS Verifying image bootflash:/i-4.0.0.104 -- SUCCESS Extracting system version from image bootflash:/i-4.0.0.104. -- SUCCESS Extracting kickstart version from image bootflash:/b-4.0.0.104. -- SUCCESS Extracting loader version from image bootflash:/b-4.0.0.104. -- SUCCESS

switch# show install all status This is the log of last installation. <----------------- log of last install Verifying image bootflash:/b-4.0.0.104 -- SUCCESS Verifying image bootflash:/i-4.0.0.104 -- SUCCESS Extracting system version from image bootflash:/i-4.0.0.104. -- SUCCESS Extracting kickstart version from image bootflash:/b-4.0.0.104. -- SUCCESS Extracting loader version from image bootflash:/b-4.0.0.104. -- SUCCESS

Verifying a Nondisruptive Upgrade


When you initiate a nondisruptive upgrade, Cisco NX-OS notifies all services that an upgrade is about to start and finds out whether or not the upgrade can proceed. If a service cannot allow the upgrade to proceed at this time, then the service aborts the upgrade and you are prompted to enter the show install all failure-reason command to determine the reason why the upgrade cannot proceed.

Do you want to continue with the installation (y/n)? [n] y Install is in progress, please wait. Notifying services about the upgrade. >[# ] 0% -- FAIL. Return code 0x401E0066 (request timed out). Please issue "show install all failure-reason" to find the cause of the failure.<---prompt failure-reason Install has failed. Return code 0x401E0066 (request timed out). Please identify the cause of the failure, and try 'install all' again.

switch# show install all failure-reason Service: "cfs" failed to respond within the given time period. switch#

If a failure occurs for whatever reason (such as a save runtime state failure or module upgrade failure) after the upgrade is in progress, then the device reboots disruptively because the changes cannot be rolled back. In such cases, the upgrade has failed.

Verifying Software Upgrades

12

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide If you need further assistance to determine why an upgrade is unsuccessful, you should collect the details from the show tech-support command output and the console output from the installation, if available, before you contact your technical support representative.

Using ROM Monitor Mode


If your device does not find a valid system image to load, the system will start in ROM monitor mode. ROM monitor mode can also be accessed by interrupting the boot sequence during startup. From ROM monitor mode, you can boot the device or perform diagnostic tests. On most systems, you can enter ROM monitor mode by entering the reload EXEC command and then pressing the Break key on your keyboard or by using the Break key-combination (the default Break key combination is Ctrl-C) during the first 60 seconds of startup.

Troubleshooting Software Upgrades and Downgrades


This section describes how to troubleshoot a software installation upgrade or downgrade failure.

Software Upgrade Ends with Error


Problem Possible Cause The standby supervisor module bootflash: file system does not have sufficient space to accept the updated image. The specified system and kickstart images are not compatible. Solution Use the delete command to remove unnecessary files from the file system. Check the output of the installation process for details on the incompatibility. Possibly update the kickstart image before updating the system image.

The install all command is entered on Enter the command on the active supervisor module only. the standby supervisor module. The upgrade ends with an error. A module was inserted while the upgrade was in progress. The system experienced a power disruption while the upgrade was in progress. Restart the installation. Restart the installation.

An incorrect software image path was Specify the entire path for the remote location accurately. specified. Another upgrade is already in progress. Module failed to upgrade. Verify the state of the system at every stage and restart the upgrade after 10 seconds. If you restart the upgrade within 10 seconds, the command is rejected. An error message displays, indicating that an upgrade is currently in progress. Restart the upgrade or use the install module command to upgrade the failed module.

Verifying a Nondisruptive Upgrade

13

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide

Upgrading Cisco NX-OS Software


To perform an automated software upgrade on any system from the CLI, follow these steps: 1. Log into the system through the console, Telnet, or SSH port of the active supervisor. 2. Create a backup of your existing configuration file, if required. 3. Perform the upgrade by entering the install all command. 4. Exit the system console and open a new terminal session to view the upgraded supervisor module by using the show module command. Tip: Always carefully read the output of the install all compatibility check command. This compatibility check tells you exactly what needs to be upgraded (such as the BIOS, loader, or firmware) and what modules will experience a disruptive upgrade. If there are any questions or concerns about the results of the output, type n to stop the installation and contact the next level of support. The following example shows an upgrade using the install all command with the source images located on an SCP server.

switch# install all system scp://testuser@tftp-server1/tftpboot/rel/qa/4.0/final/m95 00-sf1ek9-mz.4.0.bin kickstart scp://testuser@tftp-server1/tftpboot/rel/qa/4.0/fin al/n7000-s1-kickstart-mz.4.0.bin


For scp://testuser@tftp-server1, please enter password: For scp://testuser@tftp-server1, please enter password: Copying image from scp://testuser@pal/tftpboot/rel/qa/4.0/final/n7000-s1 -kickstart-mz.4.0.bin to bootflash:///n7000-s1-kickstart-mz.4.0.bin. [####################] 100% -- SUCCESS Copying image from scp://testuser@pal/tftpboot/rel/qa/4.0/final/n7000-s1 -mz.4.0.bin to bootflash:///n7000-s1-mz.4.0.bin. [####################] 100% -- SUCCESS Verifying image bootflash:///n7000-s1-kickstart-mz.4.0.bin [####################] 100% -- SUCCESS Verifying image bootflash:///n7000-s1-mz.4.0.bin [####################] 100% -- SUCCESS Extracting "slc" version from image bootflash:///n7000-s1-mz.4.0.bin. [####################] 100% -- SUCCESS Extracting "ips" version from image bootflash:///n7000-s1-mz.4.0.bin. [####################] 100% -- SUCCESS Extracting "svclc" version from image bootflash:///n7000-s1-mz.4.0.bin. [####################] 100% -- SUCCESS Extracting "system" version from image bootflash:///n7000-s1-mz.4.0.bin. [####################] 100% -- SUCCESS Extracting "kickstart" version from image bootflash:///n7000-s1-kickstart-mz .4.0.bin. [####################] 100% -- SUCCESS Extracting "loader" version from image bootflash:///n7000-s1-kickstart-mz.2. 1.1a.bin. [####################] 100% -- SUCCESS Compatibility check is done: Module bootable Impact ------ -------- --------------

Install-type ------------

Reason ------

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide


1 2 3 4 5 6 yes yes yes yes yes yes non-disruptive non-disruptive disruptive disruptive non-disruptive non-disruptive rolling rolling rolling rolling reset reset

Hitless upgrade is not supported Hitless upgrade is not supported

Images will be upgraded according to following table: Module Image Running-Version New-Version ------ ---------- -------------------- -------------------1 slc 2.0(2b) 2.1(1a) 1 bios v1.1.0(10/24/03) v1.1.0(10/24/03) 2 slc 2.0(2b) 2.1(1a) 2 bios v1.1.0(10/24/03) v1.1.0(10/24/03) 3 ips 2.0(2b) 2.1(1a) 3 bios v1.1.0(10/24/03) v1.1.0(10/24/03) 4 svclc 2.0(2b) 2.1(1a) 4 svcsb 1.3(5m) 1.3(5m) 4 svcsb 1.3(5m) 1.3(5m) 4 bios v1.1.0(10/24/03) v1.1.0(10/24/03) 5 system 2.0(2b) 2.1(1a) 5 kickstart 2.0(2b) 2.1(1a) 5 bios v1.1.0(10/24/03) v1.1.0(10/24/03) 5 loader 1.2(2) 1.2(2) 6 system 2.0(2b) 2.1(1a) 6 kickstart 2.0(2b) 2.1(1a) 6 bios v1.1.0(10/24/03) v1.1.0(10/24/03) 6 loader 1.2(2) 1.2(2) Do you want to continue with the installation (y/n)? Install is in progress, please wait.

Upg-Required -----------yes no yes no yes no yes no no no yes yes no no yes yes no no

[n] '''y'''

Syncing image bootflash:///n7000-s1-kickstart-mz.4.0.bin to standby. [####################] 100% -- SUCCESS Syncing image bootflash:///n7000-s1-mz.4.0.bin to standby. [####################] 100% -- SUCCESS Setting boot variables. [####################] 100% -- SUCCESS Performing configuration copy. [####################] 100% -- SUCCESS Module 5: Waiting for module online. 2005 May 20 15:46:03 ca-9506 %KERN-2-SYSTEM_MSG: mts: HA communication with standby terminated. Please check the -- SUCCESS "Switching over onto standby".

If the configuration meets all guidelines when the install all command is used, all modules (supervisor and switching) are upgraded.

Troubleshooting Software System Reboots


This section describes how to troubleshoot software reboots.

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide

Power-On or Switch Reboot Hangs


Problem Possible Cause Solution The bootflash is Use the Recovery for Systems with Dual Supervisor Modules procedure. corrupted. The BIOS is corrupted. The kickstart A power-on or switch reboot image is hangs for a dual supervisor corrupted. configuration. Replace this module. Contact your customer support representative to return the failed module. Power cycle the switch if required and enter CTRL-C when the switch says "Loading Boot Loader" to interrupt the boot process at the >loader prompt. Use the Recovery from the loader> Prompt on Supervisor Modules procedure to update the kickstart image.

Boot parameters Verify and correct the boot parameters and reboot. are incorrect. The system image is corrupted. Power cycle the switch if required and enter CTRL-] when the switch says "Checking all filesystems....r. done." to interrupt the boot process at the switch#boot prompt. Use the Recovery from the switch(boot)# Prompt procedure to update the system image..

Corrupted Bootflash Recovery


All device configurations reside in the internal bootflash. If you have a corrupted internal bootflash, you could potentially lose your configuration. Be sure to save and back up your configuration files periodically. The regular system boot goes through the following sequence (see Figure 1): 1. The basic input/output system (BIOS) loads the loader. 2. The loader loads the kickstart image into RAM and starts the kickstart image. 3. The kickstart image loads and starts the system image. 4. The system image reads the startup-configuration file.

Figure 1 Regular Boot Sequence

Power-On or Switch Reboot Hangs

16

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide If the images on your system are corrupted and you cannot proceed (error state), you can interrupt the system boot sequence and recover the image by entering the BIOS configuration utility described in the following section. Access this utility only when needed to recover a corrupted internal disk. Caution: The BIOS changes explained in this section are required only to recover a corrupted bootflash. Recovery procedures require the regular sequence to be interrupted. The internal sequence goes through four phases between the time that you turn on the system and the time that the system prompt appears on your terminal--BIOS, boot loader, kickstart, and system.

Recovery Interruption

Phase

Normal Prompt--appears at the end of each phase.

Recovery Prompt--appears when the system cannot Description progress to the next phase. No bootable device The BIOS begins the power-on self test, memory test, and other operating system applications. While the test is in progress, press Ctrl-C to enter the BIOS configuration utility and use the netboot option. The boot loader uncompresses the loaded software to boot an image using its filename as a reference. These images are made available through bootflash. When the memory test is over, press Esc to enter the boot loader prompt. When the boot loader phase is over, press Ctrl-] (Control key plus right bracket key) to enter the switch(boot)# prompt. Depending on your Telnet client, these keys may be reserved, and you may need to remap the keystroke. See the documentation provided by your Telnet client. If the corruption causes the console to stop at this prompt, copy the system image and reboot the system. The system image loads the configuration file of the last saved running configuration and returns a switch login prompt.

BIOS

loader>

Boot loader

Starting kickstart

loader>

Kickstart

Uncompressing system

switch(boot)#

System

Login:

--

Figure 2 Regular and Recovery Sequence

Corrupted Bootflash Recovery

17

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide

Recovery from the loader> Prompt on Supervisor Modules


Caution: This procedure uses the init system command, which reformats the file system of the device. Be sure that you have made a backup of the configuration files before you begin this procedure. The loader> prompt is different from the regular switch# prompt. The CLI command completion feature does not work at the loader> prompt and may result in undesired errors. You must type the command exactly as you want the command to appear. Note: If you boot over TFTP from the loader> prompt, you must supply the full path to the image on the remote server. Note: The TFTP boot method is available only as a backup for diagnostics and for repairing bootflash corruption. The TFTP boot method is not intended to bring up the system to a fully operational state. Reloading the system is mandatory after all diagnostics and repairs have been completed. Use the help command at the loader> prompt to display a list of commands available at this prompt or to obtain more information about a specific command in that list. To recover a corrupted kickstart image (system error state) for a system with a single supervisor module, follow these steps:

1. Enter the local IP address and subnet mask for the system at the loader> prompt, and press Enter.
loader> set ip 172.16.1.2 255.255.255.0

2. Specify the IP address of the default gateway.


loader> set gw 172.16.1.1

3. Boot the kickstart image file from the required server.


loader> boot tftp://172.16.10.100/tftpboot/n7000-s1-kickstart-4.0.bin

Recovery from the loader> Prompt on Supervisor Modules

18

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide In this example, 172.16.10.100 is the IP address of the TFTP server, and n7000-s1-kickstart-4.0.bin is the name of the kickstart image file that exists on that server. The switch(boot)# prompt indicates that you have a usable Kickstart image. 4. Enter the init system command at theswitch(boot)# prompt.
switch(boot)# init system

Caution: Be sure that you have made a backup of the configuration files before you enter this command. 5. Follow the procedure specified in the Recovery from the switch(boot)# Prompt procedure.

Recovery from the loader> Prompt


Caution: This procedure uses the init system command, which reformats the file system of the device. Be sure that you have made a backup of the configuration files before you begin this procedure. Note: The loader>prompt is different from the regular switch# or switch(boot)# prompt. The CLI command completion feature does not work at the loader> prompt and may result in undesired errors. You must type the command exactly as you want the command to appear. Note: If you boot over TFTP from the loader> prompt, you must supply the full path to the image on the remote server. Tip: Use the help command at the loader> prompt to display a list of commands available at this prompt or to obtain more information about a specific command in that list. To recover a corrupted kickstart image (system error state) for a system with a single supervisor module, follow these steps:

1. Specify the local IP address and the subnet mask for the system.
loader> set ip 172.21.55.213 255.255.255.224 set ip 172.21.55.213 255.255.255.224 Correct - ip addr is 172.21.55.213, mask is 255.255.255.224 Found Intel 82546GB [2:9.0] at 0xe040, ROM address 0xf980 Probing...[Intel 82546GB] Management interface Link UP in 1000/full mode Ethernet addr: 00:1B:54:C1:28:60 Address: 172.21.55.213 Netmask: 255.255.255.224 Server: 0.0.0.0 Gateway: 172.21.55.193

2. Specify the IP address of the default gateway.


loader> set gw 172.21.55.193 Correct gateway addr 172.21.55.193 Address: 172.21.55.213 Netmask: 255.255.255.224 Server: 0.0.0.0 Gateway: 172.21.55.193

3. Boot the kickstart image file from the required server.


loader> loader> '''boot tftp://172.28.255.18/tftpboot/n7000-s1-kickstart.4.0.3.gbin ''' Address: 172.21.55.213 Netmask: 255.255.255.224

Recovery from the loader> Prompt

19

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide


Server: 172.28.255.18 Gateway: 172.21.55.193 Filesystem type is tftp, using whole disk Booting: /tftpboot/n7000-s1-kickstart.4.0.3.gbin console=ttyS0,9600n8nn quiet loader _ver="3.17.0".... .............................................................................Im age verification OK Starting kernel... INIT: version 2.85 booting Checking all filesystems..r.r.r.. done. Setting kernel variables: sysctlnet.ipv4.ip_forward = 0 net.ipv4.ip_default_ttl = 64 net.ipv4.ip_no_pmtu_disc = 1 . Setting the System Clock using the Hardware Clock as reference...System Clock set. Local time: Wed Oct 11:20:11 PST 2008 WARNING: image sync is going to be disabled after a loader netboot Loading system software No system image Unexporting directories for NFS kernel daemon...done. INIT: Sending processes the KILL signal Cisco Nexus Operating System (NX-OS) Software TAC support: http://www.cisco.com/tac Copyright (c) 2002-2008, Cisco Systems, Inc. All rights reserved. The copyrights to certain works contained in this software are owned by other third parties and used and distributed under license. Certain components of this software are licensed under the GNU General Public License (GPL) version 2.0 or the GNU Lesser General Public License (LGPL) Version 2.1. A copy of each such license is available at http://www.opensource.org/licenses/gpl-2.0.php and http://www.opensource.org/licenses/lgpl-2.1.php switch(boot)#

The switch(boot)# prompt indicates that you have a usable kickstart image. 4. Enter the init system command at the switch(boot)# prompt.
switch(boot)# init system

Caution: Be sure that you have made a backup of the configuration files before you enter this command. 5. Follow the procedure specified in the Recovery from the switch(boot)# Prompt.

Recovery from the switch(boot)# Prompt


To recover a system image using the kickstart image for a system with a single supervisor module, follow these steps:

1. Change to configuration mode and configure the IP address of the mgmt0 interface.
switch(boot)# config t switch(boot)(config)# interface mgmt0

2. Follow this step if you entered an init system command. Otherwise, skip to Step 3. a. Enter the ip address command to configure the local IP address and the subnet mask for the system.
switch(boot)(config-mgmt0)# ip address 172.16.1.2 255.255.255.0

Recovery from the switch(boot)# Prompt

20

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide b. Enter the ip default-gateway command to configure the IP address of the default gateway.
switch(boot)(config-mgmt0)# ip default-gateway 172.16.1.1

3. Enter the no shutdown command to enable the mgmt0 interface on the system.
switch(boot)(config-mgmt0)# no shutdown

4. Enter end to exit to EXEC mode.


switch(boot)(config-mgmt0)# end

5. If you believe there are file system problems, enter the init system check-filesystem command. This command checks all internal file systems and fixes any errors that are encountered. This command takes a few minutes to complete.
switch(boot)# init system check-filesytem

6. Copy the system image from the required TFTP server.


switch(boot)# copy tftp://172.16.10.100/system-image1 bootflash:system-image1

7. Copy the kickstart image from the required TFTP server.


switch(boot)# copy tftp://172.16.10.100/kickstart-image1 bootflash:kickstart-image1

8. Verify that the system and kickstart image files are copied to your bootflash: file system.
switch(boot)#''' 12456448 Jul 12288 Jun 27602159 Jul dir bootflash: ''' 30 23:05:28 1980 kickstart-image1 23 14:58:44 1980 lost+found/ 30 23:05:16 1980 system-image1

Usage for bootflash://sup-local 135404544 bytes used 49155072 bytes free 184559616 bytes total

9. Load the system image from the bootflash: files system.


switch(boot)# '''load bootflash:system-image1''' Uncompressing system image: bootflash:/system-image1 CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC Would you like to enter the initial configuration mode? (yes/no): yes

Note: If you enter no, you will return to the switch# login prompt, and you must manually configure the system.

Recovery for Systems with Dual Supervisor Modules


This section describes how to recover when one or both supervisor modules in a dual supervisor system have corrupted bootflash.

Recovering One Supervisor Module With Corrupted Bootflash If one supervisor module has a functioning bootflash and the other has a corrupted bootflash, follow these steps:

Recovery for Systems with Dual Supervisor Modules

21

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide 1. Boot the functioning supervisor module and log on to the system. 2. At the switch# prompt on the booted supervisor module, enter the reload module slot force-dnld command, where slot is the slot number of the supervisor module with the corrupted bootflash. The supervisor module with the corrupted bootflash performs a netboot and checks the bootflash for corruption. When the bootup scripts discover that the bootflash is corrupted, it generates an init system command, which fixes the corrupt bootflash. The supervisor boots as the HA Standby. Caution: If your system has an active supervisor module currently running, you must enter the system standby manual-boot command in EXEC mode on the active supervisor module before entering the init system command on the standby supervisor module to avoid corrupting the internal bootflash:. After the init system command completes on the standby supervisor module, enter the system no standby manual-boot command in EXEC mode on the active supervisor module. Recovering Both Supervisor Modules with Corrupted Bootflash If both supervisor modules have corrupted bootflash, follow these steps:

1. Boot the system and press the Esc key after the BIOS memory test to interrupt the boot loader. Note: Press Esc immediately after you see the following message: 00000589K Low Memory Passed00000000K Ext Memory PassedHit ^C if you want to run SETUP....Wait.....If you wait too long, you will skip the boot loader phase and enter the kickstart phase. You see the loader> prompt. Caution: The loader> prompt is different from the regular switch# or switch(boot)# prompt. The CLI command completion feature does not work at the loader> prompt and may result in undesired errors. You must type the command exactly as you want the command to appear. Tip: Use the help command at the loader> prompt to display a list of commands available at this prompt or to obtain more information about a specific command in that list. 2. Specify the local IP address and the subnet mask for the system.
loader> set ip 172.21.55.213 255.255.255.224 set ip 172.21.55.213 255.255.255.224 Correct - ip addr is 172.21.55.213, mask is 255.255.255.224 Found Intel 82546GB [2:9.0] at 0xe040, ROM address 0xf980 Probing...[Intel 82546GB] Management interface Link UP in 1000/full mode Ethernet addr: 00:1B:54:C1:28:60 Address: 172.21.55.213 Netmask: 255.255.255.224 Server: 0.0.0.0 Gateway: 172.21.55.193

3. Specify the IP address of the default gateway.


loader> set gw 172.21.55.193 Correct gateway addr 172.21.55.193 Address: 172.21.55.213 Netmask: 255.255.255.224 Server: 0.0.0.0 Gateway: 172.21.55.193

4. Boot the kickstart image file from the required server.

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide


loader> loader> '''boot tftp://172.28.255.18/tftpboot/n7000-s1-kickstart.4.0.3.gbin ''' Address: 172.21.55.213 Netmask: 255.255.255.224 Server: 172.28.255.18 Gateway: 172.21.55.193 Filesystem type is tftp, using whole disk Booting: /tftpboot/n7000-s1-kickstart.4.0.3.gbin console=ttyS0,9600n8nn quiet loader _ver="3.17.0".... .............................................................................Im age verification OK

Starting kernel... INIT: version 2.85 booting Checking all filesystems..r.r.r.. done. Setting kernel variables: sysctlnet.ipv4.ip_forward = 0 net.ipv4.ip_default_ttl = 64 net.ipv4.ip_no_pmtu_disc = 1 . Setting the System Clock using the Hardware Clock as reference...System Clock set. Local time: Wed Oct 11:20:11 PST 2008 WARNING: image sync is going to be disabled after a loader netboot Loading system software No system image Unexporting directories for NFS kernel daemon...done. INIT: Sending processes the KILL signal Cisco Nexus Operating System (NX-OS) Software TAC support: http://www.cisco.com/tac Copyright (c) 2002-2008, Cisco Systems, Inc. All rights reserved. The copyrights to certain works contained in this software are owned by other third parties and used and distributed under license. Certain components of this software are licensed under the GNU General Public License (GPL) version 2.0 or the GNU Lesser General Public License (LGPL) Version 2.1. A copy of each such license is available at http://www.opensource.org/licenses/gpl-2.0.php and http://www.opensource.org/licenses/lgpl-2.1.php switch(boot)#

The switch(boot)# prompt indicates that you have a usable kickstart image. Note: If you boot over TFTP from the loader> prompt, you must supply the full path to the image on the remote server. 5. Enter the init-system command to repartition and format the bootflash. 6. Perform the steps in the Recovery from the switch(boot)# Prompt procedure. 7. Perform the steps in the Recovering One Supervisor Module With Corrupted Bootflash procedure to recover the other supervisor module. Note: If you do not enter the reload module command when a boot failure has occurred, the active supervisor module automatically reloads the standby supervisor module within 3 to 6 minutes after the failure.

System or Process Resets


When a recoverable or nonrecoverable error occurs, the system or a process on the system may reset. See Table 2-4 for possible causes and solutions. Problem Possible Cause Solution

The system or a process on the system resets. Recovering Both Supervisor Modules with Corrupted Bootflash

The system has automatically recovered from the problem. Use A recoverable error occurred on the the Recoverable System Restarts procedure and the System or system or on a process in the system. Process Resets procedure. 23

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide A nonrecoverable error occurred on the system. A clock module failed. The system cannot recover automatically from the problem. Use the Recoverable System Restarts procedure to determine the cause. Verify that a clock module failed. Replace the failed clock module during the next maintenance window.

Recoverable System Restarts


Every process restart generates a syslog message and a Call Home event. Even if the event does not affect service, you should identify and resolve the condition immediately because future occurrences could cause a service interruption. To respond to a recoverable system restart, follow these steps:

1. Check the syslog file to see which process restarted and why it restarted.
switch# show log logfile | include error

For information about the meaning of each message, see the Cisco NX-OS System Messages Reference. The system output looks like the following example:
Sep 10 23:31:31 dot-6 % LOG_SYSMGR-3-SERVICE_TERMINATED: Service "sensor" (PID 704) has finished with error code SYSMGR_EXITCODE_SY. switch# show logging logfile | include fail Jan 27 04:08:42 88 %LOG_DAEMON-3-SYSTEM_MSG: bind() fd 4, family 2, port 123, ad dr 0.0.0.0, in_classd=0 flags=1 fails: Address already in use Jan 27 04:08:42 88 %LOG_DAEMON-3-SYSTEM_MSG: bind() fd 4, family 2, port 123, ad dr 127.0.0.1, in_classd=0 flags=0 fails: Address already in use Jan 27 04:08:42 88 %LOG_DAEMON-3-SYSTEM_MSG: bind() fd 4, family 2, port 123, ad dr 127.1.1.1, in_classd=0 flags=1 fails: Address already in use Jan 27 04:08:42 88 %LOG_DAEMON-3-SYSTEM_MSG: bind() fd 4, family 2, port 123, ad dr 172.22.93.88, in_classd=0 flags=1 fails: Address already in use Jan 27 23:18:59 88 % LOG_PORT-5-IF_DOWN: Interface fc1/13 is down (Link failure or not-connected) Jan 27 23:18:59 88 % LOG_PORT-5-IF_DOWN: Interface fc1/14 is down (Link failure or not-connected) Jan 28 00:55:12 88 % LOG_PORT-5-IF_DOWN: Interface fc1/1 is down (Link failure o r not-connected) Jan 28 00:58:06 88 % LOG_ZONE-2-ZS_MERGE_FAILED: Zone merge failure, Isolating p ort fc1/1 (VSAN 100) Jan 28 00:58:44 88 % LOG_ZONE-2-ZS_MERGE_FAILED: Zone merge failure, Isolating p ort fc1/1 (VSAN 100) Jan 28 03:26:38 88 % LOG_ZONE-2-ZS_MERGE_FAILED: Zone merge failure, Isolating p ort fc1/1 (VSAN 100) Jan 29 19:01:34 88 % LOG_PORT-5-IF_DOWN: Interface fc1/1 is down (Link failure o r not-connected) switch#

2. Identify the processes that are running and the status of each process.
switch# show processes

The following codes are used in the system output for the state (process state): D = uninterruptible sleep (usually I/O) R = runnable (on run queue) S = sleeping

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide T = traced or stopped Z = defunct (zombie) process NR = notrunning ER = should be running but currently notrunning Note: ER usually is the state that a process enters if it has been restarted too many times and has been detected as faulty by the system and disabled. The system output looks like the following example. (This output has been abbreviated to be more concise.)
PID ----1 2 3 4 5 6 71 136 140 431 443 446 452 453 456 469 470 State ----S S S S S S S S S S S S S S S S S PC -------2ab8e33e 0 0 0 0 0 0 0 0 2abe333e 2abfd33e 2ac1e33e 2abe91a2 2abe91a2 2ac73419 2abe91a2 2abe91a2 Start_cnt ----------1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 TTY ---S0 Process ------------init keventd ksoftirqd_CPU0 kswapd bdflush kupdated kjournald kjournald kjournald httpd xinetd sysmgr httpd httpd vsh httpd httpd

3. Show the processes that have had abnormal exits and to if there is a stack-trace or core dump.
switch# show process log Process PID ---------------- -----ntp 919 snsm 972 Normal-exit ----------N N Stack-trace ----------N Y Core ------N N Log-create-time --------------Jan 27 04:08 Jan 24 20:50

4. Show detailed information about a specific process that has restarted.


switch# show processes log pid 898 Service: idehsd Description: ide hotswap handler Daemon Started at Mon Sep 16 14:56:04 2002 (390923 us) Stopped at Thu Sep 19 14:18:42 2002 (639239 us) Uptime: 2 days 23 hours 22 minutes 22 seconds Start type: SRV_OPTION_RESTART_STATELESS (23) Death reason: SYSMGR_DEATH_REASON_FAILURE_SIGTERM (3) Exit code: signal 15 (no core) CWD: /var/sysmgr/work Virtual Memory: CODE 08048000 - 0804D660 DATA 0804E660 - 0804E824 BRK 0804E9A0 - 08050000 STACK 7FFFFD10 Register Set: EBX 00000003 ECX 0804E994 EDX 00000008 ESI 00000005 EDI 7FFFFC9C EBP 7FFFFCAC EAX 00000008 XDS 0000002B XES 0000002B EAX 00000003 (orig) EIP 2ABF5EF4 XCS 00000023 EFL 00000246 ESP 7FFFFC5C XSS 0000002B

Recoverable System Restarts

25

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide


Stack: 128 bytes. ESP 7FFFFC5C, TOP 7FFFFD10 0x7FFFFC5C: 0804F990 0804C416 00000003 0804E994 0x7FFFFC6C: 00000008 0804BF95 2AC451E0 2AAC24A4 0x7FFFFC7C: 7FFFFD14 2AC2C581 0804E6BC 7FFFFCA8 0x7FFFFC8C: 7FFFFC94 00000003 00000001 00000003 0x7FFFFC9C: 00000001 00000000 00000068 00000000 0x7FFFFCAC: 7FFFFCE8 2AB4F819 00000001 7FFFFD14 0x7FFFFCBC: 7FFFFD1C 0804C470 00000000 7FFFFCE8 0x7FFFFCCC: 2AB4F7E9 2AAC1F00 00000001 08048A2C PID: 898 SAP: 0 UUID: 0 switch# ................ .........Q.*.$.* .......*........ ................ ........h....... .......*........ ....p........... ...*...*....,...

5. Determine if the restart recently occurred.


switch# show system uptime Start Time: Fri Sep 13 12:38:39 2002 Up Time: 0 days, 1 hours, 16 minutes, 22 seconds

To determine if the restart is repetitive or a one-time occurrence, compare the length of time that the system has been up with the time stamp of each restart. 6. View the core files.
switch# show cores Module-num Process-name --------------------5 fspf 6 fcc 8 acltcam 8 fib PID Core-create-time -----------------1524 Jan 9 03:11 919 Jan 9 03:09 285 Jan 9 03:09 283 Jan 9 03:08

The output shows all cores that are presently available for upload from the active supervisor. The module-num column shows the slot number on which the core was generated. In the previous example, an FSPF core was generated on the active supervisor module in slot 5. An FCC core was generated on the standby supervisory module in slot 6. Core dumps generated on the module in slot 8 include ACLTCAM and FIB. Copy the FSPF core dump to a TFTP server with the IP address 1.1.1.1, as follows:
switch# copy core://5/1524 tftp::/1.1.1.1/abcd

Display the file named zone_server_log.889 in the log directory as follows:


switch# '''show pro log pid 1473''' ====================================================== Service: ips Description: IPS Manager

Started at Tue Jan 8 17:07:42 1980 (757583 us) Stopped at Thu Jan 10 06:16:45 1980 (83451 us) Uptime: 1 days 13 hours 9 minutes 9 seconds

Start type: SRV_OPTION_RESTART_STATELESS (23) Death reason: SYSMGR_DEATH_REASON_FAILURE_SIGNAL (2) Exit code: signal 6 (core dumped) CWD: /var/sysmgr/work

Recoverable System Restarts

26

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide

Virtual Memory:

CODE DATA BRK STACK TOTAL

08048000 - 080FB060 080FC060 - 080FCBA8 081795C0 - 081EC000 7FFFFCF0 20952 KB

Register Set:

EBX ESI EAX EAX EFL

000005C1 2AD701A8 00000000 00000025 (orig) 00000207

ECX EDI XDS EIP ESP

00000006 08109308 0000002B 2AC8CC71 7FFFF2C0

EDX EBP XES XCS XSS

2AD721E0 7FFFF2EC 0000002B 00000023 0000002B

Stack: 2608 bytes. ESP 7FFFF2C0, TOP 7FFFFCF0

0x7FFFF2C0: 2AC8C944 000005C1 00000006 2AC735E2 0x7FFFF2D0: 2AC8C92C 2AD721E0 2AAB76F0 00000000 0x7FFFF2E0: 7FFFF320 2AC8C920 2AC513F8 7FFFF42C 0x7FFFF2F0: 2AC8E0BB 00000006 7FFFF320 00000000 0x7FFFF300: 2AC8DFF8 2AD721E0 08109308 2AC65AFC 0x7FFFF310: 00000393 2AC6A49C 2AC621CC 2AC513F8 0x7FFFF320: 00000020 00000000 00000000 00000000 0x7FFFF330: 00000000 00000000 00000000 00000000 0x7FFFF340: 00000000 00000000 00000000 00000000 0x7FFFF350: 00000000 00000000 00000000 00000000 0x7FFFF360: 00000000 00000000 00000000 00000000 0x7FFFF370: 00000000 00000000 00000000 00000000 0x7FFFF380: 00000000 00000000 00000000 00000000 0x7FFFF390: 00000000 00000000 00000000 00000000 0x7FFFF3A0: 00000002 7FFFF3F4 2AAB752D 2AC5154C ... output abbreviated ... Stack: 128 bytes. ESP 7FFFF830, TOP 7FFFFCD0

D..*.........5.* ,..*.!.*.v.*.... ... ..*...*,... ...*.... ....... ...*.!.*.....Z.* .......*.!.*...* ............... ................ ................ ................ ................ ................ ................ ................ .

7. Enter the system cores tftp:[//servername][/path] command to configure the system to use TFTP to send the core dump to a TFTP server. This command causes the system to enable the automatic copy of core files to a TFTP server. For example, the following command sends the core files to the TFTP server with the IP address 10.1.1.1:
switch(config)# system cores tftp://10.1.1.1/cores

The following conditions apply: The core files are copied every 4 minutes. This time interval is not configurable. The copy of a specific core file to a TFTP server can be manually triggered, by using the command copy core://module#/pid# tftp://tftp_ip_address/file_name.

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide The maximum number of times that a process can be restarted is part of the high-availability (HA) policy for any process. (This parameter is not configurable.) If the process restarts more than the maximum number of times, the older core files are overwritten. The maximum number of core files that can be saved for any process is part of the HA policy for any process. (This parameter is not configurable, and it is set to three.) 8. Determine the cause and resolution for the restart condition by contacting your technical support representative and asking the representative to review your core dump.

See the Cisco NX-OS High Availability and Redundancy Guide for more information on high-availability policies.

Unrecoverable System Restarts


An unrecoverable system restart might occur in the following cases: A critical process fails and is not restartable. A process restarts more times than is allowed by the system configuration. A process restarts more frequently than is allowed by the system configuration. The effect of a process reset is determined by the policy configured for each process. An unrecoverable reset may cause functionality loss, the active supervisor to restart, a supervisor switchover, or the system to restart. To respond to an unrecoverable reset, see the Troubleshooting Cisco NX-OS Software System Reboots procedure. The show system reset-reason command displays the following information: The last four reset-reason codes for the supervisor modules are displayed. If either supervisor module is absent, the reset-reason codes for that supervisor module are not displayed. The show system reset-reason module number command displays the last four reset-reason codes for a specific module in a given slot. If a module is absent, then the reset-reason codes for that module are not displayed. The overall history of when and why expected and unexpected reloads occur The time stamp of when the reset or reload occurred The reason for the reset or reload of a module The service that caused the reset or reload (not always available) The software version that was running at the time of the reset or reload

switch# show system reset-reason module 6 ----- reset reason for Supervisor-module 6 (from Supervisor in slot 6) --1) At 281000 usecs after Wed Jun 25 20:16:34 2008 Reason: Reset Requested by CLI command reload Service: Version: 4.0(2.45) 2) At 791071 usecs after Wed Jun 25 20:04:50 2008 Reason: Reset Requested by CLI command reload Service: Version: 4.0(2.45) 3) At 70980 usecs after Wed Jun 25 19:55:52 2008 Reason: Reset Requested by CLI command reload Service: Version: 4.0(2) 4) At 891463 usecs after Wed Jun 18 23:44:48 2008 Reason: Reset Requested by CLI command reload Service: Version: 4.0(2)

Unrecoverable System Restarts

28

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide

Standby Supervisor Fails to Boot


The standby supervisor does not boot after an upgrade. You may see the following system message: Error Message SYSMGR-2-STANDBY_BOOT_FAILED: Standby supervisor failed to boot up. Explanation This message is printed if the standby supervisor doesn't complete its boot procedure (i.e. it doesn't reach the login prompt on the local console) 3 to 6 minutes after the loader has been loaded by the BIOS. This message is usually caused by boot variables not properly set for the standby supervisor. This message can also be caused by a user intentionally interrupting the boot procedure at the loader prompt (by means of pressing ESC). Recommended Action Connect to the local console of the standby supervisor. If the supervisor is at the loader prompt, try to use the boot command to continue the boot procedure. Otherwise, issue a reload command for the standby supervisor from a vsh session on the active supervisor, specifying the force-dnld option. Once the standby is online, fix the problem by setting the boot variables appropriately. Symptom Standby supervisor does not boot. Possible Cause Active supervisor kickstart image booted from TFTP. Solution Reload the active supervisor from bootflash:.

Recovering the Administrator Password


You can access the system if you forget the administrator password.

Problem You forgot the administrator password for accessing.

Solution Use the Password Recovery procedure to recover the password using a local console connection.

See Also
Cisco NX-OS/IOS Configuration Fundamentals Comparison

Further Reading
The following links contain further information on this topic from Cisco.com: Cisco Nexus 7000 Series Upgrade/Downgrade Guides Cisco Nexus 7000 Series Release Notes

External Links
External links contain content developed by external authors. Cisco does not review this content for accuracy. Nexus 7000 NX-OS Upgrade (walkthru example)

Standby Supervisor Fails to Boot

29

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide NX-OS Intro (part 1) (video)

This article describes how to troubleshoot licensing on a Cisco NX-OS device. Guide Contents Troubleshooting Overview Troubleshooting Installs, Upgrades, and Reboots Troubleshooting Licensing (this section) Troubleshooting VDCs Troubleshooting CFS Troubleshooting Ports Troubleshooting vPCs Troubleshooting VLANs Troubleshooting STP Troubleshooting Routing Troubleshooting Unicast Traffic Troubleshooting WCCP Troubleshooting Memory Troubleshooting Packet Flow Issues Troubleshooting FCoE Before Contacting Technical Support Troubleshooting Tools and Methodology

Contents
1 Information About Troubleshooting Licensing Issues 1.1 Chassis Serial Numbers 1.2 Swapping out a Chassis 1.3 Grace Period 2 Licensing Guidelines 3 Initial Troubleshooting Checklist 4 Displaying License Information Using the CLI 4.1 Example: Displays Information About Current License Usage 4.2 Example: Displays the List of Features in a Specified Package 4.3 Example: Displays the Host ID for the License 4.4 Example: Displays All Installed License Key Files and Contents 5 Licensing Installation Issues 5.1 Serial Number Issues 5.2 RMA Chassis Errors or License Transfers Between Systems 5.3 Receiving Grace Period Warnings After License Installation 5.4 Grace Period Alerts 5.5 License Listed as Missing 6 See Also 7 Further Reading 8 External Links

External Links

30

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide

Information About Troubleshooting Licensing Issues


Cisco NX-OS requires licenses select features. The icenses enable those features on your system. You must purchase a license for each system that you want to enable the licensed features on.

Note: You can enable a feature without installing the license. Cisco NX-OS provides a grace period that allows you to try out the feature before purchasing the license.

Chassis Serial Numbers


Licenses are created using the serial number of the chassis where the license file is to be installed. Once you order a license based on a chassis serial number, you cannot use this license on any other system.

Swapping out a Chassis


If you swap out a chassis which included licenses, you must contact TAC to generate a new license. The old license was based on the chassis serial number and will not work with the new chassis.

Grace Period
If you use a feature that requires a license but you have not installed a license for that feature, you are given a 120-grace period to evaluate the feature. You must purchase and install the number of licenses required for that feature before the grace period ends or Cisco NX-OS will disable the feature at the end of the grace period.

License packages can contain several features. If you disable a feature during the grace period and there are other features in that license package that are still enabled, the clock does not stop for that license package. To suspend the grace period countdown for a licensed feature, you must disable every feature in that license package. Use the show license usage command to determine which features are enabled for a license package.

Licensing Guidelines
Follow these guidelines when dealing with licenses for Cisco NX-OS: Do not ignore the grace period expiration warnings. Allow 60 days before the grace period expires to allow time for ordering, shipping, and installation for a new license purchase. Carefully determine the license(s) that you require based on the features that require a license. Order your license accurately, as follows: Enter the Product Authorization Key that appears in the Proof of Purchase document that comes with your system. Enter the correct chassis serial number when ordering the license. The serial number must be for the same chassis that you plan to install the license on. Use the show license host-id command to obtain your chassis serial number. Enter serial numbers accurately. Do not use the letter "O" instead of a zero in the serial number. Order the license that is specific to your chassis. Information About Troubleshooting Licensing Issues 31

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide Back up the license file to a remote, secure place. Archiving your license files ensures that you will not lose the licenses in the case of a failure on your system. Install the correct licenses on each system, using the licenses that were ordered using that system's serial number. Licenses are serial-number specific and platform specific. Use the show license usage command to verify the license installation. Never modify a license file or attempt to use it on a system that it was not ordered for. If you return a chassis, contact your customer support representative to order a replacement license for the new chassis.

Initial Troubleshooting Checklist


Begin troubleshooting license issues by checking the following issues first:

Checklist Verify the chassis serial number for all licenses ordered. Verify the platform or module type for all licenses ordered. Verify that the Product Authorization Key that you used to order the licenses comes from the same chassis that you retrieved the chassis serial number on. Verify that you have installed all licenses on all systems that require the licenses for the features you enable.

Check off

Displaying License Information Using the CLI


Use the show license commands to display all license information configured on this system.
Example: Displays Information About Current License Usage switch(config)# show license usage Feature Ins Lic

Status Expiry Date Comments Count -------------------------------------------------------------------------------LAN_ADVANCED_SERVICES_PKG No In use Grace 102D 0H LAN_ENTERPRISE_SERVICES_PKG No In use Grace 103D 22H ------------------------------------------------------------------------------------------

Example: Displays the List of Features in a Specified Package switch(config)# show license usage LAN_ENTERPRISE_SERVICES_PKG Application ----------pbr Tunnel -----------

Licensing Guidelines

32

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide


Example: Displays the Host ID for the License switch# show license host-id License hostid: VDH=FOX0646S017

Note: Use the entire ID that appears after the colon (:) . The VHD is the Vendor Host ID.

Example: Displays All Installed License Key Files and Contents switch# show license entp.lic: SERVER this_host ANY VENDOR cisco INCREMENT LAN_ENTERPRISE_SERVICES_PKG cisco 1.0 permanent uncounted \ VENDOR_STRING=<LIC_SOURCE>MDS_SWIFT</LIC_SOURCE><SKU>N7K-LAN1K9=</SKU> \ HOSTID=VDH=TBC10412106 \ > NOTICE="<LicFileID>20071025133322456</LicFileID>LicLineID>1/LicLineID> \

Licensing Installation Issues


Common problems with licenses usually occur from incorrectly ordering the license file, installing the license file on an incorrect system, or not ordering the correct number of licenses for your fabric.

Serial Number Issues


Make sure that you use the correct chassis serial number when ordering your license. Use the show license host-id command to obtain the correct chassis serial number for your system using the CLI.

If you use a license meant for another chassis, you may see the following system message:

Error Message: LICMGR-3-LOG_LIC_INVALID_HOSTID: Invalid license hostid VDH=[chars] for feature [chars]. Explanation: The feature has a license with an invalid license Host ID. This can happen if a supervisor module with licensed features for one system is installed on another system. Recommended Action: Reinstall the correct license for the chassis where the supervisor module is installed.

When entering the chassis serial number during the license ordering process, do not use the letter "O" instead of any zeros in the serial number.

Example: Displays the Host ID for the License

33

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide

RMA Chassis Errors or License Transfers Between Systems


A license is specific to the system for which it is issued and is not valid on any other system. If you need to transfer a license from one system to another, contact your technical support representative.

Receiving Grace Period Warnings After License Installation


If the license installation does not proceed correctly, or if you are using a feature that exists in a license package that you have not installed, you will continue to get grace period warnings.

Symptom

Possible Cause

Solution

The license file is copied to the system but is not Use the license install command to install the license. You are receiving grace period installed. warnings after a license Check your logs for any system messages for a failed license installation. The license installation installation. Use the show license usage command to determine failed. which feature is in use without a license.

Grace Period Alerts

Cisco NX-OS gives you a 120-day grace period. This grace period starts or continues when you are evaluating a feature for which you have not installed a license.

The grace period stops if you disable a feature that you are evaluating. If you enable that feature again without a valid license, the grace period countdown continues where it left off.

The grace period operates across all features in a license package. License packages can contain several features. If you disable a feature during the grace period and there are other features in that license package that are still enabled, the countdown does not stop for that license package. To suspend the grace period countdown for a license package, you must disable every feature in that license package.

The Cisco NX-OS license counter keeps track of all licenses on a system. If you are evaluating a feature and the grace period has started, you will receive console messages, SNMP traps, system messages, and daily Call Home messages.

Beyond that, the frequency of these messages become hourly during the last seven days of the grace period. The following example uses the VDC feature. On January 30th, you enabled the VDC feature, using the 120-day grace period. You will receive grace period ending messages as follows: Daily alerts from January 30th to May 21st Hourly alerts from May 22nd to May 30th

On May 31st, the grace period ends, and the VDC feature is automatically disabled. You will not be allowed to use multiple VDCs RMA Chassis Errors or License Transfers Between Systems 34

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide until you purchase a valid license.

Note: You cannot modify the frequency of the grace period messages.

Caution: After the final seven days of the grace period, the feature is turned off and your network traffic may be disrupted. Any future upgrade will enforce license requirements and the 120-day grace period. If you try to use an unlicensed feature, you may see one of the following system messages:

Error Message: LICMGR-2-LOG_LIC_GRACE_EXPIRED: Grace period expired for feature [chars]. Explanation: The unlicensed feature has exceeded its grace time period. Applications using this license will be shut down immediately. Recommended Action: Install the license file to continue using the feature.

Error Message: LICMGR-3-LOG_LICAPP_NO_LIC: Application [chars] running without [chars] license, shutdown in [dec] days. Explanation: The Application [chars1] has not been licensed. The application will work for a grace period of [dec] days after which it will be shut down unless a license file for the feature is installed. Recommended Action: Install the license to continue using the feature.

Error Message: LICMGR-3-LOG_LIC_LICENSE_EXPIRED: Evaluation license expired for feature [chars]. Explanation: The feature has exceeded its evaluation time period. The feature will be shut down after a grace period. Recommended Action: Install the license to continue using the feature.

Error Message: LICMGR-3-LOG_LIC_NO_LIC: No license(s) present for feature [chars]. Application(s) shutdown in [dec] days. Explanation: The feature has not been licensed. The feature will work for a grace period, after which the application(s) using the feature will be shut down. Recommended Action: Install the license to continue using the feature.

Error Message: LICMGR-6-LOG_LICAPP_EXPIRY_WARNING: Application [chars] evaluation license [chars] expiry in [dec] days. Explanation: The application will exceed its evaluation time period in the listed number of days and will be shut down unless a permanent license for the feature is installed. Recommended Action: Install the license file to continue using the feature. Grace Period Alerts 35

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide Use the show license usage command to display grace period information for a system.
switch(config)# show license usage Feature Ins Lic Status Expiry Date Comments Count -------------------------------------------------------------------------------LAN_ADVANCED_SERVICES_PKG No In use Grace 102D 0H LAN_ENTERPRISE_SERVICES_PKG No In use Grace 103D 22H ------------------------------------------------------------------------------------------

License Listed as Missing


After a license is installed and operating properly, it may show up as missing if you modify your system hardware or encounter a bootflash: issue.

Symptom

Possible Causes

Solutions Use the Corrupted Bootflash Recovery procedure to recover from the corrupted bootflash:. Reinstall the license.

The supervisor module was replaced A license is listed as after the license was installed. missing. The supervisor bootflash: is corrupted

See Also

Before Contacting Technical Support

Further Reading

The following links contain further information on this topic from Cisco.com:

Cisco Nexus 7000 Series Licensing Information

External Links

External links contain content developed by external authors. Cisco does not review this content for accuracy.

This article describes how to troubleshoot virtual device contexts (VDCs).

Guide Contents Troubleshooting Overview Troubleshooting Installs, Upgrades, and Reboots Troubleshooting Licensing Troubleshooting VDCs (this section) Troubleshooting CFS License Listed as Missing 36

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide Troubleshooting Ports Troubleshooting vPCs Troubleshooting VLANs Troubleshooting STP Troubleshooting Routing Troubleshooting Unicast Traffic Troubleshooting WCCP Troubleshooting Memory Troubleshooting Packet Flow Issues Troubleshooting FCoE Before Contacting Technical Support Troubleshooting Tools and Methodology

Contents
1 Information About Troubleshooting VDCs 2 Initial Troubleshooting Checklist 3 VDC Issues 3.1 You Cannot Create a VDC 3.2 You Cannot Log into a Device 3.3 You Cannot Switch to a VDC 3.4 You Cannot Delete a VDC 3.5 You Cannot Allocate an Interface to a VDC 3.5.1 Table: Port Numbers for Cisco Nexus 7000 Series 32-port 10-Gbps Ethernet module 3.6 The VDC Does Not Reflect a Resource Template Change 3.7 The VDC Remains in a Failed State 3.8 You Cannot Copy the Running-Config File to the Startup-Config File in a VDC 4 See Also 5 Further Reading 6 External Links

Information About Troubleshooting VDCs


Cisco NX-OS supports VDCs, which you can use to divide the physical NX-OS device into separate virtual devices. Each VDC appears as a unique device to the connected users. A VDC runs as a separate logical entity within the physical NX-OS device, maintains its own unique set of running software processes, has its own configuration, and can be managed by a separate administrator. VDC issues may not be directly related to VDC management. See the troubleshooting chapter that reflects your symptoms to find other issues related to VDCs. For instance, if you configure a VDC template that limits the number of port channels in that VDC, you may experience problems if you try to create more port channels than the VDC template allows.

VDC templates set limits on the following features: Port channels SPAN sessions IPv4 route map memory 37

External Links

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide VLANs Virtual routing and forwarding instances (VRFs)

The minimum resource value configures the guaranteed limit for that feature. The maximum resource value represents oversubscription for the feature and is available on a first-come,first-served basis.

Note: When you allocate an interface to a VDC, Cisco NX-OS removes all configuration for that interface. See the Cisco NX-OS Virtual Device Context Configuration Guide for more information on VDCs or for details on any VDC configuration changes recommended in this article.

Initial Troubleshooting Checklist


Begin troubleshooting VDC issues by checking the following issues first:

Checklist Verify that you are logged into the device as network-admin if you are creating or modifying VDCs. Verify that you are in the correct VDC. You must be in the default VDC to configure VDCs. Verify that you have installed the Advanced Services license to configure VDCs. Verify that you are not attempting to create more than three VDCs.

Check off

Use the following commands to display VDC information: show vdc membership - Displays information about which interfaces are assigned to a VDC. show vdc resource - Displays information about the resources assigned (Command is available only in the default VDC). show vdc current-vdc - Displays the VDC you are currently in.

VDC Issues
Problems with VDCs usually occur from logging into the incorrect VDC or misallocating resources for a VDC.

You Cannot Create a VDC


When you have a problem with creating a VDC, you may see one of the following system messages:

Error Message: VDC_MGR-2-VDC_BAD: vdc_mgr: There has been a failure at res_mgr Explanation: You cannot create a VDC because not enough resources are available based on the template configuration. If no template is used, a default template is applied. Information About Troubleshooting VDCs 38

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide Recommended Action: Verify that you have sufficient resources available to create this VDC by using the show vdc resources [detail] or show vdc resource template command. Modify the template that you are using to create the VDC or create a new template with resource limits that are currently available.

Error Message: VDC_MGR-2-VDC_BAD: vdc_mgr: : There has been a failure at sys_mgr Explanation: Some services crashed or failed to come up because of insufficient system resources other than what can be reserved using the resource templates. These dynamic resources are based on system utilization and may not be available to support a new VDC. Recommended Action: Use the show system internal sysmgr service running command to determine what caused the failure.

Symptom

Possible Cause

Solution

You are not logged in as Log into the device with an account that has network-admin privileges. network-admin. You are not logged into You cannot Use the switchto command to switch to the default VDC to allocate resources. create a VDC. the default VDC. Use the show vdc resources [detail] or show vdc resource template command to There are not enough determine your available resources. Modify your template or create a VDC with fewer resources. resources by using the limit-resource command in VDC configuration mode.

You Cannot Log into a Device


You may have a problem when logging into a device.

Symptom You cannot log into a device.

Possible Cause There is no account information for the VDC.

Solution Log into the device as network-admin and use the switchto command to switch to the VDC and configure the password and network connectivity for this VDC.

You are using an incorrect Log into the device with the account created for that VDC. VDC username.

You Cannot Switch to a VDC


You may have a problem when you switch to another VDC.

Symptom You cannot switch to a VDC.

Possible Cause You are not logged in as network-admin or network-operator.

Solution Log into the device with an account that has the correct privileges.

You Cannot Create a VDC

39

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide

You Cannot Delete a VDC


When you have a problem with deleting a VDC, you may see one of the following system messages:

Error Message: VDC_MGR-2-VDC_UNGRACEFUL: vdc_mgr: Ungraceful cleanup request received for vdc [dec], restart count for this vdc is [dec] Explanation: Vdc_mgr has begun an ungraceful cleanup for a VDC. Recommended Action: No action is required.

Error Message: VDC_MGR-2-VDC_OFFLINE: vdc [dec] is now offline Explanation: Vdc_mgr has finished deleting a VDC. Recommended Action: No action is required.

Symptom You cannot delete a VDC.

Possible Cause

Solution

You attempted to delete the default VDC. You cannot delete the default VDC. Unknown errors occurred when deleting Use the show tech-support VDC command to gather more a VDC. information.

You Cannot Allocate an Interface to a VDC


When you have a problem with creating a VDC, you may see the following system message:

Error Message: VDC_MGR-2-VDC_BAD: vdc_mgr: There has been a failure at gim (port_affected_list). Explanation: An interface allocation has failed. Recommended Action: Use the show vdc membership status or show interface brief command to gather more information.

Symptom

Possible Cause

Solution Use the switchto command to switch to the default VDC to allocate resources. Use the show interface capabilities command to determine if the port is dedicated. All ports in a dedicated port group must be in the same VDC.

You are not logged in as network-admin. Log into the device with an account that has the correct privileges. You are not logged into the correct VDC. You cannot allocate an interface to a VDC. The interface is part of a dedicated port group.

You must allocate all ports in a port group to the same VDC for The interface is on the Cisco Nexus 7000 this module. For information about the port number to port group Series 32-port 10-Gbps Ethernet module mapping, see Table: Port Numbers for Cisco Nexus 7000 Series (N7K-M132XP-12). 32-port 10-Gbps Ethernet module. 40

You Cannot Delete a VDC

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide The VDC allocation has failed. Use the show vdc membership [status] or show interface brief command to gather more information.

Table: Port Numbers for Cisco Nexus 7000 Series 32-port 10-Gbps Ethernet module. shows the port allocation requirements for the Cisco Nexus 7000 Series 32-port 10-Gbps Ethernet module (N7K-M132XP-12).

Table: Port Numbers for Cisco Nexus 7000 Series 32-port 10-Gbps Ethernet module

Port Group Port Numbers 1 2 3 4 5 6 7 8 1, 3, 5, 7 2, 4, 6, 8 9, 11, 13, 15 10, 12, 14, 16 17, 19, 21, 23 18, 20, 22, 24 25, 27, 29, 31 26, 28, 30, 32

The VDC Does Not Reflect a Resource Template Change


You may have a problem when updating a resource template.

Symptom The VDC does not reflect a resource template change.

Possible Cause

Solution

Use the show vdc resource template command to verify the template. Use the The template has not template command in VDC configuration mode to reapply the template to the been reapplied to a VDC VDC. You may have to use the reload command to reboot the device or force a after a template change. stateful switchover to get the new resource limits.

The VDC Remains in a Failed State


You may have a problem when a VDC fails. You configure switchover and high availability (HA) policies for a VDC when you create the VDC. These policies determine what happens when the VDC fails or when a stateful switchover occurs to the standby supervisor.

Symptom

Possible Cause

Solution

The VDC remains Use the show vdc detail command to verify the HA policy for The VDC failed and the HA policy was set in failed state. this VDC. Use the ha-policy command in VDC configuration to bring down the VDCs. mode to change the HA policy.

You Cannot Allocate an Interface to a VDC

41

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide A supervisor switchover has occurred and Use the no vdc command to delete the failed VDC. Recreate the the switchover policy was set to bring VDC with a different switchover policy using the sw-policy down the VDCs. keyword.

You Cannot Copy the Running-Config File to the Startup-Config File in a VDC
You may have a problem when trying to save the configuration in a VDC. Symptom You cannot copy the running-config file to the startup-config file in a VDC. Possible Cause The resource allocation was not saved in the default VDC. Solution You must save the resource allocation from the default VDC before you can save the configuration in a nondefault VDC. Log into the default VDC and use the copy running-config startup-config command to save the resource allocation. Log into the nondefault VDC and save the configuration or use the copy running-config startup-config vdc-all command in the default VDC to save the configuration in all VDCs.

See Also
Before Contacting Technical Support

Further Reading
The following links contain further information on this topic from Cisco.com: Cisco Nexus 7000 Series NX-OS Virtual Device Context Configuration Guide Technical Overview of Virtual Device Contexts Cisco Techwise TV: NXOS Virtual Devices (video)

External Links
External links contain content developed by external authors. Cisco does not review this content for accuracy. Hands on with the Cisco Nexus, Part#2: Virtualization NX-OS Intro - part 8 - VDCs (video) NX-OS and VDCs

This article describes how to troubleshoot Cisco Fabric Services (CFS) problems on a Cisco NX-OS device. Guide Contents Troubleshooting Overview Troubleshooting Installs, Upgrades, and Reboots Troubleshooting Licensing Troubleshooting VDCs Troubleshooting CFS {this section} The VDC Remains in a Failed State 42

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide Troubleshooting Ports Troubleshooting vPCs Troubleshooting VLANs Troubleshooting STP Troubleshooting Routing Troubleshooting Unicast Traffic Troubleshooting WCCP Troubleshooting Memory Troubleshooting Packet Flow Issues Troubleshooting FCoE Before Contacting Technical Support Troubleshooting Tools and Methodology

Contents
1 Information About Troubleshooting CFS 2 Initial Troubleshooting Checklist 2.1 Verifying CFS Using the CLI 3 Troubleshooting Merge Failures 4 Troubleshooting Lock Failures 5 Troubleshooting CFS Regions 5.1 Changing CFS Regions 6 See Also 7 Further Reading 8 External Links

Information About Troubleshooting CFS


Many features in Cisco NX-OS require configuration synchronization across multiple devices in the network. CFS provides a common infrastructure for automatic configuration synchronization for an application in the network. It provides the transport function as well as a rich set of common services to the applications. CFS can discover CFS-capable devices in the network as well as their application capabilities.

Some of the applications that can be synchronized using CFS are as follows: Call Home RADIUS TACACS+ User roles

Note: Do not enable CFS for an application that you manage using Cisco DCNM. You can use CFS regions to limit the CFS configuration distribution to a subset of devices on the network. External Links 43

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide

Initial Troubleshooting Checklist


Begin troubleshooting CFS issues by checking the following issues first:

Checklist Verify that CFS is enabled for the same applications on all affected devices. Verify that CFS distribution is enabled for the same applications on all affected devices. If you are using CFS regions, verify that the application is in the same region on all the affected devices. Verify that there are no pending changes for an application and that a CFS commit was issued for any configuration changes in a CFS-enabled application. Verify that there are no unexpected CFS locked sessions. Clear any unexpected locked sessions.

Check off

Verifying CFS Using the CLI


To verify CFS using the CLI, follow these steps:

1. Verify that CFS is globally enabled on all devices in the network or CFS region.
switch(config)# show cfs status Distribution : Enabled Distribution over IP : Enabled - mode IPv4 IPv4 multicast address : 239.255.70.83 IPv6 multicast address : ff15::efff:4653 Distribution over Ethernet : Disabled

2. Verify that CFS is enabled for the application on all devices in the network or CFS region.
switch(config)# show cfs application ---------------------------------------------Application Enabled Scope & ---------------------------------------------ntp No Physical-fc-ip stp Yes Physical-eth vpc Yes Physical-eth igmp Yes Physical-eth l2fm Yes Physical-eth role Yes Physical-fc-ip radius Yes Physical-fc-ip tacacs No Physical-fc-ip callhome Yes Physical-fc-ip Total number of entries = 9

The Physical-fc-ip scope means that CFS uses IP to apply the configuration for that application to all devices in the network or region. The Physical-eth scope means that CFS uses Ethernet to apply the configuration for that application to all devices in the network or region. Initial Troubleshooting Checklist 44

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide 3. Verify that CFS distribution is enabled for the application on all devices in the network or CFS region.
switch(config)# Enabled Timeout Merge Capable Scope Region show cfs application name radius : Yes : 20s : Yes : Physical-fc-ip : 99

4. If you configure CFS regions, verify that the application is in the same region on all applicable devices.
switch(config)# show cfs regions brief --------------------------------------Region Application Enabled --------------------------------------4 callhome yes 99 radius yes

5. Verify the set of devices that are registered with CFS for that application.
switch# show cfs peers name radius Scope : Physical-fc-ip -------------------------------------------------Switch WWN IP Address -------------------------------------------------20:00:00:0e:d7:0e:bf:c0 192.0.2.51 [Local] 20:00:00:0e:d7:00:3c:9e 192.0.2.52 Total number of entries = 2

6. Compare the output of the show cfs merge status name application-name command and the show cfs peers name application-name command to verify that the network is not partitioned.
switch# show cfs merge status name radius Physical-fc-ip Merge Status: Success [ Mon Jan 5 11:59:36 2009 ] Local Fabric --------------------------------------------------------Switch WWN IP Address --------------------------------------------------------20:00:00:05:30:00:4a:de 192.0.2.51 [Merge Master] 20:00:00:0d:ec:0c:f1:40 192.0.2.204 Total number of switches = 2

switch# show cfs peers name radius Scope : Physical-fc-ip -------------------------------------------------Switch WWN IP Address -------------------------------------------------20:00:00:0d:ec:0c:f1:40 192.0.2.51 [Local] 20:00:00:05:30:00:4a:de 192.0.2.204 Total number of entries = 2

If the list of switch WWNs in the show cfs merge status name command output is shorter than the list of switch WWNs in theshow cfs peers name command output, the network is partitioned into multiple CFS fabrics and the merge status may show that the merge has failed, is pending, or is waiting. Verifying CFS Using the CLI 45

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide 7. Verify that a distribution is not in progress in the network for the application.
switch# show cfs lock Application: callhome Scope : Physical-fc-ip ------------------------------------------------------------------------------Switch WWN IP Address User Name User Type ------------------------------------------------------------------------------20:00:00:22:55:79:a4:c1 172.28.230.85 admin CLI/SNMP v3 switch Total number of entries = 1

If the application does not show in the output, the distribution has completed.

8. Verify that there are no CFS sessions in progress for the application.
switch(config)# show radius session status Last Action Time Stamp : Wed Dec 24 13:25:00 2008 Last Action : Commit Last Action Result : Success Last Action Failure Reason : none

Troubleshooting Merge Failures


During a merge, the merge managers in the merging networks exchange their configuration databases with each other. The application on the merge master device merges the information, decides if the merge is successful, and informs all devices in the combined network of the status of the merge. When a merge is successful, the merge master distributes the database to all devices in the combined network and the combined network remains in a consistent state. A merge failure indicates that the merged network contains inconsistent data that could not be merged.

If you add a new device to the network and the merge status for any application shows "In Progress" for a prolonged period of time, then there may be an active session for that application in some other device. Use the show cfs lock command to check the lock status for that application on all the devices. The merge will not proceed if there are any locks present for that application on any device in the network or CFS region. Use the application-name commit command to commit the changes or use the clear application-name session command to clear the session lock so that the merge can proceed.

To recover from a merge failure using the CLI, follow these steps: 1. Identify a device that shows a merge failure.
switch# show cfs merge status ------------------------------------------------------------Application Scope Vsan Status ------------------------------------------------------------role Physical-fc-ip Success radius Physical-fc-ip Success callhome Physical-fc-ip Failed

2. Commit the application configuration to restore all peers in the fabric to the same configuration database.
switch(config)# callhome commit

Troubleshooting Merge Failures

46

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide

Troubleshooting Lock Failures


In order to distribute a configuration in the network, CFS must first acquire a lock on all devices in the network or CFS region. Once CFS acquires the locks, CFS issues a commit to distribute the configuration to all devices in the network or CFS region. Under normal circumstances, CFS releases the lock after the commit.

When another application peer acquires a lock, you cannot commit new configuration changes. This is a normal operation and you should postpone any changes to an application until the application peer releases the lock.

An inconsistent lock state also occur in the following scenarios: When locks are not held on all of the devices in the network or CFS region. When locks are held on all devices in the network or region, but a CFS session does not exist on the device that holds the lock. Note: Use the troubleshooting steps in this section only if you believe the lock has not been properly released. To troubleshoot a lock failure, follow these steps: 1. Determine all the devices that participate in the CFS distribution for this application.
switch1# show cfs peers name radius Scope : Physical-fc-ip -------------------------------------------------Switch WWN IP Address -------------------------------------------------20:00:00:0d:ec:0c:f1:40 192.0.2.51 [Local] 20:00:00:05:30:00:4a:de 192.0.2.204 Total number of entries = 2

2. Check for a lock for this application on all CFS peer devices to determine the name of the administrator who owns the lock for the application.
switch2# show cfs lock Application: radius Scope : Physical-fc-ip -------------------------------------------------------------------------------Switch WWN IP Address User Name User Type -------------------------------------------------------------------------------20:00:00:05:30:00:4a:de 192.0.2.204 admin CLI/SNMP v3 switch Total number of entries = 1

You should check with that administrator before clearing the lock.

3. Connect to the device that owns the CFS lock. 4. Release the CFS lock on the device that owns the lock.
switch2# radius abort

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide 5. If the device does not release the lock, clear the CFS session on the device that owns the lock.
switch2# clear radius session

Troubleshooting CFS Regions


The following rules apply to CFS regions: When using CFS regions, an application on a given device can only belong to one region at a time. An application in a CFS region ignores all CFS distributions in any other region (including the default region). All applications that you do not assign to a CFS region exist in the default region. To resolve a configuration distribution failure to all devices in a CFS region, follow these steps: 1. Verify the list of devices in a region for the application.
switch(config)# show cfs region name radius Region-ID : 4 Application: radius Scope : Physical-fc-ip ------------------------------------------------------------------------Switch WWN IP Address ------------------------------------------------------------------------20:00:00:22:55:79:a4:c1 172.28.230.85 [Local] switch Total number of entries = 1

2. Verify that the application distribution is enabled and is in the same region on all devices in the region.
switch2# show cfs Enabled : Timeout : Merge Capable : Scope : Region : application name radius Yes 20s Yes >>>>> Application is capable of being merged. Physical-fc-ip 1 >>>>> Application is in Region 1.

switch2(config)# cfs region 4 switch2(config-cfs-region)# radius

Note: You must reassign an application to a region whenever you disable that application. CFS assigns new applications in the default region.

Changing CFS Regions


If you move an application from one region to another, you may encounter a database mismatch when attempting a merge. Follow the steps outlined in the Troubleshooting Merge Failures to identify and resolve the conflicts.

Note: When an application is moved from one region to another (including the default region), the application loses all CFS history.

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide

See Also
Before Contacting Technical Support

Further Reading
The following links contain further information on this topic from Cisco.com: Configuring CFS

External Links
External links contain content developed by external authors. Cisco does not review this content for accuracy.

This article describes how to identify and resolve problems that can occur with ports in Cisco NX-OS. Guide Contents Troubleshooting Overview Troubleshooting Installs, Upgrades, and Reboots Troubleshooting Licensing Troubleshooting VDCs Troubleshooting CFS Troubleshooting Ports (this section) Troubleshooting vPCs Troubleshooting VLANs Troubleshooting STP Troubleshooting Routing Troubleshooting Unicast Traffic Troubleshooting WCCP Troubleshooting Memory Troubleshooting Packet Flow Issues Troubleshooting FCoE Before Contacting Technical Support Troubleshooting Tools and Methodology

Contents
1 Information About Troubleshooting Ports 2 Port Guidelines 3 License Requirements 4 Initial Troubleshooting Checklist 4.1 Viewing Port Information 5 Troubleshooting Port States from the CLI 5.1 Example: show interface Command Output 6 Port-Interface Issues 6.1 You Cannot See The Interface 6.2 The Interface Configuration Has

See Also

49

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide Disappeared 6.3 You Cannot Enable an Interface 6.4 You Cannot Configure a Dedicated Port 6.5 A Port Remains in a Link Failure or Not Connected State 6.6 An Unexpected Link Flapping Occurs 6.7 A Port Is in the ErrDisabled State 6.7.1 Verifying the ErrDisable State Using the CLI 7 See Also 8 Further Reading 9 External Links

Information About Troubleshooting Ports


Before a switch can relay frames from one data link to another, the characteristics of the interfaces through which the frames are received and sent must be defined. The configured interfaces can be Ethernet interfaces, the management interface (mgmt0), or VLAN interfaces (SVIs).

Each interface has an associated administrative configuration and operational status as follows: The administrative configuration does not change unless you modify it. This configuration has various attributes that you can configure in administrative mode. The operational status represents the current status of a specified attribute like the interface speed. This status cannot be changed and is read-only. Some values may not be valid when the interface is down (such as the operation speed).

For a complete description of port modes, administrative states, and operational states, see the Cisco NX-OS Interfaces Configuration Guide.

Port Guidelines
Follow these guidelines when you configure a port interface: Before you begin configuring a switch, make sure that the modules in the chassis are functioning as designed. Use the show module command to verify that a module is OK or active before continuing the configuration. When configuring dedicated ports in a port group, follow these port mode guidelines: You can configure only the one port in each four-port group in dedicated mode. The other three ports are not usable and remain shut down. If any of the other three ports are enabled, you cannot configure the remaining port in dedicated mode. The other three ports continue to remain enabled.

License Requirements
There are no licensing requirements for port configuration in Cisco NX-OS.

Contents

50

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide

Initial Troubleshooting Checklist


Begin troubleshooting the port configuration by checking the following issues: Checklist Check the physical media to ensure that there are no damaged parts. Verify that the SFP (small form-factor pluggable) devices in use are those authorized by Cisco and that they are not faulty. Verify that you have enabled the port by using the no shutdown command. Use the show interface command to verify the state of the interface. See the Cisco NX-OS Interfaces Configuration Guide for reasons why a port may be in a down operational state. Verify that you have configured a port as dedicated and make sure that you have not connected to the other three ports in the port group. Check off

Viewing Port Information

You can use the show interface counters command to view port counters. Typically, you only observe counters while actively troubleshooting, in which case you should first clear the counters to create a baseline. The values, even if they are high for certain counters, can be meaningless for a port that has been active for an extended period. Clearing the counters provides a better idea of the link behavior as you begin to troubleshoot.

Use one of the following commands to clear all port counters or the counters for specified interfaces: clear counters interface all clear counters interface range

The counters can identify synchronization problems by displaying a significant disparity between received and transmitted frames.

Use the following commands to gather more information about ports: show interface status show interfaces capabilities show udld show tech-support udld

Troubleshooting Port States from the CLI

To display complete information for an interface, use the show interface command. In addition to the state of the port, this command displays the following: Initial Troubleshooting Checklist 51

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide Speed Trunk VLAN status Number of frames sent and received Transmission errors, including discards, errors, and invalid frames

Example: show interface Command Output displays the show interface command output.
Example: show interface Command Output switch(config)# show interface ethernet 2/45 Ethernet2/45 is down (Administratively down) Hardware is 10/100/1000 Ethernet, address is 0019.076c.4dd8 (bia 0019.076c.4dd8) MTU 1500 bytes, BW 1000000 Kbit, DLY 10 usec, reliability 255/255, txload 1/255, rxload 1/255 Encapsulation ARPA auto-duplex, auto-speed Beacon is turned off Auto-Negotiation is turned on Input flow-control is off, output flow-control is off Auto-mdix is turned on Last clearing of "show interface" counters never 1 minute input rate 0 bytes/sec, 0 packets/sec 1 minute output rate 0 bytes/sec, 0 packets/sec L3 Switched: input: 0 pkts, 0 bytes - output: 0 pkts, 0 bytes Rx 0 input packets 0 unicast packets 0 multicast packets 0 broadcast packets 0 jumbo packets 0 storm suppression packets 0 bytes Tx 0 output packets 0 multicast packets 0 broadcast packets 0 jumbo packets 0 bytes 0 input error 0 short frame 0 watchdog 0 no buffer 0 runt 0 CRC 0 ecc 0 overrun 0 underrun 0 ignored 0 bad etype drop 0 bad proto drop 0 if down drop 0 input with dribble 0 output error 0 collision 0 deferred 0 late collision 0 lost carrier 0 no carrier 0 babble 0 Rx pause 0 Tx pause 0 reset Receive data field Size is 2112

Port-Interface Issues
This section includes symptoms and solutions for troubleshooting ports.

You Cannot See The Interface


You may have a problem where an interface does not show up on your device because of the VDC configuration.

Symptom

Possible Cause

Solution

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide You cannot see the interface. The interface has been allocated to a different VDC. Log in as network-admin and use the show vdc membership command to determine which VDC owns the interface.

The Interface Configuration Has Disappeared

You may have a problem where your interface configuration disappears.

Symptom The interface configuration has disappeared.

Possible Cause

Solution

Cisco NX-OS removes the interface configuration when you The interface was reallocated to a reallocate an interface to a different VDC. You must reconfigure different VDC. the interface. Cisco NX-OS removes the interface configuration when you switch The interface mode has changed between Layer 2 and Layer 3 port mode. You must reconfigure the to or from the switchport mode. interface.

You Cannot Enable an Interface

You may have a problem when enabling an interface.

Symptom

Possible Cause The interface is part of a a dedicated port group.

Solution You cannot enable the other three ports in a port group if one port is dedicated. Use the show running-config interface CLI command to verify the rate mode setting.

You cannot enable an interface.

Use the show interface capabilities command on both ports to determine if The interface configuration is both ports have the same capabilities. Modify the configuration as needed to incompatible with a remote port. make the ports compatible. The Layer 2 port is not associated with an access VLAN or the VLAN is suspended. An incorrect SFP is connected to the port. Use the show interface brief command to see if the interface is configured in a VLAN. Use the show vlan brief command to determine the status of the VLAN. Use the state active command in VLAN configuration mode to configure the VLAN as active. Use the show interface brief command to see if you are using an incorrect transceiver. Replace with a Cisco-supported SFP.

You Cannot Configure a Dedicated Port

You may have a problem when trying to configure a port as dedicated.

Symptom

Possible Cause

Solution Use the shutdown command in interface configuration mode to disable the other three ports in the port group.

The other three ports in the port group are not You cannot configure shut down. a dedicated port.

You Cannot See The Interface

53

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide One or more of the other three ports in the port Use the show vdc membership command to find out group are not configured in the same VDC. which ports are in a different VDC. The port is not the first port in the port group. You can only set the first port in a port group to the dedicated mode.

A Port Remains in a Link Failure or Not Connected State


You may have a problem with ports or links becoming operational.

Symptom

Possible Cause

Solution Use the show port internal info command to verify the port status is in link-failure.

Verify the type of media in use. Is it copper or optical, single-mode (SM), or multimode (MM)? The port connection is bad. Verify that the media is not broken or damaged. Is the LED on the switch green?

A port remains in a link-failure state.

Use the shutdown command followed by the no shutdown command to disable and enable the port. If this problem persists, try moving the connection to a different port on the same or another module. When this problem occurs, the port stays in a transit port state and you see no There is no signal because of a signal. There is no synchronization at the MAC level. The problem may be transit fault in the small related to the port speed setting or autonegotiation. Verify that the SFP on the form-factor pluggable (SFP) or interface is seated properly. If reseating the SFP does not resolve the issue, the SFP may be faulty. replace the SFP or try another port on the switch. Use the show logging command to check for a "Link Failure, Not Connected system" message. The link is stuck in the initialization state or the link is Use the shutdown command followed by the no shutdown command to in a point-to-point state. disable and enable the port. If this problem persists, try moving the connection to a different port on the same or another module.

An Unexpected Link Flapping Occurs

When a port is flapping, it cycles through the following states, in this order, and then starts over again: 1. Initializing-The link is initializing. 2. Offline-The port is offline. 3. Link failure or not connected-The physical layer is not operational and there is no active device connection.

You Cannot Configure a Dedicated Port

54

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide When you are troubleshooting an unexpected link flapping, you should know the following information: Who initiated the link flap. The actual link down reason.

Symptom

Possible Cause

Solution

The bit rate exceeds the threshold and Use the shutdown command followed by the no shutdown command puts the port into the errDisabled to return the port to the normal state. state. A problem in the system triggers the link flap action by the end device. Some of the causes are as follows: An unexpected link flapping occurs. A packet drop in the switch occurs, because of either a hardware failure or an intermittent hardware error such as an X-bar sync loss. A packet drop results from a software error. A control frame is erroneously sent to the device. Determine the link flap reason as indicated by the MAC driver. Use the debug facilities on the end device to troubleshoot the problem. An external device may choose to reinitialize the link when it encounters the error. In such cases, the method of reinitializing the link varies by device.

A Port Is in the ErrDisabled State

The ErrDisabled state indicates that the switch detected a problem with the port and disabled the port. This state could be caused by a flapping port or a high amount of bad frames (CRC errors), which could indicate a problem with the media.

Symptom

Possible Cause The port is flapping.

Solution

A port is in the ErrDisabled state.

The device detected a high amount of bad frames (CRC errors), which might indicate a problem with the media.

Use the Verify the ErrDisable State Using the CLI procedure to verify the SFP, cable, and connections.

Verifying the ErrDisable State Using the CLI

To verify the ErrDisable state using the CLI, follow these steps:

1. Use the show interface command to verify that the switch detected a problem and disabled the port.

switch# show interface e1/14

An Unexpected Link Flapping Occurs

55

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide


e1/7 is down (errDisabled)

2. Check cables, SFPs, and optics. 3. View information about the internal state transitions of the port.
switch# show system internal ethpm event-history interface e1/7 >>>>FSM: <e1/7> has 86 logged transitions<<<<< 1) FSM:<e1/7> Transition at 647054 usecs after Tue Jan 1 22:44.. Previous state: [ETH_PORT_FSM_ST_NOT_INIT] Triggered event: [ETH_PORT_FSM_EV_MODULE_INIT_DONE] Next state: [ETH_PORT_FSM_ST_IF_INIT_EVAL] 2) FSM:<e1/7> Transition at 647114 usecs after Tue Jan 1 22:43.. Previous state: [ETH_PORT_FSM_ST_INIT_EVAL] Triggered event: [ETH_PORT_FSM_EV_IE_ERR_DISABLED_CAP_MISMATCH] Next state: [ETH_PORT_FSM_ST_IF_DOWN_STATE]

In this example, port ethernet 1/7 entered the ErrDisabled state because of a capability mismatch, or "CAP MISMATCH." 4. Display the switch log file and view a list of port state changes.

switch# show logging logfile . . . Jan 4 06:54:04 switch %PORT_CHANNEL-5-CREATED: port-channel 7 created Jan 4 06:54:24 switch %PORT-5-IF_DOWN_PORT_CHANNEL_MEMBERS_DOWN: Interface port-channel 7 is down (No operational Jan 4 06:54:40 switch %PORT_CHANNEL-5-PORT_ADDED: e1/8 added to port-channel 7 Jan 4 06:54:56 switch %PORT-5-IF_DOWN_ADMIN_DOWN: Interface e1/7 is down (Admnistratively down) Jan 4 06:54:59 switch %PORT_CHANNEL-3-COMPAT_CHECK_FAILURE: speed is not compatible Jan 4 06:55:56 switch%PORT_CHANNEL-5-PORT_ADDED: e1/7 added to port-channel 7

In this example, an error was recorded when someone attempted to add port e1/7 to port channel 7. The port was not configured identically to port channel 7, so the attempt failed.

See Also
Cisco NX-OS/IOS Interface Comparison

Further Reading
The following links contain further information on this topic from Cisco.com: Cisco Nexus 7000 Series NX-OS Interfaces Configuration Guide

External Links
External links contain content developed by external authors. Cisco does not review this content for accuracy. NX-OS Intro part 5 -Port channels (video) packetlife.net - Errdisable autorecovery

Verifying the ErrDisable State Using the CLI

56

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide

This article describes how to do basic troubleshooting of virtual Port Channel(vPC) problems on a Cisco Nexus 7000 NX-OS device.

Troubleshooting vPC on Nexus 7000 is covered in detail in Cisco-Live presentation. Sections of this presentation covers, both platform independent, and platform specific step by step troubleshooting for vPC, among other things. Access to this presentation is available FREE. Follow the below instructions to access the presentation

1. Visit https://www.ciscolivevirtual.com/ 2. Register for free. 3. Click on "Cisco Live Virtual" link. 4. Click on the ?Sessions? Tab on top, and select ?2011 Sessions Catalog? 5. In the search box, type ?BRKCRS-3144? and Submit search. 6. Select the session. You can either View the Session (or) download the pdf. 7. vPC troubleshooting is covered from slides 64 through 80.

Guide Contents Troubleshooting Overview Troubleshooting Installs, Upgrades, and Reboots Troubleshooting Licensing Troubleshooting VDCs Troubleshooting CFS Troubleshooting Ports Troubleshooting vPCs {this section} Troubleshooting VLANs Troubleshooting STP Troubleshooting Routing Troubleshooting Unicast Traffic Troubleshooting WCCP Troubleshooting Memory Troubleshooting FCoE Troubleshooting Packet Flow Issues Before Contacting Technical Support Troubleshooting Tools and Methodology

Contents
1 Information About Troubleshooting vPCs 2 Initial Troubleshooting Checklist 3 Verifying vPCs Using the CLI 57

External Links

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide 4 Received Type 1 Configuration Element Mismatch 4.1 Example: show vpc consistency-parameters Command Output 5 Cannot Enable the vPC Feature 5.1 Example: show module Command Output 6 vPC in Blocking State 7 VLANs on a vPC moved to suspend state 8 Hosts with an HSRP Gateway Cannot Access Beyond Their VLAN 9 Traffic Disrupted when the Primary vPC Device Goes Down 10 See Also 11 Further Reading 12 External Links

Information About Troubleshooting vPCs


A vPC allows links that are physically connected to two different Cisco Nexus 7000 Series devices to appear as a single port channel by a third device. See the Configuring vPC chapter in the Cisco Nexus 7000 Series NX-OS Interfaces Configuration Guide for more information on vPCs.

Initial Troubleshooting Checklist


Begin troubleshooting vPC issues by checking the following issues first:

Checklist Verify that all vPC interfaces in a vPC domain are configured in the same virtual device context (VDC). Verify that you have a separate vPC peer-link and peer-keepalive link infrastructure for each VDC deployed. Is the vPC keepalive link mapped to a separate vrf? If not, it will be mapped to the management vrf by default. In this case, do you have a management switch connect to the management ports on both vPC peer devices? Verify that the vPC peer-link is configured on a N7K-M132XP-12. It is recommended to have at least two N7K-M132XP-12 for redundancy. Verify that both the source and destination IP addresses used for the peer-keepalive messages are reachable from the VRF associated with the vPC peer-keepalive link. Verify that the peer-keepalive link is up or the vPC peer-link will not come up. Verify that the vPC peer-link is configured as a Layer 2 Port Channel trunk which only allows vPC VLANs. Verify that the vPC number that you assigned to the port channel that connects to the downstream device from the vPC peer device is identical on both vPC peer devices. If you manually configured the system priority, verify that you assigned the same priority value on both vPC peer devices. Check the show vpc consistency-parameters command to verify that both vPC peer devices have identical type-1 parameters.

Check off

Contents

58

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide Verify that the primary vPC is the primary STP root and the secondary vPC is the secondary STP root.

Verifying vPCs Using the CLI


To verify vPCs using the CLI, follow these steps: 1. Use the show running-config vpc command to verify the vPC configuration. 2. Use the show vpc command to check the status of vPC. 3. Use the show vpc peer-keepalive command to check the status of the vPC peer-keepalive link. 4. Use the show vpc consistency-parameters command to verify that both the vPC peers have the identical type-1 parameters. 5. Use the show port-channel summary command toverify the members in the port channel are mapped to the vPC. 6. Use the show cfs status commands to verify that distribution over Ethernet is enabled. 7. If you enable STP, use the show spanning-tree command on both sides of the vPC peer link to verify that the following STP parameters are identical: BPDU Filter BPDU Guard Cost Link type Priority VLANs (PVRST+)

Received Type 1 Configuration Element Mismatch


You may have a problem where you cannot bring up a vPC link because of a type 1 configuration element mismatch.

Symptom Received a type 1 configuration element mismatch.

Possible Cause The vPC peer ports or membership ports do not have identical configurations.

Solution Use the show vpc consistency-parameters interface command to determine where the configuration mismatch occurs.

Example: show vpc consistency-parameters Command Output This example shows how to display the vPC consistency parameters on a port channel:
switch# show vpc consistency-parameters interface po 10 Legend: Type 1 : vPC will be suspended in case of mismatch Name Type Local Value Peer Value ---------------- ---------------------- ----------------------STP Mode 1 Rapid-PVST Rapid-PVST STP Disabled 1 None None STP MST Region Name 1 "" "" STP MST Region Revision 1 0 0 STP MST Region Instance to 1 VLAN Mapping

Initial Troubleshooting Checklist

59

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide


STP Loopguard STP Bridge Assurance STP Port Type STP MST Simulate PVST Allowed VLANs 1 1 1 1 Disabled Enabled Normal Enabled 1-10,15-20,30,37,99 Disabled Enabled Normal Enabled 1-10,15-20,30,37,99

Cannot Enable the vPC Feature


You may receive an error when you enable the vPC feature.

Symptom

Possible Cause

Solution Use the show module command to determine the hardware version of each N7K-M132XP-12 Ethernet module. The hardwere version must be 1.3 or later to enable the vPC feature.

The hardware is Cannot enable the incompatible with the vPC feature. vPC.

Example: show module Command Output This example shows how to display the module hardware version:
switch# show module Mod Ports Module-Type --- ----- -------------------------------2 32 10 Gbps Ethernet Module 3 48 10/100/1000 Mbps Ethernet Module 5 0 Supervisor module-1X 6 0 Supervisor module-1X 10 32 10 Gbps Ethernet Module Mod --2 3 Sw -------------4.1(5) 4.1(5) Model -----------------N7K-M132XP-12 N7K-M148GT-11 N7K-SUP1 N7K-SUP1 N7K-M132XP-12 Status -----------ok ok active * ha-standby ok

Hw -----1.2 1.0 >>> Must be 1.3 or later.

vPC in Blocking State


vPC may be in the blocking state because of Bride Assurance (BA).

Symptom vPC is in blocking state.

Possible Cause BPDU only sends on a single link of a port-channel. If BA dispute is detected, the entire vPC will be in the blocking state.

Solution Do not enable BA on vPC.

VLANs on a vPC moved to suspend state


VLANs on a vPC may move to the suspend state.

Example: show vpc consistency-parameters Command Output

60

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide Symptom VLANs on a vPC moved to suspend state. Possible Cause VLANs allowed on the vPC have not been allowed on the vPC peer=link. Solution All VLANs allowed on a vPC must also be allowed on the vPC peer-link. Also, it is recommended that only vPC VLANs are allowed on the vPC peer-link.

Hosts with an HSRP Gateway Cannot Access Beyond Their VLAN


When HSRP is enabled on both vPC peer devices on a VLAN and hosts on that VLAN set the HSRP as their gateway, they may not able to reach anything outside their own VLAN. Symptom Hosts with an HSRP gateway cannot access beyond their VLAN. Possible Cause If the host gateway mac-address is mapped to the physical MAC address of any one of the vPC peer-devices, packets may get dropped due to the loop prevention mechanism in vPC. Solution Map the host gateway's mac-address to the HSRP MAC address and not the physical MAC address of any one of the vPC peer-devices. Peer-gateway can be a workaround for this scenario. Please read the configuration guide for peer-gateway for further information before implementing it.

Traffic Disrupted when the Primary vPC Device Goes Down

Traffic may remain disrupted when the N7K-M132XP-12 module on the primary vpc device goes down. Symptom Traffic disrupted when the primary vPC device goes down. Possible Cause Solution

Enable object tracking. With object tracking enabled, all vPC on the All core facing interfaces and vPC primary will shut down. The vPC secondary will take over as the operational primary and all the vPC on the secondary will stay up. As a peer-links are configured on a single N7K-M132XP-12 module. result, traffic will still be flowing thru the secondary which became operational primary.

See Also

Troubleshooting Ports

Further Reading

The following links contain further information on this topic from Cisco.com:

Virtual PortChannels: Building Networks without Spanning Tree Protocol (Cisco White Paper)

Configuring vPCs (Cisco Nexus 7000 Series Interfaces Configuration Guide)

Nexus 5000 Virtual PortChannel Quick Configuration Guide

External Links

The following links contain content developed by external authors. Cisco does not review this content for accuracy.

Our Nexus Data Center Network - To vPC or not to vPC

VLANs on a vPC moved to suspend state

61

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide Nexus 7000 Virtual Portchannel Part 1 Nexus 7000 Virtual Portchannel Part 2 Nexus 7000 Virtual Portchannel Part 3 vPC (Virtual Port-Channel) and the Nexus 5000 Platform Blog on Cisco Nexus Features (VLANs, vPCs)

This article describes how to troubleshoot VLANs. Guide Contents Troubleshooting Overview Troubleshooting Installs, Upgrades, and Reboots Troubleshooting Licensing Troubleshooting VDCs Troubleshooting CFS Troubleshooting Ports Troubleshooting vPCs Troubleshooting VLANs (this section) Troubleshooting STP Troubleshooting Routing Troubleshooting Unicast Traffic Troubleshooting WCCP Troubleshooting Memory Troubleshooting Packet Flow Issues Troubleshooting FCoE Before Contacting Technical Support Troubleshooting Tools and Methodology

Contents
1 Information About Troubleshooting VLANs 2 Initial Troubleshooting Checklist 3 VLAN Issues 3.1 You Cannot Create a VLAN 3.2 You Cannot Create a PVLAN 3.3 The VLAN Interface is Down 4 See Also 5 Further Reading 6 External Links

External Links

62

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide

Information About Troubleshooting VLANs


VLANs provide a method of isolating devices that are physically connected to the same network but are logically considered to be part of different LANs that do not need to be aware of one another.

You should use only the following characters in a VLAN name: a through z or A through Z 0 through 9 - (hyphen) or _ (underscore)

Follow these guidelines when configuring VLANs: Keep user traffic off the management VLAN; keep the management VLAN separate from user data. You can apply different Quality of Service (QoS) configurations to primary, isolated, and community VLANs. To apply output VACLs to all outgoing private VLAN traffic, map the secondary VLANs on the Layer 3 VLAN interface of the primary VLAN and then configure the VACLs on the SVI of the primary VLAN. VACLs that apply to the Layer 3 VLAN interface of a primary VLAN automatically apply to the associated isolated and community VLANs. If you do not map the secondary VLAN to the Layer 3 VLAN interface of the primary VLAN, you can have different VACLs for primary and secondary VLANs. Because traffic in private VLANs flow in different directions, you can have different VACLs for ingressing traffic and different VACLs for egressing traffic. Note: We recommend that you keep the same VACLs for the primary VLAN and all secondary VLANs in the private VLAN. You can enable DHCP snooping on private VLANs. When you enable DHCP snooping on the primary VLAN, it is propagated to the secondary VLANs. If you configure DHCP on a secondary VLAN, the configuration does not take effect if the primary VLAN is already configured. You can configure IEEE 802.1X port-based authentication on a private VLAN port, but do not configure 802.1X with port security or per-user ACL on private VLAN ports. 802.1X works with private VLANs, but the 802.1X dynamic VLAN assignment or the guest VLAN assignment does not work with private VLANs. IGMP runs only on the primary VLAN and uses the configuration of the primary VLAN for all secondary VLANs. Any IGMP join request in the secondary VLAN is treated as if it is received in the primary VLAN. Private VLANs support these Switched Port Analyzer (SPAN) features: You can configure a private VLAN port as a SPAN source port. You can use VLAN-based SPAN (VSPAN) on primary, isolated, or community VLANs or use SPAN on only one VLAN to separately monitor egress or ingress traffic. Do not configure a remote SPAN (RSPAN) VLAN as a private VLAN primary or secondary VLAN. A private VLAN host or promiscuous port cannot be a SPAN destination port. If you configure a SPAN destination port as a private VLAN port, the port becomes inactive. A destination SPAN port cannot be an isolated port. (However, a source SPAN port can be an isolated port.) You can configure SPAN to span both primary and secondary VLANs or, alternatively, to span either one if the user is interested only in ingress or egress traffic. A MAC address learned in a secondary VLAN is placed in the shared table of the primary VLAN. When the secondary VLAN is associated to the primary VLAN, their MAC address tables are merged into one, shared MAC table.

Information About Troubleshooting VLANs

63

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide

Initial Troubleshooting Checklist


Troubleshooting a VLAN problem involves gathering information about the configuration and connectivity of individual devices and the entire network. Begin your troubleshooting VLAN issues by checking the following issues first: Checklist Verify the physical connectivity for any problem ports or VLANs. Verify that you have both end devices in the same VLAN. Check off

The following CLI commands are used to display VLAN information: show vlan vlan-id show vlan private-vlan show vlan all-ports show vlan private-vlan show vlan private-vlan type show interface vlan vlan-id private-vlan mapping show tech-support vlan

VLAN Issues
This section includes symptoms and solutions for VLAN issues.

You Cannot Create a VLAN


You may have a problem when creating a VLAN.

Symptom You cannot create a VLAN.

Possible Cause There are not enough resources in the virtual device context (VDC). You are using a reserved VLAN ID.

Solution Use the show vdc resource vlan command to determine how many unused VLANs that you can configure. If this value is 0, log in as network-admin and use the limit-resource command in VDC configuration mode to add more VLAN resources to this VDC. VLANs 3968 to 4047 and 4094 are reserved for internal use in each VDC; you cannot change or use these reserved VLANs.

You Cannot Create a PVLAN


You may experience issues creating a private VLAN (PVLAN).

Initial Troubleshooting Checklist

64

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide Symptom Possible Cause Solution Use the feature pvlan command to enable the PVLAN feature.

You cannot create a PVLAN. The PVLAN feature is not enabled.

The VLAN Interface is Down


You may have a problem when configuring VLAN interfaces.

Symptom

Possible Cause The VLAN does not exist.

Solution Use the show vlan command to determine if the VLAN exists. Use the vlan command to create the VLAN.

The VLAN interface is down.

Use the show vlan internal vlan-info command to check the operating state No interfaces on the VLAN are of the Spanning Tree Protocol (STP). Configure STP so that at least one in the STP forwarding state. interface goes into the STP forwarding state. One or more services prevented Use the show vlan internal vlan-info command to determine the state of the the VLAN interface from VLAN interface. If the state is oper-es, use the show tech-support interface coming up. vlan command to gather more information. The VLAN is a secondary VLAN. The interface is in the wrong VRF. Use the show vlan internal vlan-info command to determine the state of the VLAN interface. Change the VLAN to a primary or user VLAN. Use the show vrf interface command to determine the interface that the VLAN interface is assigned to.

See Also
Before Contacting Technical Support

Further Reading
The following links contain further information on this topic from Cisco.com: Cisco Nexus 7000 Series NX-OS Layer 2 Switching Configuration Guide

External Links
External links contain content developed by external authors. Cisco does not review this content for accuracy.

This article describes how to identify and resolve problems that might occur when implementing the Spanning Tree Protocol (STP). Guide Contents Troubleshooting Overview Troubleshooting Installs, Upgrades, and Reboots Troubleshooting Licensing You Cannot Create a PVLAN 65

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide Troubleshooting VDCs Troubleshooting CFS Troubleshooting Ports Troubleshooting vPCs Troubleshooting VLANs Troubleshooting STP (this section) Troubleshooting Routing Troubleshooting Unicast Traffic Troubleshooting WCCP Troubleshooting Memory Troubleshooting Packet Flow Issues Troubleshooting FCoE Before Contacting Technical Support Troubleshooting Tools and Methodology

Contents
1 Information About Troubleshooting STP 2 Initial Troubleshooting Checklist 3 Troubleshooting STP Data Loops 4 Troubleshooting Excessive Packet Flooding 5 Troubleshooting Convergence Time Issues 6 Securing the Network Against Forwarding Loops 7 See Also 8 Further Reading 9 External Links

Information About Troubleshooting STP


STP provides a loop-free network at the Layer 2 level. Layer 2 LAN ports send and receive STP frames at regular intervals. Network devices do not forward these frames but use the frames to construct a loop-free path. See the Cisco NX-OS Layer 2 Switching Configuration Guide for more information on STP.

Follow these guidelines when configuring STP: If you are running private VLANs with multiple STP (MST), verify that all secondary VLANs belong to the same MST instance as that of the primary VLANs. Disabling spanning tree on the native VLAN of an 802.1Q trunk when you are working in Rapid PVST+ spanning tree mode can cause a spanning tree loop on that VLAN. We recommend that you leave spanning tree enabled on the native VLAN of the 802.1Q trunks. Make sure that your network has no physical loops before you disable spanning tree.

External Links

66

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide When you connect two Cisco switches through 802.1Q trunks, the switches exchange spanning tree bridge protocol data units (BPDUs) on each VLAN allowed on the trunks. The BPDUs on the native VLAN of the trunk are sent untagged to the reserved IEEE 802.1D spanning tree multicast MAC address (01-80-C2-00-00-00). The BPDUs on all other VLANs on the trunk are sent tagged to the reserved Cisco Shared Spanning Tree (SSTP) multicast MAC address (01-00-0c-cc-cc-cd). In STP, the port-channel bundle is considered as a single port. The port cost is the aggregation of all the configured port costs that are assigned to that channel. When a secondary VLAN is associated with the primary VLAN, the STP parameters of the primary VLAN, such as bridge priorities, are propagated to the secondary VLAN. However, STP parameters do not necessarily propagate to other devices. You should manually check the STP configuration to ensure that the spanning tree topologies for the primary, isolated, or community VLANs match exactly so that the VLANs can share the same forwarding database. For normal trunk ports, note the following: There is a separate instance of STP for each VLAN in the private VLAN. STP parameters for the primary and all secondary VLANs must match. The primary and all associated secondary VLANs should be in the same MST instance. The duplex configuration for both sides of the link should be set to full to prevent collisions under heavy traffic conditions. In MST mode, a misconfiguration cannot be detected if you configure one end of a link in trunk mode and the other end of the link in access mode. This misconfiguration will cause an STP loop. For nontrunking ports, note the following: STP is aware only of the primary VLAN for any private VLAN host port; STP does not run on secondary VLANs on a host port. For Rapid PVST+ on private VLANs, note the following: On a trunk port, the primary and secondary private VLANs are two different logical ports and must have the exact same STP topology. On access ports, STP sees only the primary VLAN.

Note: In some cases, the configuration is accepted with no error messages, but the commands have no effect.

Initial Troubleshooting Checklist

Troubleshooting an STP problem involves gathering information about the configuration and connectivity of individual devices and the entire network.

Begin troubleshooting STP issues by checking the following issues first: Checklist Verify the type of spanning tree configured on your device. Verify the network topology including all interconnected ports and switches. Identify all redundant paths on the network and verify that the redundant paths are blocking. Use the show spanning-tree summary totals command to verify that the total number of logical interfaces in the Active state are less than the maximum allowed. See the Cisco NX-OS Layer 2 Switching Configuration Guide for information on these limits. Verify the primary and secondary root bridge and any configured Cisco extensions. 67 Check off

Information About Troubleshooting STP

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide Use the following commands to view STP configuration and operational details: show running-config spanning-tree show spanning-tree summary show spanning-tree detail show spanning-tree bridge show spanning-tree mst show spanning-tree mst configuration show spanning-tree interface interface-type slot/port [detail] show tech-support stp show spanning-tree vlan

Use the show spanning-tree blockedports command to display the ports that are blocked by STP. Use the show mac address-table dynamic vlan command to determine if learning or aging occurs at each node.

Troubleshooting STP Data Loops


Data loops are a common problem in STP networks. Some of the symptoms of a data loop are as follows: High link utilization, up to 100 percent High CPU and backplane traffic utilization Constant MAC address relearning and flapping Excessive output drops on an interface

To troubleshoot STP loops, follow these steps: 1. Identify the ports involved in the loop by looking at the interfaces with high link utilization. switch# show interface ethernet 2/1 | include rate
1 minute input rate 19968 bits/sec, 0 packets/sec 1 minute output rate 3952023552 bits/sec, 957312 packets/sec

2. Shut down or disconnect the affected ports. switch(config)# interface ethernet 2/1 switch(config-if)# shutdown 3. Locate every switch in the redundant paths using your network topology diagram. 4. Verify that the switch lists the same STP root bridge as the other nonaffected switches. switch# show spanning-tree vlan 9
VLAN0009 Spanning tree enabled protocol rstp Root ID Priority 32777'' Address 0018.bad7.db15''

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide


Cost ... 4

5. Verify that the root port is correctly identified as the port with the lowest cost to the root bridge. switch# show spanning-tree vlan 9
VLAN0009 Spanning tree enabled protocol rstp Root ID Priority 32777 Address 0018.bad7.db15 Cost 4 Port 385 (Ethernet3/1) Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec

6. Verify that the root port and alternate ports are regularly receiving BPDUs. switch# show spanning-tree interface ethernet 3/1 detail
Port 385 (Ethernet3/1) of VLAN0001 is root forwarding Port path cost 4, Port priority 128, Port Identifier 128.385 Designated root has priority 32769, address 0018.bad7.db15 Designated bridge has priority 32769, address 0018.bad7.db15 Designated port id is 128.385, designated path cost 0 Timers: message age 16, forward delay 0, hold 0 Number of transitions to forwarding state: 1 The port type is network by default Link type is point-to-point by default BPDU: sent 1265, received 1269

7. If the received BPDU counter is not incremented, check if the BPDUs are received by the internal packet manager. switch# show system internal pktmgr interface ethernet 3/1
Ethernet3/1, ordinal: 36 SUP-traffic statistics: (sent/received) Packets: 120210 / 15812 Bytes: 8166401 / 1083056 Instant packet rate: 5 pps / 5 pps Average packet rates(1min/5min/15min/EWMA): Packet statistics: Tx: Unicast 0, Multicast 120210 Broadcast 0 Rx: Unicast 0, '' Multicast 15812'' Broadcast 0

switch# show system internal pktmgr client 303


Client uuid: 303, 2 filters Filter 0: EthType 0x4242, Dmac 0180.c200.0000 Filter 0: EthType 0x010b, Snap 267, Dmac 0100.0ccc.cccd Options: TO 0, Flags 0x1, AppId 0, Epid 0 Ctrl SAP: 171, Data SAP 177 (1) Rx: 28356632, Drop: 0, Tx: 35498365, Drop: 0

8. If the BPDUs are not received by the packet manager, check the hardware packet statistic (error drop) counters. switch# show interface counters errors
--------------------------------------------------------------------------------

Troubleshooting STP Data Loops

69

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide


Port Align-Err FCS-Err Xmit-Err Rcv-Err UnderSize OutDiscards -------------------------------------------------------------------------------mgmt0 ------Eth1/1 0 0 0 0 0 0 Eth1/2 0 0 0 0 0 0 Eth1/3 0 0 0 0 0 0 Eth1/4 0 0 0 0 0 0 Eth1/5 0 0 0 0 0 0 Eth1/6 0 0 0 0 0 0 Eth1/7 0 0 0 0 0 0 Eth1/8 0 0 0 0 0 0

9. Check that the designated port is regularly sending BPDUs. switch# show spanning-tree interface ethernet 3/1 detail
Port 385 (Ethernet3/1) of VLAN0001 is root forwarding Port path cost 4, Port priority 128, Port Identifier 128.385 Designated root has priority 32769, address 0018.bad7.db15 Designated bridge has priority 32769, address 0018.bad7.db15 Designated port id is 128.385, designated path cost 0 Timers: message age 16, forward delay 0, hold 0 Number of transitions to forwarding state: 1 The port type is network by default Link type is point-to-point by default BPDU: sent 1265, received 1269

10. If the BPDU send counter is incrementing, check if BPDUs are transmitted by the packet manager. switch# show system internal pktmgr interface ethernet 3/1
Ethernet3/1, ordinal: 36 SUP-traffic statistics: (sent/received) Packets: 120210 / 15812 Bytes: 8166401 / 1083056 Instant packet rate: 5 pps / 5 pps Average packet rates(1min/5min/15min/EWMA): Packet statistics: Tx: Unicast 0, M'' ulticast 120210'' Broadcast 0 Rx: Unicast 0, Multicast 15812 Broadcast 0

switch# show system internal pktmgr client 303


Client uuid: 303, 2 filters Filter 0: EthType 0x4242, Dmac 0180.c200.0000 Filter 0: EthType 0x010b, Snap 267, Dmac 0100.0ccc.cccd Options: TO 0, Flags 0x1, AppId 0, Epid 0 Ctrl SAP: 171, Data SAP 177 (1) Rx: 28356632, Drop: 0, Tx: 35498365, Drop: 0

11. If the packet manager BPDU sent counters is incrementing, check the hardware packet statistic counters for a possible BPDU error drop. switch# show interface counters errors
-------------------------------------------------------------------------------Port Align-Err FCS-Err Xmit-Err Rcv-Err UnderSize OutDiscards -------------------------------------------------------------------------------mgmt0 -------

Troubleshooting STP Data Loops

70

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide


Eth1/1 Eth1/2 Eth1/3 Eth1/4 Eth1/5 Eth1/6 Eth1/7 Eth1/8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Troubleshooting Excessive Packet Flooding


Unstable STP topology changes can trigger excessive packet flooding in your STP network. With Rapid STP or Multiple STP (MST), a change of the port's state to forwarding, as well as the role change from designated to root can trigger a topology change. Rapid STP immediately flushes the Layer 2 forwarding table. 802.1D shortens the aging time. The immediate flushing of the forwarding table restores connectivity faster but causes more flooding. In a stable topology, a topology change should not trigger excessive flooding. Link flaps can cause a topology change, so continuous link flaps can cause repetitive topology changes and flooding. Flooding slows the network performance and can cause packet drops on an interface.

To troubleshoot excessive flooding, follow these steps: 1. Determine the source of the excessive topology change. switch# show spanning-tree vlan 9 detail
VLAN0009 is executing the rstp compatible Spanning Tree protocol Bridge Identifier has priority 32768, sysid 9, address 0018.bad8.27ad Configured hello time 2, max age 20, forward delay 15 Current root has priority 32777, address 0018.bad7.db15 Root port is 385 (Ethernet3/1), cost of root path is 4 Topology change flag not set, detected flag not set '' Number of topology changes 8 last change occurred 1:32:11 ago'' '' from Ethernet3/1'' Times: hold 1, topology change 35, notification 2 ...

2. Determine the interface where the topology change occurred. switch# show spanning-tree vlan 9 detail
VLAN0009 is executing the rstp compatible Spanning Tree protocol Bridge Identifier has priority 32768, sysid 9, address 0018.bad8.27ad Configured hello time 2, max age 20, forward delay 15 Current root has priority 32777, address 0018.bad7.db15 Root port is 385 (Ethernet3/1), cost of root path is 4 Topology change flag not set, detected flag not set Number of topology changes 8 last change occurred 1:32:11 ago '' from Ethernet3/1'' Times: hold 1, topology change 35, notification 2 ...

3. Repeat step 2 on devices connected to the interface until you can isolate the device that originated the topology change. 4. Check for link flaps on the interfaces on this device. Troubleshooting Excessive Packet Flooding 71

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide

Troubleshooting Convergence Time Issues


STP convergence can take longer than expected or result in an unexpected final network topology.

To troubleshoot convergence issues, check the following issues: Errors in the documented network topology diagram. Misconfiguration-Check that the timers, diameter, Cisco extension features such as Bridge Assurance, Root Guard, BPDU Guard, and so on are not misconfigured. Overloaded switch CPU during convergence that exceeds the recommended logical port (port-vlan) limit. Note: The recommended scalability limits are system wide and not per VDC. Software defects that affect STP.

Securing the Network Against Forwarding Loops


To handle the inability of STP to deal correctly with certain failures, Cisco has developed a number of features and enhancements to protect the networks against forwarding loops.

Troubleshooting STP helps to isolate and find the cause for a particular failure, while the implementation of these enhancements is the only way to secure the network against forwarding loops.

To protect your network against forwarding loops, follow these steps: 1. Enable the Cisco-proprietary Unidirectional Link Detection (UDLD) protocol on all the switch-to-switch links. See the UDLD section in the Cisco NX-OS Interfaces Configuration Guide. 2. Set up the Bridge Assurance feature by configuring all the switch-to-switch links as the spanning tree network port type. Note: You should enable the Bridge Assurance feature on both sides of the links or Cisco NX-OS will put the port in the blocked state because of a Bridge Assurance inconsistency. 3. Set up all the end-station ports as a spanning-tree edge port type. You must set up the STP edge port to limit the amount of topology change (TC) notices and subsequent flooding that can affect the performance of the network. Use this command only with ports that connect to end stations. Otherwise, an accidental topology loop can cause a data-packet loop and disrupt the device and network operation. 4. Enable the Link Aggregation Control Protocol (LACP) for port channels to avoid any port-channel misconfiguration issues. See the LACP section in the Cisco NX-OS Interfaces Configuration Guide. Do not disable autonegotiation on the switch-to-switch links. Autonegotiation mechanisms can convey remote fault information, which is the quickest way to detect failures at the remote side. If failures are detected at the remote side, the local side brings down the link even if the link is still receiving pulses. Troubleshooting Convergence Time Issues 72

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide Caution! Be careful when you change STP timers. STP timers are dependent on each other and changes can impact the entire network. 5. (Optional) To prevent denial-of-service attacks, use the spanning-tree loopguard default command to secure the network STP perimeter with Root Guard. Root Guard and BPDU Guard allow you to secure STP against influence from the outside. 6. Use the spanning-tree bpduguard enable command to enable BPDU Guard on STP edge ports to prevent STP from being affected by unauthorized network devices (such as hubs, switches, and bridging routers) that are connected to the ports. Root Guard prevents STP from outside influences. BPDU Guard shuts down the ports that are receiving any BPDUs (not only superior BPDUs). Note: Short-living loops are not prevented by Root Guard or BPDU Guard if two STP edge ports are connected directly or through the hub. 7. Use the vlan command to configure separate VLANs and avoid user traffic on the management VLAN. The management VLAN is contained to a building block, not the entire network. 8. Use the spanning-tree vlan vlan-range root primary command to configure a predictable STP root. 9. Use the spanning-tree vlan vlan-range root secondary command to configure a predictable backup STP root placement. You must configure the STP root and backup STP root so that convergence occurs in a predictable way and builds optimal topology in every scenario. Do not leave the STP priority at the default value.

See Also
Cisco NX-OS/IOS STP Comparison

Further Reading
The following links contain further information on this topic from Cisco.com: Cisco Nexus 7000 Series NX-OS Layer 2 Switching Configuration Guide

External Links
External links contain content developed by external authors. Cisco does not review this content for accuracy.

This article describes troubleshooting procedures for routing. Guide Contents Troubleshooting Overview Troubleshooting Installs, Upgrades, and Reboots Troubleshooting Licensing Troubleshooting VDCs Troubleshooting CFS Securing the Network Against Forwarding Loops 73

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide Troubleshooting Ports Troubleshooting vPCs Troubleshooting VLANs Troubleshooting STP Troubleshooting Routing (this section) Troubleshooting Unicast Traffic Troubleshooting WCCP Troubleshooting Memory Troubleshooting Packet Flow Issues Troubleshooting FCoE Before Contacting Technical Support Troubleshooting Tools and Methodology

Contents
1 Information about Troubleshooting Routing Issues 2 Initial Troubleshooting Checklist 3 Troubleshooting Routing 4 See Also 5 Further Reading 6 External Links

Information about Troubleshooting Routing Issues


Layer 3 routing involves determining optimal routing paths and packet switching. You can use routing algorithms to calculate the optimal path from the router to a destination. This calculation depends on the algorithm selected, route metrics, and other considerations such as load balancing and alternate path discovery.

Cisco NX-OS uses the virtual device contexts (VDCs) to provide separate management domains per VDC and software fault isolation. Each VDC supports multiple Virtual Routing and Forwarding Instances (VRFs) and multiple routing information bases (RIBs) to support multiple address domains. Each VRF is associated with a routing information base (RIB) and this information is collected by the Forwarding Information Base (FIB).

See the Cisco NX-OS Unicast Routing Configuration Guide and the Cisco NX-OS Multicast Routing Configuration Guide for more information on routing.

External Links

74

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide

Initial Troubleshooting Checklist


Begin troubleshooting routing issues by checking the following issues first:

Checklist Verify that the routing protocol is enabled. Verify that the address family is configured if necessary. Verify that you have configured the correct VRF for your routing protocol.

Check off

Use the following commands to display routing information: show ip arp show ip traffic show tcp statistics udp4 show ip client show tcp client show ip fib show ip process show ip route show pktmgr interface show frame traffic show platform fib show platform forwarding show platform ip show vrf show vrf interface

Troubleshooting Routing
To troubleshoot basic routing issues, follow these steps:

1. Verify that the routing protocol is enabled.


switch(config)# show ospf ^ % invalid command detected at '^' marker.

If the feature is not enabled, Cisco NX-OS reports that the command is invalid. Use the feature command to enable the routing protocol. 2. Verify the configuration for this routing protocol.
switch# show running-config eigrp all version 4.0(1) feature eigrp router eigrp 99

Initial Troubleshooting Checklist

75

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide


log-neighbor-warnings log-neighbor-changes log-adjacency-changes graceful-restart nsf timers nsf signal 20 distance 90 170 metric weights 0 1 0 1 0 0 metric maximum-hops 100 default-metric 100000 100 255 1 1500 maximum-paths 16 address-family ipv4 unicast log-neighbor-warnings log-neighbor-changes log-adjacency-changes graceful-restart router-id 192.0.2.1 nsf timers nsf signal 20 distance 90 170 metric weights 0 1 0 1 0 0 metric maximum-hops 100 default-metric 100000 100 255 1 1500 maximum-paths 16

3. Verify the VRF configuration for this routing protocol.


switch# show running-config eigrp version 4.0(1) feature eigrp router eigrp 99 address-family ipv4 unicast router-id 192.0.2.1 vrf red stub

4. Check the memory utilization for this routing protocol.


switch# show processes memory | include isis 8913 9293824 bffff1d0/bffff0d0 isis 32243 8609792 bfffe0c0/bfffdfc0 isis

5. Verify that the routing protocol is receiving packets.


switch# show ip client pim Client: pim, uuid: 284, pid: 3839, extended pid: 3839 Protocol: 103, client-index: 10, routing VRF id: 255 Data MTS-SAP: 1519 Data messages, send successful: 2135, failed: 0

6. Verify that the routing protocol is enabled on an interface.


switch# show ip interface loopback0 loopback0, Interface status: protocol-up/link-up/admin-up, iod: 36, Context:"default" IP address: 1.0.0.1, IP subnet: 1.0.0.0/24 ... IP multicast groups locally joined: 224.0.0.2 224.0.0.1 224.0.0.13 ...

7. Verify that the interface is in the correct VRF.

Troubleshooting Routing

76

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide


switch(config)# show vrf interface loopback 99 Interface VRF-Name loopback99 default VRF-ID 1

8. Verify that the routing protocol is registered with the RIB.


switch(config)# show routing unicast clients CLIENT: am index mask: 0x00000002 epid: 3908 MTS SAP: 252 MRU cache hits/misses: 2/1 Routing Instances: VRF: management table: base Messages received: Register : 1 Add-route : 2 Delete-route Messages sent: Add-route-ack : 2 Delete-route-ack : 1 CLIENT: rpm index mask: 0x00000004 epid: 4132 MTS SAP: 348 MRU cache hits/misses: 0/0 Messages received: Register : 1 Messages sent: ... CLIENT: eigrp-99 index mask: 0x00002000 epid: 3148 MTS SAP: 63775 MRU cache hits/misses: 0/1 Routing Instances: VRF: default table: base notifiers: self Messages received: Register : 1 Delete-all-routes : 1 Messages sent: ...

: 1

9. Verify that the RIB is interacting with the forwarding plane.


switch# show forwarding distribution multicast client Number of Clients Registered: 3 Client-name Client-id Shared Memory Name igmp 1 N/A mrib 2 /procket/shm/mrib-mfdm m6rib 3 /procket/shm/m6rib-mfdm

See Also
Cisco NX-OS/IOS BGP (Basic) Comparison Cisco NX-OS/IOS BGP (Advanced) Comparison Cisco NX-OS/IOS EIGRP Comparison Cisco NX-OS/IOS HSRP Comparison Cisco NX-OS/IOS Layer-3 Virtualization Comparison Cisco NX-OS/IOS OSPF Comparison

See Also

77

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide

Further Reading
The following links contain further information on this topic from Cisco.com: Cisco Nexus 7000 Series NX-OS Unicast Routing Configuration Guide

External Links
External links contain content developed by external authors. Cisco does not review this content for accuracy. Packetlife Routing Cheatsheets

This article below provides only basic information on how to troubleshoot unicast packet flow traffic issues for the M1 Series modules. Troubleshooting L2/L3 unicast is covered in detail in Cisco-Live presentation. Sections of this presentation covers, both platform independent, and platform specific step by step troubleshooting for unicast, among other things. Access to this presentation is available FREE. Follow the below instructions to access the presentation

1. Visit https://www.ciscolivevirtual.com/ 2. Register for free. 3. Click on "Cisco Live Virtual" link. 4. Click on the ?Sessions? Tab on top, and select ?2011 Sessions Catalog? 5. In the search box, type ?BRKCRS-3144? and Submit search. 6. Select the session. You can either View the Session (or) download the pdf. 7. Unicast troubleshooting is covered from slides 81 through 93.

Guide Contents Troubleshooting Overview Troubleshooting Installs, Upgrades, and Reboots Troubleshooting Licensing' Troubleshooting VDCs Troubleshooting CFS Troubleshooting Ports Troubleshooting vPCs Troubleshooting VLANs Troubleshooting STP Troubleshooting Routing Further Reading 78

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide Troubleshooting Unicast Traffic(this section) Troubleshooting WCCP Troubleshooting Memory Troubleshooting Packet Flow Issues Troubleshooting FCoE Before Contacting Technical Support Troubleshooting Tools and Methodology

Contents
1 Packet is Received into Interface from Wire 2 Linksec Decryption Occurs, 1st stage Port QoS 3 Second Stage Port QoS Occurs 4 Layer 2 Source/Destination MAC Processing 5 Layer 3 Engine Processing 5.1 Layer 3 Engine Processes Layer 3 Features 5.2 Layer 3 forwarding for Routed Traffic 6 SFabric Processing Occurs (optional) 7 Layer 2 Engine Performs Source/Destination MAC Processing 8 Egress Port QoS is Performed 9 Linksec Encryption Occurs 10 Packet is Transmitted

Packet is Received into Interface from Wire


During this step, the packet is received into the Nexus 7000 port. When troubleshooting this step, we want to look to ensure there is transceiver interoperability, and validate whether we are seeing any errors on the interface. We do this via using the following commands show interface interface show interface interface transceiver PHX2-N7K-1# show interface e1/1
Ethernet1/1 is up Hardware: 10000 Ethernet, address: 0024.986c.00b0 (bia 0024.986c.00b0) Description: N7K-vdc-1 connecting to core 6506 MTU 1500 bytes, BW 10000000 Kbit, DLY 10 usec, reliability 255/255, txload 1/255, rxload 1/255 Encapsulation ARPA Port mode is trunk full-duplex, 10 Gb/s, media type is 10g Beacon is turned off Auto-Negotiation is turned off Input flow-control is off, output flow-control is off Rate mode is shared Switchport monitor is off Last link flapped 7week(s) 4day(s) Last clearing of "show interface" counters never 1 minute input rate 13056 bits/sec, 9 packets/sec 1 minute output rate 4608 bits/sec, 0 packets/sec Rx 341190251 input packets 276211313 unicast packets 52112947 multicast packets 12865991 broadcast packets 0 jumbo packets 0 storm suppression packets 94295027129 bytes

External Links

79

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide


Tx 462437316 output packets 85121 multicast packets 188251 broadcast packets 0 jumbo packets 648159081064 bytes 0 input error 0 short frame 0 watchdog 0 no buffer 0 runt 0 CRC 0 ecc 0 overrun 0 underrun 0 ignored 0 bad etype drop 0 bad proto drop 0 if down drop 0 input with dribble 0 input discard 0 output error 0 collision 0 deferred 0 late collision 0 lost carrier 0 no carrier 0 babble 0 Rx pause 0 Tx pause 1 interface resets

PHX2-N7K-1# show interface e1/1 transceiver details


Ethernet1/1 sfp is present name is CISCO-AVAGO <<< If this says type is (unknown), it is not supported. part number is SFBR-7700SDZ revision is B4 serial number is AGD12434116 nominal bitrate is 10300 MBits/sec Link length supported for 50/125um fiber is 82 m(s) Link length supported for 62.5/125um fiber is 26 m(s) cisco id is -cisco extended id number is 4 SFP Detail Diagnostics Information (internal calibration) ---------------------------------------------------------------------------Alarms Warnings High Low High Low ---------------------------------------------------------------------------Temperature 45.46 C 75.00 C -5.00 C 70.00 C 0.00 C Voltage 3.28 V 3.63 V 2.97 V 3.46 V 3.13 V Current 6.92 mA 10.50 mA 2.50 mA 10.50 mA 2.50 mA Tx Power -2.75 dBm 1.69 dBm -11.30 dBm -1.30 dBm -7.30 dBm Rx Power -2.43 dBm 1.99 dBm -13.97 dBm -1.00 dBm -9.91 dBm Transmit Fault Count = 0 ---------------------------------------------------------------------------Note: ++ high-alarm; + high-warning; -- low-alarm; - low-warning

Linksec Decryption Occurs, 1st stage Port QoS


In step 2, Linksec decryption occurs as well as receive side stage 1 QoS. It is important to step back and evaluate the difference between stage 1 and stage 2 QoS. The difference is that some ports can be configured in shared mode, whereas some can be configured in dedicated mode, on the 10G modules. What this means, is that there is 10g of bandwidth that can be dedicated to a port or shared amongst ports (4 ports, on the m132 module). When running in shared mode, there exists a chance for contention accessing the 10g bandwidth through the 4:1 Mux. To alleviate this, some QoS intelligence was passed down to the 4:1 Mux which aggregates the ports. In dedicated mode, there is no QoS applied at the Mux, instead, all traffic is processed in phase 2 QoS. To summarize, in shared mode, 1st stage QoS ensures fair access to the 10g of port bandwidth. In both shared and dedicated mode, 2nd stage QoS occurs to provide ingress queuing to the system. For the ingress QoS, we are concerned about the Receive side QoS parameters in the show queuing command.

Packet is Received into Interface from Wire

80

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide Use the show policy-map command to see per queue dropped packets. The commands to troubleshoot Linksec and Port QoS are as follows: show cts interface [all | interface] show queuing interface interface show policy-map interface (for per queue drop)

switch# show cts interface all Working Example


CTS Information for Interface Ethernet2/24: CTS is enabled, mode: CTS_MODE_DOT1X IFC state: CTS_IFC_ST_CTS_OPEN_STATE Authentication Status: CTS_AUTHC_SUCCESS Peer Identity: india1 Peer is: CTS Capable 802.1X role: CTS_ROLE_AUTH Last Re-Authentication: Authorization Status: CTS_AUTHZ_SUCCESS PEER SGT: 2 Peer SGT assignment: Trusted Global policy fallback access list: SAP Status: CTS_SAP_SUCCESS Configured pairwise ciphers: GCM_ENCRYPT Replay protection: Enabled Replay protection mode: Strict Selected cipher: GCM_ENCRYPT Current receive SPI: sci:1b54c1fbff0000 an:0 Current transmit SPI: sci:1b54c1fc000000 an:0

PHX2-N7K-1# show cts interface eth 1/8 Broken Example


CTS Information for Interface Ethernet1/8: CTS is enabled, mode: CTS_MODE_MANUAL IFC state: Unknown Authentication Status: CTS_AUTHC_INIT Peer Identity: Peer is: Not CTS Capable 802.1X role: CTS_ROLE_UNKNOWN Last Re-Authentication: Authorization Status: CTS_AUTHZ_INIT PEER SGT: 0 Peer SGT assignment: Not Trusted SAP Status: CTS_SAP_INIT Configured pairwise ciphers: Replay protection: Replay protection mode: Selected cipher: Current receive SPI: Current transmit SPI:

PHX2-N7K-1# show queuing int eth 1/1


Interface Ethernet1/1 TX Queuing strategy: Weighted Round-Robin Port QoS is enabled Queuing Mode in TX direction: mode-cos Transmit queues [type = 1p7q4t]

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide


Queue Id Scheduling Num of thresholds _____________________________________________________________ 1p7q4t-out-q-default WRR 04 1p7q4t-out-q2 WRR 04 1p7q4t-out-q3 WRR 04 1p7q4t-out-q4 WRR 04 1p7q4t-out-q5 WRR 04 1p7q4t-out-q6 WRR 04 1p7q4t-out-q7 WRR 04 1p7q4t-out-pq1 Priority 04 Configured WRR WRR bandwidth ratios: 25[1p7q4t-out-q-default] 15[1p7q4t-out-q2] 12[1p7q4t-out-q3] 12[1p7q4t-out-q4] 12[1p7q4t-out-q5] 12[1p7q4t-out-q6] 12[1p7q4t-out-q7] WRR configuration read from HW WRR bandwidth ratios: 25[1p7q4t-out-q-default] 15[1p7q4t-out-q2] 11[1p7q4t-out-q3] 11[1p7q4t-out-q4] 11[1p7q4t-out-q5] 11[1p7q4t-out-q6] 11[1p7q4t-out-q7] Configured queue-limit ratios queue-limit ratios: 78[1p7q4t-out-q-default] 1[1p7q4t-out-q2] 1[1p7q4t-out-q3] *1[1p7q4t-out-q4] *1[1p7q4t-out-q5] *1[1p7q4t-out-q6] *1[1p7q4t-out-q7] 16[1p7q4t-out-pq1] * means unused queue with mandatory minimum queue-limit queue-limit ratios configuration read from HW queue-limit ratios: 78[1p7q4t-out-q-default] 1[1p7q4t-out-q2] 1[1p7q4t-out-q3] *1[1p7q4t-out-q4] *1[1p7q4t-out-q5] *1[1p7q4t-out-q6] *1[1p7q4t-out-q7] 16[1p7q4t-out-pq1] * means unused queue with mandatory minimum queue-limit Thresholds: COS Queue Threshold Type Min Max __________________________________________________________________ 0 1p7q4t-out-q-default DT 100 100 1 1p7q4t-out-q-default DT 100 100 2 1p7q4t-out-q-default DT 100 100 3 1p7q4t-out-q-default DT 100 100 4 1p7q4t-out-q-default DT 100 100 5 1p7q4t-out-pq1 DT 100 100 6 1p7q4t-out-pq1 DT 100 100 7 1p7q4t-out-pq1 DT 100 100 Interface Ethernet1/1 RX Queuing strategy: Weighted Round-Robin Queuing Mode in RX direction: mode-cos Receive queues [type = 8q2t] Port Cos not configured Queue Id Scheduling Num of thresholds ____________________________________________________________ 8q2t-in-q-default WRR 02 8q2t-in-q2 WRR 02 8q2t-in-q3 WRR 02 8q2t-in-q4 WRR 02 8q2t-in-q5 WRR 02 8q2t-in-q6 WRR 02 8q2t-in-q7 WRR 02 8q2t-in-q1 WRR 02 Configured WRR WRR bandwidth ratios: 20[8q2t-in-q-default] 0[8q2t-in-q2] 0[8q2t-in-q3] 0[8q2tin-q4] 0[8q2t-in-q5] 0[8q2t-in-q6] 0[8q2t-in-q7] 80[8q2t-in-q1] WRR configuration read from HW WRR bandwidth ratios: 20[8q2t-in-q-default] 0[8q2t-in-q2] 0[8q2t-in-q3] 0[8q2t-in-q4] 0[8q2t-in-q5] 0[8q2t-in-q6] 0[8q2t-in-q7] 80[8q2t-in-q1] No queue-limit ratios user configuration ________________________________________ queue-limit ratios configuration read from HW

Linksec Decryption Occurs, 1st stage Port QoS

82

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide


queue-limit ratios: 100[8q2t-in-q-default] 100[8q2t-in-q2] 100[8q2t-in-q3] 100[8q2t-in-q4] 100[8q2t-in-q5] 100[8q2t-in-q6] 100[8q2t-in-q7] 100[8q2t-in-q1] Thresholds: COS Queue Threshold Type Min Max __________________________________________________________________ 0 8q2t-in-q-default DT 100 100 1 8q2t-in-q-default DT 100 100 2 8q2t-in-q-default DT 100 100 3 8q2t-in-q-default DT 100 100 4 8q2t-in-q-default DT 100 100 5 8q2t-in-q1 DT 100 100 6 8q2t-in-q1 DT 100 100 7 8q2t-in-q1 DT 100 100

PHX2-N7K-1# show policy-map interface eth 1/2


Global statistics status : enabled Ethernet1/2 Service-policy (queuing) input: default-in-policy policy statistics status: enabled Class-map (queuing): in-q1 (match-any) queue-limit percent 50 bandwidth percent 80 queue dropped pkts : 0 Class-map (queuing): in-q-default (match-any) queue-limit percent 50 bandwidth percent 20 queue dropped pkts : 0 Service-policy (queuing) output: default-out-policy policy statistics status: enabled Class-map (queuing): out-pq1 (match-any) priority level 1 queue-limit percent 16 queue dropped pkts : 0 Class-map (queuing): out-q2 (match-any) queue-limit percent 1 queue dropped pkts : 0 Class-map (queuing): out-q3 (match-any) queue-limit percent 1 queue dropped pkts : 0 Class-map (queuing): out-q-default (match-any) queue-limit percent 82 bandwidth remaining percent 25 queue dropped pkts : 0

Second Stage Port QoS Occurs


For the ingress QoS, we are concerned about the Receive side QoS parameters in the show queuing command. Use the show policy-map command to view queue drops . The commands to troubleshoot Port QoS are Second Stage Port QoS Occurs 83

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide show queuing interface interface show policy-map interface

Layer 2 Source/Destination MAC Processing


In this step, the ASIC submits the packet headers to theLayer 2 engine for lookup, and the Layer 2 engine performs source/destination MAC processing. To validate forwarding of the Layer 2 engine, we should first look at the centralized mac table aggregated on the supervisor to validate whether the mac addresses are correlated as we expect them, and assigned to the ports where we expect the Mac?s to reside. Based off of this, we can then validate the hardware programming on the ingress linecard to validate that our mac address table is properly programmed into the hardware based Layer 2 engine on the linecard. We first will look at the mac address table, then we can ensure programming is properly occurring in the hardware table. The commands used to accomplish this are as follows: show mac address-table show hardware mac address-table module interface interface To drill down on a specific MAC address, we can use the grep function with these commands to validate the mac is associated with a particular port, and that the hardware programming reflects that. show mac address-table | grep macaddress show hardware mac address-table module interface interface | grep macaddress Note: When evaluating the Hardware mac table, if the Index is set to 0x00400, or the GM bit is set to ?1?, that traffic will be routed. For example, you will see the index set to 0x00400 and GM bit set to 1 for traffic destined to the mac address local to the device PHX2-N7K-1# show mac address-table
Legend: * - primary entry, G - Gateway MAC, (R) - Routed MAC age - seconds since last seen,+ - primary entry using vPC Peer-Link VLAN MAC Address Type age Secure NTFY Ports ---------+-----------------+--------+---------+------+------+---------------G 0023.ac67.dd41 static False False sup-eth1(R) G 5 0023.ac67.dd41 static False False sup-eth1(R) * 5 0000.0c07.ac01 dynamic 0 False False Eth1/1 * 5 000c.2943.a67e dynamic 180 False False Eth1/1 * 5 000c.294b.c5ca dynamic 0 False False Eth1/1 * 5 000d.ece2.0640 dynamic 180 False False Eth1/1 * 5 0013.5f32.aa80 dynamic 0 False False Eth1/1 * 5 0018.8b45.41b7 dynamic 0 False False Eth1/1 * 5 0019.bb2f.4871 dynamic 0 False False Eth1/1 * 5 0019.bbe5.f3b8 dynamic 1230 False False Eth1/1 * 5 001a.4b33.ccdc dynamic 1080 False False Eth1/1 * 5 001a.4ba8.6a9c dynamic 1680 False False Eth1/1 * 5 001b.210a.87f9 dynamic 600 False False Eth1/1 * 5 001b.d46f.70e0 dynamic 60 False False Eth1/1 * 5 001c.c4e5.ac9a dynamic 150 False False Eth1/1 * 5 0023.ac64.6f7c dynamic 1230 False False Eth1/1 * 5 0024.986d.21c8 dynamic 270 False False Eth1/1

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide PHX2-N7K-1# show hardware mac address-table 1 int eth1/1
Valid| PI | BD | MAC | Index | Stat| SW | Modi| Age | Tmr | GM | Sec| TR | NT | RM | RMA | Cap | Fld | Always | | | | | ic | | fied| Byte| Sel | | ure| AP | FY | | | TURE| | Learn -----+----+-------+---------------+--------+-----+----+-----+-----+-----+----+---+----+----+----+-----+-----+-----+-------1 1 2 000c.294b.c5ca 0x00422 0 3 0 67 1 0 0 0 0 0 0 0 0 0 1 1 2 0050.567e.58e6 0x00422 0 3 0 68 1 0 0 0 0 0 0 0 0 0 1 1 2 0050.56aa.6067 0x00422 0 3 0 67 1 0 0 0 0 0 0 0 0 0 1 1 2 00c0.b72e.cfa0 0x00422 0 3 0 67 1 0 0 0 0 0 0 0 0 0 1 1 2 0018.8b45.41b7 0x00422 0 3 0 68 1 0 0 0 0 0 0 0 0 0 1 1 2 0013.5f32.aa80 0x00422 0 3 0 68 1 0 0 0 0 0 0 0 0 0 1 1 2 0050.56aa.75ca 0x00422 0 3 0 64 1 0 0 0 0 0 0 0 0 0 1 1 2 00a0.9811.a233 0x00422 0 3 0 39 1 0 0 0 0 0 0 0 0 0

PHX2-N7K-1# show mac address-table | grep 000c.294b.c5ca


* 5 000c.294b.c5ca dynamic 150 False False Eth1/1

PHX2-N7K-1# show hardware mac address-table 1 int eth 1/1 | grep 000c.294b.c5ca
1 0 0 1 0 2 0 000c.294b.c5ca 0x00422 0 0 0 0 0 3 0 67 1 0

Layer 3 Engine Processing


After the Layer 2 engine is finished, it sends the header to the Layer 3 engine. The layer 3 engine applies layer 3 intelligent features, to all packets, and layer 3 forwarding, to routed packets. As such, this section will divide the troubleshooting into two components, the layer 3 features applied to all packets, and the layer 3 forwarding. The layer 3 features which are applied to all packets include the below features : 1) ACL 2) QoS 3) Netflow 4) Hardware IPS Following the evaluation of the features, we will evaluate the layer 3 forwarding troubleshooting.

Layer 3 Engine Processes Layer 3 Features


We?ll drill through each of the layer 3 features below, looking at one feature at a time. The first feature we will look at is ACL. To troubleshoot ACL, we want to evaluate the configuration, and any relevant hit counters. We then can identify if the hardware on the linecard is programming the ACL. It is important to note, that if you wish to see per ACL counters, you must enable ?statistics per-entry? in the ACL. Layer 3 Engine Processing 85

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide The commands to troubleshoot ACL are as follows: show access-lists name show hardware access-lists module module show hardware access-lists resource-utilization module module

PHX2-N7K-1# show access-lists sample-86


IP access list sample-86 statistics per-entry 10 permit ospf any any [match=0] 20 permit pim any any [match=0] 30 permit tcp any any [match=0] 40 permit ip any any [match=0]

PHX2-N7K-1# show hardware access-list mod 1


VLAN 86 : ========= No ingress policies No Netflow profiles in ingress direction Policies in egress direction: Policy type Policy Id Policy name -----------------------------------------------------------RACL 4 sample-86 No Netflow profiles in egress direction Tcam 1 resource usage: ---------------------Label_b = 0x800 Bank 0 -----IPv4 Class Policies: RACL(sample-86) 1 tcam entries 1 0 0 0 0 l4 protocol cam entries mac etype/proto cam entries lous tcp flags table entries adjacency entries

[Merged]

VDC-1 Ethernet1/1 : ==================== Policies in ingress direction: Policy type Policy Id Policy name -----------------------------------------------------------QoS 1 No Netflow profiles in ingress direction Policies in egress direction: Policy type Policy Id Policy name -----------------------------------------------------------QoS 2 No Netflow profiles in egress direction

...

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide


VDC-1 CoPP : ==================== Policies in ingress direction: Policy type Policy Id Policy name -----------------------------------------------------------QoS 3 No Netflow profiles in ingress direction Tcam 1 resource usage: ---------------------Label_b = 0x1 Bank 1 -----IPv4 Class Policies: QoS() 100 tcam entries IPv6 Class Policies: QoS() 73 tcam entries 3 0 2 0 0 l4 protocol cam entries mac etype/proto cam entries lous tcp flags table entries adjacency entries

No egress policies No Netflow profiles in egress direction

PHX2-N7K-1# show hardware access-list resource utilization mod 1


ACL Hardware Resource Utilization (Module 1) -------------------------------------------Used Free Percent Utilization ----------------------------------------------------Tcam 0, Bank 0 1 16383 0.00 Tcam 0, Bank 1 2 16382 0.01 Tcam 1, Bank 0 6 16378 0.03 Tcam 1, Bank 1 176 16208 1.07

The next feature we will look at is the QoS troubleshooting for the Nexus. Note, we will have QoS applied, potentially, on both ingress and egress. So we should interrogate both the ingress and egress QoS. The commands to troubleshoot QoS are show policy-map interface interface PHX2-N7K-1# show policy-map interface eth 1/2
Global statistics status : Ethernet1/2 enabled

Service-policy (queuing) input: default-in-policy policy statistics status: enabled Class-map (queuing): in-q1 (match-any) queue-limit percent 50

Layer 3 Engine Processes Layer 3 Features

87

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide


bandwidth percent 80 queue dropped pkts : 0 Class-map (queuing): in-q-default (match-any) queue-limit percent 50 bandwidth percent 20 queue dropped pkts : 0 Service-policy (qos) output: policy statistics status: test-police-86 enabled

Class-map (qos): test-police-86 (match-all) Match: dscp 18 police cir 100 mbps bc 200 ms Service-policy (queuing) output: default-out-policy policy statistics status: enabled Class-map (queuing): out-pq1 (match-any) priority level 1 queue-limit percent 16 queue dropped pkts : 0 Class-map (queuing): out-q2 (match-any) queue-limit percent 1 queue dropped pkts : 0 Class-map (queuing): out-q3 (match-any) queue-limit percent 1 queue dropped pkts : 0 Class-map (queuing): out-q-default (match-any) queue-limit percent 82 bandwidth remaining percent 25 queue dropped pkts : 0

Netflow processing also has portions which occur in hardware. For netflow, we collect statistics in hardware on the linecards. We then, can export them via software. The commands to troubleshoot Netflow are show flow interface show flow record show flow monitor show hardware flow ip module module PHX2-N7K-1# show flow interface vlan 86
Interface Vlan86: Monitor: sample-86 Direction: Input Monitor: sample-86 Direction: Output

PHX2-N7K-1# show flow record


Flow record netflow-original: Description: Traditional IPv4 input NetFlow with origin ASs No. of users: 2 Template ID: 256 Fields: match ipv4 source address

Layer 3 Engine Processes Layer 3 Features

88

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide


match ipv4 destination address match ip protocol match ip tos match transport source-port match transport destination-port match interface input match interface output match flow direction collect routing source as collect routing destination as collect routing next-hop address ipv4 collect transport tcp flags collect counter bytes collect counter packets collect timestamp sys-uptime first collect timestamp sys-uptime last

PHX2-N7K-1# show flow monitor


Flow Monitor Solarwinds1: Use count: 0 Flow Record: netflow-original Flow Exporter: Solarwinds Flow Monitor sample-86: Use count: 2 Flow Record: netflow-original

switch(config)# show hardware flow ip module 8


D - Direction; L4 Info - Protocol:Source Port:Destination Port IF - Interface: ()ethernet, (S)vi, (V)lan, (P)ortchannel, (T)unnel TCP Flags: Ack, Flush, Push, Reset, Syn, Urgent D IF SrcAddr DstAddr L4 Info PktCnt TCP Flags -+-----+---------------+---------------+---------------+----------+----------I 8/26 007.002.000.002 007.001.000.002 000:00000:00000 0000421885 . . . . . . I 8/25 007.001.000.002 007.002.000.002 000:00000:00000 0000421900 . . . . . . O 8/25 007.002.000.002 007.001.000.002 000:00000:00000 0000422213 . . . . . . O 8/26 007.001.000.002 007.002.000.002 000:00000:00000 0000422228 . . . . . .

Cisco NX-OS supports a hardware based intrusion detection system that checks for ip packet verification. These checks handle well known, and unusable traffic types which can be witnessed during malicious activity, such as if the source is a broadcast address, or if the destination is the 0.0.0.0 address. You can validate if any of these checks are dropping packets. Note: It has been shown in the field, that frequently it is advantageous to disable IP fragment verification. This is done via the below command

no hardware ip verify fragment To validate if you are seeing any drops because of ip packet verification, you can use the below command. show hardware forwarding ip verify show hardware rate-limit module module

PHX2-N7K-1# show hardware forwarding ip verify

IPv4 and v6 IDS Checks Status Packets Failed -----------------------------+---------+------------------

Layer 3 Engine Processes Layer 3 Features

89

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide


address source broadcast Enabled 0 address source multicast Enabled 0 address destination zero Enabled 0 address identical Enabled 0 address source reserved Enabled 8 address class-e Disabled -checksum Enabled 0 protocol Enabled 0 fragment Enabled 0 length minimum Enabled 0 length consistent Enabled 0 length maximum max-frag Enabled 0 length maximum udp Disabled -length maximum max-tcp Enabled 0 tcp flags Disabled -tcp tiny-frag Enabled 0 version Enabled 0 -----------------------------+---------+-----------------IPv6 IDS Checks Status Packets Failed -----------------------------+---------+-----------------length consistent Enabled 0 length maximum max-frag Enabled 0 length maximum udp Disabled -length maximum max-tcp Enabled 0 tcp tiny-frag Enabled 0 version Enabled 0

PHX2-N7K-1# show hardware rate-limit module 1


Units for Config: packets per second Allowed, Dropped & Total: aggregated since last clear counters Rate Limiter Class Parameters -----------------------------------------------------------layer-3 mtu Config : 500 Allowed : 0 Dropped : 0 Total : 0 layer-3 ttl Config Allowed Dropped Total Config Allowed Dropped Total Config Allowed Dropped Total Config Allowed Dropped Total Config Allowed Dropped Total Config : 500 : 0 : 0 : 0 : 10000 : 0 : 0 : 0 : 100 : 0 : 0 : 0 : 3000 : 0 : 0 : 0 : 3000 : 0 : 0 : 0 : 500

layer-3 control

layer-3 glean

layer-3 multicast directly-connected

layer-3 multicast local-groups

layer-3 multicast rpf-leak

Layer 3 Engine Processes Layer 3 Features

90

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide


Allowed Dropped Total layer-2 storm-control access-list-log Config Config Allowed Dropped Total Config Allowed Dropped Total Config Allowed Dropped Total Config Config Allowed Dropped Total : 0 : 0 : 0 : Disabled : 100 : 0 : 0 : 0 : 30000 : 0 : 0 : 0 : 30000 : 20450903 : 0 : 20450903 : Disabled : 10000 : 906 : 0 : 906

copy

receive

layer-2 port-security layer-2 mcast-snooping

Layer 3 forwarding for Routed Traffic


The Layer 3 engine will only perform Layer 3 forwarding for traffic that is routed through the router. This is traffic which has been sent to the MAC address of a valid routed interface, local to the router. To troubleshoot the routed traffic, we need to perform the following tasks: 1. Ensure that the control plane routing is correct. 2. Ensure that the hardware forwarding entries on the ingress module have the corresponding information. Note: All routing of traffic is performed on the forwarding engine of the ingress module. For our example, we will troubleshoot a route 86.86.87.0/24, which is set to a next hop of 86.86.86.1, and set to route out of VLAN 86 (an SVI). We first will look at the route, ensure it is set to the correct next hop (86.86.86.1), and set to route out of VLAN 86. We will then want to ensure that we have a corresponding ARP entry associated with this next hop, and validate that the adjacency is in the adjacency table. As we can see below, 86.86.87.0/24 is set to route to 86.86.86.1, out VLAN 86. This next hop is associated with MAC address 0011.aabb.ccdd. We will use this information to investigate the hardware, next. The commands used to accomplish this are as follows: show ip route (prefix) show ip arp (nexthop) show ip adjacency PHX2-N7K-1# show ip route 86.86.87.0/24
IP Route Table for VRF "default" 86.86.87.0/24, 1 ucast next-hops, 0 mcast next-hops *via 86.86.86.1, Vlan86, [1/0], 01:19:24, static

Layer 3 forwarding for Routed Traffic

91

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide PHX2-N7K-1# show ip arp 86.86.86.1
IP ARP Table Total number of entries: 1 Address Age MAC Address 86.86.86.1 0011.aabb.ccdd

Interface Vlan86

PHX2-N7K-1# show ip adjacency


IP Adjacency Table for VRF default Total number of entries: 1 Address Age MAC Address 86.86.86.1 00:00:37 0011.aabb.ccdd

Pref Source Static

Interface Vlan86

The above example shows the control plane. Now that we know how things are supposed to work, we can interrogate the hardware to ensure the hardware entries have propagated properly to the Layer 3 hardware engine. We can see that the IP FIB has properly associated 86.86.87.0/24 to the next hop of 86.86.86.1. We can also see, in the hardware entry, that this is routed out VLAN 86, that the RPF is valid if we have enabled RPF Checking), and that the route entry is correctly associated with the MAC address of 0011.aabb.ccdd. This demonstrates that the routing in the forwarding plane is programmed correctly and that the forwarding will follow the information contained in the routing protocols. The commands used to accomplish this are as follows: show ip fib route prefix module module show system internal forwarding route prefix detail module module PHX2-N7K-1# sho ip fib route 86.86.87.0/24 mod 1
IPv4 routes for table default/base ------------------+------------------+--------------------Prefix | Next-hop | Interface ------------------+------------------+--------------------86.86.87.0/24 86.86.86.1 Vlan86

PHX2-N7K-1# show system internal forwarding route 86.86.87.1/24 detail mod 1


RPF Flags legend: S - Directly attached route (S_Star) V - RPF valid M - SMAC IP check enabled G - SGT valid E - RPF External table valid 86.86.87.0/24 , Vlan86 Dev: 1 , Idx: 0x19001 , RPF Flags: V , DGT: 0 , VPN: 1 RPF_Intf_5: Vlan86 (0x55 ) AdjIdx: 0x43005, LIFB: 0 , LIF: Vlan86 (0x55 ), DI: 0x0 DMAC: 0011.aabb.ccdd SMAC: 0023.ac67.dd41

SFabric Processing Occurs (optional)


This step occurs if the packet needs to traverse the fabric.

SFabric Processing Occurs (optional)

92

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide In this step, we need to interrogate if the fabrics are functioning properly, and if their utilization is at an acceptable level. We can view the fabric status and utilization using the following commands: show hardware fabric-utilization show module fabric switch(config)# show hardware fabric-utilization
----------------------------Slot Direction Utilization ----------------------------2 ingress 3% 2 egress 3% 6 ingress 1% 6 egress 1%

PHX2-N7K-1# show module fabric


Xbar --1 2 3 Xbar --1 2 3 Ports ----0 0 0 Module-Type -------------------------------Fabric Module 1 Fabric Module 1 Fabric Module 1 Hw -----1.0 1.0 1.0 Model -----------------N7K-C7010-FAB-1 N7K-C7010-FAB-1 N7K-C7010-FAB-1 Status -----------ok ok ok

Sw -------------NA NA NA

Xbar --1 2 3

MAC-Address(es) -------------------------------------NA NA NA

Serial-Num ---------JAF1252AHRB JAF1251CABF JAF1252AHBL

* this terminal session

Layer 2 Engine Performs Source/Destination MAC Processing


To validate forwarding of the Layer 2 engine, we should first look at the centralized MAC table aggregated on the supervisor to validate whether the MAC addresses are correlated as we expect them, and assigned to the ports where we expect the MAC?s to reside. Based on this, we can then validate the hardware programming on the egress module to validate that our MAC address table is properly programmed into the hardware based Layer 2 engine on the module. The commands used to accomplish this are as follows: show mac address-table show hardware mac address-table module interface interface The output from these commands are documented in steps 3-4 above.

Layer 2 Engine Performs Source/Destination MAC Processing

93

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide

Egress Port QoS is Performed


As the packet nears entry from the Cisco Nexus 7000, one of the final steps is the application of Port QoS. The port level QoS will be able to buffer the traffic in times of congestion. To interrogate the egress QoS, we look at following QoS commands, paying attention to the Transmit QoS:

show queuing interface interface show policy-map interface PHX2-N7K-1# show queuing int eth 1/1
Interface Ethernet1/1 TX Queuing strategy: Weighted Round-Robin Port QoS is enabled Queuing Mode in TX direction: mode-cos Transmit queues [type = 1p7q4t] Queue Id Scheduling Num of thresholds _____________________________________________________________ 1p7q4t-out-q-default WRR 04 1p7q4t-out-q2 WRR 04 1p7q4t-out-q3 WRR 04 1p7q4t-out-q4 WRR 04 1p7q4t-out-q5 WRR 04 1p7q4t-out-q6 WRR 04 1p7q4t-out-q7 WRR 04 1p7q4t-out-pq1 Priority 04

Configured WRR WRR bandwidth ratios: 25[1p7q4t-out-q-default] 15[1p7q4t-out-q2] 12[1p7q4t-out-q3] 12[1p7q4t-out-q4] 12[1p7 WRR configuration read from HW WRR bandwidth ratios: 25[1p7q4t-out-q-default] 15[1p7q4t-out-q2] 11[1p7q4t-out-q3] 11[1p7q4t-out-q4] 11[1p7 Configured queue-limit ratios queue-limit ratios: 78[1p7q4t-out-q-default] 1[1p7q4t-out-q2] 1[1p7q4t-out-q3] *1[1p7q4t-out-q4] *1[1p7q4t-out-q5] *1[1p7q4t-out-q6] *1[1p7q4t-out-q7] 16[1p7q4t-out-pq1] * means unused queue with mandatory minimum queue-limit queue-limit ratios configuration read from HW queue-limit ratios: 78[1p7q4t-out-q-default] 1[1p7q4t-out-q2] 1[1p7q4t-out-q3] *1[1p7q4t-out-q4] *1[1p7q4t-out-q5] *1[1p7q4t-out-q6] *1[1p7q4t-out-q7] 16[1p7q4t-out-pq1] * means unused queue with mandatory minimum queue-limit Thresholds: COS Queue Threshold Type Min Max __________________________________________________________________ 0 1p7q4t-out-q-default DT 100 100 1 1p7q4t-out-q-default DT 100 100 2 1p7q4t-out-q-default DT 100 100 3 1p7q4t-out-q-default DT 100 100 4 1p7q4t-out-q-default DT 100 100 5 1p7q4t-out-pq1 DT 100 100 6 1p7q4t-out-pq1 DT 100 100 7 1p7q4t-out-pq1 DT 100 100 ...

PHX2-N7K-1# show policy-map interface eth 1/2

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide


Global statistics status : enabled Ethernet1/2 Service-policy (queuing) input: default-in-policy policy statistics status: enabled Class-map (queuing): in-q1 (match-any) queue-limit percent 50 bandwidth percent 80 queue dropped pkts : 0 Class-map (queuing): in-q-default (match-any) queue-limit percent 50 bandwidth percent 20 queue dropped pkts : 0

Service-policy (queuing) output: default-out-policy policy statistics status: enabled Class-map (queuing): out-pq1 (match-any) priority level 1 queue-limit percent 16 queue dropped pkts : 0 Class-map (queuing): out-q2 (match-any) queue-limit percent 1 queue dropped pkts : 0 Class-map (queuing): out-q3 (match-any) queue-limit percent 1 queue dropped pkts : 0 Class-map (queuing): out-q-default (match-any) queue-limit percent 82 bandwidth remaining percent 25 queue dropped pkts : 0

Linksec Encryption Occurs


The command used to troubleshoot Linksec encryption is show cts interface {all | interface} The output from this command is documented in step 2 above.

Packet is Transmitted
The final step in the process is the transmission of the frame out of the physical egress port. Troubleshooting of the physical port, is the same as in step 1, and includes the following commands: show interface interface show interface interface transceiver

The output from these commands are documented in step 1 above.

This article describes how to troubleshoot Web Cache Communication Protocol version 2 (WCCPv2) on Cisco NX-OS. Linksec Encryption Occurs 95

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide Guide Contents Troubleshooting Overview Troubleshooting Installs, Upgrades, and Reboots Troubleshooting Licensing Troubleshooting VDCs Troubleshooting CFS Troubleshooting Ports Troubleshooting vPCs Troubleshooting VLANs Troubleshooting STP Troubleshooting Routing Troubleshooting Unicast Traffic Troubleshooting WCCP (this section) Troubleshooting Memory Troubleshooting Packet Flow Issues Troubleshooting FCoE Before Contacting Technical Support Troubleshooting Tools and Methodology

Contents
1 Information About Troubleshooting WCCP 2 Problem Scenarios 2.1 Reasons For Service Group Startup Failure 2.2 Client Loss 2.3 Packet Redirect Counts Not Incrementing 2.4 Potential problems

Information About Troubleshooting WCCP

The Web Cache Communication Protocol (WCCP) is a content-routing protocol that enables a Cisco NX-OS router to transparently redirect packets to cache engines. It has built-in load balancing, scaling, fault tolerance, and service-assurance (failsafe) mechanisms. WCCP version 2 (WCCPv2) is the only version supported on Cisco NX-OS devices.

See the Configuring WCCPv2 chapter in the Cisco Nexus 7000 Series NX-OS Unicast Routing Configuration Guide for more information on WCCPv2.

Problem Scenarios

Reasons For Service Group Startup Failure


WCCP Client fails to see ISU messages and is stuck in the NOT Usable state. Confirm by enabling the WCCP event debugging messages and looking for bad Receive ID messages. The reason for the failure would in general be a connectivity problem and may be mismatched speed or duplex settings. The WCCP Client is requesting a capability which is not supported by the router probably because of platform limitations. 96

Packet is Transmitted

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide This can be confirmed by enabling WCCP event debugging and looking for Capability Mismatch messages. The WCCP Client is requesting a capability which is not supported by the service group. This will occur if a service group is already formed and the WCCP Client configuration does not match the existing service. This is a misconfiguration of the WCCP Client and can be confirmed by enabling WCCP event debugging and looking for Capability Mismatch messages. HIA event messages may indicate other reasons why the router rejected an incoming "Here I Am" message. Some WCCP clients don't adhere to the configured forward/return methods and prefer to always default to "GRE" forward/return. The Cisco Nexus 7000 requires L2 forward/return methods. ACNS, WAAS, etc. might need to be configured with the assign-method-strict option. This type of failure can be seen with packet traces. The client does not respond to the Cisco Nexus 7000 with a sent RXID but will keep sending HIA with a receive ID of 0.

Client Loss
A WCCP Client is removed form a service group when the router loses contact with the WCCP Client.

The reasons why this might occur include: The service group has been disabled on the WCCP Client. Check the WCCP Client configuration. The service definition has been changed on the WCCP Client Check the WCCP Client configuration. Loss of physical connectivity to the WCCP Client. Verify connectivity using the ping command. Message loss between the router and WCCP Client perhaps because of a heavily utilized link or a flapping interface. Enable WCCP packet debugging and confirm message exchange. WCCP Client loss from a service group is indicated by the console message: %WCCP-1-SERVICELOST: Service Service ID lost on WCCP client IP address

Packet Redirect Counts Not Incrementing


The WCCP service not present on the interface. Verify using the show running-config interface ifname command. The WCCP service is not active. Check the service state using the show ip wccp web-cache | service-number detail command. Verify using the show running-config wccp command. The WCCP service definition does not match the traffic to be redirected.

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide Check the service definition using the show ip wccp service-number service command. WCCP service mismatch is indicated by the console message: %WCCP-1-SERVICEMISMATCH: Service Service ID mismatched on WCCP client IP address Matching traffic is excluded from redirection by the redirect ACL. Monitor the redirect ACL count using the show ip wccp service-number command. No traffic matching the WCCP service is traversing the interface. Define an extended IP access list to get an independent count of matching traffic. All traffic redirection is done by the platform hardware.

Potential problems
Direct communication between WCCP client and host The Cisco Nexus 7000 requires that the host running the browser, the WCCP clients, be attached to different L3 interfaces (the hosts cannot be present in the same subnet). Service definition mismatch There is no mechanism on a Cisco WCCP Client to mark two services as complementary. This also appears to be true for third party vendors. This has the consequence that the two services can drift apart over time. On service startup there is usually no problem however, as WCCP clients leave and rejoin either of the services the assignments change independently meaning that an outgoing connection and the corresponding incoming connection may not be redirected to the same WCCP Client. If that happens the configuration is broken. Check for this condition by comparing the assignments shown with the show ip wccp [web-cache | service number] detail command. The only way currently to correct the condition is to restart both services. Asymmetric routing As long as the incoming connection returns to any router in the complementary service group that will happen automatically. Note that the connection does not have to go to the exact same router as the outgoing connection, just the same service group. In any given network the routes to a particular destination may be numerous which raises the possibility that traffic returning from an origin server may take a different route to the outgoing traffic and fail to hit routers in the complementary service group. In that case the connection will not be redirected and the configuration will be broken. There is no way round this other than to ensure that there is no asymmetric routing taking place.

This article describes how to troubleshoot memory issues that may occur when configuring and using Cisco NX-OS. Guide Contents Troubleshooting Overview Troubleshooting Installs, Upgrades, and Reboots Troubleshooting Licensing Troubleshooting VDCs Troubleshooting CFS Troubleshooting Ports Troubleshooting vPCs Troubleshooting VLANs Troubleshooting STP Troubleshooting Routing Troubleshooting Unicast Traffic Troubleshooting WCCP Troubleshooting Memory (this section) Troubleshooting Packet Flow Issues Troubleshooting FCoE Before Contacting Technical Support 98

Packet Redirect Counts Not Incrementing

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide Troubleshooting Tools and Methodology

Contents
1 Overview 2 General/High Level Assessment of Platform Memory Utilization 3 Detailed Assessment of Platform Memory Utilization 3.1 Page Cache 3.2 Kernel 3.3 User Processes 3.3.1 Figuring Out Which Process is Using a Lot of Memory 3.3.2 Figuring Out How a Specific Process is Using Memory 4 Built-in Platform Memory Monitoring 4.1 Memory Thresholds 4.2 Memory Alerts

Overview
Dynamic random access memory (DRAM) is a limited resource on all platforms and must be controlled/monitored to ensure utilization is kept in check. Cisco NX-OS uses memory in the following three ways: Page cache When you access files from persistent storage (CompactFlash), the kernel reads the data into the page cache, which means that when you access the data in the future, you can avoid the slow access times that are associated with disk storage. Cached pages can be released by the kernel if the memory is needed by other processes. Some file systems (tmpfs) exist purely in the page cache (for example, /dev/sh, /var/sysmgr, /var/tmp), which means that there is no persistent storage of this data and that when the data is removed from the page cache, it cannot be recovered. tmpfs-cached files release page-cached pages only when they are deleted. Kernel The kernel needs memory to store its own text, data, and Kernel Loadable Modules (KLMs). KLMs are pieces of code that are loaded into the kernel (as opposed to being a separate user process). An example of kernel memory usage is when an inband port driver allocates memory to receive packets. User processes This memory is used by Cisco NX-OS/Linux processes that are not integrated in the kernel (such as text, stack, heap, and so on). When you are troubleshooting high memory utilization, you must first determine what type of utilization is high (process, page cache, or kernel). Once you have identified the type of utilization, you can use additional troubleshooting commands to help you figure out which component is causing this behavior.

General/High Level Assessment of Platform Memory Utilization


You can assess the overall level of memory utilization on the platform by using two basic CLI commands: show system resources and show processes memory. General/High Level Assessment of Platform Memory Utilization 99

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide Note: From these command outputs, you might be able to tell that platform utilization is higher than normal/expected, but you will not be able to tell what type of memory usage is high. The show system resources command displays platform memory statistics (not per VDC).

N7K# show system resources Load average: 1 minute: 0.43 5 minutes: 0.30 15 minutes: 0.28 Processes : 884 total, 1 running CPU states : 2.0% user, 1.5% kernel, 96.5% idle Memory usage: 4135780K total, 3423272K used, 712508K free 0K buffers, 1739356K cache

Note: This output is derived from the Linux memory statistics in /proc/meminfo. total - The amount of physical RAM on the platform free - The amount of unused or available memory used - The amount of allocated (permanent) and cached (temporary) memory

The cache and buffers are not relevant to customer monitoring. This information provides a general representation of the platform utilization only. You need more information to troubleshoot why memory utilization is high. The show process memory command displays the memory allocation per process for the current VDC (the output will contain non-VDC global processes also).

N7K# show processes memory PID MemAlloc MemLimit MemUsed StackBase/Ptr Process ----- -------- ---------- ---------- ----------------- ---------------4662 52756480 562929945 150167552 bfffdf00/bfffd970 netstack

While this output is more detailed, it is only useful for verifying process-level memory allocation within a specific VDC.

Detailed Assessment of Platform Memory Utilization

Use the show system internal kernel command or the show system internal memory-alerts-log command for a more detailed representation of memory utilization in Cisco NX-OS.

N7K# show system internal kernel meminfo MemTotal: 4135780 kB MemFree: 578032 kB Buffers: 5312 kB Cached: 1926296 kB RAMCached: 1803020 kB Allowed: 1033945 Pages Free: 144508 Pages Available: 177993 Pages SwapCached: 0 kB Active: 1739400 kB Inactive: 1637756 kB HighTotal: 3287760 kB HighFree: 640 kB LowTotal: 848020 kB LowFree: 577392 kB SwapTotal: 0 kB SwapFree: 0 kB

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide


Dirty: 0 kB Writeback: 0 kB Mapped: 1903768 kB Slab: 85392 kB CommitLimit: 2067888 kB Committed_AS: 3479912 kB PageTables: 20860 kB VmallocTotal: 131064 kB VmallocUsed: 128216 kB VmallocChunk: 2772 kB

In the output above, the most important fields are as follows: MemTotal (kB)- Total amount of memory in the system (4 GB in the Cisco Nexus 7000 Series Sup1) Cached (kB) - Amount of memory used by the page cache (includes files in tmpfs mounts and data cached from persistent storage /bootflash) RamCached (kB) - Amount of memory used by the page cache that cannot be released (data not backed by persistent storage) Available (Pages) - Amount of free memory in pages (includes the space that could be made available in the page cache and free lists) Mapped (Pages) - Memory mapped into page tables (data being used by nonkernel processes) Slab (Pages) - Rough indication of kernel memory consumption Note: One page of memory is equivalent to 4 kB of memory. The show system internal kernel memory global command displays the memory usage for the page cache and kernel/process memory.
N7K# show system internal kernel memory global Total memory in system : 4129600KB Total Free memory : 1345232KB Total memory in use : 2784368KB Kernel/App memory : 1759856KB RAM FS memory : 1018616KB

Note: In Cisco NX-OS, the Linux kernel monitors the percentage of memory that is used (relative to the total RAM present) and platform manager generates alerts as utilization passes default or configured thresholds. If an alert has occurred, it is useful to review the logs captured by the platform manager against the current utilization. Additional information about this monitoring is included later in this article. By reviewing the output of these commands, you can determine if the utilization is high as a result of the page cache, processes holding memory, or kernel. For more detailed information, see the following topics: Page Cache Kernel User Processes

Detailed Assessment of Platform Memory Utilization

101

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide

Page Cache
If Cached or RAMCached is high, you should check the file system utilization and determine what kind of files are filling the page cache. The show system internal flash command displays the file system utilization (the output is similar to df -hT included in the memory alerts log).
N7K# show system internal flash Mount-on 1K-blocks / 409600 /proc 0 /sys 0 /isan 409600 /var/tmp 307200 /var/sysmgr 1048576 /var/sysmgr/ftp 307200 /dev/shm 1048576 /volatile 204800 /debug 2048 /dev/mqueue 0 /mnt/cfg/0 76099 /mnt/cfg/1 75605 /bootflash 1796768 /var/sysmgr/startup-cfg 409600 /mnt/plog 56192 /dev/pts 0 /mnt/pss 38554 /slot0 2026608 /logflash 7997912 /bootflash_sup-remote 1767480 /logflash_sup-remote 7953616 Used 43008 0 0 269312 876 999424 24576 412672 0 16 0 5674 5674 629784 27536 3064 0 6682 4 219408 1121784 554976 Available 367616 0 0 140288 306324 49152 282624 635904 204800 2032 0 66496 66027 1075712 382064 53128 0 29882 2026604 7372232 555912 6994608 Use% Filesystem 11 /dev/root 0 proc 0 none 66 none 1 none 96 none 8 none 40 none 0 none 1 none 0 none 8 /dev/hda5 8 /dev/hda6 37 /dev/hda3 7 none 6 /dev/mtdblock2 0 devpts 19 /dev/hda4 1 /dev/hdc1 3 /dev/hde1 67 127.1.1.6:/mnt/bootflash/ 8 127.1.1.6:/mnt/logflash/

Note: When reviewing this output, the value of none in the Filesystem column means that it is a tmpfs type. In this example, utilization is high because the /var/sysmgr (or subfolders) is using a lot of space. /var/sysmgr is a tmpfs mount, which means that the files exist in RAM only. You need to determine what type of files are filling the partition and where they came from (cores/debugs/etc). Deleting the files will reduce utilization, but you should try to determine what type of files are taking up the space and what process left them in tmpfs. In Cisco NX-OS release 4.2(4) and later releases, use the following commands to display and delete the problem files from the CLI: The show system internal dir full directory path command lists all the files and sizes for the specified path (hidden command). The filesys delete full file path command deletes a specific file (hidden command). Note: Use caution when using this command. You cannot recover a deleted file.

Note: If you are running a Cisco NX-OS release prior to Cisco NX-OS release 4.2(4), you should contact your customer support representative. You can also use the show hardware internal proc-info pcacheinfo command to determine how much space each file system is using in the page cache (Cached). The command output may help you determine which persistent file systems are using the page cache and how much memory they are using.

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide

Kernel
Kernel issues are less common, but you can determine the problem by reviewing the slab utilization in the show system internal meminfo command output. Generally, kernel troubleshooting requires Cisco customer support assistance to isolate why the utilization is increasing. If slab memory usage grows over time, use the following commands to gather more information: The show system internal kernel malloc-stats command displays all the currently loaded KLMs, malloc, and free counts.
N7K# show system internal kernel malloc-stats Kernel Module Memory Tracking ------------------------------------------------------------Module kmalloc kcalloc kfree diff klm_usd 00318846 00000000 00318825 00000021 klm_eobcmon 08366981 00000000 08366981 00000000 klm_utaker 00001306 00000000 00001306 00000000 klm_sysmgr-hb 00000054 00000000 00000049 00000005 klm_idehs 00000001 00000000 00000000 00000001 klm_sup_ctrl_mc 00209580 00000000 00209580 00000000 klm_sup_config 00000003 00000000 00000000 00000003 klm_mts 03357731 00000000 03344979 00012752 klm_kadb 00000368 00000000 00000099 00000269 klm_aipc 00850300 00000000 00850272 00000028 klm_pss 04091048 00000000 04041260 00049788 klm_rwsem 00000001 00000000 00000000 00000001 klm_vdc 00000126 00000000 00000000 00000126 klm_modlock 00000016 00000000 00000016 00000000 klm_e1000 00000024 00000000 00000006 00000018 klm_dc_sprom 00000123 00000000 00000123 00000000 klm_sdwrap 00000024 00000000 00000000 00000024 klm_obfl 00000050 00000000 00000047 00000003

By comparing several iterations of this command, you can determine if some KLMs are allocating a lot of memory but are not freeing/returning the memory back (the differential value will be very large compared to normal). The show system internal kernel skb-stats command displays the consumption of SKBs (buffers used by KLMs to send and receive packets).
N7K# show system internal kernel skb-stats Kernel Module skbuff Tracking ------------------------------------------------------------Module alloc free diff klm_shreth 00028632 00028625 00000007 klm_eobcmon 02798915 02798829 00000086 klm_mts 00420053 00420047 00000006 klm_aipc 00373467 00373450 00000017 klm_e1000 16055660 16051210 00004450

Compare the output of several iterations of this command to see if the differential value is growing or very high. The show hardware internal proc-info slabinfo command dumps all of the slab information (memory structure used for kernel management). The output can be large.

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide

User Processes
If page cache and kernel issues have been ruled out, utilization might be high as a result of some user processes taking up too much memory or a high number of running processes (due to the number of VDCs/features enabled). Note: Cisco NX-OS defines memory limits for most processes (rlimit). If this rlimit is exceeded, sysmgr will crash the process and a core file is usually generated. Processes close to their rlimit may not have a large impact on platform utilization but could still become an issue if a crash occurs.

Figuring Out Which Process is Using a Lot of Memory


The following commands can help you identify if a specific process is using a lot of memory: The show process memory command displays the memory allocation per process for the current VDC (the output will contain non-VDC global processes also).
N7K# show processes memory PID MemAlloc MemLimit MemUsed StackBase/Ptr Process ----- -------- ---------- ---------- ----------------- ---------------4662 52756480 562929945 150167552 bfffdf00/bfffd970 netstack

Note: The output of the show process memory command might not provide a completely accurate picture of the current utilization (allocated does not mean in use). This command is useful for determining if a process is approaching its rlimit. To determine how much memory the processes are really using, you should check the Resident Set Size (RSS). This value will give you a rough indication of the amount of memory (in KB) that is being consumed by the processes. You can gather this information by using the following command: The show system internal processes memory command displays the process information in the memory alerts log (if the event occurred).
N7K# show system internal processes memory PID TTY STAT TIME MAJFLT TRS RSS VSZ %MEM COMMAND 4727 ? Ss 00:00:00 0 1549 123248 132832 2.9 /isan/bin/pixm 4728 ? Ssl 00:00:00 0 408 78388 143104 1.8 /isan/bin/routing-sw/mrib -m 4 6662 ? Ssl 00:00:05 0 2762 64024 144396 1.5 /isan/bin/routing-sw/netstack /isan/etc/routing-sw/pm.cfg 4538 ? Ssl 00:00:00 0 2762 60448 211664 1.4 /isan/bin/routing-sw/netstack /isan/etc/routing-sw/pm.cfg 5865 ? Ssl 00:00:01 0 2762 60416 113320 1.4 /isan/bin/routing-sw/netstack /isan/etc/routing-sw/pm.cfg 6395 ? Ssl 00:00:00 0 2762 52008 105552 1.2 /isan/bin/routing-sw/netstack /isan/etc/routing-sw/pm.cfg 4271 ? Ssl 00:00:00 0 609 49812 61420 1.2 /isan/bin/routing-sw/urib 7879 ? Ssl 00:00:00 0 1909 44800 90508 1.0 /isan/bin/routing-sw/bgp -t 64000 5696 ? Ssl 00:00:17 0 337 44696 55252 1.0 /isan/bin/routing-sw/clis -cli /isan/etc/routing-sw/cli 5333 ? Ssl 00:00:14 0 337 44652 55208 1.0 /isan/bin/routing-sw/clis -cli /isan/etc/routing-sw/cli 4182 ? Ssl 00:00:15 0 337 44648 55204 1.0 /isan/bin/routing-sw/clis -cli /isan/etc/routing-sw/cli 6076 ? Ssl 00:00:14 0 337 44624 55284 1.0 /isan/bin/routing-sw/clis -cli /isan/etc/routing-sw/cli 6825 ? Ssl 00:00:00 0 1402 44576 84020 1.0 /isan/bin/routing-sw/pim -t 4268 ? Ssl 00:00:00 0 363 27132 38896 0.6 /isan/bin/routing-sw/u6rib 4732 ? Ssl 00:00:00 0 404 25220 65360 0.6 /isan/bin/routing-sw/m6rib 4726 ? S<s 00:00:00 0 144 25208 30188 0.6 /isan/bin/pixmc remaining output omitted

If you see an increase in the utilization for a specific process over time, you should gather additional information about the process utilization.

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide

Figuring Out How a Specific Process is Using Memory


If you have determined that a process is using more memory than expected, it is helpful to investigate how the memory is being used by the process. The show system internal sysmgr service pid <PID in decimal> command dumps the service information running the specified PID.
N7K# show system internal sysmgr service pid 4727 Service "pixm" ("pixm", 109): UUID = 0x133, PID = 4727, SAP = 176 State: SRV_STATE_HANDSHAKED (entered at time Fri Nov 12 01:42:01 2010). Restart count: 1 Time of last restart: Fri Nov 12 01:41:11 2010. The service never crashed since the last reboot. Tag = N/A Plugin ID: 1

Convert the UUID from the above output to decimal and use in the next command. Note: If troubleshooting in lab, you can use NX-OS hex/dec conversion using following hidden commands : hex <dec to convert> dec <hex to convert> The show system internal kernel memory uuid <UUID in decimal> command displays the detailed process memory usage including its libraries for a specific UUID in the system (convert UUID from the sysmgr service output).
N7K# show system internal kernel memory uuid 307 Note: output values in KiloBytes Name rss shrd drt map heap ro dat bss stk misc ---- --- ---- --- --- ---- -- --- --- --- ---/isan/bin/pixm 7816 5052 2764 1 0 0 0 0 52 0 /isan/plugin/1/isan/bin/pixm 115472 0 115472 0 109176 752 28 6268 0 24 /lib/ld-2.3.3.so 84 76 8 2 0 76 0 0 0 8 /usr/lib/libz.so.1.2.1.1 16 12 4 1 0 12 4 0 0 0 /usr/lib/libstdc++.so.6.0.3 296 272 24 1 0 272 20 4 0 0 /lib/libgcc_s.so.1 1824 12 1812 1 1808 12 4 0 0 0 /isan/plugin/1/isan/lib/libtmifdb.so.0 12 8 4 1 0 8 4 0 0 0 /isan/plugin/0/isan/lib/libtmifdb_stub 12 8 4 1 0 8 4 0 0 0 /dev/mts0 0 0 0 1 0 0 0 0 0 0 /isan/plugin/1/isan/lib/libpcm_sdb.so. 16 12 4 1 0 12 4 0 0 0 /isan/plugin/1/isan/lib/libethpm.so.0. 76 60 16 1 0 60 16 0 0 0 /isan/plugin/1/isan/lib/libsviifdb.so. 20 4 16 1 12 4 4 0 0 0 /usr/lib/libcrypto.so.0.9.7 272 192 80 1 0 192 76 4 0 0 /isan/plugin/0/isan/lib/libeureka_hash 8 4 4 1 0 4 4 0 0 0 remaining output omitted

This output helps you to determine if a process is holding memory in a specific library and can assist with memory leak identification. The show system internal <service> mem-stats detail command displays the detailed memory utilization including the libraries for a specific service.
N7K# show system internal pixm mem-stats detail Private Mem stats for UUID : Malloc track Library(103) Max types: 5 -------------------------------------------------------------------------------TYPE NAME ALLOCS BYTES

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide


CURR MAX CURR MAX 2 MT_MEM_mtrack_hdl 32 33 16448 16596 3 MT_MEM_mtrack_info 424 531 6784 8496 4 MT_MEM_mtrack_lib_name 636 743 30054 35112 -------------------------------------------------------------------------------Total bytes: 53286 (52k) -------------------------------------------------------------------------------Private Mem stats for UUID : Non mtrack users(0) Max types: 105 -------------------------------------------------------------------------------TYPE NAME ALLOCS BYTES CURR MAX CURR MAX 4 [r-xp]/isan/plugin/0/isan/lib/libacfg.s 0 4 0 51337 9 [r-xp]/isan/plugin/0/isan/lib/libavl.so 79 81 1568 1608 25 [r-xp]/isan/plugin/0/isan/lib/libfsrv.s 6 6 34 34 32 [r-xp]/isan/plugin/0/isan/lib/libindxob 6 6 456 456 46 [r-xp]/isan/plugin/0/isan/lib/libmpmts. 0 2 0 100 48 [r-xp]/isan/plugin/0/isan/lib/libmts.so 7 10 816 972 51 [r-xp]/isan/plugin/0/isan/lib/libpfm_in 0 1 0 3490 53 [r-xp]/isan/plugin/0/isan/lib/libpss.so 169 196 27316 114880 57 [r-xp]/isan/plugin/0/isan/lib/libsdb.so 140 140 5632 5632 62 [r-xp]/isan/plugin/0/isan/lib/libsrg.so 0 1 0 3480 68 [r-xp]/isan/plugin/0/isan/lib/libsysmgr 3 3 2094 2094 79 [r-xp]/isan/plugin/0/isan/lib/libutils. 61 69 512 55389 84 [r-xp]/isan/plugin/1/isan/bin/pixm 238 240 532920 533440 88 [r-xp]/isan/plugin/1/isan/lib/libpixm.s 0 1 0 48 92 [r-xp]/lib/ld-2.3.3.so 21 26 3483 4233 94 [r-xp]/lib/tls/libc-2.3.3.so 286 287 8163 8490 100 [r-xp]/usr/lib/libglib-2.0.so.0.600.1 12 19 6328 6800 -------------------------------------------------------------------------------Total bytes: 589322 (575k) remaining output omitted

These outputs are usually requested by the Cisco customer support representative when investigating a potential memory leak in a process or its libraries.

Built-in Platform Memory Monitoring


Cisco NX-OS has built-in kernel monitoring of memory usage to help avoid system hangs, process crashes, and other undesirable behavior. The platform manager periodically checks the memory utilization (relative to the total RAM present) and automatically generates an alert event if the utilization passes the configured threshold values. When an alert level is reached, the kernel attempts to free memory by releasing pages that are no longer needed (for example, the page cache of persistent files that are no longer being accessed), or if critical levels are reached, the kernel will kill the highest utilization process. Other Cisco NX-OS components have introduced memory alert handling, such as BGP's graceful low memory handling, that allow processes to adjust their behavior to keep memory utilization under control. Note: While Cisco NX-OS implements VDCs, it is important to remember that a specific VDC's memory utilization is not limited. Platform memory issues will impact all configured VDCs.

Memory Thresholds
Prior to Release 4.2(4), the default memory alert thresholds were as follows: 70% MINOR 80% SEVERE 90% CRITICAL From Release 4.2(4) and later releases, the memory alert thresholds were changed to the following:

Built-in Platform Memory Monitoring

106

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide 85% MINOR 90% SEVERE 95% CRITICAL This change was introduced in part due to baseline memory requirements when many features/VDCs are deployed. The thresholds are configurable, using the following command: system memory-thresholds minor percentage severe percentage critical percentage The show system internal memory-status command allows you to check the current memory alert status.
N7K# show system internal memory-status MemStatus: OK

Memory Alerts
If a memory threshold has been passed (OK -> MINOR, MINOR -> SEVERE, SEVERE -> CRITICAL), the Cisco NX-OS platform manager will capture a snapshot of memory utilization and log an alert to SYSLOG (as of Release 4.2(4), default VDC only). This snapshot is useful in determining why memory utilization is high (process, page cache, or kernel). The log is generated in the Linux root path (/) and copy is moved to OBFL (/mnt/plog) if possible. This log is very useful for determining if memory utilization is high due to the memory that was consumed by the page cache, kernel, or Cisco NX-OS user processes. The show system internal memory-alerts-log command displays the memory alerts log. The memory alerts log consists of the following outputs: Command cat /proc/memory_events cat /proc/meminfo cat /proc/memtrack df -hT du --si -La /tmp cat /proc/memory_events cat /proc/meminfo Description Provides a log of timestamps when memory alerts occurred. Shows the overall memory statistics including the total RAM, memory consumed by the page cache, slabs (kernel heap), mapped memory, available free memory, and so on. Displays the allocation/deallocation counts of the KLMs (Cisco NX-OS processes running in kernel memory). Displays file system utilization information (with type). Displays file information for everything located in /tmp (symbolic link to /var/tmp). Dumped a second time to help determine if utilization changed during data gathering. Dumped a second time to help determine if utilization changed during data gathering.

This article describes how to troubleshoot packet flow issues for Cisco NX-OS. Guide Contents Troubleshooting Overview Troubleshooting Installs, Upgrades, and Reboots Memory Thresholds 107

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide Troubleshooting Licensing Troubleshooting VDCs Troubleshooting CFS Troubleshooting Ports Troubleshooting vPCs Troubleshooting VLANs Troubleshooting STP Troubleshooting Routing Troubleshooting Unicast Traffic Troubleshooting WCCP Troubleshooting Memory Troubleshooting Packet Flow Issues(this section) Troubleshooting FCoE Before Contacting Technical Support Troubleshooting Tools and Methodology

Contents
1 Packet Flow Issues 1.1 Packets Dropped Because of Rate Limits 1.2 Packets Dropped Because of a QoS Policy 1.3 Packets Dropped in Hardware 1.4 show hardware internal statistics rates 1.4.1 show hardware internal statistics pktflow all

Packet Flow Issues


Packets could be dropped for the following reasons : Software switched packets could be received from the interface, but dropped by the supervisor because of rate limits. Packets could be dropped because of a QoS policy. Hardware switched packets could be dropped by the hardware because of a bandwidth limitation.

Packets Dropped Because of Rate Limits


Use the show hardware rate-limit command to determine if packets are being dropped because of a rate limit.
dctl-n7010-7# show hardware rate-limit copy

Memory Alerts

108

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide


Units for Config: packets per second Allowed, Dropped & Total: aggregated since last clear counters Rate Limiter Class Parameters -----------------------------------------------------------copy Config : 30000 Allowed : 13651778 Dropped : 228295 <-- caused by ICMP redirect or OSPF Hello Total : 13880073 dctl-n7010-7(config)# show hardware rate-limit module 1 Units for Config: packets per second Allowed, Dropped & Total: aggregated since last clear counters Rate Limiter Class Parameters -----------------------------------------------------------layer-3 mtu Config : 500 Allowed : 0 Dropped : 0 Total : 0 layer-3 ttl Config Allowed Dropped Total Config Allowed Dropped Total Config Allowed Dropped Total Config Allowed Dropped Total Config Allowed Dropped Total Config Allowed Dropped Total Config Config Allowed Dropped Total Config Allowed Dropped Total Config : 500 : 0 : 0 : 0 : 10000 : 17020000 : 88790262 <---HSRP, OSPF hello : 105810262 : 100 : 0 : 0 : 0 : 3000 : 0 : 0 : 0 : 3000 : 0 : 0 : 0 : 500 : 0 : 0 : 0 : Disabled : 100 : 0 : 0 : 0 : 30000 : 552173 : 0 : 552173 : 30000

layer-3 control

layer-3 glean

layer-3 multicast directly-connected

layer-3 multicast local-groups

layer-3 multicast rpf-leak

layer-2 storm-control access-list-log

copy

receive

Packets Dropped Because of Rate Limits

109

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide


Allowed Dropped Total layer-2 port-security Config : 85134 : 0 : 85134 : Disabled

Packets Dropped Because of a QoS Policy


Use the show policy-map interface control-plane command to determine if packets are being dropped because of a QoS policy.
dctl-n7010-7# sh policy-map interface control-plane class-map copp-system-class-exception (match-any) match exception ip option match exception ip icmp unreachable police cir 360 kbps , bc 250 ms module 1 : conformed 0 bytes; action: transmit violated 0 bytes; action: drop module 2 : conformed 0 bytes; action: transmit violated 0 bytes; action: drop module 3 : conformed 0 bytes; action: transmit violated 0 bytes; action: drop module 4 : conformed 0 bytes; action: transmit violated 0 bytes; action: drop module 10 : conformed 11614462878 bytes; action: transmit violated 3097405384908 bytes; action: drop

Packets Dropped in Hardware


Use the following show hardware commands to determine if packets are being dropped by the hardware. show hardware internal statistics rates show hardware internal statistics pktflow all

show hardware internal statistics rates


dctl-n7k-dist-1# sh hardware internal statistics rates + ============================= + R2D2 Instance 0 + ============================= | |-- Ingress IN | |--- Packets/sec | | |--sum: 0 | | | |--- Bytes/sec | | |--sum: 0 | |-- Ingress OUT | |--- Packets/sec | | |--sum: 0 | | |

Packets Dropped Because of a QoS Policy

110

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide


|-- Egress IN | |--- Packets/sec | | |--sum: 0 | | | |-- Egress OUT | |--- Packets/sec | | |--sum: 0 | | | |--- Bytes/sec | | |--sum: 0 | | | + ============================= + Metropolis Instance 0 + ============================= | |-- Ingress IN | |--- Packets/sec | | |--I1: 2 | | |--I1: 0 | | |--sum: 0 | | | |-- Ingress OUT | |--- Packets/sec | | |--I1: 2 | | |--I1: 0 | | |--sum: 0 | | | |--- Bytes/sec | | |--I1: 1166 | | |--I1: 0 | | |--sum: 0 | |-- Egress IN | |--- Packets/sec | | |--I1: 0 | | |--I1: 0 | | |--sum: 0 | | | |--- Bytes/sec | | |--I1: 0 | | |--I1: 0 | | |--sum: 0 | |-- Egress OUT | |--- Packets/sec | | |--I1: 0 | | |--sum: 0 | | | | |

show hardware internal statistics pktflow all This command displays per ASIC statistics, including packets into and out of the ASIC. This command helps to identify where packet loss is occurring.
dctl-n7k-dist-1# show hardware internal statistics pktflow all bhv_bitmask:0 |------------------------------------------------------------------------|

Cisco Nexus 7000 Series NX-OS Troubleshooting Guide


| Device:R2D2 Role:MAC | | Packets |------------------------------------------------------------------------| Instance: 0 Ports:|----------|-------------------|------------------| | | IN | OUT | |----------|-------------------|------------------| |Ingress | 00000000014a40c0 | 00000000014a40c0 | |----------|-------------------|------------------| |Egress | 000000000007e9dc | 000000000007e9dc | |----------|-------------------|------------------| |------------------------------------------------------------------------| | Device:Metropolis Role:REWR | | Packets |------------------------------------------------------------------------| Instance: 0 Ports:|----------|-------------------|------------------| | | IN | OUT | |----------|-------------------|------------------| |Ingress | 00000000014a40c0 | 0000000001498ccc | |----------|-------------------|------------------| |Egress | 000000000007e9dc | 000000000007e9dc | |----------|-------------------|------------------| |------------------------------------------------------------------------| | Device:Octopus Role:QUE | | Packets |------------------------------------------------------------------------| Instance: 0 Ports:|----------|-------------------|------------------| | | IN | OUT | |----------|-------------------|------------------| |Ingress | 0000000001498ccc | 0000000001498cc6 | |----------|-------------------|------------------| |Egress | 000000000007e9c5 | 000000000007e9dc | |----------|-------------------|------------------| *** Counters above represent packets combined into a larger one ***

show hardware internal statistics pktflow all

112

S-ar putea să vă placă și