Documente Academic
Documente Profesional
Documente Cultură
ECN-APJ
3 Argentium
05-10-2015 11:14 PM
Share:
https://www.dell.com/community/Connectrix/How-To-Identify-And-Troubleshoot-Slow-Drain-Device-In-Brocade/td-p/7093539 1/14
10/21/2019 How To Identify And Troubleshoot Slow Drain Device In Brocade SAN Environment - Dell Community
Introduction
In SAN environment, the performance degrade is a common issue. In such cases, the device processing speed
becomes slow, or there are many frame drop warnings, and finally affect the business applications. Usually they are one
or several devices cause this problem, we call such device slow drain devices. The slow drain device could be a host,
storage or connected switch. For some reason, the frames they accepted exceeded their capabilities so that they could
not return enough buffer credits to uplink devices, which causes network delay, congestion or even frame lose issue. All
of these would lead to performance issue. The bottleneck device could either be at physical layer, such as SFP, fiber
cable and endpoint device, or a SAN design defect, for example, the actual data volume exceeds the maximum
processing capability.
In this article we shall talk about how to identify and troubleshoot slow drain device in Brocade SAN environment.
Detailed Information
To understand the cause of the bottleneck, we should understand how switches implement the flow control mechanism.
The buffer credit plays a key role in the flow control. Every single switch port has several buffer credits, the number of the
credits is determined by the negotiation process of the port and connected device. Only when there are available buffer
credits, the port can send out a frame and then occupy a credit. Once the remote device receives the frame, sends out
an acknowledge message, then the available buffer credit will be added one. Since the buffer credits are limited, if the
port has no enough credit, then the network delay would happen. Certainly, if the occupied time is more than 500ms, the
frame will be dropped and release the credit.
https://www.dell.com/community/Connectrix/How-To-Identify-And-Troubleshoot-Slow-Drain-Device-In-Brocade/td-p/7093539 2/14
10/21/2019 How To Identify And Troubleshoot Slow Drain Device In Brocade SAN Environment - Dell Community
Because of the credit-based flow control mechanism, the bottleneck will lead to the congestion on the entire data path. If
the path includes a cascading link, all the data transmission through this link will be affected. Therefore, a bottleneck
device can cause the congestion of the entire network. It is important to identify the bottleneck device during the
troubleshooting of performance issue. For endpoint devices, such as hosts or storage, the system will report bottleneck
issue. For Brocade switches, the following message will pop up,
To troubleshoot performance issue, the first step is clearing the switch counters. We can use the following commands:
#>statsclear
#>slotstatsclear
If you’d cleared the counters before, you can directly collect supportshow or supportsave logs for analysis. If you haven’t
cleared the counters, you’d better collect a copy of the current the outputs of supportshow or supportsave, then clear the
counters. The first one can be used to quickly analyze which ports already have the errors, then we can check these
ports first. The sfpshow command can be used to check the power levels for both TX and RX on a particular port.
https://www.dell.com/community/Connectrix/How-To-Identify-And-Troubleshoot-Slow-Drain-Device-In-Brocade/td-p/7093539 3/14
10/21/2019 How To Identify And Troubleshoot Slow Drain Device In Brocade SAN Environment - Dell Community
For a single switch network, all the connected device are hosts or storage. For multiple switches network, there will be
ISL links and E-Ports. Identifying the network topology can help administrators to understand the data transmission path.
For example, the following islshow ouputshows the connectivity status between the Brocade switch and the remote
switch. No. 1: local switch port 57 connects to remote switch CHD_1C_TLI_SAN1 port 55. No. 2: local switch port 129
connects to remote switch CHD_1D_NGN_SAN1, port 135.
islshow :
<truncated>
As the following diagram shows, there are two Brocade switches in the SAN network.
https://www.dell.com/community/Connectrix/How-To-Identify-And-Troubleshoot-Slow-Drain-Device-In-Brocade/td-p/7093539 4/14
10/21/2019 How To Identify And Troubleshoot Slow Drain Device In Brocade SAN Environment - Dell Community
As the above information, we check the port 57 status with the command portstatsshow 57,
portstatsshow 57
tim_txcrd_z_vc 8-11: 0 00 0
tim_txcrd_z_vc 12-15: 0 0 0 0
https://www.dell.com/community/Connectrix/How-To-Identify-And-Troubleshoot-Slow-Drain-Device-In-Brocade/td-p/7093539 5/14
10/21/2019 How To Identify And Troubleshoot Slow Drain Device In Brocade SAN Environment - Dell Community
The Time TX Credit Zero counter shows the duration of the zero buffer credit. Zero buffer credit doesn’t mean there is
performance issue. However if the value is very high, there could be congestion somewhere in the network. Usually if the
number is less than 30% of the transmission frames, then it is normal.
The c3_timeout counter is used to verify if there is frame loss. Prior to FOS 6.3.1, the counter has no direction. After FOS
6.3.1, it is replaced with the er_rx_c3_timeout and er_tx_c3_timeout counters. When the port sends or receives a frame,
it occupies a buffer credit. If more than 500ms the port doesn’t receive the response, then the transmission is failed and
the frame will be dropped and the counter will be added one. This number means there is performance issue. In this
case, er_tx_c3_timeout is not zero.
portstatsshow 55
tim_txcrd_z_vc 8-11: 0 00 0
tim_txcrd_z_vc 12-15: 0 0 0 0
https://www.dell.com/community/Connectrix/How-To-Identify-And-Troubleshoot-Slow-Drain-Device-In-Brocade/td-p/7093539 6/14
10/21/2019 How To Identify And Troubleshoot Slow Drain Device In Brocade SAN Environment - Dell Community
The er_rx_c3_timeout counter is not zero which means it also exceeded 500ms and dropped the frames. Please be
noted that the upstream er_tx_c3_timeout is not always equal to the downstream er_rx_c3_timeout, it depends on the
time that you clear the counters and collect the logs.
We've checked the ISL links between two switches, now let’s find out the congestion device. We saw the
er_tx_c3_timeout on the upstream port, and the er_rx_c3_timeout on the downstream port, there should be an F-Port on
upstream switch while an F-port on downstream switch.
How to find out all these abnormal ports? We check the porterrshow output of these switches. Finally we find the port 21
and port 27 have some problem:
portstatsshow 21
portstatsshow 27
https://www.dell.com/community/Connectrix/How-To-Identify-And-Troubleshoot-Slow-Drain-Device-In-Brocade/td-p/7093539 7/14
10/21/2019 How To Identify And Troubleshoot Slow Drain Device In Brocade SAN Environment - Dell Community
Are the any ports also affected? Since there is only one ISL link between two switch, so all the ports on the data
transmission path have been affected as well. Please be noted that the port 26 on the downstream switch hasn’t been
affected since its data is congested on the upstream switch.
https://www.dell.com/community/Connectrix/How-To-Identify-And-Troubleshoot-Slow-Drain-Device-In-Brocade/td-p/7093539 8/14
10/21/2019 How To Identify And Troubleshoot Slow Drain Device In Brocade SAN Environment - Dell Community
For multiple switches SAN environment, we can also follow the above steps to find out the abnormal device from the
portstatsshow output. For single switch environment, we only need to check the F-Ports.
Next we need to find out the cause of the bottleneck device, here are the normal steps:
5. Check the connected device if there is no finding from the above steps
Back to this case, we find there is a few errors at physical layer, and the power level of RX is less than -7dBm. So we
need to check the fiber cable between the switch and the device.
https://www.dell.com/community/Connectrix/How-To-Identify-And-Troubleshoot-Slow-Drain-Device-In-Brocade/td-p/7093539 9/14
10/21/2019 How To Identify And Troubleshoot Slow Drain Device In Brocade SAN Environment - Dell Community
portstatsshow 27
sfpshow 27
https://www.dell.com/community/Connectrix/How-To-Identify-And-Troubleshoot-Slow-Drain-Device-In-Brocade/td-p/7093539 10/14
10/21/2019 How To Identify And Troubleshoot Slow Drain Device In Brocade SAN Environment - Dell Community
After replacing the fiber cable, the problem was solved which indicates the bad fiber cable caused the problem.
Sometimes we might not be able to find any problem on switches, then we should check if there is any problem on the
connected device (e.g. HBA card).
Summary
The key point of troubleshooting Brocade SAN performance issue is looking for the bottleneck device through the
congestion data path. Understanding the difference between er_rx_c3_timeout and er_tx_c3_timeout is very important.
We suggest clearing the counters when the devices work normally. If the performance issue occurs, only the logs that
are collected during that period are meaningful.
Labels : Brocade
4 Kudos
Share Reply
https://www.dell.com/community/Connectrix/How-To-Identify-And-Troubleshoot-Slow-Drain-Device-In-Brocade/td-p/7093539 11/14
10/21/2019 How To Identify And Troubleshoot Slow Drain Device In Brocade SAN Environment - Dell Community
2 Replies
RRR
5 Osmium
11-04-2015 02:45 AM
But how can we detect where the actual slowest part in the SAN environment is?
0 Kudos
Share Reply
ECN-APJ
3 Argentium
11-08-2015 10:45 PM
https://www.dell.com/community/Connectrix/How-To-Identify-And-Troubleshoot-Slow-Drain-Device-In-Brocade/td-p/7093539 12/14
10/21/2019 How To Identify And Troubleshoot Slow Drain Device In Brocade SAN Environment - Dell Community
Are you using Cisco MDS switches? If yes, you may also refer to this article:
http://www.cisco.com/c/en/us/products/collateral/storage-networking/mds-9700-series-multilayer-
direc....
0 Kudos
Share Reply
https://www.dell.com/community/Connectrix/How-To-Identify-And-Troubleshoot-Slow-Drain-Device-In-Brocade/td-p/7093539 13/14
10/21/2019 How To Identify And Troubleshoot Slow Drain Device In Brocade SAN Environment - Dell Community
About Dell Careers Community Events Partner Program Premier Dell Technologies
© 2018 Dell Terms of Sales Privacy Statement Ads & Emails Legal & Regulatory Corporate Social Responsibility Contact Feedback
https://www.dell.com/community/Connectrix/How-To-Identify-And-Troubleshoot-Slow-Drain-Device-In-Brocade/td-p/7093539 14/14