Documente Academic
Documente Profesional
Documente Cultură
Brocade’s
Fabric Vision Workshop
January 2019
Fabric
Vision
Brocade
Fabric OS
Core capabilities
integrated into ASIC
Fabric Vision
Brocade
Enables joint software/hardware
Fabric OS
features and capabilities
Reduce common
network problems
Fabric Vision
Fabric Software Eliminate 48 percent
of maintenance costs
8 | Broadcom Proprietary and Confidential. © 2017 Broadcom. All Rights Reserved. The term “Broadcom” refers to Broadcom Limited and/or its subsidiaries.
Brocade Fabric Vision Features
Automatically detect Identifies, monitors, Automation that Quickly detects and Customizable
degraded storage and analyzes the simplifies policy- clearly alerts admins health and
IO performance performance of based monitoring to high levels of performance
with integrated specific flows or and alerting latency, helping to dashboard, with all
device latency and frame types identify slow drain critical information
IOPS monitoring devices on one screen
Fabric Performance Dashboards
IO Insight Flow Vision MAPS Impact Monitoring
(FOS
(FOS 8.0)
8.0) (FOS
(FOS 7.2)
7.2) (FOS
(FOS 7.2)
7.2) (FOS
(FOS 7.2)
7.2)
(FOS
(FOS 7.3)
7.3)
10 | Broadcom Proprietary and Confidential. © 2017 Broadcom. All Rights Reserved. The term “Broadcom” refers to Broadcom Limited and/or its subsidiaries.
Brocade Fabric Vision Technology
Fabric Vision Features Introduced in FOS 7.2
• Automation that • Cable and optic • Customizable health • Identifies, monitors,
simplifies policy- diagnostics that and performance and analyzes
based monitoring and simplify the dashboard, with all performance of
alerting deployment and critical information specific flows or
support of large on one screen frame types
fabrics
ClearLink
MAPS Diagnostics Dashboards Flow Vision
• Implementing new SAN infrastructure to replace older Detection: Run pro-active Clearlink Diagnostics D-Port on
SAN hardware. Needed seamless migration. ICL reporting test failures.
CRC’s incrementing as well as Physical Coding Sublayer
• Pre-prod testing - ICL links failed clearlink diagnostic (PCS) block errors.
(D-Port traffic link test)
Mitigation: Identified failing Optic (QSFP). Replacement
• Physical layer errors detected resolved the issues.
Best Practice: Test without FEC, Enable FEC wherever
possible
• Ability to avoid any future physical layer impact to • Costly delays to project avoided.
production environment • Mitigated risk & reputational damage
• Enabled seamless integration of new SAN hardware
• Avoided impacts to other projects by maximising
giving increased confidence in technology
usage of change windows.
• Ability to pin-point physical layer issues easily • Minimized any impact from project dependencies.
• Avoids project delay
• Critical BAU activities retained by reducing time
spent on troubleshooting.
• The D-Port Test itself identified which side the problem was on since the
test results on Switch 1 failed due to ‘remote’ port, and Switch 2 failed
due to ‘local’ port.
• Always utilize Clearlink diagnostics with FEC disabled (default)
• QSFP optic replaced on Switch 2
• Rapid detection & isolation of slow-drain issues. • Avoided impact to users and potential public-facing
issues.
• Reduction in impact. Propagation of performance
degradation to other applications avoided. • Reduced potential effort costs in identifying root
cause and resolution actions.
• PS value add - limited the man-hours required to
investigate issue
Pre-defined Groups
Intuitive Reporting
3.
All the monitoring rules based on the custom thresholds
configured in Fabric Watch
All of the monitoring rules based on the active thresholds
MAPS
in Fabric Watch at the time of migration
Aggressive Policy
Pre-defined Policies
• Contains rules and actions with very strict
• Each policy is based on more than thresholds
300 rules with unique actions that • When a pristine network is needed
have been vetted by Brocade experts Moderate Policy
• Takes the guesswork out of defining • Contains rules and actions with threshold values
threshold-based rules and appropriate between the aggressive and conservative
policies
actions
• Integration with Brocade Network Conservative Policy
Advisor automates applying the • Contains rules and actions with more lenient
policies across the fabric or multiple thresholds
• When environments are resilient and can
fabrics accommodate errors
Pre-defined Categories
Ports Switch Status Fabrics
• 300+ rules grouped by type
• 10 pre-defined Categories
FRU Security Resource
• Every rule is customizable
Fabric
• Reduces errors and manual effort FCIP
Traffic /
Performance
Performance
Impact
Backend Ports
Pre-defined Groups
• Enables a group of similar All Host All Target All
components to be monitored as a Ports Ports E_Ports
single entity
All Power
• More than 30 pre-defined groups All SFPs All Fans
Supplies
available
• Ports get automatically assigned All FCIP and
to the right group Circuits more…
• Reduces errors and manual effort
Actions Actions
Condition Condition
Groups
Categories
Policy
Conservative Policy
Custom Policy
Port Examples
Health
Element Definition
The number of times an invalid cyclic redundancy check
(CRC with
Monitors port statistics and takes error occurs on a port or a frame that computes to an invalid
good EOF
action based on the configured CRC. Invalid CRCs can represent noise on the network. Such
(crc g_eof)
thresholds and actions frames are recoverable by retransmission. Invalid CRCs can
markers)
indicate a potential hardware problem
Class 3
timeouts The number of Class 3 discard frames because of timeouts
(C3TXTO)
Switch
Status Examples
Policy (SSP)
Element Definition
Enables you to monitor the Power Power supply thresholds detect absent or failed power
health of the switch by defining Supplies supplies, and power
the number of types of errors that (BAD_PWR) supplies that are not in the correct slot for redundancy
transitions the overall switch
state into a state that is not Core Blade
healthy2 (DOWN_CO Faulty core blades (applies to modular switches only)
RE)
Fabric
State Examples
Change
Element Definition
Groups areas of potential Zone changes (ZONE_CHG) Tracks the number of zone changes.
problems arising between Zone changes Because zoning is a security provision, frequent zone changes may
devices, including measures such (ZONE_CHG) indicate a security breach or weakness. Zone change messages occur
whenever there is a change in zone configurations
as zone changes, fabric
segmentation, E_Port down, Tracks the number of fabric reconfigurations. These occur when the
fabric reconfiguration, domain following
Fabric events happen:
ID changes, and fabric logins reconfigurations • Two fabrics with the same domain ID are connected
FAB_CFG) • Two fabrics are joined
• An E_Port or VE_Port goes offline
• A principal link segments from the fabric
Enables you to define rules Monitors different security System resource monitoring
for field-replaceable units violations on the switch and enables you to monitor your
(FRUs), including SFP takes action based on the system’s RAM, flash,
transceivers, power supplies, configured thresholds and memory, and CPU
and flash memory their actions (i.e. invalid
logins, certificate expiration)
Green
Green dot
dot
indicates active
policy
– To create a new policy, select the switch in the tree, and then
click Add button (next slide)
• This creates an empty policy to which the user can configure new rules
• Add Port Group dialog can be launched from the MAPS Configuration dialog by clicking Manage
button
• This dialog can be launched in fabric or switch context
• If launched in fabric context, groups are shown for all MAPS-enabled switches in an aggregated view
• Groups can be created, edited, or deleted using this dialog
• Lists details of each MAPS violation including the object affected, the rule condition,
actions triggered, and recommended action to resolve the violation
• Launch points:
• Dashboard widgets
• Fabric/switch/FC
port right-click
will launch in
context of
selected object
• MAPS
Configuration
dialog
Prerequisites (If the Fabric OS, Brocade Network Advisor, or switch do not meet the requirements below, upgrade the firmware, software, and/or install the required licenses before
preceding.)
Confirm that the product is a Brocade SAN Check if you have a Brocade Fabric Vision technology (or Fabric Watch and Determine which one of the pre-defined
• IO_PERF_IMPACT
– When the Transient Queue Latency (TXQ) is greater than high
threshold (10ms)
– Calculated based on buffer credit zero and transient
queue latency counters
• IO_FRAME_LOSS
– When the Transient Queue Latency (TXQ) is greater than high
threshold (80ms)
– Calculated based on TXQ and CX3 Timeouts counters
• IO_LATENCY_CLEAR
– Latencies drop to normal levels, the port state changes to
IO_LATENCY_CLEAR
FOS 7.4, CONDOR 3 © 2017 BROCADE COMMUNICATIONS SYSTEMS, INC. COMPANY PROPRIETARY INFORMATION 80
Slow Drain Device Quarantine (SDDQ)
Simplified, advanced detection, and mitigation of slow drain devices in a SAN
/31
porterrshow :
Port Error Stats: frames enc crc crc too too bad enc disc link
loss loss frjt fbsy c3timeout pcs
tx rx in err g_eof shrt long eof out c3 fail
sync sig tx rx err
159: 672.3m 2.0g 0 0 0 0 0 0 0 643.9k 0 0
0 0 0 643.9k 7 0
Framelog:
• Alternative Mitigation:
– Manually place in low QoS zone
– SDDQ
– Port Toggle
– Fence/Decom
• Prevented SAN performance from degrading further • Significantly reduced impact to users and public
due to existence of slow draining device. escalation risk.
• Introduction of automated quarantine capability
• Utilization of fully automated proactive monitoring
reduced potentially many hours of manual
and reactive actions
remediation.
• Increased availability of SAN Admins time
• PS value-add:
• Health Check
• Best-practices
• Defining policies
/31
2016/12/13-06:04:07, [MAPS-1022], 214, SLOT 6 | FID 1, INFO, sansw0303a_vf01, Port 2/19 has been marked as Slow Drain
Device.
• Mark as Special Event, plus any other • Special Events show up in Custom Event
actions required e.g. email: widget:
• Deploy MAPS policy and FPI to proactively alert you to latency bottlenecks
• Utilize SDDQ if you wish to automate proactive mitigation action
• Or manually put device in low quality of service zone
• Or remove device from fabric
• Initiator Checks:
– No. of LUN’s/devices used by adapter
– Load balancing
– Queue depth settings
– HBA driver/firmware levels (upgrade if possible)
– Application load
– etc….etc
COMPASS Policy
• Configuration and Operational Monitoring Policy
Automation Services Suite (COMPASS) Actions: apply
• Simplifies deployment of larger fabrics with automated policy, monitor
switch and fabric configuration services drifts
Recovery actions
Customer Challenges Detection:
• MAPS/FPI RX Util alerts on Host NPIV
• Flow Vision monitor utilised to identify flows behind host NPIV
• High utilization on NPIV port – identified during port
• BNA historical graph identified peak times of high util caused by a
FV Workshop. single VM/application
• All other historical workflows showing low usage.
• Customer very concerned over alerts and potential • Both fabrics showing same utilization (balanced).
other flows that could impact critical applications.
• No Support ticket raised or resolution process in Mitigation:
• MAPS alerting setup on specific traffic flows to determine
progress frequency going forward.
• Flexibility to move other workflows and/or adding additional host
connections to mitigate issue.
© 2017 BROCADE COMMUNICATIONS SYSTEMS, INC. COMPANY PROPRIETARY INFORMATION
Fabric Vision Use Case – Government
Outcomes
Flow Learning
• Non-disruptively discover all flows that go to or
come from a specific host or storage port, or
traverse ISLs
Flow Monitoring
• Non-disruptively monitor any flow across the fabric
• Obtain statistics on specific flows or frame types
Flow Generator
• Full-mesh flow generator for stress testing
• Pre-defined flows to minimize configuration
• Create/Define Flows
– Switch port, Initiator Port, Target Port, LUN ID
– Trunk Groups, XISL & Backbone monitors
– Source to Destination or Bi-Directional
• Learn Flows
• Monitor Flows in BNA
– Frame Monitor Flow
– E2E Monitor Flow
– Top Talker Monitor Flow
– LUN level statistics
• Simplified deployment of
Flow Monitor
• Easily accessible, pre-
canned templates from right-
click menu on
Switch/Initiator/Target Port
• Automated new dialog with
populated fields based on the
pre-canned template
selection
• All the flows defined in the context of a fabric are listed in the left column
• User can move the flow definition to the right column to monitor the measures available for the flow
• The selected flows can be plotted on a Performance Graph
• IO
Insight
Flows
• Now confirmed that the physical host NPIV port had high utilization
• 7 day history from the CLI not enough to gauge the frequency of the port utilisation issue
• Historical BNA dashboards - 1 month historical data will give a clearer picture via the violations
widget
• From CLI utilize ‘portloginshow’ to view all virtual machine login PID’s behind
• the physical NPIV host port reporting FPI
• CLI Filter used to filter for the PID’s associated with the host NPIV port within the output
of the ‘sys_mon_all_fports’ learning monitor to gauge the heaviest flows (see output on
next slide)
• Used BNA flow vision to graph the flows via sys_mon_all_fports as follows:
• Move the monitor to right hand-side to begin filtering for the PID’s:
• Note: historical graph has been selected to show how often particular flows are
peaking.
© 2017 BROCADE COMMUNICATIONS SYSTEMS, INC. COMPANY PROPRIETARY INFORMATION
Fabric Vision: Use Case
Detection
• Historical graph will help determine highest and busiest flows over a time period:
• Identified the same flow (application) on both fabrics as being the flows with the highest
peaks over a week/fortnight
• All other flows were showing low utilization
• So result had verified the following:
Non-critical flows were having no impact on the performance of the critical
application flow
Single flow/application causing the high front port util FPI alerts (the critical app in
this case)
Flow/application was balanced over the dual fabrics i.e. had same utilization with no
imbalance
• Deploy MAPS policy and FPI to proactively alert you to high utilization
• Utilize historical performance monitoring in BNA
• Use Flow Vision to identify NPIV Flows
• Design/plan NPIV use to account for application workloads
• Add additional physical port connectivity depending on application workload
• Setup/finetune host queue depth
Monitor host and storage Identify and isolate the source Leverage IO statistics
device IO workloads of device or network to provision and fine-tune
and behaviors performance degradation the infrastructure
*DATA FLOW SHOWN IN THIS DIAGRAM IS FOR READ OPERATIONS. © 2017 BROCADE COMMUNICATIONS SYSTEMS, INC. COMPANY PROPRIETARY INFORMATION
IO Insight Demo Video
2.20 mins
10:45 – 11:45 Monitor and Prevent: Using MAPS, Dashboards and FPI
Troubleshoot & Support: Using FV tools for rapid diagnosis and
11:45 - 12:30 resolution
Intro into the Analytics and Monitoring Platform - AMP
12:45 - 13:00 Wrap-up and Q&A
Quickly pinpoint
Automate monitoring problems, uncover
and alerting of abnormal issues before users
behaviors are affected
+ Fabric Vision
Basic Support
Initiation –
+ AMP
+ BNA
#1 SAN Infrastructure
#2 Switch Health
Reactive
#4 Availability – lack of redundancy in the fabric
#5 Availability – reduce resiliency due to physical layer port errors and faulty media
#7 Utilization, Ports, Target, Initiator, ISL ports capacity and balance planning.
Proactive
Target and host performance: reporting on pending IO to optimize queue depths and
#9
optimize application performance
165
Finding the Learning Portal Link on MyBrocade
1. On the MyBrocade dashboard, find the Education tab at the top of the
page (red circle) and hover over it.
2. Click the Learning Portal link (blue circle).
166
Searching for Courses on the Learning Portal
167
Thank You