Sunteți pe pagina 1din 20

Smarts 101

M
Automating
Problem & Incident Management

EMC CONFIDENTIALINTERNAL USE ONLY

What Is the Customer Challenge?


Too much data, not
enough information

Application Domain
Symptom

Difficult to determine the


root cause of a problem
Cant distinguish among
network, storage, and
application faults

Symptom

Symptom

Too many trouble tickets


being opened in the call
centercant keep up

Storage Domain

Symptom

Symptom

Symptom

Problem

IP

Network Domain

Difficult to gauge the


impact of an IT-level
problem to the business

Management solutions send alarms for everything


2

Prior Approaches Cant Keep Up


Sustaining
SLAs

Presentation
Root Cause & Impact Analysis
Ruleswritingandmaintenancetoprocess
Events/Alarmswithoutunderstandingof
underlyingITtopology
Integration & Correlation End-to-End
Event Collection & Reporting
Applications
Switches

Routers

Databases
Telephony

Servers
Video

Firewalls
Storage

Reporting
SLAs
IDS
Optical

Source: Giga Information Group Infrastructure Management Report 2001

EMC Smarts: Managing Services In Real Time


Automated Actionable Intelligence
Analytics
Codebook Correlation Technology

Abstraction
EMC Smarts Common Information Model

Data and Event Collection

Routers

Applications

Optical

Firewalls

Video

Switches

IDS

Databases

Telephony

Storage

Servers

With patented, industry-leading technologies


4

ECIM: The EMC Common Information Model

Based on DMTF CIM, extended with rich


semantics for integrating and automating
management applications
Comprehensive

Models the complex web of relationships in the


real world:

Within entities, across entities

Models cross-domain relationships

Models network, systems, applications,


services, business entities
100+ classes, 40+ relationships
Infinitely extensible via inheritance

Key to service management, end-to-end


Pieces together information from heterogeneous
sources

Abstract

To scale to networks of unlimited size and


complexity through multiple levels of abstraction
To decouple management application logic from
the specifics of an ever-increasing stream of
vendor products

Subscribes

Service
Subscriber
ComposedOf

Hosted
Application
HostedBy

Service
Offering

Neighboring
Systems

Switch

Neighboring
Systems

Host

Router

EMC Common Information Model

Application Logic
Logical Port
Problem Down Causes OperationallyDown, ConnectedPortDown
Local symptom OperationallyDown
Propagated symptom ConnectedPortDown To Connected Logical Ports
Down

Physical Port
Problem Down Causes PortDown
Propagated symptom PortDown To Logical Ports Layered over the Port
are Down
Propagated symptom ConnectedPortDown To Layered over Logical
Ports

Card

Problem Down Causes PortDown


Propagated symptom PortDown To Physical Ports in the Card are
Down
Propagated symptom ConnectedPortDown To Physical Ports in the
SwitchCard
Problem Down Causes ConnectedPortDown
Propagated symptom ConnectedPortDown To Cards in the Switch
7

Application Topology
Switch 0

Switch 1
P
C0 0
P1

L0
L1
L0
L1
L0

C1

P0
P1

L0
L1
L0
L1
L0

P0
P1 C0

L1
L0

L1

P0

L0

L1

P1

L1

C1

Switch 3

Switch 2
C0

P0

L0
L1

P1

L0
L1
L0

P0

C1

P1

L1
L0
L1

L0
L1
L0
L1
L0
L1
L0
L1

P0
P1

P0
P1

C0

C1

Codebook Correlation Example

Symptoms

Problems

The first step in creating a Codebook


is to use the models that apply to
each discovered element to
create a list of the problems.
These problems represent the
columns in the Codebook.

Codebook Correlation Example

Symptoms

Problems

The second step is to use the


local and propagated symptom
definitions in the applicable
models to create a list of all the
symptoms that can be collected
in the network. These symptoms
represent the rows in the Codebook.

10

Codebook Correlation Example

Symptoms

Problems

Finally, models are used


to populate the Codebook
with the active symptoms
for each problem.
These are the root-cause
problem signatures.

11

Codebook Correlation Example

Symptoms

Problems

Next, a mask is created in


memory and populated with
symptoms collected using
ICMP, SNMP, Traps, and TCP.
Each symptom is added
as received.

12

Codebook Correlation Example

Symptoms

Problems

On a periodic basis, the


mask is passed across the
Codebook to identify the
exact or best root cause.

13

Automating Service ManagementStart to Finish


1) Includes a library of generic
models for the specific managed
technology, which describe the
attributes of the physical or virtual
managed objects, as well as their
relationships within the environment
and the problems associated with
those objects.

4 Codebook Correlation

Root Cause & Impact Reporting

Analysis
Automatic Interface To Service Desk

2) Auto-discovery of the environment


or inputs from 3rd party databases &
tools
3) Mapping of generic models in the
library exist in the environment, and
how these devices are related.

1 Object Library

4) Using input from step 3, the


Codebook automatically calculates
the sets of symptoms, or
signatures, that indicate serviceaffecting authentic problems for the
specific topology.

Context

5) Monitors the environment looking


for signatures of authentic problems.
When Codebook identifies a set of
symptoms that indicates an authentic
problems, the root cause and its
business impact are delivered to
Service console in real time.

Domain Managers

3 Object Repository

Discovery

5 State Monitoring

Network, Server, Storage, Applications

14

SMARTS Architecture
EMC Common Information Model

IC
Console

IC
Dashboard

Unified data model


Rich 50+ relationships

Scripting,
Java, CLI,
C, C++

Standard
EAI Bus:
ODBC/SQL
JMS,
Database
TIBCO

CRMs

Common Services

Automation engines:

Codebook Correlation Technology

Auto - Discovery

Populate topology through


mediation layer
De-coupled from ECIM and
analysis

Automatic adaptation to change


Recursive distributed architecture
Open extensible architecture

De-coupling of layers
Open-ended import/export APIs
Incremental product support
Add new classes and new
automation engines

Analysis Models

Analysis
Engine

ECIM Model and Object Repository

Mediation

15

EMC Approach: IT Process Optimization


Incident/Problem
Process

Process Optimization
Through Functional Domains

Configuration
Process

Compliance
Process

Change/Release
Process

Business Impact

IT Service Management

Operations Management

Process Optimization Across Technology Domains

Storage

Network

Server

Application

Virtualization

Security

3rd Party

16

Triage, Troubleshooting, and


Workflow
Proactive device monitoring
Alarm correlation engine
Alarm presentation and workflow
Device drill down and data gathering
CIM-compliant Topology model
Open database

17

Triage, Troubleshooting, and


Workflow
Single Focal Point for Monitoring, Analysis, and Control

18

Triage, Troubleshooting, and Workflow


SMARTS Screenshots

19

SMARTS InCharge Components

InCharge Availability Manager, Performance Mgr, and Server Performance Mgr


(AM/PM/SPM)

InCharge Service Assurance Manager (SAM)

Hardware monitoring
Application process monitoring
Wmware monitoring
MS Cluster monitoring

InCharge Business Impact Manager (BIM)

Application (TCP) Port Availability


Topology and correlation with other SMARTS apps

InCharge Enhanced Server Manager

Cross-domain correlation
Powerful options for managing and viewing infrastructure

InCharge Application Connectivity Manager

Auto-discovery of Layers 2/3/logical


Root cause and impact analysis

Determines the effect technology events have


on business
Analyzes events across domains and associates
events to customers and services

InCharge SMARTS adapters

Integration point/leverage existing investment


InCharge for non-IP domains (Syslog)
20

S-ar putea să vă placă și