Sunteți pe pagina 1din 10

International Conference on Automonic Computing

Automating ITSM Incident Management Process


Rajeev Gupta K Hima Prasad Mukesh Mohania
IBM Research Lab, Delhi IBM Research Lab, Delhi IBM Research Lab, Delhi
grajeev@in.ibm.com hkaranam@in.ibm.com mkmukesh@in.ibm.com

Abstract can report problems using various methods such as


web based, e-mail or telephone. The L1 person tries to
Service desks are used by customers to report IT satisfy the customer requests as much as possible to
issues in enterprise systems. Most of these service facilitate the restoration of normal operational service
requests are resolved by level-1 persons (service desk with minimal business impact on the customer. Usually
attendants) by providing information/quick-fix a database of historic incidents and their corresponding
solutions to customers. For each service request, level- resolutions are maintained at the service desk. The L1
1 personnel identify important keywords and see if the person uses keyword search to see if the service
incoming request is similar to any historic incident. request can be resolved using any of the previously
Otherwise, an incident ticket is created and, with other reported incidents. If the incident can not be resolved,
related information, forwarded to incident's subject an artifact of the incoming request is created in the
matter expert (SME). Incident management process is form of an incident ticket which initiates the chain of
used for managing the life cycle of all incidents. An various IT system management (ITSM) processes such
organization spends lots of resources to keep its IT as incident management, problem management,
resources incident free and, therefore, timely configuration management, change management and
resolution of incoming incident is required to attain release management. Directly or indirectly, all these
that objective. Currently, the incident management processes start with the incident management process.
process is largely manual, error prone and time The incident management process provides most
consuming. In this paper, we use information immediate and visible gains to service quality and cost
integration techniques and machine learning to reduction. Then the problem management process is
automate various processes in the incident used to find the root cause of incident(s) so that similar
management workflow. We give a method for incidents can be avoided in the future. To alleviate the
correlating the incoming incident with configuration root cause one or more of change/ configuration/
items (CIs) stored in Configuration management release management processes are involved.
database (CMDB). Such a correlation can be used for
correctly routing the incident to SMEs, incident As part of the incident management process, an
investigation and root cause analysis. In our incident ticket is created and the ticket along with the
technique, we discover relevant CIs by exploiting the relevant information is forwarded to a level-2 (L2)
structured and unstructured information available in person who is subject matter expert (SME) for the
the incident ticket. We present efficient algorithm incident. Then the SME tries to resolve the incident
which gives more than 70% improvement in accuracy through her expertise. This whole process of incident
of identifying the failing component by efficiently resolution is largely manual; thus it is time consuming
browsing relationships among CIs. and error prone. This paper is aimed at improving the
incident management process so that it can be more
1. Introduction automated thus reducing the incident processing time.
An incident ticket contains information about the
Service Desk is a single point of contact (SPOC) for customer and problem reported by her. Problem related
the users who need help for running their IT systems. information may include system which is facing the
Service desk is managed by a level-1(L1) person. problem, severity of it, natural language description of
Customers contact the service desk for various the problem, etc. In this paper we use both structured
purposes such as information, configuration change, and unstructured information to help in identifying the
problem being faced by the customer, etc. Customers possible failing component. Next, we describe the
ITSM incident management process in detail.

978-0-7695-3175-5/08 $25.00 © 2008 IEEE 141


DOI 10.1109/ICAC.2008.22
Incident Duplicate Incident Identifying failing Diagnosis Repair &
detection search classification component recovery

Figure 1: Incident Management Workflow

We explain our contributions with the help of an


2. Incident management process example. Let us assume that customer has called and
describes the problem as:
The aim of the incident management process is quickly
resolving incidents that affect the normal running of an Example1: “I tried launching inventory application
organization’s IT services. An incident is an intimation on server avalanche.server.net from 10.10.10.70. I get
of some error or failure of some component in IT an error saying that database transaction failed."
systems. Figure 1 shows the incident management
workflow which can be used for resolving an incident. First, we automatically identify relevant keywords
In a typical service desk, incident is either reported by which can be used to search CMDB and identify
the customer or automatically generated by system possible failing CI. Presently simple keyword search is
monitoring/event generation system. Customer report used to do this operation. Using actual service desk
incidents by describing the system condition using data we prove that simple keyword search may not be
natural language text whereas automatically generated sufficient to search the actual failing component. When
incidents only have structured data specifying system customer describes the system problem, (s)he describes
and event-class [17]. In this paper we are considering only her or his perception of the problem. Thus,
customer reported incidents only. For such incidents, problem description may partially or not at all mention
L1 person does a quick “keyword based search” from a the possible failing component explicitly. We use the
database of historic incidents. If any matching incident incident description and CMDB data to get the context
is found, its solution may be used to resolve the of the search and identify the failing component. For
incoming incident. If the L1 person can not provide identifying the incident context, incident classification
any resolution, an incident record is created. This plays a key role as we describe towards the end of the
incident record is classified for various purposes such paper. Using performance results over the actual
as assigning priority based on urgency and impact, customer data we show that accuracy of search can be
selecting the appropriate SME, etc. Then failing improved by more than 70% using our approach. By
component is identified by manually associating our proposed automations, we help the service desk by:
hardware or software components (configuration • By improving the “duplicate search” step we help
items) responsible for the incident. Information about reduce the number of incident tickets. By
these configuration items (CIs) is maintained in a associating CI and intelligently selected keywords
Configuration management database (CMDB) which is with historic incidents we improve recall of the
also used by other ITSM processes as an underlying “duplicate search” step.
data storage framework. L1 person uses keyword • By automatically associating responsible CIs, we
search along with human intelligence to guess the help in reducing the number of incident tickets
possibly responsible CIs. Then the incident ticket is being forwarded (rightly or wrongly) to SMEs. As
forwarded to L2 support to diagnose the problem in the SMEs are costly to get and maintain, this will help
selected CI. For diagnosis the CI is monitored and in reducing operational costs.
various probes [14] may be used. If the identified CI is • We implemented a rule based request router which
wrong ticket is bounced back and forth between L1 and uses various attributes of incident ticket, including
L2 support. If any code change is required external associated CI, to automatically route the ticket to
support (L3) is contacted. After resolving the problem most appropriate SME.
customer is informed and incident ticket is closed. In • Using the importance (or role) of the selected CIs,
this paper we propose techniques to automate and the service personnel can assign priority to the
improve various stages of the incident management incident. For example, if the selected CI is a
workflow. Next we describe our approach with the machine on which billing application is installed,
help of an example and outline our contributions. and then higher priority can be given compared to
the case where selected CI is the backup software.
2.1 Contributions

142
The rest of the paper is organized as follows. In relationships. To find the database on which billing
Section 3, we give details of data model of the application is running one needs to traverse depth=2.
configuration management database. We use object
data model and relationship between objects for CMDB have one other kind of relationships of
performing context oriented search. Steps to implicit types. These implicit relationships are part of
automatically identify keywords from incident the CMDB data model. Figure 3 shows an example of
description are presented in Section 4. Searching implicit relationship where a Unix Computer is sub-
CMDB for keywords along with desired relationships category of Computer and Operating System is a child
is presented in Section 5. Implementation details and of that. These explicit and implicit relationships can be
performance results showing effectiveness of our used to know the details of all the linux machines in the
technique are presented in Section 6. Related work is enterprise such as which machine they are installed-on,
presented in Section 7. The paper concludes in Section what software’s are running on them, etc. When an L1
8. person searches the CMDB, she mainly considers CIs
without bothering about relationships. Usually CMDB
is searched using keywords extracted from the incident
Billing
Application
description. All CIs having attributes values matching
deployed-on with those keywords are returned as keyword search
results. If L1 person is not satisfied by the results
Apache
Web Server WebSphere she/he searches again and/or explores CIs related to
depends-on
Application returned CIs in a very ad-hoc manner. This whole
Server 5.2
process is largely manual heavily dependent on skills
runs-on Oracle transactiona of L1 persons making it unreliable and costly. For a
Database runs-on l- given incident, we propose a method to search CMDB,
runs-on DB2 considering both CIs and relationships, so that failing
Aix Machine Database component can be identified with more accuracy and
hpux1
minimal human intervention. First step of our
runs-on
Aix Machine automated architecture is identifying important
hpux2
keywords from incident which is explained next.
Aix Machine
hpux3

Figure 2: Typical enterprise architecture

3. Configuration management database


Effectiveness of the incident management process
depends on the speed and the accuracy of the process
to identify the failing component. That in turn is
closely aligned to the accuracy and design of the
configuration management database (CMDB). CMDB
stores configuration items which represent systems,
software’s and people in enterprise infrastructure.
Figure 2 shows a typical enterprise infrastructure
represented using configuration items and relationships
between them. Nodes shown are CIs stored in CMDB
and edges between them are explicit relationships
between CIs. Relationships are also stored in CMDB
as objects. For example, Oracle database is stored as Figure 3: Example CMDB data model
an object with its attributes like schema, table space
etc. Similarly, an Aix machine hpux2 is also stored
with its attributes like CPU type, memory etc. Another
relationship object is created with type runs-on whose 4. Identifying keywords
source-attribute is the oracle database object and
target-attribute is hpux2. Each edge shows depth=1 A typical incident record contains three types of
information: 1) Customer details such as name, contact

143
information, e-mail address, etc. 2) Ticket information 5. Searching CMDB
including status of the ticket, severity, system/
component related to the ticket, etc. 3) Problem All the keywords extracted as explained in the previous
description giving client’s perception of the problem. section can be used to search the CMDB. But this
For identifying relevant keywords mainly problem ‘vanilla’ search is not likely to be efficient due to
description is used. Customer and other ticket related following considerations:
information are mainly used for determining scope of ™ Typical CMDB implementations are not
the search in CMDB. For example, customer related done with keyword search in mind.
information can be used to search only a particular part Instead, CMDB is mainly used for
of CMDB belonging to that customer. Simple rule browsing from one object to its parent/
engines can be created to process the structured child or other related objects.
customer and ticket information but processing natural ™ CMDB is largely implemented as object
language description of the problem using simple rules oriented database [5]. Concept of keyword
is not possible. L1 person needs to understand the search over object oriented database is not
problem to decide on the keywords to be used to search well defined. Various questions such as
CMDB. L1 person expects that these keywords will which attribute(s) to search, how to search
result in the possible failing component for the given child objects, is it possible to index
incident ticket. In this section we describe a method of objects, etc. need to be answered for
automatically extracting relevant keywords from efficiently searching an object oriented
incident description. database.
™ While doing keyword search, relationships
Incident tickets are usually managed using with are not taken into account. Keyword
web-based interface. Thus first step of the incident search and relationship browsing need to
processing is to extract the incident description using be integrated.
HTTP parser. As information entered in the incident ™ When a customer describes a problem it is
description may be unclean, we need to clean that his/her perception of the problem which
unstructured data. We used edit distance between may not include much information about
keywords and dictionary words to clean the incident the actual failing component. Instead, it is
description. As one needs to search for keywords in CI likely to have some information about
types (e.g. oracle database) and attribute values (e.g. neighboring/dependent objects.
hpux3.ibm.com) of CIs; in an incident description (in
English language) only nouns or noun phrases need to For efficiently and automatically locating failing
be extracted. Next step in identifying keywords is to components we use indexing and object navigation
normalize the extracted nouns. For example a which are explained in next two sub-sections.
computer name may be mentioned in its short form
(hpux3 for hpux3.in.ibm.com) thus, if required, we 5.1 Making search efficient
identify such machine names and normalize them by
appending their domain names. Similarly it is possible For making CMDB search efficient we index all the
that all the attributes use IP addresses whereas incident CMDB objects. Indexing CMDB objects is non-trivial
description has URL. Thus, we need to replace them as we should be able to search not only with keywords
with commonly agreed format. To further improve the but also with relationships. For example, we should be
search performance we annotate noun-phrases using able to efficiently search all the Computer-Systems
named-entity (NE) annotators [12] which annotates which have Linux Enterprise Edition v3.1 as operating
keywords based on their types. By associating CI-types system. Further, it should be possible to search for
with these keyword annotations, we limit the search relationships which are arbitrarily deep. We solve all
scope further. For example, if a particular keyword is these issues by creating two indices over CMDB.
identified as software-name it need not be searched in
CIs of type ComputerSystem. A dictionary matcher can Aim of the keyword index is to identify objects
be used to categorize keywords to classes such as having a given keyword in the values of its attributes.
hardware and software using a dictionary of names. This index maps each keyword with a set of objects.
Similarly, it can be used to identify WebSphere The keyword index has four columns: keyword, object-
application server as a single keyword. id, object-desc and object-class. The object-desc
(object description) column is used to display name of
objects so that service personnel can identify and get

144
details of any of the displayed object. The object-class 5.2 Limiting search results
column is used to indicate type of the object e.g.
Computer System, Database, Security certificate, etc. Keyword search may result in lots of objects. We do
To synchronize the index with the modifications in the not want to overburden the service desk personnel by
underlying CMDB, we poll the CMDB logs showing lots of results. Further, search context is
periodically to get updates since last poll. All the captured by the objects represented by keywords as
newly added/ deleted/ updated objects are reflected in well as their neighborhood objects; and one naïve way
the index. Frequency of the index synchronizing is to get all the neighborhood objects of vanilla
process should be decided based on number and keyword search results. Such an approach is also likely
frequency of changes occurring in CMDB. Usually to return lots of unnecessary objects which the service
CMDB is modified using resource discovery desk personnel will have to examine. In general all the
mechanism [] which is done once a day. Thus, we CMDB objects can be related to each other thus we
perform the synchronization once a day which need some way of limiting number of search results. In
synchronizes keyword index as well as relationship general, search results can be limited by two ways:
index which is explained next.
1. Keyword search results are returned with their
Relationship index is created to efficiently browse corresponding TF-IDF [10] scores. In this case objects
over related objects. We use these relationships to with top-k relevance scores are shown to the service
extract context of the incident search automatically. desk personnel. She can select the failing component
For example, if incident describes two machines which from the list shown.
are not able to communicate, then possible failing
component may be the CI corresponding to a network- 2. In this case a list of objects is shown so that ratio of
link between them. As in keyword index, for creating scores between bottom ranking object and that of top
the relationship index also all configuration items are ranking object is above a certain threshold. This
obtained one-by-one and all its depth 1 relationships approach is motivated by the fact that if bottom rank
are stored in the relationship index. For getting objects object has very low relevance score then it is better not
related with higher depths, multiple invocations on the to show that to service desk personnel.
relationship index can be done. Each relationship can
be defined by its source object, target object and We combine the above two criteria to output
relationship type between them. Table 1 shows minimum results to the service desk personnel.
example entries for the relationship index. In the Configurable parameter nmax is used to indicate the
relationship index, oid-1 is the object-id of the object maximum number of results to be returned (top-k).
we are parsing, oid-2 is the object-id of the related Another parameter  is used to indicate the lower
object, relationship-type gives name of the bound on the ratio of score of bottom ranked object
implicit/explicit relationship and direction is forward and the top-ranked object. Let the results returned by
(backward) if parsed CI is the source (target) CI of the the CMDB search have relevancy scores {c1, c2,... cn}
relationship. Both explicit as well as implicit (parent- in descending order. Thus,
child) relationships are covered using the relationship cn
index. n < n max and >γ (1)
c1
Actual number of objects returned by the keyword
Table 1: Example relationship index search will be given by the more restrictive of
oid-1 oid-2 relationship- direction
type
conditions given by Equation (1).
ab4569000 ph9067230 installed On forward
ph9067230 ab4569000 installed On backward 5.3 Getting search context
ab4569000 dc4587689 parent-child forward
As mentioned previously, search context is obtained by
dc4587689 ab4569000 parent-child backward
using neighborhood objects of directly mentioned
objects in the incident description. If there is no other
Keyword index and relationship index are used to information then we can simply consider getting
efficiently search the CMDB. In the next section, we neighbors of all the object resulted from vanilla
describe how relationship browsing can be done to keyword search. This forms our base-line scheme.
automatically obtain search context.
5.3.1 Omni-directional search. As one object may be
related, directly or indirectly, to a large number of

145
objects, we can not access all the related objects. We Equations (5) and (6) can be used reduce the search
limit the object navigation up to a certain relationship- scope without any adverse impact on quality of results
depth (rd). E.g., rd=1 signifies only directly related presented to the service personnel. Typical values of
objects will be used for getting incident context, by various parameters discussed in this section are given
keeping rd =2 we get objects which are directly related in Table 2.
to the related objects (database to billing application in
Figure 2), etc. For getting all the related objects, Table 2: Typical parameter values
irrespective of depth, rd can be set to 0. Relevancy Parameter-name Value
score of neighborhood objects depends on the score of nmax 20
the keyword search results to which they are related, α 3
type of relationship and depth. Equation (2) is used to  3
get relevancy score of a related CI (RCI) which is  0.05
related to one keyword search CI (KSCI).
wi ( rel _ type ) 5.3.2 Super object search: This case is useful when
RR = r × ∏ (2) service desk personnel can guess the type of object
0≤i ≤ rd α
which may be responsible for the incoming incident. In
RR is the relevancy score of the related CI (RCI), r
this case all child objects are considered as part of its
is the relevancy score of the KSCI to which it is
parent object. For example, if we want to find the
related, w(rel_type) is the weight assigned to the type machine (Computer System) responsible for the
of relationship, and rd is the relationship-depth incident then we search for keyword is super object
between the KSCI and its related RCI. Value of consisting of Computer System object along with all its
w(rel_type) lies between 0 and 1, depending on the children like Operating System, File System, etc. This
type of relationship. We give more weight to parent- ensures that whole physical object is considered a
child implicit relationship compared to explicit single logical object leading to better results. There are
relationships. Further, among explicit relationships
two problems with this approach. First, how can we
larger weight is given to a relationship type if any
automate the selection of object type – we answer this
problem in their target may lead to problem in source.
question in Section 5.3.4. Second, we are only
Configuration parameter α (> 1) makes sure that the
considering relationships which are implicit in the data
RR value of the RCI is less than the relevancy (TF-
model. Thus, results for this depend a lot on the data
IDF) score of its related KSCI. If an RCI is related to model. We correct that in our next approach.
more than one KSCI then its relevancy score is sum of
all its individual RR values. It should be noted that we 5.3.3 Directed search. In this case we want to include
are only interested in neighboring objects having score both implicit as well as explicit relationship in a
more than cn (otherwise it will not be part of results manner which is more intelligent than the one
shown to the service personnel). As one object may be presented in Section 5.3.1. We do directed search by
related to more than one object, we calculate scores of using what we call search templates. This scheme is
all related objects having RR values more than cn/; useful when service desk personnel can identify the
where  (>1) is a configuration parameter. Thus, an type of object responsible for the incident along with
RCI can be part of result only if: relationships which can be useful in getting the search
wi ( rel _ type) cn context. In general, a search template can be
RR = ci × ∏ > (3)
0≤ i ≤ d α β represented as tree of CIs and relationships as shown in
This gives us the bounds on the relationship weight Figure 4. Search templates are a subset of data model
and depth to which relationships are to be explored. including CMDB object types with implicit and
These bounds reduce the navigation scope drastically, explicit relationships between them. Directed search
thus improving search performance. As w(rel_type) [1 has following distinctive features:
and ci [c1, from Equation (3): o Keyword search using this template will
c1 / α d > cn / β (4) involve CMDB objects of types database
server, application server, Java server and
Thus, from Equations (1) and (4) we get,
application.
log(β / γ )
d< (5) o All resultant objects of child type are
log(α )
traversed using relationship specified on
Similarly, we can obtain the minimum weight of edges to get their corresponding parent
the relationship to be explored using, objects. Thus, all the resultant objects of type
w( rel _ type) > α × γ / β (6) “Application” are traversed using “Deployed
on” relationship to get their corresponding

146
“Application Server” objects. These objects based mapping was used to select template(s) to be
are further traversed using “Dependency” used for a particular class. For example, for incidents
relationship to get the corresponding of class connectivity problems we can have template
“Database Server” objects. for Computer Systems as root with its network
o Service desk person is presented with objects interfaces as non-root objects.
of the root type, that is, “Database Server”.
o Relevancy score of each object is calculated
in the manner similar to the one described in Computer hardware: wireless, key-board, mouse
Section 5.3.1. Thus, a “Database Server” Desktop problems: windows, software, word, hang
object which is parent of lots of resultant child AppServer: websphere, servlet, jsp, page, application
objects (result of keyword search) will have Database: database, db, sql, db2, table, oracle
Networking: connection, ping, access, connectivity
higher score.

Database Server Figure 5: Example features for various


Dependency Dependency classes of incidents

Application Server Java Server


6. Implementation details and performance
Deployed On evaluation

Application Our system was implemented as per the architecture


shown in Figure 6. Implementation details are
explained in [18]. Keyword extractor component is
Figure 4: Sample search template implemented for identifying keywords and annotating
them. We use part-of-speech (POS) tagger to identify
For both super-object search and directed search noun-phrases from the given incident description.
we are selectively using relationships to get the search Then, we use regular expression matcher to identify
context. For super-object search only implicit certain kinds of attributes such as IP addresses and
relationships are considered whereas for directed URLs whereas synonym matcher is used to add
search both implicit as well as explicit relationships domain names with ambiguously specified machine
can be used. In both these cases it is important to names. IBM's unstructured information management
choose the correct search template/ super object. framework (UIMA) [12] is used for annotating
Search template can be created with various object keywords based on their object types such as
types as root. Service desk personnel can select the <hardware>, <software> etc. To implement keyword
appropriate template by guessing the object type which and relationship indices we extended the openly
is most likely to be responsible for the incident. We available Lucene [1] framework of document indexing.
semi-automate this step through incident classification An object has attributes of various types such as
as explained in the next section. integer, float, char, string etc.; besides other objects.
From keyword search point of view, only Strings (or
5.3.4 Incident classification. Incidents are normally character arrays) are important as all the keywords are
classified among pre-specified hierarchical classes. usually stored as strings. Further, we assume that all
The classification is used to assign priority to the the object attributes can be accessed using public
incident and to route the incident to the appropriate methods (e.g. get*() methods in Java). Results of all
service team. We automate the incident classification such methods are combined, using space separated
which helps in selecting the appropriate search context concatenation, to create a text document corresponding
also. As different words are used to describe different to an object. These documents are fed to Lucene to
kinds of problems we used keywords and their create the keyword index. Relationship index was
annotations as classification features. These features created by browsing all relationships of depth 1. The
are used for learning the class to which the failing keyword index and the relationship index were placed
object is likely to belong. Support vector machine [16] on the server machine itself which makes index
method was used to get the class for a given incident synchronization task easier. Object navigation
description. Figure 5 gives significant keywords for templates were represented in XML format [18] for
example root/super objects. A number of hierarchical super-object search as well as directed browsing.
classes were created to classify the incidents and a rule

147
the help of subject matter experts (SMEs). For each
incident for a particular class a set of one or more
templates were obtained. Directed search was
performed with corresponding to each such
template and a union of results was presented to
service desk personnel.

Table 3: Problem ticket queues


Queue Id Problem type # tickets
LNWAT LAN Services 240
Figure 6: System architecture DSA LAN admin operations 495
AIXBK AIX backup and restore 26
CMDB server was running on 2GHz, 2GB dual UNIX Unix support for CA4B 59
processing machine with Red Hat Enterprise Linux as WMAIL Email support 27
operating system. Other components of the architecture NOTES Notes and other applications 7487
including the CMDB client were running on 2GHz RESWTG Web conferencing 12
1GB IBM T43P laptop with MS Windows as OS. All
the components were implemented in Java 1.4.2. For 6.1 Data Collection
performance comparison we use following schemes:
CMDB data consisted of 192,000 objects and more
• All: This is the naive keyword search where whole than 150,000 relationships among them in an IBM
of CMDB is searched for the given incident network. Various tickets are categorized (by the L1
without considering relationships between objects. person) using the location and type of problem.
Different queues are formed for different categories.
• CI type specific: In this case keyword search is For example, there are 16 queues for Watson location
performed only on CIs of given types. Specifically, for problem categories such as notes, network related,
we use two classes of CIs called hardware and etc. Location information can be used to decide the
software. The hardware class has Computer System CMDB server to be used whereas problem-type is used
(with its sub-objects) whereas software class has to select the template. Table 3 shows ticket queues
Database, Web Server, Application Server, and along with the number of tickets used in our
J2EE Application objects. These sets are experiments.
configurable.
6.2 Performance results
• Omni directional search: This scheme is explained
in Section 5.3.1. Maximum depth of navigation was We now present the accuracy and response time
configured to be 2. results for context oriented search enabled by object
navigation and CI type specific search. For each ticket,
• Super object search: This scheme is explained in responsible CI(s) were identified by subject matter
Section 5.3.2. In this case we considered super experts as ground truth. With each incident we show a
objects of types Computer System (including its configurable number of results (nmax=20) to the
child objects Operating System and File System), service desk personnel but to measure accuracy we
Business process (including applications), servers consider only top-k results among them. If actual CI
etc. (ground truth) is part of that top-k set, we assume the
results to be accurate. Service desk personnel select the
• Directed search: In this case directed navigation is resultant CI (or CIs) which can be used to show details
done as per Section 5.3.3 with different templates of those CIs and their related CIs (all software’s
specified. We created 54 templates corresponding installed on the machine, network interfaces, etc.) for
to 120 classes of incidents. One to many mapping helping in the incident resolution.
between incident-class to template was created with

148
5.3.1 as given by legends OmniDir and Bounds
respectively. Number of accesses for the super-object
case is slightly higher than the directional-scheme as
former explores all the implicit relationships
irrespective of their importance. As can be seen from
Figure 8, we get better results, compared to the omni-
directional search, if we can correctly identify the
template to use. Omni-directional scheme not only
gives lesser accuracy but also has more response time

Figure 7: Accuracy of results for various


schemes
Figure 7 shows percentage accuracy for various
templates while varying the number of results
presented for k=1, 2, 3. Directed search with
appropriate template gives the best accuracy values
whereas naive keyword search performs worst. The
accuracy improves by limiting the search scope by
either limiting it to certain CI types or templates. We
get slight improvement by including child objects with Figure 8: Number of CMDB accesses
their parent objects but it is better if we consider (more accesses) compared to the directed search.
explicit relationships as well. Omni directional scheme 7. Related works
performs better than the super object scheme since it
explores all the related objects. Super object scheme Importance of the incident management is underlined
fails to return correct CI if it is not explicitly by work in [8] where IT management of business
mentioned in the incident description. Accuracy results objectives (MBO) is done by mapping effect of various
for bounds are not presented in Figure 7 as they are system management processes on key performance
same as the omni directional scheme. indicators (KPIs). A way of assessing impact of
incident management on business objectives is given in
For k=1, we get 84% accuracy for directional [7]. By automating a big part of incident management
scheme compared to 50% for a naive keyword search. process, incident resolution can be done faster leading
Thus our template based technique can be used for to reduced impact on business objectives. There has
automatically associating CIs without any filtering by been lot of work in literature using database and data
service desk personnel; but, in general, CI association mining technologies for helping service desk
should be verified by the service desk personnel before personnel. Idea of using case-based reasoning by
being used for any of the purpose mentioned in Section matching incoming problem symptoms against a
2.1. historical database of symptoms (and their
corresponding solutions) is proposed in [6]. In [9]
Figure 8 shows average number of server accesses authors use call stack matching to find the likely cause
required for various schemes. As search response time of the crash and hang related problems. These
is dominated by the network and processing delays at approaches try to find similarities in the problem report
the server, average number of server access is information supplied by users without considering their
proportional to response time and CPU cycles required semantic meaning. In contrast, our approach finds the
for a ticket. These accesses are required for searching possible failing components even if problem
over indexes and getting individual CIs to display. For description does not explicitly mention them. Our
naive keyword search (scheme all) we need to access technique uses integration of unstructured information
maximum 20 (=nmax) objects whereas for other cases in the form of incident description with the structured
we need to access other neighborhood objects as well. information in CMDB to generate context of the
We show number of accesses for omni-directional incident and identify relevant CIs. In [11] authors have
search with and without bounds specified in Section proposed an algorithm for linking text document with

149
the relevant data stored in a relational repository. Their [6] T. Acorn and S. Walden. SMART: Support management
algorithm is primarily based on identifying the reasoning technology for Compaq customer service. In
keywords from the text and searching the RDBMS Innovative Applications of Artificial Intelligence, Volume 4,
based on these keywords and predefined templates. 1992.
[7] C. Bartolini and M. Salle. Business driven prioritization
However, for incident management we need to of service incidents. DSOM, 2004.
associate an incident with configuration items which [8] C. Bartolini, M. Salle, and D. Trastour. It service
are stored as (Java) objects in CMDB. In this paper, we management driven by business objectives: an application to
extend the technology for the CIs stored in CMDB and incident management. NOMS, 2006.
use it for service desk management. [9] M. Brodie, S. Ma, G. Lohman, T. Syeda-Mahmood,
L. Mignet, N. Modani, M. Wilding, J. Champlin, and P.
8. Conclusions and future works Sohn. Quickly Finding known software problems via
automated symptom matching. In Workshop on Integrating
Data Mining and Knowledge Management, 2005.
In this paper we presented a technique to identify [10] S. Chakrabarti. Mining the Web. Discovering
failing component by integrating text specified in the Knowledge from Hypertext Data. Morgan Kaufmann
problem ticket with structured data stored in CMDB Publishers, 2002.
database along with incident classification. In our [11] V. Chakravarty, H. Gupta, P. Roy, and M. Mohania.
system, improvement in incident management process Efficiently linking text documents with relevant structured
occurs due to three main reasons: automated information. In International conference on Very Large
identification of keywords, search over CMDB using Databases (VLDB), 2006.
search context, and limiting search scope using [12] D. Ferrucci and A. Lally. Building an example
application with the unstructured information management
directed navigation. We plan to extend the work for architecture. IBM Systems Journal, 43(3):433-455, 2004.
autonomic computing and problem management. Our [13] T. Li, S. Zhu, and M. Ogihara. Mining patterns from
work is also continuing towards identifying CIs case base analysis. In Workshop on Integrating Data Mining
corresponding to system events. That work will be and Knowledge Management, 2001.
useful in correlating system events with client reported [14] I. Rish, M. Brodie, N. Odintsova, S. Ma, and G.
incidents. Grabarnik. Real-time problem determination in distributed
systems using active probing. In Proceedings of Network
Operations and Management Symposium, Seoul, Korea,
9. References April 2004.
[15] S. Sarawagi. Automation in information extraction and
[1] Apache Lucene: Full featured text search engine. integration (tutorial). In VLDB, 2002.
http://lucene.apache.org/java/docs/index.html. [16] S.R. Gunn. Support vector machines for classification
[2] Java Beans. and regression. Technical Report ISIS-1-98, Department of
http://java.sun.com/products/javabeans/docs/index.html. Electronics and Computer Science, University of
[3] Reducing IT support costs through automated electronic Southampton, 1998
end-user support. http://www-935.ibm.com/services/us [17] IBM Tivoli Enterprise Console.
/its/pdf/g510-3499-00.pdf. www.ibm.com/software/tivoli
[4] Using Java Reflection. http://java.sun.com/developer [18] R. Gupta, K. H. Prasad and M. Mohania. Information
/technicalArticles/ALT/Reflection. integration techniques to automate incident management.
[5] What do you need from a configuration management (Short paper) NOMS 2008.
database (CMDB)? www.bmc.com/USA/Corporate/
BSM/attachments/BMC CMDB wp en.pdf.

150

S-ar putea să vă placă și