Documente Academic
Documente Profesional
Documente Cultură
142
The rest of the paper is organized as follows. In relationships. To find the database on which billing
Section 3, we give details of data model of the application is running one needs to traverse depth=2.
configuration management database. We use object
data model and relationship between objects for CMDB have one other kind of relationships of
performing context oriented search. Steps to implicit types. These implicit relationships are part of
automatically identify keywords from incident the CMDB data model. Figure 3 shows an example of
description are presented in Section 4. Searching implicit relationship where a Unix Computer is sub-
CMDB for keywords along with desired relationships category of Computer and Operating System is a child
is presented in Section 5. Implementation details and of that. These explicit and implicit relationships can be
performance results showing effectiveness of our used to know the details of all the linux machines in the
technique are presented in Section 6. Related work is enterprise such as which machine they are installed-on,
presented in Section 7. The paper concludes in Section what software’s are running on them, etc. When an L1
8. person searches the CMDB, she mainly considers CIs
without bothering about relationships. Usually CMDB
is searched using keywords extracted from the incident
Billing
Application
description. All CIs having attributes values matching
deployed-on with those keywords are returned as keyword search
results. If L1 person is not satisfied by the results
Apache
Web Server WebSphere she/he searches again and/or explores CIs related to
depends-on
Application returned CIs in a very ad-hoc manner. This whole
Server 5.2
process is largely manual heavily dependent on skills
runs-on Oracle transactiona of L1 persons making it unreliable and costly. For a
Database runs-on l- given incident, we propose a method to search CMDB,
runs-on DB2 considering both CIs and relationships, so that failing
Aix Machine Database component can be identified with more accuracy and
hpux1
minimal human intervention. First step of our
runs-on
Aix Machine automated architecture is identifying important
hpux2
keywords from incident which is explained next.
Aix Machine
hpux3
143
information, e-mail address, etc. 2) Ticket information 5. Searching CMDB
including status of the ticket, severity, system/
component related to the ticket, etc. 3) Problem All the keywords extracted as explained in the previous
description giving client’s perception of the problem. section can be used to search the CMDB. But this
For identifying relevant keywords mainly problem ‘vanilla’ search is not likely to be efficient due to
description is used. Customer and other ticket related following considerations:
information are mainly used for determining scope of Typical CMDB implementations are not
the search in CMDB. For example, customer related done with keyword search in mind.
information can be used to search only a particular part Instead, CMDB is mainly used for
of CMDB belonging to that customer. Simple rule browsing from one object to its parent/
engines can be created to process the structured child or other related objects.
customer and ticket information but processing natural CMDB is largely implemented as object
language description of the problem using simple rules oriented database [5]. Concept of keyword
is not possible. L1 person needs to understand the search over object oriented database is not
problem to decide on the keywords to be used to search well defined. Various questions such as
CMDB. L1 person expects that these keywords will which attribute(s) to search, how to search
result in the possible failing component for the given child objects, is it possible to index
incident ticket. In this section we describe a method of objects, etc. need to be answered for
automatically extracting relevant keywords from efficiently searching an object oriented
incident description. database.
While doing keyword search, relationships
Incident tickets are usually managed using with are not taken into account. Keyword
web-based interface. Thus first step of the incident search and relationship browsing need to
processing is to extract the incident description using be integrated.
HTTP parser. As information entered in the incident When a customer describes a problem it is
description may be unclean, we need to clean that his/her perception of the problem which
unstructured data. We used edit distance between may not include much information about
keywords and dictionary words to clean the incident the actual failing component. Instead, it is
description. As one needs to search for keywords in CI likely to have some information about
types (e.g. oracle database) and attribute values (e.g. neighboring/dependent objects.
hpux3.ibm.com) of CIs; in an incident description (in
English language) only nouns or noun phrases need to For efficiently and automatically locating failing
be extracted. Next step in identifying keywords is to components we use indexing and object navigation
normalize the extracted nouns. For example a which are explained in next two sub-sections.
computer name may be mentioned in its short form
(hpux3 for hpux3.in.ibm.com) thus, if required, we 5.1 Making search efficient
identify such machine names and normalize them by
appending their domain names. Similarly it is possible For making CMDB search efficient we index all the
that all the attributes use IP addresses whereas incident CMDB objects. Indexing CMDB objects is non-trivial
description has URL. Thus, we need to replace them as we should be able to search not only with keywords
with commonly agreed format. To further improve the but also with relationships. For example, we should be
search performance we annotate noun-phrases using able to efficiently search all the Computer-Systems
named-entity (NE) annotators [12] which annotates which have Linux Enterprise Edition v3.1 as operating
keywords based on their types. By associating CI-types system. Further, it should be possible to search for
with these keyword annotations, we limit the search relationships which are arbitrarily deep. We solve all
scope further. For example, if a particular keyword is these issues by creating two indices over CMDB.
identified as software-name it need not be searched in
CIs of type ComputerSystem. A dictionary matcher can Aim of the keyword index is to identify objects
be used to categorize keywords to classes such as having a given keyword in the values of its attributes.
hardware and software using a dictionary of names. This index maps each keyword with a set of objects.
Similarly, it can be used to identify WebSphere The keyword index has four columns: keyword, object-
application server as a single keyword. id, object-desc and object-class. The object-desc
(object description) column is used to display name of
objects so that service personnel can identify and get
144
details of any of the displayed object. The object-class 5.2 Limiting search results
column is used to indicate type of the object e.g.
Computer System, Database, Security certificate, etc. Keyword search may result in lots of objects. We do
To synchronize the index with the modifications in the not want to overburden the service desk personnel by
underlying CMDB, we poll the CMDB logs showing lots of results. Further, search context is
periodically to get updates since last poll. All the captured by the objects represented by keywords as
newly added/ deleted/ updated objects are reflected in well as their neighborhood objects; and one naïve way
the index. Frequency of the index synchronizing is to get all the neighborhood objects of vanilla
process should be decided based on number and keyword search results. Such an approach is also likely
frequency of changes occurring in CMDB. Usually to return lots of unnecessary objects which the service
CMDB is modified using resource discovery desk personnel will have to examine. In general all the
mechanism [] which is done once a day. Thus, we CMDB objects can be related to each other thus we
perform the synchronization once a day which need some way of limiting number of search results. In
synchronizes keyword index as well as relationship general, search results can be limited by two ways:
index which is explained next.
1. Keyword search results are returned with their
Relationship index is created to efficiently browse corresponding TF-IDF [10] scores. In this case objects
over related objects. We use these relationships to with top-k relevance scores are shown to the service
extract context of the incident search automatically. desk personnel. She can select the failing component
For example, if incident describes two machines which from the list shown.
are not able to communicate, then possible failing
component may be the CI corresponding to a network- 2. In this case a list of objects is shown so that ratio of
link between them. As in keyword index, for creating scores between bottom ranking object and that of top
the relationship index also all configuration items are ranking object is above a certain threshold. This
obtained one-by-one and all its depth 1 relationships approach is motivated by the fact that if bottom rank
are stored in the relationship index. For getting objects object has very low relevance score then it is better not
related with higher depths, multiple invocations on the to show that to service desk personnel.
relationship index can be done. Each relationship can
be defined by its source object, target object and We combine the above two criteria to output
relationship type between them. Table 1 shows minimum results to the service desk personnel.
example entries for the relationship index. In the Configurable parameter nmax is used to indicate the
relationship index, oid-1 is the object-id of the object maximum number of results to be returned (top-k).
we are parsing, oid-2 is the object-id of the related Another parameter is used to indicate the lower
object, relationship-type gives name of the bound on the ratio of score of bottom ranked object
implicit/explicit relationship and direction is forward and the top-ranked object. Let the results returned by
(backward) if parsed CI is the source (target) CI of the the CMDB search have relevancy scores {c1, c2,... cn}
relationship. Both explicit as well as implicit (parent- in descending order. Thus,
child) relationships are covered using the relationship cn
index. n < n max and >γ (1)
c1
Actual number of objects returned by the keyword
Table 1: Example relationship index search will be given by the more restrictive of
oid-1 oid-2 relationship- direction
type
conditions given by Equation (1).
ab4569000 ph9067230 installed On forward
ph9067230 ab4569000 installed On backward 5.3 Getting search context
ab4569000 dc4587689 parent-child forward
As mentioned previously, search context is obtained by
dc4587689 ab4569000 parent-child backward
using neighborhood objects of directly mentioned
objects in the incident description. If there is no other
Keyword index and relationship index are used to information then we can simply consider getting
efficiently search the CMDB. In the next section, we neighbors of all the object resulted from vanilla
describe how relationship browsing can be done to keyword search. This forms our base-line scheme.
automatically obtain search context.
5.3.1 Omni-directional search. As one object may be
related, directly or indirectly, to a large number of
145
objects, we can not access all the related objects. We Equations (5) and (6) can be used reduce the search
limit the object navigation up to a certain relationship- scope without any adverse impact on quality of results
depth (rd). E.g., rd=1 signifies only directly related presented to the service personnel. Typical values of
objects will be used for getting incident context, by various parameters discussed in this section are given
keeping rd =2 we get objects which are directly related in Table 2.
to the related objects (database to billing application in
Figure 2), etc. For getting all the related objects, Table 2: Typical parameter values
irrespective of depth, rd can be set to 0. Relevancy Parameter-name Value
score of neighborhood objects depends on the score of nmax 20
the keyword search results to which they are related, α 3
type of relationship and depth. Equation (2) is used to 3
get relevancy score of a related CI (RCI) which is 0.05
related to one keyword search CI (KSCI).
wi ( rel _ type ) 5.3.2 Super object search: This case is useful when
RR = r × ∏ (2) service desk personnel can guess the type of object
0≤i ≤ rd α
which may be responsible for the incoming incident. In
RR is the relevancy score of the related CI (RCI), r
this case all child objects are considered as part of its
is the relevancy score of the KSCI to which it is
parent object. For example, if we want to find the
related, w(rel_type) is the weight assigned to the type machine (Computer System) responsible for the
of relationship, and rd is the relationship-depth incident then we search for keyword is super object
between the KSCI and its related RCI. Value of consisting of Computer System object along with all its
w(rel_type) lies between 0 and 1, depending on the children like Operating System, File System, etc. This
type of relationship. We give more weight to parent- ensures that whole physical object is considered a
child implicit relationship compared to explicit single logical object leading to better results. There are
relationships. Further, among explicit relationships
two problems with this approach. First, how can we
larger weight is given to a relationship type if any
automate the selection of object type – we answer this
problem in their target may lead to problem in source.
question in Section 5.3.4. Second, we are only
Configuration parameter α (> 1) makes sure that the
considering relationships which are implicit in the data
RR value of the RCI is less than the relevancy (TF-
model. Thus, results for this depend a lot on the data
IDF) score of its related KSCI. If an RCI is related to model. We correct that in our next approach.
more than one KSCI then its relevancy score is sum of
all its individual RR values. It should be noted that we 5.3.3 Directed search. In this case we want to include
are only interested in neighboring objects having score both implicit as well as explicit relationship in a
more than cn (otherwise it will not be part of results manner which is more intelligent than the one
shown to the service personnel). As one object may be presented in Section 5.3.1. We do directed search by
related to more than one object, we calculate scores of using what we call search templates. This scheme is
all related objects having RR values more than cn/; useful when service desk personnel can identify the
where (>1) is a configuration parameter. Thus, an type of object responsible for the incident along with
RCI can be part of result only if: relationships which can be useful in getting the search
wi ( rel _ type) cn context. In general, a search template can be
RR = ci × ∏ > (3)
0≤ i ≤ d α β represented as tree of CIs and relationships as shown in
This gives us the bounds on the relationship weight Figure 4. Search templates are a subset of data model
and depth to which relationships are to be explored. including CMDB object types with implicit and
These bounds reduce the navigation scope drastically, explicit relationships between them. Directed search
thus improving search performance. As w(rel_type) [1 has following distinctive features:
and ci [c1, from Equation (3): o Keyword search using this template will
c1 / α d > cn / β (4) involve CMDB objects of types database
server, application server, Java server and
Thus, from Equations (1) and (4) we get,
application.
log(β / γ )
d< (5) o All resultant objects of child type are
log(α )
traversed using relationship specified on
Similarly, we can obtain the minimum weight of edges to get their corresponding parent
the relationship to be explored using, objects. Thus, all the resultant objects of type
w( rel _ type) > α × γ / β (6) “Application” are traversed using “Deployed
on” relationship to get their corresponding
146
“Application Server” objects. These objects based mapping was used to select template(s) to be
are further traversed using “Dependency” used for a particular class. For example, for incidents
relationship to get the corresponding of class connectivity problems we can have template
“Database Server” objects. for Computer Systems as root with its network
o Service desk person is presented with objects interfaces as non-root objects.
of the root type, that is, “Database Server”.
o Relevancy score of each object is calculated
in the manner similar to the one described in Computer hardware: wireless, key-board, mouse
Section 5.3.1. Thus, a “Database Server” Desktop problems: windows, software, word, hang
object which is parent of lots of resultant child AppServer: websphere, servlet, jsp, page, application
objects (result of keyword search) will have Database: database, db, sql, db2, table, oracle
Networking: connection, ping, access, connectivity
higher score.
147
the help of subject matter experts (SMEs). For each
incident for a particular class a set of one or more
templates were obtained. Directed search was
performed with corresponding to each such
template and a union of results was presented to
service desk personnel.
148
5.3.1 as given by legends OmniDir and Bounds
respectively. Number of accesses for the super-object
case is slightly higher than the directional-scheme as
former explores all the implicit relationships
irrespective of their importance. As can be seen from
Figure 8, we get better results, compared to the omni-
directional search, if we can correctly identify the
template to use. Omni-directional scheme not only
gives lesser accuracy but also has more response time
149
the relevant data stored in a relational repository. Their [6] T. Acorn and S. Walden. SMART: Support management
algorithm is primarily based on identifying the reasoning technology for Compaq customer service. In
keywords from the text and searching the RDBMS Innovative Applications of Artificial Intelligence, Volume 4,
based on these keywords and predefined templates. 1992.
[7] C. Bartolini and M. Salle. Business driven prioritization
However, for incident management we need to of service incidents. DSOM, 2004.
associate an incident with configuration items which [8] C. Bartolini, M. Salle, and D. Trastour. It service
are stored as (Java) objects in CMDB. In this paper, we management driven by business objectives: an application to
extend the technology for the CIs stored in CMDB and incident management. NOMS, 2006.
use it for service desk management. [9] M. Brodie, S. Ma, G. Lohman, T. Syeda-Mahmood,
L. Mignet, N. Modani, M. Wilding, J. Champlin, and P.
8. Conclusions and future works Sohn. Quickly Finding known software problems via
automated symptom matching. In Workshop on Integrating
Data Mining and Knowledge Management, 2005.
In this paper we presented a technique to identify [10] S. Chakrabarti. Mining the Web. Discovering
failing component by integrating text specified in the Knowledge from Hypertext Data. Morgan Kaufmann
problem ticket with structured data stored in CMDB Publishers, 2002.
database along with incident classification. In our [11] V. Chakravarty, H. Gupta, P. Roy, and M. Mohania.
system, improvement in incident management process Efficiently linking text documents with relevant structured
occurs due to three main reasons: automated information. In International conference on Very Large
identification of keywords, search over CMDB using Databases (VLDB), 2006.
search context, and limiting search scope using [12] D. Ferrucci and A. Lally. Building an example
application with the unstructured information management
directed navigation. We plan to extend the work for architecture. IBM Systems Journal, 43(3):433-455, 2004.
autonomic computing and problem management. Our [13] T. Li, S. Zhu, and M. Ogihara. Mining patterns from
work is also continuing towards identifying CIs case base analysis. In Workshop on Integrating Data Mining
corresponding to system events. That work will be and Knowledge Management, 2001.
useful in correlating system events with client reported [14] I. Rish, M. Brodie, N. Odintsova, S. Ma, and G.
incidents. Grabarnik. Real-time problem determination in distributed
systems using active probing. In Proceedings of Network
Operations and Management Symposium, Seoul, Korea,
9. References April 2004.
[15] S. Sarawagi. Automation in information extraction and
[1] Apache Lucene: Full featured text search engine. integration (tutorial). In VLDB, 2002.
http://lucene.apache.org/java/docs/index.html. [16] S.R. Gunn. Support vector machines for classification
[2] Java Beans. and regression. Technical Report ISIS-1-98, Department of
http://java.sun.com/products/javabeans/docs/index.html. Electronics and Computer Science, University of
[3] Reducing IT support costs through automated electronic Southampton, 1998
end-user support. http://www-935.ibm.com/services/us [17] IBM Tivoli Enterprise Console.
/its/pdf/g510-3499-00.pdf. www.ibm.com/software/tivoli
[4] Using Java Reflection. http://java.sun.com/developer [18] R. Gupta, K. H. Prasad and M. Mohania. Information
/technicalArticles/ALT/Reflection. integration techniques to automate incident management.
[5] What do you need from a configuration management (Short paper) NOMS 2008.
database (CMDB)? www.bmc.com/USA/Corporate/
BSM/attachments/BMC CMDB wp en.pdf.
150