Automated Ticket Resolution

Recommending resolutions of ITIL services tickets using
Deep Neural Network
Durga Prasad Muni1 , Suman Roy1 , Yeung Tack Yan John John Lew Chiang1 , Antoine Jean-Marie Viallet∗2 and Navin
Budhiraja1
1
Infosys Ltd., #44 Electronic City, Hosur Road, Bangalore 560100, India
2
SUPELEC engineering school, Gif-sur-Yvette, France
{Durgaprasad Muni,Suman Roy,Yeung Chiang,Navin.Budhiraja}@infosys.com,viallet.antoine@gmail.com
ABSTRACT Resolution; Resolution Recovery

Application development and maintenance is a good exam-
ple of Information Technology Infrastructure Library (ITIL)
services in which a sizable volume of tickets are raised ev- 1. INTRODUCTION
eryday for different issues to be resolved in order to deliver Ticketing system is used as one of the inputs for Infor-
uninterrupted service. An issue is captured as summary on mation Technology Infrastructure Library (ITIL) services
the ticket and once a ticket is resolved, the solution is also such as problem management and configuration manage-
noted down on the ticket as resolution. It will be beneficial ment. Specifically, in production support related application
to automatically extract information from the description development and maintenance projects, the incident data in
of tickets to improve operations like identifying critical and the form of tickets are used for different purposes such as
frequent issues, grouping of tickets based on textual con- SLA calculation, forecasting, optimum resource level check-
tent, suggesting remedial measures for them etc. In partic- ing, quick metrics computation etc. A huge number of tick-
ular, the maintenance people can save a lot of effort and ets are raised by the user on the ticketing system for the pur-
time if they have access to past remedial actions for simi- pose of resolving their problems while using different support
lar kind of tickets raised earlier based on history data. In systems. A ticketing system tries to minimize the business
this work we propose an automated method based on deep impact of incidents by addressing the concerns of the raised
neural networks for recommending resolutions for incoming tickets. Any prior knowledge which can be obtained by min-
tickets. We use ideas from deep structured semantic mod- ing ticket data can help in quick redressal of the problem.
els (DSSM) for web search for such resolution recovery. We The incident tickets record symptom description of issues,
project a small subset of existing tickets in pairs and an in- as well as details on the incident resolution using a range of
coming ticket to a low dimensional feature space, following structured fields such as date, resolver, affected servers and
which we compute the similarity of an existing ticket with services and a couple of free-form entries outlining the de-
the new ticket. We select the pair of tickets which has the scription/summary of issues, note by users/administrators
maximum similarity with the incoming ticket and publish etc.
both of its resolutions as the suggested resolutions for the The issues are captured as summaries on the tickets and
latter ticket. The experiment of our data sets shows that we once a ticket is resolved, the solution is also noted down on
are able to achieve a promising similarity match of about the ticket as resolution. The maintenance people can save a
70% - 90% between the suggestions and the actual resolu- lot of effort and time if they have access to previous remedial
tion. actions for similar kind of tickets based on past history. In
this work we propose a method based on deep learning for
suggesting resolutions for new tickets.
Keywords Deep learning allows computational models that are com-
Deep Learning; Neural Network; Deep Neural Network; Ticket; posed of multiple processing layers to learn representations
∗
This work was done when Antoine was an intern at Infosys of data with multiple levels of abstraction [11], - these are
Ltd during July-Sept, 2016 called deep neural networks. Deep neural networks (DNN)
are becoming popular these days for providing efficient so-
Permission to make digital or hard copies of all or part of this work for lutions for many problems related to language and informa-
personal or classroom use is granted without fee provided that copies are not
made or distributed for profit or commercial advantage and that copies bear tion retrieval [1, 3, 6, 22, 24, 21]. In this work we use a
this notice and the full citation on the first page. Copyrights for components framework of deep learning network for resolution recovery
of this work owned by others than ACM must be honored. Abstracting with by lifting ideas from similar deep structured semantic mod-
credit is permitted. To copy otherwise, or republish, to post on servers or to els (DSSM) for web search [9, 8, 7]. In those papers the
redistribute to lists, requires prior specific permission and/or a fee. Request authors use DNN to rank a set of documents against a given
permissions from Permissions@acm.org. query in the following steps. First, the DNN carries out a
IKDD CODS 2017, March 9-11 2017, Chennai, India
Copyright 2017 ACM 978-1-4503-2776-3/14/02 . . . $15.00. non-linear projection to map the query and the documents
DOI: 10.1145/3041823.3041831 to a common semantic space in which the relevance of each
Figure 1: A schematic diagram of our Approach
document for the query is computed in that semantic space.

Then the neural networks are learned using the clickthrough Figure 2: Snapshot of relevant parts of incident
data by maximizing the conditional likelihood of the clicked ticket data
document given the query.
Motivated by this we propose a method for recommending tain a much richer classification. A small note is recorded as
resolution for incoming tickets in ITIL services using similar resolution taken for each ticket. A small part of ticket data
approach. We treat existing tickets (or historical tickets) is shown in Figure 2.
as a set of documents while an incoming ticket resembles a
query. The tickets are represented using features consisting
2.1 Feature vector creation from Ticket Data
of words from their summaries after lemmatization and re- We assume a free field of ticket to contain a succinct prob-
moval of stop words. We map an incoming ticket and a small lem description associated with a ticket in the form of a
subset of existing tickets to a low dimension feature space. summary (or call description). We consider the collection
This is done by feeding feature vectors for existing tickets of summary of tickets for extracting the feature vector cor-
and the incoming ticket to a deep neural network (DNN) and responding to a ticket. We use light natural language pro-
thus obtaining the semantic representation of the feature cessing for feature vector generation. As a pre-processing
vectors in a very low dimensional space. Then we compute we remove the tickets which do not contain either a sum-
the similarity between the low-dimensional feature vector of mary or a resolution or both. In the beginning we perform
the new ticket and the existing tickets to find the most sim- lemmatization of the words in the the summary of tickets.
ilar ticket with respect to the new ticket. Subsequently, we Then we use Stanford NLP tool [4] to parse the useful con-
publish the resolutions of the most similar (existing) ticket tents in the summary of the tickets and tag them as tokens.
pair as the recommended solutions for the fresh ticket. A Next we set up some rules for removing tokens which are stop
schematic diagram of our method is shown in Figure 1. words. We compute document frequency (DF)1 of each lem-
The paper is organized as follows. In the section 2 we matized word. We discard the words whose DF is smaller
describe the schema of our data set and the steps for fea- than 3. Ticket summary may contain some very rare words
ture extraction from a ticket. We introduce our deep neural like name of a person (user who raised the ticket) and some
network and its training and validation steps in Section 3. noise words. By removing words with DF < 3, we can re-
Experiments of our company data set are discussed in Sec- move these very rare words which do not contribute to the
tion 4. We state related work in Section 5 before concluding content of ticket summary. In this way, the feature vector
in Section 6. size could be reduced significantly.
We can model a ticket as a vector T (., . . . , .), where each
element T (xi ) represents the importance or the weight of a
2. TICKET DATA SET word w with respect to the ticket. One needs to choose a
We consider incident tickets with similar schema which suitable weighing scheme to best describe these tickets. We
are frequent in ITIL. These tickets usually consist of two shall use TF*IDF [20] of a word as its weight2 . Finally, a
fields [10, 14], fixed and free form. Fixed-fields are cus- profile of a ticket is given as T = (x1 , . . . , xn ) = ~
x, where
tomized and inserted in a menu-driven fashion. Example x1 , . . . , xn are the appropriate weights for the chosen words
of such items are the ticket’s identifier, the time the ticket w1 , . . . , wn respectively from the summary of T . As ticket
is raised or closed on the system or, if a ticket is of inci- summaries are short, most of the words may appear only
dent or request in nature. Various other information are once. If we take TF (term frequency) to represent the weight
captured through these fixed fields such as category of a of an element in the feature vector then most of the entries
ticket, employee number of the user raising the ticket etc, in the feature vector will be 1 which do not convey much
and also maintenance team performance parameters like re- meaningful information. So, we have considered TF*IDF
sponse time, resolution time of the ticket etc. But, fixed values as weights for feature vector representation.
fields do not convey much information about the incident
itself. There is no standard value for free-form fields. The 2.2 Relational schema on fixed elements
concern/issue for raising a ticket is captured as “call descrip- The fixed field entries of a ticket can be represented using
tion” or “summary” as free-formed texts, - it can be a just a 1
Document frequency of a word is the number of tickets
sentence that summarizes the problem reported in it, or it (ticket summaries) containing the word in the data set (cor-
may contain a detailed description of the incident. By using pus) [12]
freely generated part of tickets, administrators can get to 2
TF*IDF is a popular metric in the data mining litera-
know about unforeseen network incidents and can also ob- ture [12]
a relational schema. For that we shall consider only a limited
number of fixed fields of a ticket for choosing attributes that
reflect its main characteristics (the domain experts’ com-
ments play an important role in choosing the fixed fields),
for example the attributes can be, - application name, cate-
gory and sub-category. They can be represented as a tuple:
Ticket(application name, category and sub-category). Each
of the tuples corresponding to entries in the fixed fields in the
ticket can be thought of an instantiation of the schema. Ex-
amples of rows of such schema can be, (AS400 - Legacy Man-
ufacturing, Software, Application Errors), (AS400 Legacy -
Retail, Software, Application Functionality Issue) etc. The
relation key can vary from 1 to number of distinct tuples Figure 3: Structure of Deep Neural Network
in the schema. One such key can hold several Incident IDs,
that is, it can contain several tickets with different IDs. ilarity to choose more similar (dissimilar) tickets out of those
and filter out those tickets which are not that much similar
(dissimilar) wrt the given ticket. The semantic similarity
3. DEEP NEURAL NETWORK (DNN) FOR approach is discussed in the Subsection 3.1. Secondly, we
RECOMMENDING RESOLUTION have designed the DNN in such a way that it will consider
Artificial Neural Networks (ANNs) or Neural Networks the tickets in pairs. By considering a pair of chosen similar
(NNs) are networks inspired by biological neural networks. (dissimilar) tickets, we ensure there is higher chance that
These are made up of interconnected processing units. These they are more similar (dissimilar) as a combination. It is
are used for tasks such as estimation, function approxima- like a committee of classifiers that improve the performance
tion and pattern recognition. over a single classifier model [15]. We may consider multiple
We use feed forward Deep Neural Networks(DNNs) that (≥ 2) chosen tickets for better similarity (dissimilarity) by
have multiple hidden layers. Each layer in a feed forward which the mismatch (match) will be balanced out. In this
Neural Network adds its own level of non-linearity that can paper, however we consider only pairs of similar (dissimilar)
solve more complex problems. Our DNN model processes tickets for training for ease of computation.
input vectors in two stages. First, the DNN maps high-
dimensional sparse text features of a document layer by layer 3.1 Finding semantic similar tickets for train-
into low-dimensional feature vector. In the next stage, the ing
low dimensional feature vectors are passed through cosine For training DNNs we need to find out similar tickets cor-
similarity function computation gate. responding to a given ticket. Towards that we use a tech-
The architecture of deep neural network DNN [9] that we nique based on descriptive features for computing the sim-
use to recommend resolution for a new ticket is given in ilarity of summaries of two tickets [13]. In this approach
Fig 3. The objective of the DNN model is to find the ticket (SS), a sentence is a sequence of words each of which carries
from the existing tickets with the summary that is the most useful information. A joint word set is formed dynamically
similar wrt a new ticket. To achieve this goal, we can train using all the distinct words in a pair of sentences. A raw se-
the DNN with a set of ticket summaries coupled with their mantic vector is carved out of each sentence with the help of
respective similar and dissimilar ticket summaries. These a lexical database which is augmented with information con-
similar and dissimilar tickets are obtained through multiple tent for words. Finally we calculate the semantic similarity
stages of processing. At first, we assume that tickets with for two tickets Ti and Tj using the cosine distance between
the chosen combination of fixed fields or tuple (application these two semantic vectors, denoted as sem-sim(Ti , Tj ). This
name, category and sub-category combination) are similar semantic similarity of two tickets is different from the simi-
to some extent. Alternatively, we assume that tickets from larity of two tickets in probabilistic IR models in Section A
different tuples are dissimilar to each other. To obtain sim- in Appendix.
ilar tickets for a given ticket, we randomly pick a subset of Moreover, we say a ticket Ti is semantically similar (or
tickets from same tuple as that of the given ticket. simply similar) to a ticket Tj if sem-sim(Ti , Tj ) > 0.5. Oth-
However, we cannot definitely say that tickets from same erwise, Ti is dissimilar with Tj . For training the DNN we
tuple are always similar. The combination of fixed fields, need to choose tickets which are semantically similar to the
application name, category and sub-category (or a sub- ticket Ti belonging to the same tuple τ . Given a ticket Ti
combination) broadly bucket the tickets into same group, having tuple τ one needs to pick tickets at random and check
but there may be difference among tickets within the same if they are similar to Ti . Similarly we need to randomly pick
tuple. Moreover, since a user chooses attributes of these up tickets having tuples other than τ which are dissimilar
fixed fields (application name, category and sub-category) with Ti . However, we would like to restrict the number of
for a ticket in a menu-driven fashion, there is always a chance draws in such a way that the probability of choosing simi-
that the user may have chosen wrong entries for such menu- lar/dissimilar tickets is maximized.
driven options. It is possible to find out the probability of choosing tickets
To avoid this issue, we follow two steps to obtain better which are similar to a ticket belonging to the same tuple
similar and dissimilar tickets which lead to better training of and also those which are dissimilar. Suppose there are Nτ
the model. Firstly, for choosing similar (dissimilar) tickets, tickets which have the same tuple τ . Further assume that
we pick the tickets from same (different) tuple for a given there are Nτi number of tickets having tuple τ which are
ticket, and then we use the approach based on semantic sim- similar to ticket Ti . Then the probability of choosing a ticket
Ni
similar to ticket Ti is p = Nττ . Also the probability of failure,
that is, choosing a ticket which is dissimilar to ticket Ti is,
q = 1 − p. As we are required to pick a pair of tickets we
need to decide on the number of draws of tickets from the
collection in which there is a maximum chance of choosing
at least 2 tickets which are similar to the ticket Ti . Based
on the computation in Section B in Appendix we decide to
draw 5 tickets and choose two tickets which has got higher
semantic similarity values with the given ticket Ti .
Next we find an optimal number of draws in which exactly
2 number of dissimilar tickets are chosen. Recall that we pick
up dissimilar tickets from the set of tickets having a tuple
other than τ (to which the given ticket Ti belongs). We
argue that it is enough to draw 2 tickets from the collection
of tickets with non-τ tuples and select the tickets having less Figure 4: DNNR: Part of DNN that reduces the
similarity with Ti . To see this observe that in this case the dimension
success corresponds to the event of selecting tickets which
are dissimilar with Ti . Drawing dissimilar tickets from a ing the method as mentioned earlier. These four pairs of
collection of tickets having tuples other than τ can be seen similar and dissimilar tickets are represented by a set Tm .
as independent events. Using the argument in Section C in The given ticket Tm and each of the pairs of similar and
the Appendix we choose to draw 2 tickets randomly from dissimilar tickets are fed to the DNNR one by one. Let ym
the collection of tickets having non-τ tuple. be the output feature vector for Tm and yi and yj be the
output feature vectors for Ti and Tj respectively.
3.2 Structure of the DNN The cosine similarity between one output ym and another
The structure of the DNN for recommending resolution for output yi are computed using Eqn 3. Then this cosine simi-
a new ticket is given in Fig 3. The network finds similarity larity of two tickets (Ti , Tj ) wrt the ticket Tm are combined
between the new ticket summary and a subset of existing to generate R(Tm , T(i,j) ) in Eqn 4. Same equations are used
ticket summaries. Prior to computing similarity, the DNN for computing the cosine metric and R-value for dissimilar
reduces the high dimensional feature vectors representing pairs.
ticket summaries into low-dimensional vectors. For that, it
uses DNNR that reduces dimension. This is a multilayer
ym T yi
feed-forward deep neural network. cosine(ym , yi ) = (3)
The structure of our DNNR is given in Fig 4. The input ||ym ||||yi ||
layer of DNNR consists of n number of nodes where n is
the size of the feature vector of ticket summary. We assume 1
R(Tm , T(i,j) ) = (cosine(ym , yi ) + cosine(ym , yj )) (4)
there are N − 1 number of hidden layers. Let ~ x be the input 2
feature vector, y as output vector. Let hi , i = 1, 2, . . . , N be These R-values of pairs of similar and dissimilar tickets
the intermediate hidden layers, Wi be the ith weight matrix wrt Tm are fed to the Softmax function as shown in Fig 3.
and bi be the ith bias term. We have The Softmax function computes posterior probabilities [9].
m
h1 = W1 ~x The posterior probability for R(Tm , T(i,j) )3 is given in Eqn 5
hi = f (Wi hi−1 + bi ), i = 2, 3, . . . , N − 1. (1) as below.
m
y = f (WN hN −1 + bN ) m exp(γR(Tm , T(i,j) ))
P (T(i,j) |Tm ) = P , (5)
T m0 ∈Tm exp(γR(T m , T(im0 ,j 0 ) ))
We use tanh as the activation function at the output layer (i ,j 0 )
and at the hidden layers. Recall the tanh function is defined

where γ is the smoothing parameter in the Softmax function.
as
As our objective is to find the most similar ticket (pairs) for
1 − e−2z a given ticket Tm , we maximize the posterior probability for
f (z) = tanh(z) = (2)
1 + e2z the similar (or positive) pairs. Alternatively, we minimize
The output of DNNR is passed through a cosine similarity the following loss function
function as shown in Fig 3. As we consider pair of tickets, the L(Ω) = −log
Y m+
P (T(i,j) |Tm ) (6)
output of two cosine similarity function nodes are combined. m+
(Tm ,T(i,j) )
3.3 Training of the DNN model

where Ω denotes the set of parameters {Wi , bi : i =
For the training of the DNN, we take a set of M ticket sum-
1, 2, . . . , N } of the neural networks. L(Ω) is differentiable
maries (Tm , m = 1, 2, . . . , M ). The summary of each ticket
m+ wrt Ω as it is continuous and its (partial) derivatives
Tm is coupled with one pair of similar tickets T(i,j) and three
m−
are also continuous. So, the DNN can be trained using
pairs of dissimilar tickets T(i0 ,j 0 ) . We use the abridged nota- gradient-based numerical optimization algorithm. The de-
m+
tion T(i,j) for the chosen similar tickets Ti , Tj corresponding tailed derivation is given in Section D of Appendix.
to the ticket Tm . Similarly, we use the notation T(im− 0 ,j 0 ) for 3
this posterior probability resembles the probabilistic IR
the chosen dissimilar tickets Ti0 , Tj 0 corresponding to the model similarity in Section A in Appendix for ranking tick-
ticket Tm . These similar (dissimilar) tickets are chosen us- ets
3.4 Validation of the DNN model Domain Total tickets No of Input feature
tuples vector dimension
The resolution for a new ticket T is recommended as fol-
lows. At first we seek the tuple τ corresponding to T . Then AMD 10653 270 1253
the existing tickets belonging to τ are fed to DNN. We as- Retail 14379 150 491
sume that a ticket having the tuple τ is relevant, at least
partially to the tickets belonging to the same τ in terms of Table 1: Ticket data from different domains
resolution. To find the most relevant resolution correspond-
ing to those tickets, we use the DNN to find the most similar Domain Total #Reco- Average Average
ticket summary wrt new ticket summary. test mmended Similarity Similarity
The feature vector of the new ticket T and those of the tickets tickets (SS evaluator) (manual
existing tickets {Tk , 1 ≤ k ≤ K} from the tuple τ are fed evaluation)
to DNNR one by one. Let y be the output of DNNR for AMD 816 369 0.71 0.92
T and yk be the output of DNNR for Tk . We find the co- Retail 1384 618 0.67 0.70
sine similarity between y and yk for each ticket Tk having
tuple τ . Then we pair two cosine similarity outputs and ob- Table 2: The performance of DNN on recommend-
tain the R-value (Eqn 4) for each pair. There will be K ing resolutions
2
possible pairs. However, we take only K/2 distinct pairs of
combinations. If K is odd then we consider the cosine sim-
ilarity of the remaining ticket summary Tk wrt T with one
of the randomly chosen tickets in τ -tuple and then compute We randomly pick 10% of each data set (tickets) as the
the R-value. Note we are computing the cosine similarity of test set and 20% of the data set as the training set. These
each ticket in tuple τ with the new ticket, we do not leave 20% of the data set as the training set constitute M number
out any ticket in the tuple. of tickets for training. For each ticket of the training set
The ticket pairs are ranked based on the R values. Higher (Tm , m = 1, 2, . . . , M ), we take one pair of similar tickets
the value of R, more similar the ticket pair wrt the new and three pairs of dissimilar tickets from the remaining 70%
ticket T . Out of these K/2 (or dK/2e) pairs, suppose the of the data set (with repetition).
(Tu , Tv )-pair obtain the maximum R-value. Then this pair
collectively is the closest pair wrt T . Then the correspond- 4.3 Recovering resolutions for new tickets
ing resolutions of these two summaries are published as the We train the DNN with 50 epochs or iterations. We at-
recommended resolutions for the new ticket if R-value of tempt training of the model with few values of learning rate
this pair is greater than a threshold value θ. If more than parameter t (Eqn 7). The learning rate t is uniform across
one pair have same maximum R-value (> θ) or almost same the iterations. Out of these trained models using different
maximum R-value (within 1% variation) then we publish all values of learning rate, we consider the model for which the
the corresponding resolutions of these tickets pairs. In our learning curve (by minimizing loss function) was steadily
experiment, we have considered θ = 0.8. decreasing.
After obtaining the trained model, we validated the model
4. EXPERIMENTAL RESULTS using test set. For each ticket of test set, our DNN model
In this section we report on our experiments. We imple- recommends two or more resolutions if the R-value of win-
ment our deep neural network using Java. We perform our ning (best) pair(s) is greater than a threshold value θ. In
experiment using our internal data set. Our approach facili- our experiment, we have considered θ = 0.8.
tates for various architectures of the DNN with many hidden We compared the actual resolution of each test ticket with
layers, and each layer in turn, contains different number of the recommended resolution pair using semantic similarity
nodes. However in this paper, we have considered one single score ranging between 0 to 1. Then we computed the average
architecture with two hidden layers. So, in our experiments, semantic similarity score over all recommended cases. The
we have taken N = 2 + 1 = 3. The first and second hidden total number of test tickets, the number of test tickets for
layers consists of 200 nodes and 100 nodes respectively. The which the model recommended the resolutions with θ = 0.8,
output layer has only 50 nodes. and the average semantic similarity are given for both data
sets in Table 2.
4.1 Internal ticket data Along with semantic similarity approach (SS)-based eval-
We have used data sets from different domains in ITIL uation we have also manually evaluated the similarity be-
services in our company to validate our methodology, these tween two tickets. Towards this we inspected the actual res-
domain are AMD and Retail. The data from AMD domain olution of each ticket and the corresponding recommended
portrays information on application maintenance. It con- resolutions for evaluation purposes. We use three similarity
sists of 10653 tickets. As mentioned earlier, the ticket sum- scores of 0, 0.5 and 1. If the meaning of a pair of actual res-
maries are preprocessed and are represented by TF*IDF fea- olution and recommended language appear to be the same
ture vectors. The dimension of a feature vector that repre- (using meta language oriented informal semantics) then we
sents a ticket summary is 1253. The Retail data includes assign a similarity score of 1 to this pair. If we find that the
14379 tickets which contains data related to services to cus- meaning of the elements of this pair are not exactly same,
tomers. Each ticket summary in this domain is represented but there is some match then we provide a score of 0.5 to this
by a 491-dimensional feature vector. The details of these pair. Otherwise (in case the resolutions completely differ in
two data sets are given in Table 1. their meaning) we score this pair 0. As before we calculate
the average manual similarity score over all test tickets.
4.2 Data Partition The effectiveness of our DNN approach to find similar
Domain Average similarity Average Similarity Domain Total #Reco- Average
(SS evaluator) (manual evaluation) test mmended Similarity
AMD 0.93 0.98 tickets tickets (SS evaluator)
Retail 0.83 0.90 AMD 816 442 0.56
Retail 1384 988 0.60
Table 3: The performance of DNN on finding similar
ticket summaries Table 6: The performance of simple cosine similarity
approach on resolution recommendation
Tuple No. of test No. of % of
Name tickets matches matches
Report Masks Tuple No. of test No. of % of
Administration 99 99 100.00 Name tickets matches matches
Separation Request Report Masks
AS400 Legacy - Direct Response Administration 99 99 100.00
Software 19 12 63.16 Separation Request
Application Errors AS400 Legacy - Direct Response
AS400 Legacy - Manufacturing/Packaging Software 65 41 63.08
Administration 47 34 72.34 Application Errors
Configuration AS400 Legacy - Manufacturing/Packaging
AS400 Legacy - Purchasing Administration 41 28 68.29
Software 8 7 87.50 Configuration
Application Functionality Issue AS400 Legacy - Purchasing
AS400 Legacy - Manufacturing/Packaging Software 20 14 70.00
Software 2 1 50.00 Application Functionality Issue
Application Functionality Issue AS400 Legacy - Manufacturing/Packaging
Software 5 2 40.00
Application Functionality Issue
Table 4: The performance of DNN in top 5 tuples
Table 7: The performance of simple cosine similarity
in top 5 tuples
ticket summaries (comparison between actual test summary
and summaries of recommended tickets) is given in Table 3.
It shows that our DNN approach could successfully find sim- this approach we consider the TF*IDF vector representation
ilar ticket summaries for a given new ticket summary. Lower of ticket summaries (input to the DNN) and compute the
similarity values for recommending resolutions (Table 2) is similarity of a new ticket with other tickets in the same tuple
primarily due to the variations in resolutions as documented to find out the most similar ticket for resolution recovery. In
by different maintenance personnel for even the same ticket this approach the feature vector of a ticket T is represented
summary. as (x1 , . . . , xn ) using n terms/keyphrases, where xi denotes
The performance of the DNN for top (maximum tickets) the TF*IDF weight of the ith term in ticket T . Given two
5 tuples in test set is given in Table 4 for AMD domain. tickets T and T 0 with their feature vector representations
In this result, we count a match if the semantic similarity (x1 , . . . , xn ) and (y1 , . . . , yn ) respectively, their Cosine-based
score using SS approach between actual resolution and rec- Σn xi yi
similarity [12] is given by simC (T, T 0 ) = √ n i=1 2
√ n 2 .
ommended resolution is greater than 0.5. We have given the ( Σi=1 xi )( Σi=1 yi )
name of tuples in the order of application name, category Once a new ticket T 0 arrives, we find the tuple associated
and sub-category. with it. Then we compute the similarity of T 0 with each
We have also conducted experiment without employing ticket T in the same tuple. Then we pick out the ticket hav-
semantic similarity approach (SS) during training. In this ing the maximum similarity (tickets having similarity with
case we pick up 2 tickets randomly from the same tuple as a tolerance 0.05) and publish its resolution as the recom-
of the original ticket and compute their cosine similarity wrt mended resolution for the new ticket.
the latter. The results are shown in Table 5. It highlights The performance of the simple cosine similarity (approach
the fact that our DNN approach does not depend much on 2) is given in Table 6. Note that, the performance using
the SS approach to select similar/dissimilar tickets during simple cosine similarity approach is lower than the DNN
training. approach.
For this simple cosine similarity approach, we have also
4.4 Comparison with other methods provided performance in top 5 tuples in test set for AMD
We now compare our method with two approaches, one domain in Table 7.
based on cosine similarity (approach 2) of feature vector of
tickets and another related to clustering and kNN-search Approach 3: Clustering and kNN-search: In this ap-
(approach 3). Both approaches are described below. proach we consider tickets as bag of items for grouping them
into clusters and place the new ticket in the appropriate clus-
Approach 2: Cosine-based similarity of tickets: In ter to find its k nearest neighbors for recovering its resolu-
tion [18]. In this case we consider both the fixed field entries
and free field (summary) for finding the feature vector of
Domain Total #Reco- Average the tickets. That is, a profile of a ticket is given as T =
test mmended Similarity (x1 , . . . , xk , xk+1 , . . . , xn ), where x1 , . . . , xk are the proper
tickets tickets (manual evaluation) representation (categorical or bucket) of feature elements
AMD 822 363 0.89 corresponding to fixed field entries of T and xk+1 , . . . , xn
Retail 1384 583 0.68 are the appropriate weights (we use TF*IDF [20] in this
case also) for the chosen keywords/keyphrases from the free
Table 5: The performance of DNN on recommend- form entries of T . For more details see [17]. Also we con-
ing resolutions without using SS approach during sider a hybrid approach for computing the distance metric.
training We use Jacaard metric to compute the distance on fixed el-
Domain Average The idea of using deep learning in our work originates
semantic similarity from work on learning deep structured latent models for
AMD 0.38 web search [7, 8, 9]. Latent semantic models [5, 2] have
Retail 0.50 been used to bridge the gap between Web documents and
search queries by mapping query to its relevant documents
Table 8: The performance of clustering and kNN- at the semantic level and grouping different terms appearing
search method in similar context into the same semantic cluster. There have
been further extensions of these approaches. New models
for clickthrough data like Bi-lingual Topic Models (BLTMs)
ement entries of feature space and Cosine distance metric and linear Discriminative Projection Models (DPMs) [7, 8],
for keyphrases. Then we take a convex combination of these (which consist of queries and their clicked document) have
distances to formulate our metric [18, 17]. been used to plug the gap between search queries and web
Given the set of tickets and a hybrid distance metric on documents. In another approach, the authors in [19] have
it we group the tickets into different clusters. Once a new extracted a hierarchical semantic structure embedded in the
ticket arrives we perform natural language parsing on its query and the document via deep learning. Combining both
summary to identify keywords, and thus generate the feature the approaches, Huang et al. have proposed a series of deep
vector for it based on these keywords (keyphrases). Then we structured semantic models for ranking a set of documents
compute the distance of this ticket from all other clusters for a given query in the following manner [9]. They pro-
and place the ticket in the particular cluster to which it is jected the query and the documents to a common semantic
the closest. Further we shortlist a group of tickets based on space, following which the relevance of each document wrt
the tuple of the new ticket. Using a KNN-based approach we the given query was judged using the cosine similarity be-
choose a ticket in this shortlist which are nearer to the new tween their vectors in the common semantic space. By max-
ticket using the distant metric and publish the corresponding imizing the conditional likelihood of the clicked document
resolution as the recovered resolutions for the new ticket. given the query they trained their neural network models.
The performance of the third approach (clustering and Motivated by these ideas we use deep structured models
kNN search) is given in Table 8. Our DNN approach per- to recommend resolutions for tickets in ITIL services. We
formed better than both cosine similarity approach and clus- project existing tickets and a new ticket to a common low
tering and kNN search method. dimensional space and then compute the similarity of the
new ticket with other tickets. We select the ticket which
5. RELATED WORK has highest similarity with the new ticket and pick the reso-
lution of the former as the recommended resolution for the
There are some pieces of works which deal with recom-
new ticket. For training purposes we maximize the posterior
mendation of resolution for incoming tickets. Notable among
probability of the new ticket with similar tickets. However,
them is [26] where the authors proposed a kNN (k-nearest
we do not need to hash the input ticket vectors because tick-
neighbor)-based method to recommend resolution for event
ets are short text data and hasing increases the dimension
tickets using a similarity metric between events and past
of feature vectors instead of reducing them.
resolutions of other events. The method proposed therein
heavily depended on the underlying similarity measure in
kNN. In a similar work [23] an analysis of the historical
event tickets from a large service provider was carried out. 6. CONCLUSION
Two resolution-recommendation algorithms for event tickets In this work we have proposed deep learning based recom-
were proposed on historical tickets by considering false pos- mendation algorithm for recovering resolutions for incoming
itive tickets which would often be generated by monitoring tickets in ITIL services. Our learning algorithm provides
systems. The authors built on the work of kNN-based algo- some advantages over the traditional resolution recommen-
rithms for recommending resolutions in [25] where they used dation techniques. The methods for recommending resolu-
SCL (structural corresponding learning) based feature adap- tions in [26, 23, 25] use similarity measures to compute the
tation to uncover feature mapping in different time intervals k nearest neighbors for the incoming ticket to suggest reso-
as ticket descriptions differ in case of change of servers’ en- lutions for the latter. For improving the similarity measure
vironments over a period of time during which resolutions used in KNN the authors therein utilize both the event and
remained unchanged. Further they applied this algorithm on resolution information in historical tickets via a topic-level
tickets grouped by different time interval granularities to ac- feature extraction using the LDA (Latent Dirichlet Alloca-
count for the periodic regularities present in ticket datasets. tion) model. In our deep learning based technique we use a
All the methods proposed above for recommending reso- high dimensional input vector for tickets which does not re-
lution exploited the underlying similarity measure in kNN- quire much natural language processing. Further we do not
based search. In [18], Roy et al. proposed a method meant need to compare the similarity of each ticket in the reposi-
to be applied to a more general setting. In this approach tory with the new ticket. Clustering-based recommendation
it was possible to identify tickets similar to a new ticket by algorithm [18] alleviates the comparison of the new ticket
suitably clustering the tickets using a Cosine metric. In par- with each existing ticket, however proper clustering of tick-
ticular, the authors put forth an automated method based ets is challenging as they have short text content. Also,
on unsupervised learning for recovering resolutions for in- in our method we use deep structured models for ranking
coming tickets using the traditional kNN search by which the tickets against the new ticket in the same tuple. Thus
they were able to achieve a promising similarity match of by mapping high-dimensional input vector of tickets to low-
about 48% between the suggestions and the actual resolu- dimensional vectors we avoid the “curse of dimensionality”
tion. phenomenon.
7. REFERENCES tickets. In Service-Oriented Computing - 14th
[1] Y. Bengio. Learning deep architectures for AI. International Conference, ICSOC 2016, Banff, AB,
Foundations and Trends in Machine Learning, Canada, Proceedings, pages 829–845, 2016.
2(1):1–127, 2009. [18] S. Roy, J. Y. T. Yan, N. Budhiraja, and A. Lim.
[2] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent Recovering resolutions for application maintenance
Dirichlet Allocation. Journal of Machine Learning incidents. In IEEE International Conference on
Research, 3:993–1022, 2003. Services Computing, SCC’16, San Francisco, CA,
[3] R. Collobert, J. Weston, L. Bottou, M. Karlen, USA, pages 617–624, 2016.
K. Kavukcuoglu, and P. P. Kuksa. Natural language [19] R. Salakhutdinov and G. E. Hinton. Semantic hashing.
processing (almost) from scratch. Journal of Machine Int. J. Approx. Reasoning, 50(7):969–978, 2009.
Learning Research, 12:2493–2537, 2011. [20] G. Salton and C. Buckley. Term Weighing Approaches
[4] M.-C. de Marneffe, B. MacCartney, and C. D. in Automatic Text Retrieval. Information Processing
Manning. Generating typed dependency parses from and Management, 1988.
phrase structure parses. In International Conference [21] J. Schmidhuber. Deep learning in neural networks: An
on Language Resources and Evaluation (LREC’06), overview. Neural Networks, 61:85–117, 2015.
pages 449–454, 2006. [22] R. Socher, B. Huval, C. D. Manning, and A. Y. Ng.
[5] S. C. Deerwester, S. T. Dumais, T. K. Landauer, Semantic compositionality through recursive
G. W. Furnas, and R. A. Harshman. Indexing by matrix-vector spaces. In Proceedings of the Joint
latent semantic analysis. JASIS, 41(6):391–407, 1990. Conference on Empirical Methods in Natural Language
[6] L. Deng, X. He, and J. Gao. Deep stacking networks Processing and Computational Natural Language
for information retrieval. In IEEE International Learning, EMNLP-CoNLL’12, pages 1201–1211, 2012.
Conference on Acoustics, Speech and Signal [23] L. Tang, T. Li, L. Shwartz, and G. Grabarnik.
Processing, ICASSP’13, Vancouver, BC, Canada, Recommending resolutions for problems identified by
pages 3153–3157, 2013. monitoring. In IFIP/IEEE International Symposium
[7] J. Gao, X. He, and J. Nie. Clickthrough-based on Integrated Network Management (IM’13), pages
translation models for web search: from word models 134–142, 2013.
to phrase models. In Proceedings of the 19th ACM [24] G. Tür, L. Deng, D. Hakkani-Tür, and X. He.
Conference on Information and Knowledge Towards deeper understanding: Deep convex networks
Management, CIKM’10, pages 1139–1148, 2010. for semantic utterance classification. In IEEE
[8] J. Gao, K. Toutanova, and W. Yih. International Conference on Acoustics, Speech and
Clickthrough-based latent semantic models for web Signal Processing, ICASSP’12, pages 5045–5048, 2012.
search. In Proceeding of the 34th International ACM [25] W. Zhou, T. Li, L. Shwartz, and G. Y. Grabarnik.
SIGIR Conference on Research and Development in Recommending ticket resolution using feature
Information Retrieval, SIGIR’11, pages 675–684, 2011. adaptation. In 11th International Conference on
[9] P. Huang, X. He, J. Gao, L. Deng, A. Acero, and L. P. Network and Service Management, CNSM’15, pages
Heck. Learning deep structured semantic models for 15–21, 2015.
web search using clickthrough data. In 22nd ACM [26] W. Zhou, L. Tang, T. Li, L. Shwartz, and
International Conference on Information and G. Grabarnik. Resolution recommendation for event
Knowledge Management, CIKM’13, pages 2333–2338, tickets in service management. In IFIP/IEEE
2013. International Symposium on Integrated Network
[10] D. Johnson. NOC Internal Integrated Trouble Ticket Management, IM’15, pages 287–295, 2015.
System Functional Specification Wishlist. In RFC
1297, 1992.
[11] Y. LeCun, Y. Bengio, and G. Hinton. Deep learning.
Nature, 521:436–444, 2015.
[12] J. Leskovec, A. Rajaraman, and J. Ullman. Mining of
APPENDIX
Massive Datasets. Cambridge University Press, 2nd A. PROBABILISTIC IR MODEL FOR CLAS-
edition, 2014.
[13] Y. Li, D. McLean, Z. Bandar, J. O’Shea, and K. A.
SIFICATION
Crockett. Sentence similarity based on semantic nets In this model the historical tickets are ranked according
and corpus statistics. IEEE Trans. Knowl. Data Eng., to the probability of being relevant to a new ticket. By
18(8):1138–1150, 2006. this one can compute the semantic similarity of an incom-
[14] A. Medem, M.-I. Akodjenou, and R. Teixeira. ing ticket T 0 with an existing ticket T in the repository us-
Troubleminer: Mining network trouble tickets. In ing a similarity function based on conditional probability,
Integrated Network Management-Workshops, IM’09, sim(T 0 , T ) = P (T 0 |T ). Applying Bayes’ rule [16], we can
0
)P (T 0 )
2009. rewrite the above as P (T 0 |T ) = P (T |T P (T )
.
[15] L. Rokach. Ensemble-based classifiers. Artificial For a given class of new tickets, we can take P (T ) to be
Intelligence Review, 33(1-2):1–39, 2010. constant and can further assume that all tickets have the
[16] S. Ross. A First Course in Probability. Pearson same probability. Therefore in the presence of historical
Education Limited, 9th edition, 2014. tickets such as T , any incoming ticket T 0 can be ranked
0
)P (T 0 )
[17] S. Roy, D. P. Muni, J. Y. T. Yan, N. Budhiraja, and using, sim(T 0 , T ) = P (T 0 |T ) = P (T |T
P (T )
= P (T |T 0 ). We
F. Ceiler. Clustering and labeling IT maintenance shall use similar probabilistic IR model for ranking tickets.
B. OPTIMIZING THE NUMBER OF DRAWS above probability value always decreases. Thus we draw 2
FOR FINDING SIMILAR TICKETS tickets randomly from the set of tickets having tuples other
than τ .
We seek to find the probability of choosing tickets which
are similar to a ticket belonging to the same tuple. Suppose
there are Nτ tickets which are in the same tuple τ . Further D. GRADIENT DESCENT
assume that there are Nτi number of tickets having tuple τ
The DNN can be trained using gradient-based numerical
which are similar to ticket Ti with the same tuple τ . Then
optimization algorithms [9] because L(Ω) is differentiable
the probability of choosing a ticket similar to ticket Ti is
Ni
wrt Ω. The parameters in (Ω) are updated as
p = Nττ . Then the probability of failure, that is choosing a
ticket which is dissimilar to ticket Ti is q = 1 − p. As we are ∂L(Ω)
Ωt = Ωt−1 − t | , (7)
required to pick a pair of tickets we decide on the number ∂Ω Ω=Ωt−1
of draws of tickets from the collection in which there is a
maximum chance of choosing at least 2 tickets which are where t is the learning rate at the tth iteration, Ωt and Ωt−1
similar to the ticket Ti . are the model parameters at the tth and (t − 1)th iteration,
Suppose we are allowed m number of draws of tickets from respectively.
a collection of tickets in tuple τ . We need to determine the Let M be the number of the ticket summaries (Tm ), We
probability of picking at least 2 tickets similar to the given consider the combination of a pair of similar (positive) ticket
+
ticket Ti . Each such draw of ticket from the collection can summaries T(i,j) and three pairs of dissimilar (negative) ticket
be seen as a Bernoulli trial with the probability of success p summaries T(i+0 ,j 0 ) : 1 ≤ i0 , j 0 ≤ 3 for training the DNN. Then
and the probability of failure q. Let X be a random variable we can denote each m-th combination as (Tm , T(i,j) m+
). Then
which takes the value of the number of similar tickets to Ti we can write
drawn from the collection of tickets with tuple τ . Then X
follows a binomial distribution with the number of draws L(Ω) = L1 (Ω) + L2 (Ω) + · · · + Lm (Ω) + · · · + LM (Ω), (8)
being m and the probability of success p.
m+
We need to determine P (X ≥ 2). We find out the proba- where Lm (Ω) = − log P (T(i,j) |Tm ), 1 ≤ m ≤ M (9)
bility of the complementary event P (X < 2) and then com- M
pute 1 − P (X < 2). ∂L(Ω) X ∂Lm (Ω)
and, = (10)
Now P (X = k) = m
k m−k
k
·p q we have P (X < 2) = ∂Ω m=1
∂Ω
m 0 m m 1 m−1
2) = 1−q m +mpq m−1 .

0
·p q + 1
·p q and P (X >
Lm (Ω)
For different values of q we plot P (X > 2) against m. The
m+
resulting graph asymtotically approaches 1. For different = − log P (T(i,j) |Tm )
values of q being less than 0.5 we observe that the curve 
m+

begins to approach 1 when m is close to 5. In view of the exp(γR(Tm , T(i,j) ))
= − log  P 
above we choose the number of draw to be 5. That is to say, T m0 0 ∈Tm exp(γR(Tm , T(i0 ,j 0 ) ))
m
(i ,j )
we randomly pick 5 tickets from the collection of tickets with
m
P 
tuple τ and select two tickets which have higher semantic T m0 0 ∈Tm exp(γR(Tm , T(i0 ,j 0 ) ))
(i ,j )
similarity with the given ticket Ti . = log  m+

exp(γR(Tm , T(i,j) ))
 P m− 
C. OPTIMAL NUMBER OF DRAWS FOR T m−
(i0 ,j 0 )
m+ exp(γR(Tm , T 0 0 ))
∈Tm \T(i,j) (i ,j )
= log 1 +
FINDING OUT DISSIMILAR TICKETS

m+
exp(γR(Tm , T(i,j) ))
Now we need to find an optimal number of draws in which  
exactly 2 number of dissimilar tickets are chosen. Recall X
that we pick up dissimilar tickets from the set of tickets = log 1 + exp γR(Tm , T(im− m+
0 ,j 0 ) ) − γR(Tm , T(i,j) ) 
 
having a tuple other than τ (the given ticket Ti belongs to T m−
0 0
(i ,j )
this tuple). We argue that it is enough to draw 3 tickets  
from the collection of tickets with non-τ tuples and select X h i
m+
the tickets having less similarity with Ti because it will have = log 1 + exp −γ R(Tm , T(i,j) ) − R(Tm , T(im−
0 ,j 0 ) )
 

high probability of containing two dissimilar tickets. T m−
0 0
(i ,j )
To see this observe that the success will correspond to  
the event of selecting tickets which are dissimilar with T .
Drawing dissimilar tickets from a collection of tickets hav-
X
exp −γ∆m

= log 1 +

(i0 ,j 0 ) 
ing tuples other than τ can be seen as independent events.
T m−
0 0
Suppose we keep on picking up tickets from collection of (i ,j )
tickets having non-τ tuples until r dissimilar tickets are m+

where ∆m
(i0 ,j 0 ) = R(Tm , T(i,j) ) − R(Tm , T(im−
0 ,j 0 ) )
picked. If X denote the number of trials required (note X
follows
negative binomial distribution), then P (X = m) =  
m−1 r (m−r)
r−1
q (1−q) . In this case r = 2. Hence P (X = m) = X
exp −γ∆m

Therefore, Lm (Ω) = log 1 +

(i0 ,j 0 ) 
(m − 1)q 2 (1 − q)m−2 . For m = 2 this value becomes q 2 . For
picking dissimilar tickets from non-τ tuples we can take a T m−
0 0
(i ,j )
higher value of q = 0.7, then q 2 ≈ 0.5. As q 2 (1 − q)m−2 is (11)
monotonically decreasing function for m (having fixed q) the The gradient of the loss function w.r.t the N-th weight ma-
trix WN is The gradient of the loss function w.r.t the intermediate weight
∂∆m matrix, Wl , l = 2, 3, . . . , N − 1, can be computed as
∂Lm (Ω) X m (i0 ,j 0 )
= α(i0 ,j 0 ) (12) ∂∆m
∂WN ∂WN ∂Lm (Ω) X m (i0 ,j 0 )
m−
T 0 0 = α(i0 ,j 0 ) (17)
(i ,j ) ∂Wl m−
∂W l
T 0 0
(i ,j )
where
m+ where
∂∆m
(i0 ,j 0 ) ∂R(Tm , T(i,j) ) ∂R(Tm , T(im−
0 ,j 0 ) )
= − (13)
"
∂∆m

∂WN ∂WN ∂WN (i0 ,j 0 ) 1 (Tm ,Tim+ ) T (Tm ,Tim+ ) T
= δl,Tm hl−1,Tm +δ hl−1,T m+
∂Wl 2 l,T m+
i i
and
(Tm ,Tjm+ ) (Tm ,T m+ )
m −γ exp(−γ∆m (i0 ,j 0 ) ) + δl,Tm hTl−1,Tm + δ m+j hTl−1,T m+
α(i 0 ,j 0 ) = (14) l,Tj j
exp(−γ∆m
P
1+ T m− (i00 ,j 00 ) )
00 00 m−

(i ,j ) (Tm ,T 0 ) T (Tm ,T m−
0 ) T
− δl,Tm i
hl−1,Tm + δ m− i
hl−1,T m−
Let ym , yi and yj be the outputs of DNNR with Tm , Ti l,T 0
i i0
and Tj ticket summaries. #
(Tm ,T m− (Tm ,T m−

0 ) 0 )
m
∂R(Tm , T(i,j) ) T T − δl,Tm j
hTl−1,Tm + δ j
hTl−1,T m−
l,T m−

∂ 1 ym yi ym yj j 0 j0
= +
∂WN ∂WN 2 ||ym ||||yi || ||ym ||||yj || (18)
1 (Tm ,Ti ) T (Tm ,Ti ) T
= (δym hN −1,Tm + δyi hN −1,Ti This derivation is based on the derivation given in [9]. We
2
(T ,T ) (T ,T )
modify the derivation in taking into account pairs of tickets
+δymm j hTN −1,Tm + δyj m j hTN −1,Tj ) (documents) instead of a single ticket (document) while con-
(15) sidering positive (similar) and negative (dissimilar) tickets
(documents).
where,
δy(Tmm ,Ti ) = (1 − ym ) ◦ (1 + ym ) ◦ (bc1 yi − a1 c1 b3 ym )
δy(Ti m ,Ti ) = (1 − yi ) ◦ (1 + yi ) ◦ (bc1 ym − a1 bc31 yi )
(T ,Tj )
δymm = (1 − ym ) ◦ (1 + ym ) ◦ (bc2 yj − a2 c2 b3 ym )
(T ,Tj )
δyj m = (1 − yj ) ◦ (1 + yj ) ◦ (bc2 ym − a2 bc32 yj )
T T
a1 = ym yi , a 2 = ym yj
1 1 1
b= , c1 = , c2 =
||ym || ||yi || ||yj ||
The operator ‘◦’ denotes the element-wise multiplication.
For hidden layers, we also need to calculate δ for each ∆m

(i0 ,j 0 ) .
We calculate each δ in the hidden layer l through back prop-
agation as
(T ,Ti ) T (T ,T )
δl,Tm
m
= (1 + hl,Tm ) ◦ (1 − hl,Tm ) ◦ Wl+1 m i
δl+1,T m
(T ,Tj ) T (T ,T )
δl,Tm
m
= (1 + hl,Tm ) ◦ (1 − hl,Tm ) ◦ Wl+1 m j
δl+1,T m
(T ,Ti ) T (T ,T )
δl,Tm = (1 + hl,Ti ) ◦ (1 − hl,Ti ) ◦ Wl+1 m i
δl+1,T (16)
i i
(T ,Tj ) T (Tm ,Tj )
δl,Tm
j
= (1 + hl,Tj ) ◦ (1 − hl,Tj ) ◦ Wl+1 δl+1,T j
with
(T ,T ) (T ,T ) (T ,Tj )
m i
δN,T m
= δy(Tmm ,Ti ) , δN,T
m j
m
= δymm
(T ,Ti ) (T ,Tj ) (T ,Tj )
m
δN,T i
= δy(Ti m ,Ti ) , δN,T
m
j
= δyj m

Automated Ticket Resolution

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Automated Ticket Resolution

Încărcat de

Drepturi de autor:

Formate disponibile

Recommending resolutions of ITIL services tickets using

Deep Neural Network

ABSTRACT Resolution; Resolution Recovery

document for the query is computed in that semantic space.

and at the hidden layers. Recall the tanh function is defined

3.3 Training of the DNN model

tickets having non-τ tuples until r dissimilar tickets are m+

δy(Tmm ,Ti ) = (1 − ym ) ◦ (1 + ym ) ◦ (bc1 yi − a1 c1 b3 ym )

δy(Ti m ,Ti ) = (1 − yi ) ◦ (1 + yi ) ◦ (bc1 ym − a1 bc31 yi )

For hidden layers, we also need to calculate δ for each ∆m

S-ar putea să vă placă și