An Evolutionary Feature Clustering Approach For Anomaly Detection Using Improved Fuzzy Membership Function

International Journal of Information Technology and Web Engineering
Volume 14 • Issue 4 • October-December 2019
An Evolutionary Feature Clustering

Approach for Anomaly Detection Using
Improved Fuzzy Membership Function:
Feature Clustering Approach for Anomaly Detection
Gunupudi Rajesh Kumar, VNR Vignana Jyothi Institute of Engineering and Technology, Hyderabad, India
https://orcid.org/0000-0001-7677-6823
Narsimha Gugulothu, JNTUH, Hyderabad, India

Mangathayaru Nimmala, VNR Vignana Jyothi Institute of Engineering and Technology, Hyderabad, India
ABSTRACT
Traditionally, IDS have been developed by applying machine learning techniques and followed single
learning mechanisms or multiple learning mechanisms. Dimensionality is an important concern which
affects classification accuracies and eventually the classifier performance. Feature selection approaches
are widely studied and applied in research literature. In this work, a new fuzzy membership function
to detect anomalies and intrusions and a method for dimensionality reduction is proposed. CANN
could not address R2L and U2R attacks and have completely failed by showing these attack accuracies
almost zero. Following CANN, the CLAPP approach has shown better classifier accuracies when
compared to classifiers kNN, and SVM. This research aims at improving the accuracy achieved by
CLAPP, CANN, and kNN. Experimental results show accuracies obtained using proposed approach
is better when compared to other existing approaches. In particular, the detection of U2R and R2L
attacks to user accuracies are recorded to be very much promising.
Keywords
Anomaly, Attacks, Classification, Classifier, Feature Extraction, Feature Selection, Intrusion,
Membership Function
1. INTRODUCTION
The advancements in computing and communication technology made our life simple and every task
in daily life is driven by technology. Today our life depends on the internet for everything starting
from professional, personal needs to the domestic needs. Internet use for our daily needs such as
shopping, banking, hotel booking, daily news etc., becoming very common in our daily life. As the
computer literacy rate increasing day by day, the use of IT related services is increasing exponentially.
Previously cyber-attacks were limited to only organizations. But, now, the cyber-attacks not only
DOI: 10.4018/IJITWE.2019100102

Copyright © 2019, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.

19
restricted to organizations but also to the personal computers, laptops or the mobile devices. This
clearly underlines the need for the protection at different levels.
When we investigate the history, first steps for intrusion detection laid by the United States Air
Force (USAF). For the first time in the literature, James P Anderson who was working as deputy for
command and management systems in USAF (Anderson, 1972) uses the phrase “malicious threat”
in his report to US government which, is defined as “external penetration threat in closed or open
systems and boil down to gaining an unauthorized access to classified data directly or indirectly”.
He proposed the use of reference monitor to safeguard classified data, which is currently called as
an intrusion detection system. The formal definition for intrusion detection can be given as “…the
process of discovering the presence or the possibility of presence, of unauthorized uses of network or
computing infrastructure activities on a continuous basis…” The first real-time Intrusion Detection
System (IDS) was researched and developed by Neumann during 1984 and 1986, named as Intrusion
Detection Expert System (IDES) which was developed as a rule-based system that detects malicious
activity from known threats, later known as Next Generation Intrusion Detection Expert System
(NIDES) (Schwab, 2015).
In the late 1990s the design approaches of intrusion detection systems were greatly improved
to accommodate the increase in volume and complexity of network attacks. Till the 1990s the IDS’s
were functioning on principle of correlating the signatures of a new incoming threat with the already
existing knowledge base of attack data, if any match found, will be declared as a threat. This approach
is called as signature-based intrusion detection system. These kinds of systems cannot detect new
threats. When a new attack arrived, will be passed through the IDS without any blockage and, only
on realization damage caused by the attack, its signature pattern will be added to the knowledge base,
thus enabling IDS to detect the same in future. Another method, named as an anomaly detection
system, learns from the history and understands the behavioral patterns of normal packets and
abnormal packets. On receiving any abnormal packet even though its signature is not matching
with the knowledge base, it understands its behavior and warns the user about the intrusion. Strictly
speaking both the approaches are important. Why, because we must shield the IT infrastructure from
known and unknown threats.
There are two levels of intrusion detection system host-based intrusion detection system (HIDS)
and network-based intrusion detection system (NIDS). NIDS fails when some malicious activity takes
place inside the organization’s network behind NIDS, it fails to identify as it is not in its scope and
NIDS is engaged in the process of incoming packets. It happens when a malicious packet enters the
network because of overflow or congestion at the entry point of the organization’s network. Thus, it is
bypassed by the network security mechanism and creates damage within the organization’s network.
Hence, network-based intrusion detection cannot protect alone. Alternately, HIDS can be combined
along with HIDS for combating against incoming threats. HIDS can be helpful in the collection of
audit data from different hosts within the organization’s network and report back to NIDS and NIDS
acts accordingly in the case of any detection of malicious activity.
During the period between 1990 and 2000, the majority of the networks migrated from obsolete
protocol suites to the TCP/IP suite. Moreover, the majority of the applications are based on the
internet, as internet usage started growing. Traditional IDSs which were designed for legacy network
standards started becoming obsolete and the rate of false positives was increasing rapidly which
resulted in the gradual diminishing of the demand for such detection systems. With the revolution in
cloud technologies, the demand for security in the cloud environment again triggered and there was a
drastic change in the design of the intrusion detection systems. Security, being one of the prominent
requirements for any organization as, communication among applications for different services are
routed through the internet.
The challenge for the design of intrusion detection is lack availability of proper public dataset
to test the performance of anomaly detection algorithm. To some extent, this was resolved and the
research in the design of security systems took a great leap with the introduction of DARPA dataset
20
in the year 1998. DARPA presented a kernel-based dataset to the research community and organized
technical competitions for the design of IDS, to bring out better approaches for the design of the IDS.
Our previous works (Gunupudi, 2015) (Gunupudi, 2016) (Gunupudi, 2017) proposes a Gaussian-
based similarity measure for the detection of anomalies.
Later information and computer science department of the University of California (UCI)
released another dataset named as KDD Cup 99’. This dataset is used for the “Third International
Knowledge Discovery and Data Mining Tools Competition”, from January 2000 onwards and many
researchers started contributing different approaches for the design of IDS using this dataset. These
are two benchmark datasets available publicly today, where researchers can test their approaches for
the design anomaly detection systems. But, there is a huge uncertainty, whether these two datasets
sufficient to address today’s threat complexities. But works of (Lin, 2015) influenced us and we
started working over new approaches that can be suitable for KDD Cup 99 dataset.
1.1. Different Strategies to Be Followed for the Design of IDS

As the networks are growing and a wide variety of applications are developed, therefore the intrusion
detections systems are also varying, there are different approaches contributed by researchers in the
literature (Aljawarneh, 2011; Vangipuram, 2016; Imran, 2016; Vangipuram, 2014) for the design
of intrusion detection, Figure 1 depicts the different strategies to be followed during the design of
Figure 1. Different strategies to be followed for the design of intrusion detection systems
21
the intrusion detection (Aljawarneh, 2011; Wu, 2011; Wang, 2009). As shown in the Figure 1, the
designers of the IDS need answer the following queries before their design process.
1.1.1. What is the Functional Architecture of the IDS?

The functional model may be centralized, distributed or hybrid models. In the case of the centralized
architecture entire protection is taken care by the IDS known as NIDS in this case. Whereas if the
protection is divided among both at the top level and the host level, it is known as a distributed
model. In these cases, the host is responsible for the collection of the audit data time to time and
communicated to the IDS enabling IDS to take decisions. If both the concepts are merged in the
architecture, it is known as hybrid IDS.
1.1.2. Where is the IDS Being Deployed?

The designers need to know where the detection system to be deployed either at the host level or at
the network level.
1.1.3. How is Decision Making Done?

In the case of the cloud environment, the decisions are taken by cooperative IDS model and otherwise,
the decisions may be taken at autonomous IDS.
1.1.4. What Strategy to be Following for the Analysis of the Threat Detection?
The detection process can be done using signature-based detection approaches and anomaly-based
detection approaches as discussed before.
1.1.5. What is the Analysis Timing?

We need to finalize, whether the decisions be taken in real-time or offline.
1.1.6. What is the Type of Response?

Generally, the IDS can respond in two ways either with the active or passive type of action. Active
response means that, once the threat is detected and it is blocked immediately, and the passive means
the signature is recorded and its behavior is being observed to initiate action.
1.1.7. What are the Data Sources for the IDS to Run?
The data sources may be audit records, network traffic live data or some system-oriented statistics
which varies from application to application.
The IDS basically, available for two purposes, safeguarding the host or the network. Generally,
the network-based IDS are deployed outside the intranet, at the firewall and the host-based IDS is
designed to safeguard from internal security threats. Once the architecture and deployment model,
then the type of data that is considered, data sources, and timing either real-time or offline are going to
play an important role. Apart from all these aspects, the important strategy is that whether we follow
the signature based or anomaly-based approaches. In the signature-based IDS, the detection accuracy
is 100% in the case of known and repeated threats. The signature-based IDS perform poorly for new
and novel threats. Unlike the signature-based IDS, the anomaly-based IDS make use of machine
learning techniques, to predict the intrusions.
1.2. Dimensionality Reduction

Many contributions were made for the design of IDS different researchers and, very few contributions
were made to address the dimensionality reduction problem. The performance of the intrusion detection
is mainly depending on the detection accuracy and time of detection process. The lesser the time is
the better the approach and higher the detection accuracy better the algorithm. The volume of the
22
data to be processed for the detection process, influence the performance of the algorithm and in
turn will influence the time of the detection process. This issue has a significant role in the online
detection process. Dimensionality reduction methods will play an important role in the performance
of intrusion detection approaches as follows (Nguyen, 2010; Ghanem, 2015):
1. Removes any noisy data that present in each feature;

2. Removes inconsistency in the data;
3. Improves efficiency of the IDS;
4. Reduces time and space required;
5. Improves the visualization of the data.
In this paper, an attempt is made to design IDS using fuzzy-based approach for feature clustering
using a self-constructing approach which performs better detection accuracy than the previous works
CANN (Lin, 2015), CLAPP (Gunupudi, 2017). Section 2 of this paper discusses the previous works
that address dimensionality reduction discussed in the literature.
2. LITERATURE REVIEW
The detection accuracy is 100% in the case of known and repeated threats, for which the signature-
based IDS exhibits excellent performance, and it shows poor performance in the case of the novel
and new attacks. To address this issue, we need a dynamic and intelligent approach to design the IDS
(Boutaba, 2018; Burnap, 2018; Chen, 2017; Corbo, 2016). Anomaly-based intrusion detection system
is the suitable approach, as it makes use of machine learning approaches such as aNN, Bayesian,
genetic algorithm, SVM, kNN, decision tree, deep learning and clustering techniques. While designing
an anomaly-based IDS approach, we have to address the dimensionality reduction problem in order
to improve the performance of the detection algorithm (Ghafir, 2018). The similarity or distance
measures that can be used are cosine measure, Euclidean measure, Manhattan measure, Gaussian-
based similarity measure (Laftah, 2015). These measures can be used for dimensionality reduction as
well as classification and prediction process. If we consider the literature till 2005 no much work was
in progress for the dimensionality reduction process. Special emphasis is given for the dimensionality
reduction process by us and is discussed in the following section (Al-Jarrah, 2017; Yassein, 2015;
Aljawarneh, 2015; Aljawarneh, 2016; Aljawarneh, 2017; Aljawarneh, 2018).

There are many approaches proposed by researchers towards the dimensionality reduction process.
We discuss a few of them that can be correlated with our approach. Dimensionality reduction is
the process of mapping high dimensional data points to lower dimensionality without losing the
geometrical properties of the data. Dimensionality reduction is important for machine learning
approaches, text processing, pattern recognition, intrusion detection, data analysis. In these methods
repeated calculation of similarities makes the computation tougher, if the dimensionality reduction is
not addressed. Dimensionality reduction is the process of moving the data space from one-dimensional
space to lower dimensional space without changing the statistical properties of the data such that it
will not have any severe influence over the decision process (Ahmed, 2016; Shah, 2018).
Recent studies for intrusion detection ensure that, for the intrusion detection process, the
dimensionality reduction is having a major role, which is to be addressed (Shenfield, 2018). Reducing
dimensionality minimizes the influence of unimportant features, noise and improves the accuracy of the
prediction process (Ghanem, 2015; Kabir, 2018; Muller, 2018; Papamartzivanos, 2018; Vangipuram,
2016). Several dimensionality reduction approaches are proposed in the literature such as information
gain approach, Chi-square, mutual Information, cross Entropy (Caropreso, 2001), those based on
23
missing values ratio, low variance filter, high correlation filter, random forests, principal component
analysis (PCA), backward feature elimination, forward feature construction (Silipo, 2014). LDA-Linear
Discriminant Analysis, GDA-Generalized Discriminant Analysis, and CCA-Canonical Correlation
Analysis are other techniques used for dimensionality reduction process (Jiang, 2011; Caropreso,
2001). Authors of (Nguyen, 2010) (Pham, 2010) discuss the recent feature reduction approaches
such as CFS, mRMR, CART, GeFS and presents an algorithm’s accuracy with the KDDcup’99
dataset. The problem with these existing mechanisms is that, these approaches are either based on
feature selection. As a result, by removing a few non-important features, we are missing the valuable
information which will also have some influence over the decision process (Tonejc, 2016).
In (Rawat, 1998) the authors proposed an intrusion detection approach and performed their
experimentation on the DARPA dataset. In their works, the authors used cosine similarity and proved
the detection accuracies are better than previous works. In our previous works, we proposed a novel
Gaussian-based similarity measure using the CANN (Lin, 2015) approach for intrusion detection
process. When we perform our experimentation over the KDD Cup ’99 dataset (Olusola, 2010), the
results were not that much satisfactory, which forced us to work for a better algorithm. (Lin, 2015)
presented a novel approach named as “Cluster Center and Nearest Neighbor (CANN)” for intrusion
detection process which performs dimensionality reduction process and uses k-nearest neighbor
approach for classification.
In this approach, all the dimensions are reduced to a single dimension. The authors have used
Euclidean measure for their experimentation. This approach seems to be good enough for the detection
of DoS, Probe attacks, but exhibits poor performance for the Remote2Local (R2L) and User2Root
(U2R) attacks. The complete experimentation of CANN algorithm is done over the KDD Cup ’99
dataset with 6 attributes and 19 attributes (Zhang et al., 2006), out of 41 attributes that were existing
in the original dataset. For the dimensionality reduction, the authors used CANN algorithm and kNN
classifier is used for the detection process. The CANN approach inspired many of the researchers
in the design of intrusion detection system. Research work in (Gunupudi, 2015) (Gunupudi, 2016)
(Gunupudi, 2017) proposes CANN (Lin, 2015) approach with changed similarity measure which was
designed based on the Gaussian function, whereas the authors in CANN approach used the Euclidean
measure and performed experimentation over the DARPA dataset and proved the results to be better
than (Rawat, 2006). Another research contribution by (Jiang, 2010) and CANN (Lin, 2015) inspired
us to work for the design of CLAPP (Gunupudi, 2017) that proposes fuzzy based self-constructing
clustering, feature selection approach. The authors used this approach for document clustering and
classification in which an incremental clustering approach was introduced.
2.2. Self-Constructing Feature Clustering Technique Approach (CLAPP)

Research work in (Jiang, 2010) proposes CLAPP technique, makes use of fuzzy membership function
which generates clusters based on the pre-defined threshold used for dimensionality reduction as well
as it is also used as a distance measure in kNN classification approach. In existing IDS designs, the
detection of Remote2Local and User2Root is almost invisible. We performed this experimentation
over KDD Dataset, in order to improve the accuracy of Remote2Local and User2Root when compared
to (Lin, 2015). The incremental clustering process considers the global database as input. The
major drawback with CANN approach is that it failed in the detection of the R2L and User2Root
(U2R) attacks, which was addressed in this CLAPP approach. In CLAPP approach the clusters were
characterized by a membership function of mean and standard deviation. In this paper, we propose
another gaussian based fuzzy member function that uses is a self-constructing feature clustering
approach for the detection process.
In this paper, we are trying to propose a new fuzzy membership function which performs better
than CANN and CLAPP approaches. Section 3 will discuss the proposed fuzzy-based membership
function. Section 4 will discuss the detailed steps to be followed as a part of dimensionality reduction
and detection process. Section 5 demonstrates the proposed approach with the help of a case study.
24
Section 6 discusses the experimentation results over the KDD Cup ’99 datasets using CANN, CLAPP
and proposed approaches. In the experimentation process, we have used two variations of the KDD
Cup ’99 datasets. One is with 19 attributes of the 10% KDD Cup ’99 dataset which is having 494021
instances with 41 dimensions. The 19 attributes are selected based on the experimentation of the
CANN approach and another dataset is with full 41 dimensions 10% KDD Cup ’99 dataset. The
results and discussion section include the confusion matrixes, detection accuracies and ROC values
of different class labels. All the results were presented in the form of tables and graphs.
3. PROPOSED FUZZY MEMBERSHIP FUNCTION
When a communication is initiated, several sessions associated with each connection get invoked in
the background. Here, each connection is a sequence of TCP packets moving from the source and the
destination. A normal run of the session is a function of feature values associated to each connection
such as duration, protocol_type, service, flat, count etc., as mentioned in Figure 3 a total of 41
attributes, whose invocation does not cause any threat. All such sessions associated with respective
connections are identified to be safe are represented as normal in the global vector. Otherwise, marked
as the attack with its name either of Denial of Service (Dos) or Probe or Remote2Local or User2Root.
Each connection is defined as a function of those 41 features w.r.t global vector. The cardinality of
this global vector denotes the connections dimensionality. The idea of this research is to address
the dimensionality reduction of connections and achieve improved intrusion detection systems. Our
approach is motivated by (Jiang, 2010) (Lin, 2015).
For this, the proposed fuzzy measure is used for dimensionality reduction of the connections.
We choose to apply feature clustering to all available features of the global vector using the proposed
fuzzy membership function. The result of feature clustering is a set of clusters. Each connection is
now expressed as a function of fuzzy similarities of connections w.r.t generated clusters.
Let, |c| is the available connections, |f| be the available number of features and ‘|g|’ depicts the
available number of clusters, on applying the proposed feature clustering algorithm approach leads
to the transformation each input process to new dimensionality space. The result dimensionality space
changes from |f| i.e. number of features to new space with |g|. In this approach, we use notations such

as, C , F and D each denote connection vector, feature vector and decision vector. The cardinality

of, C , F and D equal to |c|, |f| and |d| respectively as shown in Equations (1) to (3):
{ }
C = C (1),C (2),C (3),C (4), …,C (c) = C 
1X c
(1)
{ }
F = F(1), F(2), F(3), F(4), …, F(f ) = F 
1X f
(2)
{ }
D = D(1), D(2), D(3), D(4), …, D(d ) = D 
1X d
(3)
Equations (4) and (5) represent connection and feature vectors:

( )
C (i ) = C (i,1),C (i,2),C (i,3),C (i,4), …,C (i,F ) (4)
25

( )
F(i ) = F( j ,1), F( j ,2), F( j ,3), F( j ,4), …, F( j ,C ) (5)
The elements C (i, f ) and F( j ,c) in which i, j represents row and column elements of respective
connection and feature vectors of matrix [CF] |c| x |f|.
3.1. Problem Specification

(i) ( j)
C , F and D be represents the given connections, features and decision vectors consecutively
and [CF]|c|x|f|, be the matrix representation of the connection-feature vector. The main objective of the
proposed approach is to transform [CF]|c|x|f| of the dimensions |c| X |f|, to perform dimensionality
reduction to equivalent representation expressed as [CG]|c|x|g| with the dimensions of |c|x|g| on using
constraints such as deviation (σ ) , the threshold (θ), and the proposed fuzzy membership function.
3.2. Terminology and Representations

Table 1 gives the notation followed in this paper.
3.3. Feature Probability
 D (d ) 

Prob  j 
 F ( ) 
 
Table 1. Terminology and notations
Description Description

C - Connection vector F - Feature vector

D - Decision label vector c(i) – ith connection
f(j) – jth feature D(d) – dth decision label

|c| - no. of connections |f| - no. of features
|d| - no. of decision class label [CF] – Connection-Feature matrix
[CG] – Connection- Cluster matrix C(i,j) - ith connection and jth feature
F(j,i) - jth feature and ith connection d - denotes membership of feature to a decision class
 D (d )  V ( j ) - Feature pattern vector for the jth feature, F

(j )

Prob  j  – Conditional probability of F(j) w.r.t D(d)
 F ( ) 
 
as denoted as c ( j,d )
dC i ,C - Membership function of V(i) and V(j)

( ) (j )
26
(j )
Given a feature, F the posterior conditional probability of F( j ) w.r.t D(d ) is given by:
 D (d ) 
l=c ( j ,l )

Prob  j  =
∑ l =1
F X dj
(6)
 F ( )  l=c ( j ,l )
  ∑ l =1
F
where:
1; F ( j ,l ) ∈ D (d )
dj =  (7)
0; F ( j ,l ) ∉ D (d )

Equation (6) is equivalent to Equation (8):
 D (d )  F ( j ,1)X j + F ( j ,2)X j + … + F ( j , p)X j


Pr  j  = d d d
(8)
 F ( )  F
( j ,1)
+F
( j ,2)
+ … + F
( j , p)
 
3.4. Feature Pattern Vector

V (j)
The sequence vector obtained out of conditional probability:
 D 
 (d ) 
Pr  
 F( j ) 

calculated by considering of each feature, F( j ) with respect to each decision class, D(d) is termed as a

feature pattern vector and is represented by V ( j ) . The vector V ( j ) is described as:
 D   D   D 
 (1)   2  d
V ( j ) = Prob  , Prob  ( ) , …, Prob  ( )  (9)
 F( j )  F   F 
  ( j )   ( j ) 
Equations (9) is represented in simplified form as given by Equation (10):

V ( j ) = V( j ,1),V( j ,2),V( j ,3), …,V( j ,d ) (10)
where:
27
 D 
 (d ) 
V( j ,d ) = Prob   (11)
 F( j ) 

3.5. Pattern Similarity

Let, V (i ) , V ( j ) each denote the vector of feature patterns and D(d) is the decision label. The similarity
pattern,  is given by eq. (12):
V (i )
,V ( j )
D=d
∏ (12)
D
 = (i , d ) (j , d )
V (i ),V ( j ) V ,V
D =1
where:
 

 D 
 (d ) 
 D 
 (d ) 
2
 −Prob F −Prob F 

 

 ( j )   ( j ) 


exp  σd 
; else
d
 =   (13)
V (p ),V (q )    D    D   D 
2
  D 
   (d )    (d )   d
 
  (d ) 
   ≠ 0a nd Prob  ( ) ≠ 0
 F  
prob 
 Prob  − Prob 
0.5 *  −

 F( j ) 
   ( j )   ;
    F 
 1 + exp  
 
   F( j )   ( j ) 
σd  

D=d
In Equation (12), the notation ∏ represents the product (cumulative) value obtained
d

V (i ),V ( j )
D =1
from considering all values of D.
3.6. Extended Membership Function
gV i , µ
() g
When performing evolutionary clustering, it is required to obtain the similarity between pattern and
the cluster mean. The membership function of Equation (12) can be extended to suit is purpose.
g is called an extended membership function and gives the similarity between the pattern
V (i ),µg
to the cluster mean, µg . Equation (14) is extended membership function used to compute the similarity
between patterns, V (p ) to the cluster mean, µg :
D=d
 g
= ∏ D (14)
V (i ), µg V (i ), µg
D =1
28
3.7. Sample Computations
Example 1: Consider feature, F(10) from Table 1. The conditional probability of F(10) to D(1) and D(2)
is obtained by applying Equations (15) and (16):
 D  l =9 (10,l )
 (1) 
Prob   = ∑ l =1
S X 110
= 1 (15)
 F(10)  l =9 (10,l )
 ∑ l =1
F
 D  l =9 (10,l )
 (2) 
Prob   = ∑ Sl =1
X 210
= 0 (16)
 F(10)  l =10 (10,l )
 ∑ l =1
F

Example 2: The feature pattern vector for s(10) is represented as C (10) and is given by:
 D   D 
 (1)   2
V (10) = Prob  , Prob  ( ) 
 F(10)   F 
  (10) 

Example 3: Suppose, V (1) = (0.1, 0.9) and V (3) = (0.2, 0.8) be feature patterns. The fuzzy
similarity, for 0.5 deviation is given by:
  1.0−0.2   
  2
   0.9−0.8   
  2
 = 0.5     * 0.5    

  0.5    
1 + e   0.5    = 0.961174
− −
 
  
V (1),V (3) 1 + e  

Using the improved measure, the similarity V (1) and V (3) for 0.5 is 0.961174 where as it is equal
to 0.9419 using (Rajesh, 2017). This shows the importance of the present membership function.
4. ALGORITHM

The following algorithm exhibits the sequence of the steps involved in the dimensionality reduction
process. As it is already discussed in the previous section, the importance of the dimensionality
reduction, a six-step sequence is proposed for the dimensionality reduction process.
The dataset is to be considered as the primary input and before proceeding with this algorithm,
that performs the dimensionality reduction process. This approach not only gives promising results
but also improves the performance of the approach for the anomaly detection process:
Step 1: Given Connection and Feature matrix, we first identify a threshold which will play an important
role in the dimensionality reduction process. Here, we assume the threshold to be almost equal
to one and but not 1 (nearly 1). With this, the formation of optimal transformation is possible,
which is responsible for the entire detection process. We calculate the posterior probabilities of
all the features w.r.t all the available class labels.
29
Step 2: Firstly, to proceed with the further steps, we need to identify the initial cluster. Generally,
the first feature pattern vector is identified as an initial cluster. In this step, we need to assume
the deviation value which is between 0 and 1. Preferably, the deviation selected should be equal
to zero but not equal to zero. Mean of the cluster need to be calculated, for which the mean is
generally the same as the initial cluster.
Step 3: The proposed membership function is used for the calculation purpose of the unsupervised
approach. In this paper, this algorithm makes use of the proposed membership function to
estimate the similarities between each feature vector, to make it easy to assign it to a cluster. A
given feature pattern is added to the cluster only if the similarity is below the threshold. Once the
similarity is more than the prescribed the threshold, a new cluster is created, and that respective
feature vector is mapped to the newer cluster.
Step 4: Once step-3 is completed, we need to update the new mean and deviation immediately so
as to proceed for the calculation of the similarity of another feature vector to each one of the
clusters. The mean and deviation are calculated for each cluster by considering the respective
cluster constituents of the feature vectors. Once a pattern vector is satisfying the similarity with
more than one cluster, then the cluster which exhibits the maximum similarity value is considered
and the pattern vector is assigned to that cluster. The cluster membership and cluster count are
also updated after every pattern vector.
Step 5: This process is continued until all the pattern vectors in the input end are completed. Till
then Step-2 through Step-4 are repeated and cluster count, cluster membership count, mean and
deviation of each cluster is updated.
Step 6: Once the algorithm is stopped after the completion of the all the pattern vectors, the final
clusters are formed and ready for the classification process.
4.2. Classification Algorithm

Step 1: The output of the algorithm 4.1 i.e. transformed feature connection matrix. The number of
features in the transformed matrix is always either less or equal to the number of features in the
original dataset. Read the first entry of the transformed matrix.
Step 2: Apply the algorithm 4.1 over the new incoming connection for the dimensionality reduction
process. We must use the same threshold and deviation as in algorithm 4.1.
Step 3: For each incoming connection feature pattern vector, we need to calculate the similarity using
proposed membership function as in the algorithm 4.1 (dimensionality reduction algorithm).
Step 4: We use classification approaches such as kNN, J48 approaches for the detection process. We
calculate the most similar instances to the new connection feature vector based on the proposed
membership function.
Step 5: Calculate the similarity of the neighbors with all the feature vectors of the training dataset.
Once nearest is achieved the class label of the training dataset is mapped with the new connection
feature vector in the test dataset.
Step 6: Evaluate classification accuracies of classifier.
5. RESULTS AND DISCUSSIONS
In this work, experimentation is carried out using KDD Cup 99 dataset with 19 attributes and 41
attributes. Figure 2 shows various sets of data available in the KDD’99 dataset. We have opted the
kddcup.data_10_percent.gz, a KDD Cup ’99 dataset with 494021 instances with 41 attributes. We
have taken 19 attributes as used in the CANN approach and performed the experiments. Another
dataset is with the use of a full number of 41attributes, we have conducted the experimentation. The
major drawback with CANN approach is that it failed in the detection of the Remote2Local(R2L) and
30
Figure 2. Various sets of data available in KDD Cup 99 dataset
User2Root (U2R) attacks, which was addressed in the CLAPP approach. We have opted the kddcup.
data_10_percent.gz, a KDD Cup ’99 dataset with 494021 instances with 41 attributes.
Figure.3 shows the set of 41 attributes in the KDD’99 dataset. These attributes can be categorized
into three major groups, such as basic TCP Connections features, content features within a connection
that varies from application to application and traffic features.
5.1. Classifier Accuracies on KDD Dataset (19 Attributes)

Table 2 and Table 3 represents the confusion matrix of the classifier kNN for k=1 and C4.5 (also
known as J48) classifiers for a chosen deviation, σ = 0.5 and threshold θ = 0.9995. The overall
accuracy achieved for the proposed approach is 99.6644 and 99.6265 respectively. The accuracies
for Remote2Local(R2L) attacks is achieved for kNN (k=1) and the C4.5 classifier is 84.2% for both
and for User2Root (U2R) attacks, it is 35.9% and 50%, respectively. The number of clusters obtained
by this approach are 11 that means the dimensionality is 11.
Table 4 and Table 5 give confusion matrix for the classifiers kNN for k=1 and kNN (k=3) using
a chosen deviation, σ = 0.5 and threshold θ = 0.9999 and θ = 0.999999. The overall accuracy
achieved is 99.65% and 98.89%, respectively.
Figure 3. List of 41 standard attributes in KDD Cup ‘99 dataset
31
Table 2. Confusion Matrix for kNN (k=1) Classifier, θ = 0.9995 and σ= 0.5
Normal U2R DoS R2L Probe Accuracies

Normal 96607 22 319 148 182 0.993
U2R 34 14 3 1 0 0.359
DoS 249 2 391112 23 72 0.999
R2L 97 1 6 1018 4 0.842
Probe 256 0 220 19 3612 0.933
Overall Accuracy 99.6644
Table 3. Confusion Matrix of C4.5 Classifier, θ = 0.999999 and σ= 0.5

Normal 96676 13 278 157 154 0.992
U2R 30 18 3 1 0 0.5
DoS 322 3 391069 5 59 0.998
R2L 172 2 4 934 14 0.842
Probe 296 0 320 12 3479 0.939
Table 4. Confusion Matrix of kNN(k=1) Classifier, θ = 0.9999 and σ= 0.5

Normal 96572 20 316 171 199 0.993
U2R 34 15 2 1 0 0.405
DoS 252 2 391098 24 82 0.999
R2L 106 0 10 1002 8 0.82
Probe 255 0 222 24 3606 0.926
Table 5. Confusion Matrix of kNN(k=3) Classifier, θ = 0.999999 and σ= 0.5

Normal 95138 7 1401 368 364 0.971
U2R 40 8 2 2 0 0.421
DoS 1398 3 390007 24 26 0.996
R2L 657 1 27 402 39 0.475
Probe 770 0 314 50 2973 0.874
32
The accuracies achieved for attacks of Remote2Local(R2L) class for the classifier kNN (k=1)
for threshold θ = 0.9999 and classifier kNN(k=3), θ = 0.999999 are 82.01% and 47.51% and for
User2Root (U2R) attacks it is 40.5% and 42.1%, respectively. The number of clusters obtained is 12
and 17 respectively, which mean the dimensionality is 12 and 17.
Table 6 give total accuracy and class wise ROC values for the various classifiers for σ= 0.5 for
KDD 19 attributes. In this proposed approach, ROC values for Remote2Local(R2L) for kNN (k=3)
θ = 0.9995, kNN (k=5) θ = 0.9999 are 0.9916 and 0.9911 which is a great improvement over the
CLAPP and CANN approaches. ROC values for User2Root (U2R) for kNN (k=3) and kNN (k=5) θ
= 0.9995 are 0.9237 and 0.9312, which is another achievement with this approach.
As the authors presented their experimental results for CANN using KDD Cup ’99 dataset with
19 attributes as discussed in (Jiang et. al, 2010).
5.2. Classifier Accuracies on KDD Dataset (41 Attributes)

We have performed our experimentation using KDD Cup ’99 dataset also with a full set of 41 attributes.
The results obtained from various threshold values with different classifiers are mentioned in Table
7 to Table 12. The Table 7 gives confusion matrix for C4.5 and kNN (k=1) classifiers for a chosen
Table 6. Total accuracy and class wise ROC values for the various classifiers for σ = 0.5 for KDD 19 attributes
No. of Total ROC Values

Threshold Classifier
Dimensions Accuracy Normal DoS R2L Probe U2R
0.9995 C4.5 11 99.612 0..9965 0.9983 0.9624 0.9661 0.8301
0.9995 kNN k=1 11 99.6644 0.9989 0.9993 0.9867 0.9836 0.902
0.9995 kNN k=3 11 99.6397 0.9994 0.9996 0.9901 0.986 0.9237
0.9995 kNN k=5 11 99.6112 0.9995 0.9996 0.9916 0.9876 0.9312
0.9999 C4.5 12 99.6172 0.9967 0.9984 0.959 0.9694 0.8566
0.9999 kNN k=1 12 99.6502 0.9988 0.9993 0.984 0.983 0.9046
0.9999 kNN k=3 12 99.6302 0.9994 0.9995 0.9901 0.9864 0.921
0.9999 kNN k=5 12 99.6018 0.9995 0.9996 0.9911 0.9871 0.934
0.999999 C4.5 17 99.6265 0.9968 0.9983 0.9565 0.9723 0.8636
0.999999 kNN k=1 17 99 0.9963 0.9977 0.943 0.9681 0.9084
0.999999 kNN k=3 17 98.8881 0.9976 0.9985 0.9539 0.9691 0.919
0.999999 kNN k=5 17 98.8472 0.998 0.9988 0.9606 0.9707 0.9197
Table 7. Confusion Matrix of C4.5 classifier, θ = 0.9995 and σ= 0.5
Normal DoS R2L Probe U2R Accuracies

Normal 97042 4 117 27 88 0.997
U2R 35 16 0 0 1 0.8
DoS 110 0 391284 2 62 0.999
R2L 61 0 5 1060 0 0.972
Probe 109 0 265 2 3731 0.961
33
deviation, σ = 0.5 and threshold θ = 0.9995. With these parameters, it is observed that the overall
accuracy achieved is 99.82% and 99.63% respectively. Accuracies achieved for Remote2Local attacks
for C4.5 and kNN (k=1) classifier are 97.2% and 96.63% and for U2R attacks it is 80% and 39.5.4%
respectively. Clusters generated are 25 which mean the dimensionality is 25 which is reduced from
41 dimensions in the KDD Cup ’99 original dataset. Table 8 gives confusion matrix for kNN (k =
1) for threshold θ = 0.9995 and σ= 0.5. Remote2Local(R2L) attack detection accuracy is 91.5% and
User2Root (U2R) detection accuracy is 39.5% and overall accuracy is 99.6356.
Table 9 and Table 10 give confusion matrix for kNN for k = 5 and C4.5 classifiers for a
chosen deviation, σ = 0.5 and threshold θ = 0.999999. The overall accuracy achieved is 99.18%
Table 8. Confusion Matrix of kNN (k=1) classifier, θ = 0.9995 and σ= 0.5

Normal 96571 22 208 92 385 0.993
U2R 34 15 0 2 1 0.395
DoS 188 0 391142 0 128 0.999
R2L 76 1 0 1046 3 0.915
Probe 403 0 254 3 3447 0.87
Table 9. Confusion Matrix of kNN (K=5) classifier, θ = 0.999999 and σ= 0.5

Normal 95789 7 981 265 236 0.979
U2R 40 8 1 2 1 0.533
DoS 1025 0 390321 44 68 0.997
R2L 391 0 11 700 24 0.657
Probe 587 0 318 54 3148 0.905
Table 10. Confusion Matrix of C4.5 classifier, θ = 0.9999 and σ= 0.5

Normal 97074 5 91 22 86 0.997
U2R 29 22 0 0 1 0.815
DoS 92 0 391309 4 53 0.999
R2L 71 0 2 1053 0 0.975
Probe 110 0 268 1 3728 0.964
34
and 99.83% respectively. Specifically, accuracies achieved for Remote2Local(R2L) attacks for
kNN and C4.5 classifier are 65.7% and 97.5% and for User2Root (U2R) attacks it is 53.3% and
81.5% respectively.
5.3. ROC and Accuracies on KDD Dataset (41 Attributes)

Table 11 shows the overall accuracies and ROC values for each class, using subset constraints θ =
0.9995 and σ = 0.5 for classifiers C4.5 and kNN(k=1, k=3 and k=5). The accuracies for
Remote2Local (R2L) and User2Root (U2R) attacks are specified for various values of C4.5 and kNN
for k=1, 3, and 5 are given in the table shows a significant improvement over the CLAPP and CANN
algorithms. Table.16 shows the overall accuracies and ROC values for each class, using subset
constraints θ = 0.9999 and σ= 0.5 for classifiers C4.5 and kNN (k=1, k=3 and k=5). The accuracies
for Remote2Local(R2L) and User2Root (U2R) attacks are specified for various values of C4.5 and
kNN for k=1, 3, and 5 are given in the table also shows a significant improvement over the CLAPP
and CANN algorithms. Using this approach, the minimum ROC value for User2Root (U2R) achieved
is 0.857 for Naivebayes classifier and maximum ROC value is 0.9224 for kNN (k=5). Using this
approach, the minimum ROC value for Remote2Local(R2L) achieved is 0.9103 for naivebayes
classifier and maximum ROC value is 0.9944 for the bayesnet classifier.
The Roc curve area for J48 is recorded to be 0.73 and 0.968, respectively, for U2R and R2L
attacks. The correctly classifier instances are 99.8405% for θ = 0.999999.
Table 12 shows the overall accuracies and ROC values for each class, using subset constraints
θ = 0.9999 and σ = 0.5 for classifiers C4.5 and kNN (k=1, k=3 and k=5). The accuracies for
Table 11. Accuracies and ROC values, where θ = 0.9995 and σ= 0.5 for KDD 41 attributes
Total ROC Values

Classifier
Accuracies Normal DoS R2L Probe U2R
C4.5 99.8203 0.9985 0.9987 0.9913 0.9801 0.8888
kNN k=1 99.6356 0.9987 0.993 0.9901 0.976 0.9051
kNN k=3 99.6207 0.9992 0.9996 0.992 0.982 0.9117
kNN k=5 99.5885 0.9994 0.9996 0.9927 0.9845 0.9224
Bayesnet 94.1306 0.9936 0.996 0.9944 0.988 0.8792
Naivebayes 77.218 0.9104 0.7113 0.9104 0.8091 0.857
Table 12. Accuracies and ROC values for various classifiers where, θ = 0.9999 and σ= 0.5 for KDD 41 attributes
Total ROC Values

Classifier
C4.5 99.831 0.9988 0.999 0.9882 0.9813 0.9058
kNN k=1 99.0393 0.9963 0.9982 0.9363 0.9607 0.8877
kNN k=3 98.9243 0.9977 0.9988 0.9464 0.9632 0.9179
kNN k=5 98.8415 0.9981 0.999 0.9523 0.9669 0.921
Bayesnet 94.6917 0.9929 0.9967 0.9949 0.9877 0.888
Naivebayes 77.2856 0.8643 0.7399 0.9206 0.8108 0.9262
35
Remote2Local(R2L) and User2Root (U2R) attacks are specified for various values of C4.5 and
kNN for k=1, 3, and 5 are given in the table shows a significant improvement over the CLAPP and
CANN algorithms. Table 13 presents the ROC values of each class for threshold θ = 0.999999 and
σ= 0.5 for classifiers C4.5 and kNN for k=1, 3, and 5, bayesnet and naivebayes. Remote2Local(R2L)
detection accuracies recorded maximum 0.9951, 0.9883 and 0.984 for classifiers bayesnet, C4.5 and
kNN (k=5) respectively. User2Root (U2R) detection accuracies recorded maximum 0.9376, 0.9323
and 0.9313 for classifiers kNN (k=5), kNN (k=3) and naivebayes respectively.
5.4. ROC and Accuracies on KDD Dataset (19 Attributes)

Table 14 shows the overall accuracies and ROC values for each class, using subset constraints
θ = 0.9995 and σ= 0.5 for classifiers C4.5 and kNN (k=1, k=3 and k=5). The accuracies for
Remote2Local(R2L) and User2Root (U2R) attacks are specified for various values of C4.5 and kNN
for k=1, 3, and 5 are given in the table shows a significant improvement over the CLAPP and CANN
algorithms. Using this approach, the minimum ROC value for User2Root (U2R) achieved is 0.8301
for C4.5 classifier and maximum ROC value is 0.9312 for kNN (k=5). The minimum ROC value
for Remote2Local(R2L) achieved is 0.8732 for naivebayes classifier and maximum ROC value is
0.9916 for kNN (k=5) classifier.
Table 15 shows the overall accuracies and ROC values for each class, using subset constraints θ =
0.9999 and σ= 0.5 for classifiers C4.5 and kNN (k=1, k=3 and k=5). The accuracies for Remote2Local
(R2L) and User2Root (U2R) attacks are specified for various values of C4.5 and kNN for k=1, 3, and
5 are given in the table also shows a significant improvement over the CLAPP and CANN algorithms.
Using this approach, the minimum ROC value for User2Root (U2R) achieved is 0.8509 for bayesnet
Table 13. kNN and C4.5 accuracies, θ = 0.999999 and σ= 0.5 for KDD 41 attributes
Total ROC Values

Classifier
C4.5 99.8405 0.9987 0.9989 0.9883 0.9794 0.8798
kNN k=1 99.2033 0.997 0.9979 0.9626 0.9744 0.9071
kNN k=3 99.214 0.9982 0.9987 0.9774 0.976 0.9323
kNN k=5 99.1792 0.9985 0.9989 0.984 0.9772 0.9376
Bayesnet 94.7612 0.9923 0.9973 0.9951 0.988 0.888
Naivebayes 77.2704 0.8663 0.7179 0.9144 0.8106 0.9313
Table 14. KDD 19 attributes ROC and accuracies using proposed approach with θ = 0.9995 and σ= 0.5
Total ROC Values

Classifier
C4.5 99.612 0..9965 0.9983 0.9624 0.9661 0.8301
kNN k=1 99.6644 0.9989 0.9993 0.9867 0.9836 0.902
kNN k=3 99.6397 0.9994 0.9996 0.9901 0.986 0.9237
kNN k=5 99.6112 0.9995 0.9996 0.9916 0.9876 0.9312
Bayesnet 94.3687 0.9896 0.9961 0.9902 0.9859 0.85
Naivebayes 58.0259 0.7567 0.8008 0.8732 0.8824 0.8792
36
Table 15. KDD 19 attributes ROC and accuracies using proposed approach with θ = 0.9999 and σ= 0.5
Total ROC Values

Classifier
C4.5 99.6172 0.9967 0.9984 0.959 0.9694 0.8566
kNN k=1 99.6502 0.9988 0.9993 0.984 0.983 0.9046
kNN k=3 99.6302 0.9994 0.9995 0.9901 0.9864 0.921
kNN k=5 99.6018 0.9995 0.9996 0.9911 0.9871 0.934
Bayesnet 94.4553 0.99 0.9963 0.9903 0.9857 0.8509
Naivebayes 58.0068 0.7357 0.7969 0.871 0.8814 0.8769
classifier and maximum ROC value is 0.934 for kNN (k=5). Using this approach, the minimum ROC
value for Remote2Local(R2L) achieved is 0.871 for naivebayes classifier and maximum ROC value
is 0.9911 for kNN (k=5) classifier.
Table 16 shows the overall accuracies and ROC values for each class, using subset constraints θ
= 0.999999 and σ= 0.5 for classifiers C4.5 and kNN (k=1, k=3 and k=5). The accuracies for R2L
and User2Root (U2R) attacks are specified for various values of C4.5 and kNN for k=1, 3, and 5
are given in the table also shows a significant improvement over the CLAPP and CANN algorithms.
Using this approach, the minimum ROC value for User2Root (U2R) achieved is 0.8573 for bayesnet
classifier and maximum ROC value is 0.9376 for Naivebayes. Using this approach, the minimum
ROC value for Remote2Local (R2L) achieved is 0.9175 for naivebayes classifier and maximum ROC
value is 0.99 for the bayesnet classifier.
Table 17 presents the experimental results of the proposed approach, CLAPP along with the
results of (Lin, 2015) which uses the dimensionality reduction through CANN and its detection
accuracy, simple kNN classifier and simple SVM Classifier with degree 2 over 19 dimensions. From
this table, it is clear that their drastic change in the detection process of the Remote2Local(R2L) and
User2Root (U2R) attacks. It can also be observed that the accuracies of the proposed approach are
showing marginal improvements than the CLAPP approach.
Figure 4 shows overall classifier accuracies for CLAPP and proposed approach over the KDD
Cup ’99 dataset with 41 dimensions. The experimentation is done with proposed approach for
dimensionality reduction in which the number of dimensions was reduced to 25, 29 and 35 respectively
with different threshold values θ = 0.9995, 0.9999 and 0.999999 instead of 41 dimensions. The
Classification is carried out using the kNN classifier for k=1, 3, 5 and J48 classifier. From the Figure
Table 16. KDD 19 attributes ROC and accuracies using proposed approach with θ = 0.999999, σ= 0.5
Total ROC Values

Classifier
C4.5 99.6265 0.9968 0.9983 0.9565 0.9723 0.8636
kNN k=1 99 0.9963 0.9977 0.943 0.9681 0.9084
kNN k=3 98.8881 0.9976 0.9985 0.9539 0.9691 0.919
kNN k=5 98.8472 0.998 0.9988 0.9606 0.9707 0.9197
Bayesnet 94.6606 0.9925 0.9979 0.99 0.9858 0.8573
Naivebayes 58.2605 0.9789 0.7817 0.9175 0.8787 0.9376
37
Table 17. kNN and C4.5 accuracies for KDD 19 attributes using proposed, CLAPP, CANN, SVM approaches
Overall
Detection Approach Normal Probe Remote2Local(R2L) DoS U2R
Accuracy
DR with CANN (K=1) (Jiang, 2010) 0.9704 0.8761 0.5702 0.9968 0.0385 99.46%
kNN (K=1) Classifier (Jiang, 2010) 0.9968 0.9849 0.9174 0.9998 0.1731 99.89%
SVM (Degree=2) Classifier (Jiang, 2010) 0.9598 0.9659 0.7895 0.8285 0.6154 95.37%
DR with CLAPP (with J48) θ=0.999999

0.992 0.938 0.999 0.834 0.4 99.64%
(Gunupudi, 2017)
DR with PROPOSED (with J48) θ=0.999999

0.992 0.939 0.998 0.842 0.5 99.63%
(Gunupudi, 2017)
Figure 4. Overall classification accuracies on KDD ‘99 (41 attributes) for CLAPP and proposed approaches
38
4, it can be observed that C4.5 (also known as J48) approach performed better when compared to
the kNN approach using the proposed approach. The classification accuracies for the J48 classifier
for the proposed approach are 99.8203, 99.831, and 99.8405 using the datasets after dimensionality
reduction using different thresholds θ = 0.9995, 0.9999 and 0.999999, respectively. The classification
accuracies for the J48 classifier for the CLAPP approach are 99.8223, 99.8275 and 99.8271 using
the same dataset after dimensionality reduction using thresholds θ = 0.9995, 0.9999 and 0.999999,
respectively. It can be observed that the classification accuracies for the proposed approach except
threshold θ = 0.9995 were improved. It can also be observed from Figure 4, that except the J48
classifier the accuracies for kNN (k=1, 3, 5) were not improved over the CLAPP approach. Using
the proposed dimensionality reduction technique, the system can be tuned to detect anomalies using
J48 approach for KDD Cup 99 dataset with 41 attributes.
Figure 5 shows overall classifier accuracies for CLAPP and proposed approach over the KDD
Cup ’99 dataset with 19 dimensions. For this experimentation, the 19 attributes were selected based
on the works of (Lin, 2015). The experimentation is done with proposed approach for dimensionality
reduction in which the number of dimensions was reduced to 11, 12 and 17, respectively, instead of
19 dimensions, with different threshold values θ = 0.9995, 0.9999 and 0.999999. The Classification
Figure 5. Overall classification accuracies on KDD ‘99 (19 attributes) for CLAPP and proposed approaches
39
is carried out using the kNN classifier for k=1, 3, 5 and J48 classifier. From the Figure 5, it can be
observed that the kNN (k=1) classifier performed better when compared to other classifiers using
the proposed approach. It can be observed from the Figure 5, for the threshold θ = 0.999999 using
proposed approach for dimensionality reduction, the overall classification accuracies were dropped
when compared to kNN (k=3, k=5) classifier whereas J48 classifier in this case still performs well.
The increase in classifier accuracy is from the use of the membership function designed to suit
the purpose. The membership function helps to transform the original process into the equivalent
process in a different dimension space and in this process of transformation it eliminates the effect of
noise on classification. Figure 6 shows results of overall detection accuracies for different thresholds
θ = 0.995, 0.9999 and 0.999999 using proposed and CLAPP approaches with the J48 classifier.
From the Figure 6, it can observe that there is a marginal increase in the accuracies over the CLAPP
approach. Figure 7 shows results of overall detection accuracies for different thresholds θ = 0.995,
0.9999 and 0.999999 using proposed and CLAPP approaches with kNN (k=1) classifier. From the
Figure 6, it can observe that there is a marginal decrease in the accuracies over the CLAPP approach.
Figure 8 shows results of overall detection accuracies for different thresholds θ = 0.995, 0.9999
and 0.999999 using proposed and CLAPP approaches with kNN (k=5) classifier. From the Figure 8,
it can observe that there is a marginal decrease in the accuracies over the CLAPP approach. Figure 9
exhibits the results of the detection accuracies for different thresholds θ = 0.995, 0.9999 and 0.999999
using proposed and CLAPP approaches with the J48 classifier.
Figure 10 exhibits the results of the detection accuracies for different thresholds θ = 0.995, 0.9999
and 0.999999 using proposed and CLAPP approaches with kNN (k=3) classifier.
Figure 11 exhibits the results of KDD with 41 attributes for detection accuracies for different
thresholds θ = 0.9999 and 0.999999 using proposed and CLAPP approaches with the J48 classifier.
Figure 12 exhibits the results of KDD with 19 attributes for detection accuracies for different
thresholds θ = 0.9999 and 0.999999 using proposed and CLAPP approaches with the J48 classifier.
Figure 13 presents the detection accuracies of the J48 classifier over the proposed, CANN and SVM
approaches. From the Figure 13, we can observe that Remote2Local(R2L) detection accuracies are
0.842, 0.5702, 0.7895 for the proposed, CANN and SVM approaches respectively. User2Root (U2R)
Figure 6. KDD 41 attributes - J48 classifier accuracies for CLAPP and proposed approach
40
Figure 7. KDD 41 attributes – kNN (k=1) classifier accuracies for CLAPP and proposed approach
Figure 8. KDD 19 attributes – C4.5 classifier accuracies for CLAPP and proposed approach
detection accuracies are 0.5, 0.0385, 0.6154 respectively. The Probe attack detection accuracy is 0.939,
0.8761, 0.9659 for proposed, CANN and SVM approaches. This shows that the proposed approach
gives an excellent improvement over the CANN and SVM approaches. Whereas DoS attack detection
accuracies are 0.998, 0.9968, 0.8285 for proposed, CANN and SVM approaches. Very marginal
increase in DoS attack detection is observed, as CANN already records good detection accuracy
0.9968 whereas proposed measure records 0.998.
Figure 14 presents the detection accuracies of the kNN (k=1) classifier with KDD Cup ’99 dataset
having 19 attributes over the proposed, CANN and SVM approaches. From the Figure 14, we can
observe that Remote2Local(R2L) detection accuracies are 0.534, 0.5702, 0.7895 for the proposed,
CANN and SVM approaches respectively. User2Root (U2R) detection accuracies are 028, 0.0385,
0.6154 respectively. The Probe attack detection accuracy is 0.809, 0.8761, 0.9659 for proposed,
CANN and SVM approaches. This shows that results given the proposed approach for kNN (k=1)
41
classifier is not encouraging. The very marginal decrease in DoS attack detection is observed, as
CANN records detection accuracy 0.9968 whereas proposed measure records 0.996.
Figure 15 presents the detection accuracies of the kNN (k=3) classifier with KDD Cup ’99
dataset having 19 attributes over the proposed, CANN and SVM approaches. From the Figure 15,
we can observe that User2Root (U2R) detection accuracies are 0.421, 0.0385, 0.6154 for proposed,
42
Figure 11. KDD 41 attributes - J48 classifier accuracies for U2R and R2L classes
Figure 12. KDD 19 attributes - J48 classifier accuracies for U2R and R2L classes
CANN and SVM approaches respectively. DoS attack detection accuracies are 0.996, 0.9968, 0.8285
respectively. Remote2Local(R2L) detection accuracies are 0.475, 0.5702, 0.7895 for the proposed,
CANN and SVM approaches respectively. The Probe attack detection accuracy is 0.874, 0.8761,
0.9659 for proposed, CANN and SVM approaches. This shows that results gave the proposed approach
for kNN (k=3) classifier excellent improvement over CANN but stays behind the accuracy of SVM.
Performance of User2Root (U2R) is far better than the CANN approach. The very marginal decrease
in DoS attack detection is observed, as CANN records detection accuracy 0.9968 whereas proposed
measure records 0.996.
Figure 16 presents the detection accuracies of the kNN (k=5) classifier with KDD Cup ’99
dataset having 19 attributes over the proposed, CANN and SVM approaches. From the Figure 15,
we can observe that User2Root (U2R) detection accuracies are 0.5, 0.0385, 0.6154 for proposed,
43
Figure 13. KDD 19 attributes – C4.5 classifier accuracies for all classes using proposed, CANN, SVM approaches
Figure 14. KDD 19 attributes – kNN (K=1) classifier accuracies for all classes using proposed, CANN, SVM approaches
CANN and SVM approaches respectively. DoS attack detection accuracies are 0.995, 0.9968, 0.8285
respectively. Remote2Local(R2L) detection accuracies are 0.48, 0.5702, 0.7895 for the proposed,
CANN and SVM approaches respectively. The Probe attack detection accuracy is 0.881, 0.8761, 0.9659
for proposed, CANN and SVM approaches. This shows that results gave the proposed approach for
kNN (k=3) classifier excellent improvement over CANN but stays behind the accuracy of SVM. The
very marginal decrease in DoS attack detection is observed, as CANN records detection accuracy
0.9968 whereas proposed measure records 0.995.
44
The experimentation process with the proposed approach exhibits good performance over the
CANN approach and SVM approach. The J48 classifier is showing excellent performance than the
kNN classifier.
6. CONCLUSION
This paper presents a novel membership function and how the training dataset is transformed into
reduced features using it. This work exhibits the benefits of dimensionality reduction by overcoming
45
the effects of noise. In the previous works such as CANN, SVM, KNN, the Remote2Local(R2L)
and User2Root (U2R) attack detection is very poor and this problem is solved with this work. The
existing anomaly detection approaches suffer from the influence of noise over the detection accuracy.
The dimensionality reduction process using this approach minimizes the effect of the noise over
the detection process. The entire experimentation process is done over the KDD Cup 99 dataset
with 494021 instances and 41 features. Not only exhibiting the improvement of the Probe, DoS
attacks, this work exhibits better detection of low-frequency attacks such as User2Root (U2R) and
Remote2Local(R2L). This work can be extended by introducing more measures and approaches to
detect User2Root (U2R) and Remote2Local(R2L) attacks efficiently.
ACKNOWLEDGMENT
Research work presented in this paper is not supported by author/coauthor parent institution and is
entirely self-financed by authors only.
46
REFERENCES
Aaron, H. R. S., & Adae, I. (2014). Seven Techniques for Dimensionality Reduction. Knime. Retrieved from
https://www.knime.com/blog/seven-techniques-for-data-dimensionality-reduction
Ahmed, M., Mahmood, A. N., & Hu, J. (2016). A survey of network anomaly detection techniques. Journal of
Network and Computer Applications, 60, 19–31. doi:10.1016/j.jnca.2015.11.016
Al-Jarrah, O. Y., Al-Hammdi, Y., Yoo, P. D., Muhaidat, S., & Al-Qutayri, M. (2017). Semi-supervised multi-
layered clustering model for intrusion detection. In Digital Communications and Networks. doi:10.1016/j.
dcan.2017.09.009
Al-Yaseen, W. L., Othman, Z. A., & Mohd, Z. A. N. (2015). Intrusion Detection System Based on Modified
K-means and Multi-level Support Vector Machines. In M. W. Berry, A. Mohamed, & B. W. Yap (Eds.), Soft
Computing in Data Science (pp. 265–274). Springer Singapore. doi:10.1007/978-981-287-936-3_25
Aljawarneh, S. (2011). A web engineering security methodology for e-learning systems. Network Security,
2011(3), 12–15. doi:10.1016/S1353-4858(11)70026-5
Aljawarneh, S., Yassein, M. B., & We’am, A. T. (2017, November 01). A resource-efficient encryption algorithm
for multimedia big data. Multimedia Tools and Applications, 76(21), 22703–22724. doi:10.1007/s11042-016-
4333-y
Aljawarneh, S., Yassein, M. B., & We’am, A. T. (2018, May 01). A multithreaded programming approach for
multimedia big data: Encryption system. Multimedia Tools and Applications, 77(9), 10997–11016. doi:10.1007/
s11042-017-4873-9
Aljawarneh, S. A., Alawneh, A., & Jaradat, R. (2017). Cloud security engineering: Early stages of SDLC. Future
Generation Computer Systems, 74, 385–392. doi:10.1016/j.future.2016.10.005
Aljawarneh, S. A., Moftah, R. A., & Maatuk, A. M. (2016). Investigations of automatic methods for detecting
the polymorphic worms signatures. Future Generation Computer Systems, 60, 67–77. doi:10.1016/j.
future.2016.01.020
Aljawarneh, S. A., & Vangipuram, R. (2018). GARUDA: Gaussian dissimilarity measure for feature representation
and anomaly detection in Internet of things. The Journal of Supercomputing. doi:10.1007/s11227-018-2397-3
Aljawarneh, S. A., Vangipuram, R., Puligadda, V. K., & Vinjamuri, J. (2017). G-SPAMINE: An approach to
discover temporal association patterns and trends in internet of things. Future Generation Computer Systems,
74, 430–443. doi:10.1016/j.future.2017.01.013
Anderson, J. P. (1972). Computer Security Technology Planning Study (Vol. 2). Fort Washington, PA: Anderson
and Co.
Bani Yassein, M. O., & Aljawarneh, S. A. (2016, April). A conceptual security framework for cloud computing
issues. International Journal of Intelligent Information Technologies, 12(2), 12–24. doi:10.4018/IJIIT.2016040102
Bartnes, M., & Moe, N. B. (2017). Challenges in IT security preparedness exercises: A case study. Computers
& Security, 67, 280–290. doi:10.1016/j.cose.2016.11.017
Boutaba, R., Salahuddin, M. A., Limam, N., Ayoubi, S., Shahriar, N., Estrada-Solano, F., & Caicedo, O. M.
(2018). A comprehensive survey on machine learning for networking: Evolution, applications and research
opportunities. Journal of Internet Services and Applications, 9(1), 16. doi:10.1186/s13174-018-0087-2
Burnap, P., French, R., Turner, F., & Jones, K. (2018). Malware classification using self organising feature maps
and machine activity data. Computers & Security, 73, 399–410. doi:10.1016/j.cose.2017.11.016
Chen, Z., Yeo, C. K., Lee, B. S., & Lau, C. T. (2017). Detection of network anomalies using Improved-MSPCA
with sketches. Computers & Security, 65, 314–328. doi:10.1016/j.cose.2016.10.010
Corbo, T. (2016). Information Security Trends: Overall Security Spending Remains Strong. Retrieved from
https://www.451alliance.com/Reports.aspx
47
Ghafir, I., Hammoudeh, M., Prenosil, V., Han, L., Hegarty, R., Rabie, K., & Aparicio-Navarro, F. J. (2018).
Detection of advanced persistent threat using machine-learning correlation analysis. Future Generation Computer
Systems, 89, 349–359. doi:10.1016/j.future.2018.06.055
Ghanem, T. F., Elkilani, W. S., & Abdul-Kader, H. M. (2015). A hybrid approach for efficient anomaly detection
using metaheuristic methods. Journal of Advanced Research, 6(4), 609–619. doi:10.1016/j.jare.2014.02.009
PMID:26199752
Gunupudi, R. K., Nimmala, M., Gugulothu, N., & Gali, S. R. (2017). CLAPP: A self constructing feature
clustering approach for anomaly detection. Future Generation Computer Systems, 74, 417–429. doi:10.1016/j.
future.2016.12.040
Gunupudi Rajesh Kumar, N. M., & Narsimha, G. (2017). A feature clustering based dimensionality reduction
for intrusion detection (FCBDR). IADIS International Journal on Computer Science and Information Systems.
Retrieved from http://www.iadisportal.org/ijcsis/papers/2017200103.pdf
Imran, A., Aljawarneh, S. A., & Sakib, K. (2016, April). Web data amalgamation for security engineering:
Digital forensic investigation of open source cloud. Journal of Universal Computer Science, 22(4), 494–520.
Jiang, J. Y., Liou, R. J., & Lee, S. J. (2011). A fuzzy self-constructing feature clustering algorithm for text
classification. IEEE Transactions on Knowledge and Data Engineering, 23(3), 335–349.
Kabir, E., Hu, J., Wang, H., & Zhuo, G. (2018). A novel statistical technique for intrusion detection systems.
Future Generation Computer Systems, 79, 303–318. doi:10.1016/j.future.2017.01.029
Kumar, G. R., Mangathayaru, N., & Narasimha, G. (2015, September). An improved k-Means Clustering
algorithm for Intrusion Detection using Gaussian function. In Proceedings of the International Conference on
Engineering & MIS 2015 (p. 69). ACM. doi:10.1145/2832987.2833082
Kumar, G. R., Mangathayaru, N., & Narasimha, G. (2015, September). Intrusion detection using text processing
techniques: a recent survey. In Proceedings of the International Conference on Engineering & MIS 2015 (p.
55). ACM. doi:10.1145/2832987.2833067
Kumar, G. R., Mangathayaru, N., & Narsimha, G. 2016. An approach for intrusion detection using fuzzy
feature clustering. In 2016 International Conference on Engineering MIS (ICEMIS) (pp. 1–8). doi:10.1109/
ICEMIS.2016.7745345
Kumar, G. R., Mangathayaru, N., & Narsimha, G. (2016, April). An Approach for Intrusion Detection Using
Novel Gaussian Based Kernel Function. Journal of Universal Computer Science, 22(4), 589–604.
Kumar, G. R., Mangathayaru, N., & Narsimha, G. 2016. Design of novel fuzzy distribution function for
dimensionality reduction and intrusion detection. In 2016 International Conference on Engineering MIS (ICEMIS)
(pp. 1–6). doi:10.1109/ICEMIS.2016.7745346
Kumar, G. R., Mangathayaru, N., & Narsimha, G. (2016). Intrusion Detection A Text Mining Based Approach.
arXiv:1603.03837
Kumar, G. R., Mangathayaru, N., & Narsimha, G. (2016). A Novel Similarity Measure for Intrusion Detection
using Gaussian Function. arXiv:1604.07510
Kumar, G. R., Mangathayaru, N., Narsimha, G., & Reddy, G. S. 2017. Evolutionary approach for intrusion
detection. In 2017 International Conference on Engineering MIS (ICEMIS) (pp. 1–6). doi:<ALIGNMENT.
qj></ALIGNMENT>10.1109/ICEMIS.2017.8273116
Lin, W. C., Ke, S. W., & Tsai, C. F. (2015). CANN: An intrusion detection system based on combining cluster
centers and nearest neighbors. Knowledge-Based Systems, 78, 13–21.
Tonejc, J., Güttes, S., Kobekova, A., & Kaur, J. (2016). Machine Learning Methods for Anomaly Detection in
BACnet Networks. Journal of Universal Computer Science, 22(9), 1203–1224.
Mangathayaru, N., Kumar, G. R., & Narsimha, G. 2016. Text mining-based approach for intrusion detection. In
2016 International Conference on Engineering MIS (ICEMIS) (pp. 1–5). doi:10.1109/ICEMIS.2016.7745351
Muller, S., Lancrenon, J., Harpes, C., Le Traon, Y., Gombault, S., & Bonnin, J.-M. (2018). A training-resistant
anomaly detection system. Computers & Security, 76, 1–11. doi:10.1016/j.cose.2018.02.015
48
Nguyen, H. T., Petrović, S., & Franke, K. 2010. A Comparison of Feature-Selection Methods for Intrusion
Detection. In I. Kotenko & V. Skormin (Eds.). Computer Network Security (pp. 242–255). Springer.
doi:10.1007/978-3-642-14706-7_19
Olusola, A. A., Oladele, A. S., & Abosede, D. O. (2010, October). Analysis of KDD’99 intrusion detection
dataset for selection of relevance features. In Proceedings of the World Congress on Engineering and Computer
Science (Vol. 1, pp. 20-22).
Papamartzivanos, D., Gómez Mármol, F., & Kambourakis, G. (2018). Dendron: Genetic trees driven rule induction
for network intrusion detection systems. Future Generation Computer Systems, 79, 558–574. doi:10.1016/j.
future.2017.09.056
Radhakrishna, V., Kumar, P. V., & Janaki, V. (2016, April). A Novel Similar Temporal System Call Pattern
Mining for Efficient Intrusion Detection. Journal of Universal Computer Science, 22(4), 475–493.
Schwab, P. (2015). The History of Intrusion Detection Systems (IDS): Part 1. Threatstack. Retrieved from https://
blog.threatstack.com/the-history-of-intrusion-detection-systems-ids-part-1
Shenfield, A., Day, D., & Ayesh, A. (2018). Intelligent intrusion detection systems using artificial neural networks.
ICT Express, 4(2), 95–99. doi:10.1016/j.icte.2018.04.003
Syed, A. R. S., & Issac, B. (2018). Performance comparison of intrusion detection systems and application
of machine learning to Snort system. Future Generation Computer Systems, 80, 157–170. doi:10.1016/j.
future.2017.10.016
49

An Evolutionary Feature Clustering Approach For Anomaly Detection Using Improved Fuzzy Membership Function

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

An Evolutionary Feature Clustering Approach For Anomaly Detection Using Improved Fuzzy Membership Function

Încărcat de

Drepturi de autor:

Formate disponibile

International Journal of Information Technology and Web Engineering

Volume 14 • Issue 4 • October-December 2019

An Evolutionary Feature Clustering

Narsimha Gugulothu, JNTUH, Hyderabad, India

1.1. Different Strategies to Be Followed for the Design of IDS

1.1.1. What is the Functional Architecture of the IDS?

1.1.2. Where is the IDS Being Deployed?

1.1.3. How is Decision Making Done?

1.1.5. What is the Analysis Timing?

1.1.6. What is the Type of Response?

1.2. Dimensionality Reduction

1. Removes any noisy data that present in each feature;

2.1. Dimensionality Reduction

2.2. Self-Constructing Feature Clustering Technique Approach (CLAPP)

3. PROPOSED FUZZY MEMBERSHIP FUNCTION

Equations (4) and (5) represent connection and feature vectors:

3.1. Problem Specification

3.2. Terminology and Representations

3.3. Feature Probability

Table 1. Terminology and notations

f(j) – jth feature D(d) – dth decision label

 D (d )  V ( j ) - Feature pattern vector for the jth feature, F

dC i ,C - Membership function of V(i) and V(j)

Equation (6) is equivalent to Equation (8):

 D (d )  F ( j ,1)X j + F ( j ,2)X j + … + F ( j , p)X j

3.4. Feature Pattern Vector

The sequence vector obtained out of conditional probability:

Equations (9) is represented in simplified form as given by Equation (10):

3.5. Pattern Similarity

 −Prob F −Prob F 

3.6. Extended Membership Function

3.7. Sample Computations

   = 0.5     * 0.5    

4.1. Dimensionality Reduction

4.2. Classification Algorithm

5. RESULTS AND DISCUSSIONS

Figure 2. Various sets of data available in KDD Cup 99 dataset

5.1. Classifier Accuracies on KDD Dataset (19 Attributes)

Figure 3. List of 41 standard attributes in KDD Cup ‘99 dataset

Normal U2R DoS R2L Probe Accuracies

Table 3. Confusion Matrix of C4.5 Classifier, θ = 0.999999 and σ= 0.5

Normal U2R DoS R2L Probe Accuracies

Table 4. Confusion Matrix of kNN(k=1) Classifier, θ = 0.9999 and σ= 0.5

Normal U2R DoS R2L Probe Accuracies

Table 5. Confusion Matrix of kNN(k=3) Classifier, θ = 0.999999 and σ= 0.5

Normal U2R DoS R2L Probe Accuracies

5.2. Classifier Accuracies on KDD Dataset (41 Attributes)

No. of Total ROC Values

Table 7. Confusion Matrix of C4.5 classifier, θ = 0.9995 and σ= 0.5

Normal DoS R2L Probe U2R Accuracies

Table 8. Confusion Matrix of kNN (k=1) classifier, θ = 0.9995 and σ= 0.5

Normal DoS R2L Probe U2R Accuracies

Table 9. Confusion Matrix of kNN (K=5) classifier, θ = 0.999999 and σ= 0.5

Normal DoS R2L Probe U2R Accuracies

Table 10. Confusion Matrix of C4.5 classifier, θ = 0.9999 and σ= 0.5

Normal DoS R2L Probe U2R Accuracies

5.3. ROC and Accuracies on KDD Dataset (41 Attributes)

Total ROC Values

Total ROC Values

5.4. ROC and Accuracies on KDD Dataset (19 Attributes)

Total ROC Values

Total ROC Values

Total ROC Values

 −Prob F −Prob F 

 = 0.5     * 0.5    