Sunteți pe pagina 1din 212

AN ANALYTICAL APPROACH FOR REAL TIME

INTRUSION DETECTION USING MACHINE


LEARNING PARADIGM


A THESIS


Submitted by

NAVEEN N C


In Partial Fulfillment of the Requirements
for the Degree of

DOCTOR OF PHILOSOPHY





DEPARTMENT OF COMPUTER SCIENCE
AND ENGINEERING
SRM UNIVERSITY, KATTANKULATHUR- 603 203


JANUARY 2013
ii


DECLARATION


I hereby declare that the dissertation entitled AN ANALYTICAL
APPROACH FOR REAL TIME INTRUSION DETECTION USING
MACHINE LEARNING PARADIGM submitted for the Degree of Doctor of
Philosophy is my original work and the dissertation has not formed the basis for the
award of any degree, diploma, associateship or fellowship of similar other titles.
It has not been submitted to any other University or Institution for the award of any
degree or diploma.





Place: Chennai

Date: NAVEEN N C



iii


SRM UNIVERSITY, KATTANKULATHUR 603 203
BONAFIDE CERTIFICATE


Certified that this thesis titled AN ANALYTICAL APPROACH FOR
REAL TIME INTRUSION DETECTION USING MACHINE LEARNING
PARADIGM is the bonafide work of Mr. N.C. Naveen who carried out the
research under my supervision. Certified further, that to the best of my knowledge
the work reported herein does not form part of any other thesis or dissertation on the
basis of which a degree or award was conferred on an earlier occasion for this or any
other candidate.



Dr. R.SRINIVASAN
SUPERVISOR
Professor- Emeritus
Directorate of Research
SRM University
Kattankulathur - 603203

iv

ACKNOWLEDGEMENTS
It gives me immense pleasure to thank and express my gratitude towards
my supervisor Dr. R Srinivasan, for his support throughout the course of my study.
His constant motivation, support and expert guidance have helped me to overcome
all odds making this journey a truly rewarding experience in my life. I thank him
from the bottom of my heart.
I would like to thank Dr. S Natarajan, Professor, Department of ISE,
PESIT for his valuable feedback and critical reviews which has highly contributed
towards the quality of the thesis. His wide knowledge and guidance have been a
great help and has provided a good basis for the present thesis. I also express my
sincere gratitude for spending his valuable time in reviewing and evaluating my
research work.
I am extremely thankful to Sir. Dr. M Ponnavaikko, Vice Chancellor,
SRM University, for having given the necessary importance to research at the
University and who has always wished SRM to be on the top of research fraternity.
I would like to express my heartfelt gratitude and thanks to my most
respectful Dr. C Muthamizhchelvan, Director of Faculty of Engineering and
Technology, SRM University, for having provided the necessary infrastructure for
carrying out this research successfully.
My thanks are due to Dr. S Ponnusamy, Controller of Examinations,
SRM University, for providing the necessary and timely support in the preparation
of synopsis of my research and this thesis. His advice and kind approach towards
research scholars shall not be forgotten by those who have interacted with him.
I am indebted to the Dr. E Poovammal, Head of the Department, and
staff both teaching and non teaching of Department of Computer Science who
have been extremely helpful at numerous occasions.
v

I am grateful for the support received from Dr. S V Kasmir Raja, Dean
(Research), SRM University, Chennai and for the tremendous support from the
staff at the university libraries and various other university resources.
I am extremely grateful to R V College of Engineering, Bangalore for
providing the support in the form of sponsorship and provide the audit data set to
carry out my research work.
I am grateful to all the members of doctoral committee for their remarks
and comments, without their insightful suggestions, this thesis would not have been
complete.
I would like to thank all my colleagues in ISE Department, RVCE for
their constant support throughout my research work. I am thankful to all the
students, faculty, non-teaching staff of RVCE who have directly or indirectly helped
me in completing the work.
Special thanks to my family for their patience and inspiration that
supported me in successfully completing my research. Finally, I would like to
express my deepest gratitude to my parents and siblings for their unwavering
confidence in my ability that helped me to accomplish my academic dreams.

vi


ABSTRACT


Recent work adopts a wide range of Artificial Intelligence techniques in
Intrusion Detection Systems. Another domain of research in this paradigm is Data
Mining that offers flexibility and has been a focus of research in the recent years.
Intrusion detection can be automated by making the system learn using
classifiers/clusters from a training set. A benefit of Machine Learning is that the
techniques are capable of generalizing from known attacks to variations, or even can
detect new types of intrusion. Recent research focuses more on the hybridization of
techniques to improve the detection rates of Machine Learning classifiers. Artificial
Neural Networks and Decision Trees have been applied to develop Intrusion
Detection Systems and have become popular. Several evaluations performed to date
indicate that Intrusion Detection Systems are moderately successful in identifying
known intrusions and quite a bit worse at identifying those that have not been seen
before. This provides a prospect area for research and commercial communities to
design Intrusion Detection Systems.
The research work presented in this thesis models the Intrusion Detection
System by ensemble approach using Outlier Detection, Change Point and Relevance
Vector Machines. The current new hybrid detection model developed combines the
individual base classifiers and Machine Learning paradigms to maximize detection
accuracy and minimize computational complexity. Results illustrate that the
proposed hybrid systems provide more accurate detection rate. Real time dataset is
used in the experiments to demonstrate that Relevance Vector Machines can greatly
improve the classification accuracy and the approach achieves higher detection rate
with low false alarm rates and is scalable for large datasets, resulting in an effective
Intrusion Detection System.
We first introduce the Single Layer Feedforward Network as the core
intrusion detector. This can be used to build a scalable and efficient network
Intrusion Detection Systems which are accurate in attack detection. Experimental
vii

results further demonstrate that our system is robust and is better than other systems
such as the Decision Trees and the Naive Bayes. We then introduce a logging
framework for network data collection and perform modeling using Change Point
and Outlier Detection to build real time Intrusion Detection System. Experimental
results performed on data sets collected in real time confirm that Change Point and
Outlier Detection algorithms are particularly well suited to detect attacks.
Experiments were also conducted with Support Vector Machines and compared
performance of the Decision Trees with this model.
In my research work a new method for developing Intrusion Detection
System using Relevance Vector Machine is implemented. Compared with other
classifier algorithms, Relevance Vector Machine has the advantage of utilizing the
sparseness and reduces the false alarms while still maintaining desirable detection
rate.


viii


TABLE OF CONTENTS

CHAPTER NO. TITLE PAGE NO.

ABSTRACT vi
LIST OF TABLES xiv
LIST OF FIGURES xv
LIST OF SYMBOLS AND ABBREVIATIONS xvii

1 INTRODUCTION 1
1.1 INTRUSION DETECTION 3
1.2 DEFINITIONS AND TERMINOLOGY 4
1.3 OVERVIEW OF POTENTIAL INTRUSIONS 5
1.3.1 Network - Based 5
1.3.2 Network Behavior Anomaly
Detection (NBAD) 5
1.3.3 Host Based IDS 6
1.3.4 Types of Attacks 6
1.3.4.1 DoS Attacks 6
1.3.4.2 User to Root (U2R) 6
1.3.4.3 Remote to Local (R2L) 7
1.3.5 A General IDS Architecture 7
1.3.6 Characteristics of IDS 9
1.3.6.1 Audit Source Location 10
1.3.6.2 Detection Methods 11
1.3.6.3 Behavior of Detection 14
1.3.6.4 Usage Frequency 14
1.4 DETECTION APPROACHES 16
1.4.1 Stateful (Event Correlation)
Intrusion Detection 16
1.4.2 Stateless Intrusion Detection 17
ix

CHAPTER NO. TITLE PAGE NO.

1.5 ARTIFICIAL INTELLIGENCE (AI)
TECHNIQUES APPLIED TO IDS 19
1.5.1 Change Point (CP) 20
1.5.2 Decision Trees 21
1.5.3 Feature Selection 22
1.5.4 Support Vector Machines (SVM) 22
1.5.5 Relevance Vector Machines (RVM) 23
1.6 AREAS OF RESEARCH 23
1.7 CONTRIBUTION AND NOVELTY 25
1.8 STRUCTURE OF THE THESIS 26

2 LITERATURE SURVEY 27
2.1 CURRENT IDS PRODUCTS 28
2.2 OUTLIER DETECTION (OD) 32
2.3 STATISTICAL BASED ANOMALY DETECTION 32
2.4 MACHINE LEARNING FOR ANOMALY
DETECTION 34
2.5 MACHINE LEARNING VERSUS
STATISTICAL TECHNIQUES 35
2.6 INSTANCE BASED LEARNING (IBL) 35
2.7 CHANGE POINT TECHNIQUE 36
2.7.1 Coefficient of Variation 37
2.7.2 Chauvenets Criterion 37
2.7.3 Peirces Criterion 37
2.7.4 CUSUM (CUmulative SUM) 38
2.7.5 Generalized Likelihood Ratio (GLR) 38
2.7.6 DDR (Direct Density Ratio) 38
2.8 APPLICATION OF DATA MINING (DM)
IN DEVELOPING IDS 39
2.8.1 Artificial Neural Networks (ANN) 40
x

CHAPTER NO. TITLE PAGE NO.

2.8.2 Feed Forward Neural Networks (FFNN) 40
2.8.2.1 Multi Layered Feed Forward (MLFF)
Neural Networks 40
2.8.2.2 Radial Basis Function Neural
Networks (RBFNN) 41
2.8.3 Recurrent Neural Networks (RNN) 42
2.8.4 Self Organizing Maps (SOM) 42
2.8.5 Bayesian Networks (BN) 44
2.8.6 Decision Trees (DT) 45
2.8.7 Support Vector Machines (SVM) 46
2.9 IMPORTANCE OF FEATURE
SELECTION FOR IDS 51
2.10 RELEVANCE VECTOR MACHINES (RVM) 52
2.11 CURRENT STATE OF IDS 53
2.11.1 Intrusion Prevention System (IPS) 54
2.11.1.1 Rate based IPS 55
2.11.1.2 Content based IPS 56
2.11.2 Intrusion Response System (IRS) 56
2.11.2.1 Static Decision Making 57
2.11.2.2 Dynamic Decision Making 57
2.11.3 Artificial Immune Systems 57

3 LOG AND AUDIT DATA COLLECTION
FRAMEWORK 58
3.1 INTRODUCTION 58
3.2 REQUIREMENTS OF IDS 61
3.3 PROBLEMS IN DETECTING
NETWORK ATTACKS 61
3.4 GENERAL FRAMEWORK OF
PROPOSED IDS 62
xi

CHAPTER NO. TITLE PAGE NO.

3.4.1 Data Collection Framework 64
3.4.1.1 TCP/IP Protocol 66
3.5 ARCHITECTURE OF COMPUTING
INFRASTRUCTURE 67
3.5.1 Normal Data Collection 68
3.5.2 Collection of Attack Data 68
3.6 PACKET CAPTURE MODULE 71
3.7 SUMMARY 82

4 ANOMALY DETECTION USING
NEURAL NETWORKS 83
4.1 INTRODUCTION 83
4.2 ENSEMBLE OF APPROACHES 84
4.2.1 Offline Processing 85
4.2.2 Multi Sensor Correlation 85
4.3 SURVEY OF AVAILABLE IDS PACKAGES 86
4.4 OPEN PROBLEMS IN THE DESIGN OF IDS 87
4.4.1 Feature Selection 87
4.4.2 Visualization 88
4.4.3 Predictive Analysis 88
4.5 APPLICATION OF ARTIFICIAL NEURAL
NETWORKS (ANN) IN IDS 89
4.5.1 Bayesian Networks (BN) 92
4.5.2 Nave Bayes (NB) Classification 93
4.6 HYBRID OR ENSEMBLE CLASSIFIERS 96
4.7 CONSTRUCTION OF CLASSIFIER MODEL 97
4.8 MULTI LAYER PERCEPTRONS (MLP) 99
4.9 ARCHITECTURE OF THE MODEL USING SLFN 100
4.9.1 Training Phase 103
4.9.2 Detection Phase 103
xii

CHAPTER NO. TITLE PAGE NO.

4.9.3 Selection of Layers in MLP 103
4.9.3.1 The Input Layer 103
4.9.3.2 The Output Layer 103
4.9.3.3 The Hidden Layers 103
4.10 PROPOSED SLFN ALGORITHM 108
4.11 EXPERIMENTAL RESULTS 108
4.12 SUMMARY 113

5 CHANGE POINT AND OUTLIER DETECTION
FROM NETWORK DATA 114
5.1 INTRODUCTION 114
5.1.1 Signature Detection (SD) 115
5.1.2 Anomaly Detection (AD) 115
5.1.3 Denial of Service (DoS) Detection 116
5.1.4 Data Mining and Machine Learning 117
5.2 MACHINE LEARNING BASED IDS 118
5.3 STATISTICS BASED IDS 121
5.4 MACHINE LEARNING VERSUS STATISTICAL
TECHNIQUES 122
5.5 OUTLIER DETECTION (OD) 122
5.6 PROPOSED ARCHITECTURE OF OUTLIER
DETECTION AND CLASSIFICATION MODULE 127
5.6.1 Dataset Description 130
5.6.2 Proposed Change Point Outlier
Detection (CPOD) Algorithm 131
5.7 STRUCTURE CHART 135
5.8 IMPORTANCE OF SELECTING OPTIMAL
SUBSET OF FEATURES 135
5.9 EXPERIMENTAL RESULTS 137
5.10 SUMMARY 142
xiii

CHAPTER NO. TITLE PAGE NO.

6 APPLICATION OF SVM AND RVM
IN DEVELOPING IDS 143
6.1 INTRODUCTION 143
6.2 RELATION BETWEEN DM, ML
AND STATISTICS 145
6.3 SVM TRAINING AND CLASSIFICATION 149
6.4 THE KERNEL MAPPING 153
6.5 RVM TRAINING AND CLASSIFICATION 155
6.6 METHODOLOGY AND PROPOSED
ARCHITECTURE 158
6.6.1 R Statistical Tool 161
6.6.2 RBF Networks 162
6.7 EXPERIMENTAL RESULTS 165
6.8 PERFORMANCE ANALYSIS OF RVM AND SVM 167

7 CONCLUSION 169
7.1 MAJ OR CONTRIBUTIONS AND NOVELTY 171
7.2 FUTURE WORK 173
7.3 SIGNIFICANT CHALLENGES AND OPEN ISSUES 174

REFERENCES 175
APPENDIX 1 GLOSSARY OF TECHNICAL TERMS 188
APPENDIX 2 ATTACK DESCRIPTION 190
LIST OF PUBLICATIONS 192
VITAE 193


xiv


LIST OF TABLES

TABLE NO. TITLE PAGE NO.

2.1 Ideal requirements of IDS 27
2.2 Leading IDS products currently available 29
3.1 Types of attacks generator 69
3.2 List of network dataset attributes selected 74
3.3 Data set statistics for a month 81
4.1 Types of attack classes 110
4.2 Number of examples considered 110
4.3 Comparison results 111
4.4 Result analysis of SLFN v/s Nave Bayesian algorithm 111
5.1 Real time data collected 137
5.2 Average packet rate for one week 138
6.1 The error obtained by the RVM with different
parametric values 166
6.2 The error obtained by the SVM with different
parametric values 166
6.3 Comparison between RVM and SVM models 166








xv


LIST OF FIGURES

FIGURE NO. TITLE PAGE NO.

1.1 Cyber intrusion during Mar 2012 Sep 2012 2
1.2 General architecture of IDS 7
3.1 General framework of proposed IDS 63
3.2 Framework for packet capture process 64
3.3 Campus network diagram for collecting
normal and attack data 67
3.4 Flow chart for packet capture 72
3.5 Flow chart for the packet preprocessing
and Feature extractor 75
3.6 Traffic pattern in the course of a working day (Monday) 76
3.7 TCP Packet count in the course of a day (Monday) 76
3.8 TCP statistics in a course of a working day (Monday) 77
3.9 UDP Packet count in the course of a day (Monday) 77
3.10 UDP statistics in a course of a working day (Monday) 78
3.11 Number of connections in a course of a working
day (Monday) 78
3.12 Connection statistics in the course of a week 79
3.13 Traffic statistics in the course of a week 79
3.14 Traffic statistics in the course of a month 80
3.15 Average packet count in the course of a day 80
4.1 General block diagram of ANN 90
4.2 A framework of SLFN 99
4.3 Block diagram of the model using SLFN 99
4.4 Flow chart for the training phase of neural networks 102
4.5 Flow chart for the anomaly detection using
neural networks 105
4.6 Initial screen 109
xvi

FIGURE NO. TITLE PAGE NO.

4.7 Packet capture screen 109
4.8 Use of WEKA tool for analysis 110
4.9 ROC Curves for 3 weeks data 112
5.1 Architecture of the model using CP and OD 127
5.2 Flow chart for change point and outlier detection 129
5.3 Structure chart of the research work 135
5.4 Initial packet capture screen 137
5.5 Outlier detected for Table 5.2 data 138
5.6 Plot of real time data and detection of change
point and outliers 139
5.7 Plot of real time data and detection of change
point and outliers 139
5.8 Plot of real time data collected for week 1 140
5.9 Plot of real time data collected for week 2 140
5.10 CUSUM analysis 140
5.11 Standard deviation analysis 141
5.12 Outlier detected for the data given in Table 5.2 141
5.13 Snapshot showing threshold of IP addresses captured 141
6.1 Relation between DM, ML and statistics 145
6.2 (a) Original data in the input space.
(b) Mapped data in the feature space 151
6.3 Architecture of the RVM model 159
6.4 RBF networks 163
6.5 ROC curve for the data 165
6.6 Performance chart of SVM 165
6.7 ROC curve obtained for RVM and SVM models 167




xvii


LIST OF SYMBOLS AND ABBREVIATIONS

AD - Anomaly Detection
ADS - Active Directory Service
AI - Artificial Intelligence
AIS - Artificial Immune Systems
ANN - Artificial Neural Networks
ARP - Address Resolution Protocol
BBID - Behavior Based Intrusion Detection
BN - Bayesian Networks
CBR - Case Based Reasoning
CP - Change Point
CUSUM - CUmulative SUM
CV - Coefficient of Variation
DDIS - Distributed Intrusion Detection System
DDoS - Distributed Denial of Service
DHCP - Dynamic Host Control Protocol
DM - Data Mining
DNA - Dynamic Network Analysis
DoS - Denial of Service
DR - Detection Rates
DT - Decision Trees
ELM - Extreme Learning Machines
ENN - Evolutionary Neural Network
ERM - Empirical Risk Minimization
ES - Expert Systems
FFNN - Feed Forward Neural Networks
FL - Fuzzy Logic
FN - False Negative
FP - False Positive
xviii

FPR - False Positive Rate
FS - Feature Selection
FSM - Finite State Machines
GA - Genetic Algorithms
GLR - Generalized Likelihood Ratio
HIDS - Host - based Intrusion Detection Systems
HMM - Hidden Markov Models
IBL - Instance Based Learning
ICMP - Internet Control Message Protocol
IDES - Intrusion Detection Expert System
IDS - Intrusion Detection Systems
IDS/IPS - Intrusion Prevention and Detection System
IG - Information Gain
ILP - Inductive Logic Procedures
IP - Internet Protocol
IPS - Intrusion Prevention System
IRS - Intrusion Response System
KB - Knowledge Based
KDD - Knowledge Discovery Databases
KNN - k-Nearest Neighbor
LSSVM - Least Squares SVM
MD - Misuse Detection
MFFNN - Multilayer Feed Forward Neural Networks
ML - Machine Learning
MLP - Multi Layer Perceptron
NA - Network Analyzers
NB - Nave Bayes
NBAD - Network Behavior Anomaly Detection
NIDES - Next-generation Intrusion Detection Expert System
NIDS - Network Intrusion Detection System
NN - Neural Network
OD - Outlier Detection
xix

OSI - Open Systems Interconnection
POD - Ping Of Death
PSVM - Proximal SVM
R2L - Remote to Local
RBF - Radial Basis Function
RBFNN - Radial Basis Function Neural Networks
RBS - Rule Based Systems
RL - Reinforcement Learning
RNN - Recurrent Neural Networks
ROC - Receiver Operating Characteristics
RSVM - Robust SVM
RV - Relevance Vectors
RVM - Relevance Vector Machine
SD - Signature Detection
SLFFN - Single Layer Feed Forward Networks
SLFN - Single-Hidden Layer Feed Forward Neural Network
SOM - Self Organizing Maps
SRM - Structural Risk Minimization
SSH - Secure Shell
SSL - Secure Socket Layer
SV - Support Vectors
SVM - Support Vector Machine
TCP - Transmission Control Protocol
TN - True Negative
TP - True Positive
U2R - User to Root
UDP - User Datagram Protocol
WEKA - Waikato Environment for Knowledge Analysis
WIPS - Wireless Intrusion Prevention Systems

1




CHAPTER 1
INTRODUCTION

With the advances in network based technology reliable operation of
network based systems plays a prominent role. The ability to detect intruders in
computer systems is important as computers are increasingly integrated into the
systems that we rely on. Internet security is a critical factor in the performance of an
enterprise which affects everything from business to cost management. Catastrophic
Internet attacks can disrupt business operations and hence security expertise is more
valuable. Recent research shows number of attacks on networks has dramatically
increased and consequently interest in analysis of network intrusion has increased
among the researchers. During the past decade as well as current, it has become
important to evaluate Machine Learning (ML) techniques for network based
Intrusion Detection Systems (IDS).
Several factors, such as choice of data set, method of validation and data
preprocessing, are found to affect the detection results significantly. These findings
have also enabled a better interpretation of the current body of research. Due to the
nature of the intrusion detection domain, there is an extreme discrepancy among the
data set, which poses a significant challenge to ML. Researchers have demonstrated
that well known techniques such as Artificial Neural Networks (ANN) and Decision
Trees (DT) have often failed to learn. However, this is not been recognized as an
issue in intrusion detection previously. But investigation demonstrates that it is the
class imbalance that causes the poor detection of some classes of intrusion.
Internet with its numerous benefits has also created different ways to
compromise the security and stability of the systems connected to it. Recently,
137529 incidents were reported to CERT/CC while in 1999, there were 9859
2



reported incidents. Around 86 attacks are reported on computer systems in the
United States that control critical infrastructure, factories and databases, according
to the Department of Homeland Security during Oct 2011 May 2012, compared
with 11 over the same period a year ago [1]. Security management operations
protect computer networks against Denial of Service (DoS) attacks, unauthorized
access to critical information, and modification, updation or destruction of data. The
automated detection and immediate reporting of these unauthorized events are
required to provide a timely response to attacks.

Figure 1.1 Cyber Intrusion during Mar 2012 Sep 2012: CERT-In monthly
security bulletin
With an ever growing use of computer technology, security for computer
has become important, both for work as well as for personal use. People using
computers are at some risk of intrusion, even if the computer is not connected to the
Internet or any other network. Intruder can attempt to access and misuse the system
if the computer is left unattended for a longer time. The problem is more if the
computer is connected to a network, particularly the Internet. Users from around the
world can reach the computer remotely and may attempt to access
private/confidential information or launch some form of an attack to bring the
system to a halt or cease to function effectively.
Section 1.1 provides an introduction to intrusion detection. Related
terminology and definitions that are used in this thesis are presented in Section 1.2.
Section 1.3 presents an overview of potential intrusions to computers and computer
3



networks. Taxonomy of IDS is provided in Section 1.4 and Section 1.5, which
includes a discussion of the main approaches to detecting intrusions. Main areas of
research and emerging area of research in intrusion detection is discussed in
Section 1.6.
1.1 INTRUSION DETECTION
Intrusion detection is the process of monitoring and analyzing events that
occur in a computer or networked computer system. Detection is carried out by
analyzing the behavior of users that conflict with the intended use of the system.
Any user using a computer will be at some risk of intrusion, even though the
computer is not connected to the Internet. If the computer is left unattended, any
intruder can attempt to access and try to misuse the system. The problem is much
more if the computer is connected to a network, particularly the Internet. Any user
from around the world can reach the computer remotely. Intruder may attempt to
access important private or confidential information or launch a form of attack to
bring the system to a halt or cease to function effectively. An intrusion to a
computer system does not need to be executed manually by a person. It may be
executed remotely and automatically with engineered software. A well known
example of this is the Slammer worm also known as Sapphire, which performed a
global DoS attack in 2003. The worm exploited vulnerability in Microsofts SQL
Server that disabled database servers and overloaded networks. Moore et. al [2] refer
to Slammer as the fastest computer worm in history, that infected many computer
systems around the world within ten minutes. Not only did the Slammer worm
restrict the general Internet traffic, it caused network outages and unforeseen
consequences such as cancelled airline flights, interference with elections, and
communication failures.
Malware is a kind of malicious software that intentionally harms a
computer or computer system. Individuals may not have much at stake if they are
targeted by a cyber attack, but it is a serious threat to enterprises and government
organizations. A survey conducted by the Web Application Security Consortium
(2008) revealed that more than 60% of attacks are profit motivated. There are many
4



examples in recent news of cyber attacks. For example, early in 2010, US power
grid revealed that they had been infiltrated by an intruder, leaving malware that was
capable of shutting down the entire grid. Later in the same year, a major spy
network GhostNet located in China claimed to have infiltrated to more than
1000 computers around the world. Similarly the Russian military was accused of
launching DoS attacks against Georgia during the war over South Ossetia. From
these examples, it is very clear that cyber attacks are threat to national security. This
has prompted President Barack Obama to initiate a national cyber security body in
the USA in May 2009, followed shortly by the UK.
There are several mechanisms that can be adopted to increase the security
in computer systems. Three levels of protection is of more importance namely attack
prevention, attack avoidance and attack detection that is explained in
Section 1.3.6.2.
1.2 DEFINITIONS AND TERMINOLOGY
IDS are software or a physical medium that monitor network traffic in
order to detect unwanted activity and events such as illegal and malicious traffic and
traffic that violates security policy. IDS employ techniques for modeling and
recognizing intrusive behavior in a computer system. Intrusive behavior is
considered as any behavior that deviates from normal, expected, use of the system.
IDS share many of the challenges of detecting fraud and fault management or
localization. Although, these are not the major focus of the thesis, there is an overlap
between these domains, especially for event correlation. There are many types of
intrusion, which makes it difficult to define it in a single term.
The capability and performance of IDS is normally measured using the
following terms:
a. True Positive (TP): A system that classifies an intrusion as an
intrusion. The TP rate also known as detection rate, sensitivity and
recall often used in several literature.
5



b. False Positive (FP): A system that incorrectly classifies normal
data as an intrusion that is also known as a false alarm.
c. True Negative (TN): A system that correctly classifies normal data
as normal. The TN rate is also known as specificity.
d. False Negative (FN): A system that incorrectly classifies an
intrusion as normal.
1.3 OVERVIEW OF POTENTIAL INTRUSIONS
There are several types of IDS technologies due to the variance of
network configurations. Each type has advantages and disadvantages in terms of
detection, configuration, and cost.
1.3.1 Network - Based
A Network Intrusion Detection System (NIDS) is one common type of
IDS that analyzes network traffic at all seven layers of the Open Systems
Interconnection (OSI) model and analyzes for suspicious activity. NIDS are easy to
deploy on any network and are capable of viewing traffic from many systems at
once. A new research area is Wireless Intrusion Prevention System (WIPS) that
monitors and analyzes the wireless radio spectrum in a network for intrusions and
performs countermeasures.
1.3.2 Network Behavior Anomaly Detection (NBAD)
NBAD analyzes traffic on network segments to determine if anomaly
exists in the amount or type of traffic. If an unwanted event occurs, segments that
see very little traffic transform the amount or type of traffic. NBAD requires sensors
that create a good snapshot of a network and requires benchmarking and base lining
to determine the nominal amount of a segments traffic.
6



1.3.3 Host Based IDS
Host based Intrusion Detection Systems (HIDS) analyze network traffic
and system specific settings such as software calls, local security policy, local log
audits, and more. A HIDS must be installed on every machine and requires a
configuration which is specific to that operating system and software.
1.3.4 Types of Attacks
1.3.4.1 DoS Attacks
These attacks interrupt services on a host by preventing it from dealing
with certain requests. This is a step in multi stage attack which is destructive that
crash a host or prevents it from functioning properly. There are three types of DoS
attacks:
1. Bugs in trusted programs can be used by an attacker to gain
unauthorized access to a computer system. Specific examples of
implementation bugs are buffer overflows, race conditions, and
mishandling of temporary files.
2. Creation of malformed packets that confuse the Transmission
Control Protocol/Internet Protocol (TCP/IP) stack of the machine
that is trying to reconstruct the packet.
3. Fooling a system by misrepresenting oneself and giving access.
1.3.4.2 User to Root (U2R)
These attacks exploit vulnerabilities in operating systems and software to
obtain root or administrator access to the system. For example consider the buffer
overflow attack. A buffer overflow occurs when the program writes more
information into the buffer space than the memory it has allocated. This allows an
attacker to overwrite data that controls the program execution path and seize the
control of the program to execute the attackers code instead the process code. Poor
7



programming practices and software bugs are the major risk factors. An effective
solution to the buffer overflow problem is to employ secure coding. While no
security measure is perfect, avoiding programming errors is always the best solution.
1.3.4.3 Remote to Local (R2L)
There are some similarities between this class of intrusion and U2R, as
similar kind of attacks may be carried out. In this case the attacker does not have an
account on the host and attempts to obtain local access across a network connection.
To achieve this, the attacker can execute buffer overflow attacks and exploit
configurations. With this the attacker may obtain data by misguiding a human
operator, rather than targeting software flaws. These classes may be used in IDS for
classifying intrusions, rather than only differentiating between normal and
intrusion. This will give more information about the type of intrusion, which may
affect the chosen method of reporting and acting on the suspected detection. Some
known events may be classified as an intrusion while other events needs to be
observed in the context of one or more events before they are classified as intrusion.
This could lead to repetition of the same event or a completely different event but
still IDS should be able to recognize simple, single event, attacks as well as
complex, multiple event attacks. As an example, Ping of Death attack may cause the
system to crash by sending large ping packets to a host.
1.3.5 A General IDS Architecture
In this section the general IDS architecture is discussed. In general the
common building blocks of IDS are as in Figure 1.2.



Figure 1.2 General architecture of IDS
Network Sniffers
Log Files
Special Monitoring
Module
Network Adapters and
Parsers

Data Preprocessing

Diagnosis and
Analysis of Data

Post Processing and
Expected Alarm class
ALARM
8



IDS in general consist of the following components:
1. Data Collection Phase: For an accurate detection of intrusion,
reliable and complete data about the target systems activities is
essential. Reliable data collection is a very complex issue and most
operating systems offer some form of auditing. Data used for
intrusion detection can be collected as:
a. User access patterns: The sequence of commands issued at the
terminal and the resources requested.
b. Network packet level features: Source and destination IP
addresses, type of packets and rate of occurrence of packets.
c. Application and system level behavior: Data collected from the
sequence of system calls generated by a process, which is also
known as audit patterns. These logs might contain security
relevant events, such as failed login attempts or they might log a
complete report on every system call invoked by every process.
Similarly, routers and firewalls provide event logs for network
activity that logs information, such as network connection
openings and closings, or a complete record of each packet. The
amount of system activity information that a system collects is a
tradeoff between overhead and effectiveness. A system that
records every action in detail might substantially degrade the
performance and requires huge disk storage. Collecting
information is an expensive task but collecting the right
information is important. Determining what information to log
and where and how to collect it is an open research problem.
2. Data Preprocessing Phase: This phase is responsible for collecting
and providing the log data in the specified form that will be used by
the Diagnosis phase to make a decision. Data preprocessor is, thus,
concerned with collecting the data from the desired source and
converting it into a format that is comprehensible by the analyzer.
9



3. Diagnosis and Analysis Phase: The analysis or the intrusion detector
phase is the core component which analyzes the audit patterns to
detect attacks. This is a critical component and one of the most
researched phases. Various pattern matching algorithms, ML, Data
Mining (DM) and statistical techniques can be used as intrusion
detectors. The capability of the analyzer to detect an attack often
determines the strength of the overall IDS.
4. Post Processing and Expected Alarm class: This phase controls the
mechanism to react and determine the best way to respond when the
analyzer detects an attack. The system either raises an alert without
taking any action against the source or blocks the source for a
predefined period of time. This action depends upon the security
policy that is predefined in the IDS.
Issues to validate whether the predictions made is correct and related to
the actual behavior of IDS implementations is the real challenge. A systematic and
complete validation would require that the predictions made by the approach are
compared with the behavior of actual IDS implementations. Such an activity would
represent an enormous challenge and precisely exemplifies the problem, which the
work attempts to address. It would be required that one or several rather complex
environments are built such that IDS can be analyzed under different conditions.
However, the most challenging aspect of any such undertaking of validation would
be the number and diversity of individual tests to be executed.
1.3.6 Characteristics of IDS
IDS can be classified into four characteristics as:
1. Audit source location : Host based Detection or Network based
detection
2. Detection method : Misuse Detection or Anomaly Detection
3. Behavior of detection : Passive or Active Detection
4. Usage frequency : Real time or Off line Detection
10



1.3.6.1 Audit Source Location
Normally IDS operates on one of two levels either on a host or a
network. A host based IDS monitors the local behavior on a single host. This is
performed by analyzing status or performance of a system. Similarly application
based IDS detect attacks against specific applications.
Recent research trend is towards network based IDS that analyze network
traffic. Currently there are IDSs that support both host based and network based
intrusion detection. Some of the limitations and challenges of network based IDS
include:
a. Not able to detect all forms of intrusion attacks, since some may not
generate network traffic.
b. Not able to deal with encrypted data via Secure Socket Layer (SSL)
connections and Secure Shell (SSH).
c. Detecting attacks on different operating system platforms.
Some of the limitations of host based IDS are:
a. Analysis of network activity that may be a part of an attack process
on the host.
b. Vulnerability of the system if an attacker obtains root/administrator
access.
From the points discussed, it is clear that the two types of IDS can
complement each other that achieve a broader coverage of detection.


11



1.3.6.2 Detection Methods
There are two main detection methods:
1. Misuse Detection (MD) or Knowledge Based (KB): Attempts to
detect knowledge by encoding known intrusions, typically as rules,
and use this to analyze events.
2. Anomaly Detection (AD) or Behavior Based Intrusion Detection
(BBID): Attempts to learn from the features of event patterns that
constitute normal behavior.
By observing patterns that deviate from established norms the system
detects that an intrusion has occurred. Some IDS offer both capabilities by
hybridization techniques. However, a system may also be modeled according to both
normal and intrusive data, which has become a common approach in recent research
which adopts ML techniques. MD is successful commercially but cannot detect
attacks for which it has not been programmed. It is prone to issue false negatives if
the system is not kept up to date with the latest intrusions. On the other hand MD
systems generally produce few false positives. Currently MD has incorporated
techniques that allow more flexibility and capability of detecting more variations of
attacks. This is made possible with ML techniques such as ANN which are built to
be able to generalize their models of known attacks to classify unseen cases. The
major benefit of AD is the ability to detect new attacks. But with this approach it is
more prone to issuing false positives.
There are different levels of data sources to model anomaly detection of
which some classifications are based on:
a. Keyboard level: Determination of the key that is hit, time since the
last hit, etc.
b. Command level: Analysis of the commands issued and the sequence
of execution. Researchers also consider output parameters and
arguments that are passed to system calls.
12



c. Session level: Data is collected to analyze end of session events that
produce data such as length of session, overall CPU usage, memory
and input-output usage, terminal name used, login time. However this
approach may not be suitable to build a real time IDS as the data is
obtained after the user has completed the session. Hence by this time,
the attacker may have completed the intrusion.
d. Group level : Aggregating users into groups.
Based on any of the above levels, AD system may build up several
profiles of users. This can be implemented as either considering one profile per user
or as groups of users who may have particular rights in the system. A challenge of
developing host based anomaly detection systems is to keep IDS up to date with
changes in environment. Continuous training and updating is required to avoid false
alarms that may increase, which is also referred as behavioral drift. It is possible to
model or train an anomaly system over time but one particular issue with this is that
there is a danger of learning intrusive behavior as well.
Three levels of protection may be adopted in computer system namely:
a. Prevention of Attack: Firewalls, user names and passwords, and
user rights.
b. Avoidance of Attack: Encryption and Decryption.
c. Detection of Attack: Using IDS.
Simply by adopting cryptography and protocols it is not possible to
prevent all intrusions and also control the communication between computers.
Firewalls block and filter certain types of data or services from users by enforcing
restrictions, but still they are unable to handle misuse that occurs within the network
or on a host computer. IDS complements detection of malicious behavior by adding
features to the current security mechanisms.
13



The purpose of IDS is to detect intrusions by analyzing the behavior of a
user conflicts with the intended use of the computer, or computer network. This
includes committing fraud, hacking into the system to steal information, conducting
an attack to prevent the system from functioning properly or even system break
down. Earlier the intrusion detection was performed by system administrators,
manually analyzing logs of user behavior and system messages, with poor chances
of being able to detect intrusions in progress. This gradually changed by developing
applications that can automatically analyze the data for the system administrators.
The first IDS to achieve this in real time were developed in the early 1990s. As the
magnitude of data in computer networks is increasing, developing IDS is still a
significant challenge.
Recent work adopts a wide range of Artificial Intelligence (AI)
techniques in IDSs. Rule Based Systems (RBS) were the first to be employed
successfully, and are still used in many IDSs. RBS allows for IDSs to automatically
filter network traffic and/or analyze user data to identify patterns of known
intrusions. Suspected intrusions can be reported to the administrator as rules in
detailed manner which lead to the network misbehavior. The major drawback of
RBSs is that they are having a set of rigid rules hence may not be able to detect new
intrusions, or variations of known intrusions. Recent research focuses more on the
hybridization of techniques to improve the detection rates of ML classifiers. ANNs
and DTs have been applied to develop IDS and have become popular.
The need for effective intrusion detection mechanisms as part of a
security mechanism for computer systems was recommended by Denning and
Neumann. They identified four reasons for utilizing intrusion detection within a
secure computing framework:
1. Many existing systems have security flaws which make them
vulnerable. Detection of this is very difficult because of technical and
economic reasons.
14



2. Secure systems cannot be installed in place of existing system with
security flaws because of application and economic considerations.
3. The development of completely secure systems is probably
impossible as the number of attacks is growing.
4. Highly secured systems are still vulnerable to be misused by
legitimate users.
Even though adopting mechanisms such as cryptography and protocols to
control the communication between computers and users, it is impossible to prevent
all intrusions. Firewalls serve to block and filter certain types of data or services
from users on a host computer or a network of computers, aiming to stop some
potential misuse by enforcing restrictions. However, firewalls cannot handle any
form of misuse occurring within the network or on a host computer.
1.3.6.3 Behavior of Detection
Active IDS is a system that is configured automatically to block
suspected attacks without any intervention by an analyst. IDS have the advantage of
providing real time solution in response to an attack. Passive IDS is a system is
configured to monitor and analyze network traffic activity and alert analyst to
potential vulnerabilities in case of attack. It will not have any capability of
performing any protective or corrective functions on its own. The major advantage
of passive IDS is that these systems can be easily and rapidly deployed.
1.3.6.4 Usage Frequency
Furthermore, intrusion scan occur in traffic that appears normal. IDS will
not replace the other security mechanisms, but compliment them by attempting to
detect when malicious behavior occurs. The main purpose of IDS is to detect the
conflict users behavior like committing fraud, steal information by hacking,
conduct an attack to prevent the system from functioning properly or even break
down. In the early days detection was performed by system administrators, manually
15



analyzing logs of user behavior and system messages. This led to poor chances of
being able to detect intrusions in progress. Anderson and Denning developed
software that can automatically analyze the data for the system administrators. The
first IDS to achieve this in real time were developed in the early 1990s. However,
due to the increased use of computers, the magnitude of data in contemporary
computer networks still renders this a significant challenge.
Several AI techniques have been adopted in IDS and RBS were the first
to be employed successfully, and are still at the core of many IDS. RBS
automatically filters network traffic and/or analyze user data and identify patterns of
known intrusions. Suspected intrusions are reported to an administrator in a detailed
manner by a set of rules that lead to the detection of intrusion. The drawback of
RBSs is that they are inflexible due to the rigid rules and cannot detect new
intrusions as well as variations of known intrusions.
The Knowledge Discovery in Databases (KDD) Cup 99 data set has
been widely used to evaluate intrusion detection prototypes in the last decade.
Although many researchers apply the same ML techniques to the data set,
contradictory findings have been reported in the literature. Despite the criticisms,
researchers continue to use the data due to a lack of better publicly available
alternatives. Hence, it is important to identify the value of the data set which has
largely been ignored. However by applying ML techniques detection of attacks still
remains a challenging task.
By providing appropriate training it is observed that ANNs are capable of
learning from imbalanced data. Multi Layer Perceptron (MLP) that is commonly
trained by the back propagation algorithm aims to minimize the error of the
classifier by correctly identifying the class(es). Evolutionary Neural Network (ENN)
approach trains MLPs by evolving their weights using Genetic Algorithms (GA).
The ENN can successfully learn from imbalanced data, which demonstrates that
MLPs are capable of detecting the minor classes with more appropriate training.
ENN only offers one solution that may have an inadequate trade off in performance.
It is been identified as a general problem with existing ML algorithms, that they
16



produce a single solution. Still different solutions may be obtained by changing the
training data set, changing the configuration parameters, or assigning weights to the
class apriori. Unless optimal weights are known it is not likely to offer the user the
ideal performance.
1.4 DETECTION APPROACHES
There are two main approaches in detecting intrusions
1. Stateful: In this approach attack is considered as being composed of
several events or stages. Event correlation is also considered
synonymous with stateful approach.
2. Stateless: This approach attempts to classify single events as being
an intrusion or not.
Event correlation and stateless intrusion detection are discussed further
followed by a discussion of the advantages and disadvantages of the two approaches.
1.4.1 Stateful (Event Correlation) Intrusion Detection
Stateful refers to processing low level events to identify significant
patterns that can be aggregated into a single higher level event. The principle is to
reduce the number of events and/or increase the semantic level. Aggregated events
will contain more meaning than individual low level events. Stateful approach is the
process of identifying events as forming part of an attack pattern, in which the
aggregated higher level event suggests the type of attack identified.
Event correlation systems may analyze data both spatially and
temporally, building deterministic and/or probabilistic models of intrusions. Spatial
systems analyze events from different sources simultaneously. Temporal systems
consider the order of events and the time between them. For example, event B must
occur within 50 milliseconds after event A has occurred to qualify as an intrusion X.
Rule based systems are commonly used for event correlation. This can be treated as
17



signature based approach, since the system filters the events according to a set of
rules or signatures to determine the pattern of intrusions. Model based systems such
as Bayesian networks are also available and several non-AI techniques have adopted
event correlation. Some of the popular methods are the Codebook, Finite State
Machines (FSM) and other state based approaches like alert management,
localization, Petri nets, dependency graphs and hyper bipartite networks.
1.4.2 Stateless Intrusion Detection
A stateless IDS attempts to classify the data collected from network
connections as intrusive or normal. Stateless intrusion detection is normally adopted
in DM, ML communities treating the intrusion detection problem as a classification
task. The raw data collected from network based IDSs needs to be transformed into
suitable feature vectors. The feature vectors may also include some apriori
knowledge, such as the count feature that contains information about the number of
connections from a particular user for a specific duration.
The major drawback of stateless intrusion detection is that multi stage
attacks cannot be detected. Despite this drawback a benefit of this approach is that
they are faster, require less memory, and can offer real time intrusion detection. The
event correlation is complicated by the existence of many possible attacks which
makes it difficult to determine which attacks are in progress. The computational
requirements are more as it is necessary to keep track of many events concurrently.
If implemented with ML, intrusions are automatically trained from the data set
instead of conducting knowledge engineering to produce a rule base for event
correlation. Many other techniques like ANN also offer some flexibility by allowing
detecting variations in an attack. Conventional rule based approaches need a rule for
every single intrusion and variation thereof. Not only does this require a large
knowledge base, but variations of known attacks may go undetected. However such
systems will be able to give accurate information when detecting an intrusion,
contrary to many ML techniques. Another benefit of state based systems is that they
can execute responses for potential intrusions before they are completed.
18



A challenge to use either of these approaches is updating the system. For
RBS, this involves adding new rules by updating old rules. The problem is that the
knowledge base of rules may grow very large with time and may not scale well. For
some ML techniques, updating may involve comprehensive re-training that involves
a process to gather data concerned to new intrusions. Some new techniques are able
to learn continuously online but the danger is that intrusive behavior may also be
trained. Event correlation can be effectively used to develop misuse detection while
stateless approaches provide an opportunity for misuse and anomaly detection. It is
clear that both approaches have certain pros and cons but neither approach can be
said to be better than the other. As with network based and host based intrusion
detection, state based and stateless implementations complement each other.
Recent research is on attacks against the learning phase implemented by
some IDS. When ML techniques are implemented to construct the IDS, the attacker
may try to pollute the learning data on which the IDS are trained. The attack is
launched by forcing the IDS to learn from properly crafted data, so that during the
operational phase the IDS will not be able to detect certain attacks against the
protected network. Once the events are analyzed and attacks are detected IDS
responds as passive or active. In passive response reports are sent to administrators
who will then take action on the matter depending on the severity. In the case of
active response IDS automatically initiates replies to attacks. In the case of passive
response it is possible for an attacker to monitor the email of the organization or can
use a false IP for initiating an attack. In that case it would be of no use to alert
administrators. Active responses initiate an automatic action that is taken when
certain types of intrusions are detected. Before alerting, additional information may
be collected in order to gain more clues of the possible attack. Another way of active
response is to stop the attack in order to avoid future attacks. Detection of intrusions
can be classified as real time or in line / off line which refer to not in real time. Some
systems combine both types of detection and are called hybrids.
Given the diverse type of attacks it is really a challenge for any IDS to
detect a wide variety of attacks with very few false alarms in real time environment.
The system must detect all intrusions with broad attack detection coverage and at the
19



same time resulting in very few false alarms. The system must also be efficient
enough to handle large amount of data set without affecting performance. A high
level of security can be ensured by disabling all resource sharing and
communication between computers. Compared with todays highly networked
computing environment this may not be a good solution and hence there is a need to
develop better IDS.
1.5 ARTIFICIAL INTELLIGENCE (AI) TECHNIQUES APPLIED
TO IDS
There are several aspects of intrusion detection to be considered when
developing IDS to be deployed in real life, such as:
1. Architecture: The main focus here is on what could be referred to as
a detection module that would exist in a larger IDS framework.
Especially in wireless and mobile ad hoc networks, the architecture is
very important. This includes determining where to deploy the IDS,
which is considered to be a research challenge.
2. Data collection: Many researchers have used the KDD Cup 99 data
set for the work since data collection is not required. It is necessary to
collect data from the environment in which IDS is to be deployed.
This also includes a process of labeling data for supervised learning.
3. Data preprocessing: Data preprocessing is necessary mainly for the
detection methodology that are suitable for ML. Although the
availability of the data set is very convenient for researchers,
challenge is in the transformation applied.
4. Performance: There are several methods that can be adopted to help
achieve better performing IDS. This could be in terms of detection
rates, usage of memory usage, feature selection and sampling of data.
Related to the data transformation, different transformations and
feature sets may facilitate to improve intrusion detection. Different
20



ways of preprocessing the data are available, which may yield
improved performance.
5. Other issues: Detecting new intrusions is always a challenge. This is
due to new software available, which inevitably has vulnerabilities
that can be exploited. Therefore, re-training IDS is necessary once
new data set is available. When and how this is to be done and
whether to have online training or unsupervised learning is still an
open research challenge.
Related to achieving real time intrusion detection, researchers have
investigated several methods of performing feature selection. The major benefit of
feature selection is that the amount of data required to process is reduced, ideally
without compromising the performance of the detector. In some cases, feature
selection may improve the performance of the detector as it simplifies the
complexity problem by reducing its dimensionality. Many different methods are
available for adopting AI based feature selection techniques for intrusion detection.
1.5.1 Change Point (CP)
The CP detection method has been studied extensively by statisticians.
CP method will to verify if the observed data set is statistically homogeneous. If it
detects any changes the point in time is recorded as CP. There have been various
research work done for off line data set. In this method entire data is collected first
and then a decision of a CP is made based on the analysis. On the other hand CP can
be applied to real-time or on-line data set where decisions are made on the fly.
In my research work, attack detection algorithm belongs to the real-time
CP detection. The challenge in developing such a system is to reduce memory usage,
computation time and design a model which will collect packets in real time. When
designing dynamic and complex systems that need to work on Internet, it may not be
possible to model the total number of session request arrivals. Hence a robust design
has to be done that is specific to the requirement and can detect different attacks.
Many potential application areas still exists, where it is necessary to consider these
21



models for developing network security with an early detection of attacks in
computer networks that lead to changes in network traffic. Most of the detection
algorithms are based on the CP detection theory. They utilize the threshold to test
statistics and achieve reduced false alarms. The real challenge in designing the CP
algorithms usually involves optimizing the measures such as average detection delay
and frequency of false alarms.
1.5.2 Decision Trees (DT)
DM techniques can be applied to IDS once the collected data set is
preprocessed and converted to the format suitable for mining process. The data that
is formatted can be used to develop a classification or cluster model. The
classification model can be a rule based, DT based or association rule based. This
model can be used for both misuse detection and anomaly detection, but it is
predominantly used for misuse detection. Classification and clustering are both
similar as they partition the data set into distinct segments called classes. But unlike
clustering, classification analysis requires that the user should have prior knowledge
of how classes are defined. It is also necessary that each dataset used to build the
classifier should have a value for the attribute used to define the classes. The main
objective of a classification model is to explore the data and classify, so from the
new unknown data set it is possible to discover interesting patterns. Classification is
used to assign data set to a pre-defined class and ML technique performs this task by
extracting rules from examples of correctly classified data. Classification models can
be built using a wide variety of algorithms. DT has been used to detect intrusions but
has to be applied by fine tuning the techniques, so that the false alarms are reduced.
They are among the well known ML techniques. A DT is a collection of a decision
node that specifies a test attribute, an edge that corresponds to one of the possible
attribute values and a leaf which contains the class to which the data belongs. The
two major phases of DT are building the tree and classification that is repeated until
a leaf is encountered. Several well known algorithms are developed in which ID3
and C4.5 algorithms being the most popular ones. Information gain is used as a
22



measure to select the test attribute at each node in the tree. The attribute with the
highest information gain is chosen as the test attribute and referred as a measure for
goodness of split. This attribute minimizes the information needed to classify the
samples in the resulting partitions. DM techniques will dynamically model and can
use classification techniques to accurately predict probable intrusions.
1.5.3 Feature Selection
Another challenge in developing IDS is to obtain feature set that is
comprehensive enough to separate normal data from intrusive data by keeping the
size of the data set as small as possible. More the features more difficult it is to
detect intrusions. For many ML algorithms increasing the number of features may
significantly increase the training time required to learn the intrusion task. Also the
run time will slow down and memory requirements will increase with more features,
commonly referred to as the curse of dimensionality. Hence, much research has
been devoted to developing efficient techniques to perform feature selection.
1.5.4 Support Vector Machines (SVM)
SVM has been extensively used in developing IDS applications. The
main advantages of using SVM is that
a. The standard optimization method can be used to find the solution of
maximizing the margin that separate two different classes
b. Minimize the training errors.
SVM permits the training errors as some training data may not be
linearly separable in the conventional SVM feature space. Using SVM the training
data can be mapped into the SVM feature space. There exists a hyper plane that can
separate these data with a maximal margin. This is made possible by the
introduction of kernel method, which is equivalent to a transformation of the vector
space for locating a nonlinear boundary.
23



1.5.5 Relevance Vector Machines (RVM)
A related ML classifier that may be used is the RVM that unlike SVM
incorporates probabilistic output through Bayesian inference. Its decision function
depends on fewer input variables than SVM. This allows for a better classification
estimates with small data sets having high dimensionality. In this thesis the RVM is
chosen as a tool for prediction. The RVM is a kernel based learning machine that
has the same functionality as SVM. Its form is a linear combination of data centered
basis functions that are generally nonlinear. The RVM model has shown to provide
equivalent and often superior results as compared to the SVM. The comparison is
with respect to generalization ability and sparseness of the model.
For the given data set, the process of deploying RVM is as follows:
a. Formulation of model for prediction
b. Finding the new feature vector with the given large data set and
c. Design of a model to generate the lowest prediction error, which is a
real challenge.
In this thesis a comparison of the performance of RVM and SVM, for
classifying network data as normal and attack has been carried out using real
time data.
1.6 AREAS OF RESEARCH
There are several areas of research in the domain of intrusion detection.
The following parameters will help move the field towards looking at developing an
ideal IDS:
a. Accuracy : No false positives.
b. Completeness : No false negatives.
c. Performance : Detection in real time.
24



d. Fault tolerance : IDS not becoming security vulnerability by
itself.
e. Timeliness : Handle large amount of data. How quickly the
IDS can propagate the information through the
network to react to potential intrusions. This is
also referred as scalability.
A large proportion of research in the development of IDS focuses on new
system architectures and detectors to improve the accuracy and completeness of the
IDS. Event correlation is a very important research area, which has been established
well for misuse detection. Related to system architectures, an emerging research area
is intrusion detection in wireless/mobile ad hoc and sensor networks. There is a
trend in applying ML to intrusion detection, which offers flexible detectors and
lends itself conveniently to anomaly detection. It is now common to develop hybrid
systems, which combine misuse and anomaly detectors, host based and network
based modules, and event correlation and stateless detectors. With the increasing
research on hybrid IDS much of the recent research focuses on correlating effective
alerts between the different modules. Related to alert correlation, alert aggregation is
also the focus of recent research that attempts to group similar events into a single
generalized event. This can significantly reduce the false positive rates and the
amount of alerts a system administrator is required to investigate. Much research
addresses scalability by distributing or decentralizing the IDS. This can be done with
event correlation as well as with other detection mechanism based on mobile agents
and artificial immune systems. Probes are issued to gather information about the
performance of the distributed system and utilize Bayesian reasoning to determine
how many probes should be issued and the number of tests they should perform. The
idea for developing this scheme is to reduce the computational costs and achieve real
time detection of the system. An inconsistency with intrusion detection is that the
IDS itself may become a security vulnerability. Many researchers suggest that state
based IDSs are more prone to attacks than stateless approaches, as they can be
25



flooded with events that prevent them from functioning efficiently. The applications
of ML algorithms, which learn over time, are particularly vulnerable to adversarial
attacks. There is a danger that an adversary may manipulate the training process by
gradually changing the behavior over time so that a new planned attack will not be
detected. Several research papers give an overview of threats to learning algorithms
themselves and discuss ways to protect against and detect attacks.
1.7 CONTRIBUTION AND NOVELTY
The research work presented in this thesis, models the IDS by ensemble
approach using Outlier Detection (OD), CP and RVM. The current new hybrid IDS
model developed combines the individual base classifiers and ML paradigms to
maximize detection accuracy and minimize computational complexity. The results
illustrate that the proposed hybrid systems provide more accurate IDS. Real time
dataset is used in the experiments to demonstrate that RVM can greatly improve the
classification accuracy. The approach achieves higher detection rate with low false
alarm rates and is scalable for large datasets, resulting in an effective IDS. Several
contributions from researchers have been made to the general ML domain in order to
develop a real time IDS. In this thesis, contributions are made both in intrusion
detection and ML domains.
The system developed aims to:
1. Detect broad attack range and that is not specific in detecting only the
previously known attack.
2. Reduce the number of false alarms generated by IDS thereby
improving the attack detection accuracy.
3. Work efficiently in real time.
Issues such as scalability, availability of training data, robustness to noise
and feature selection in the training data are also addressed.
26



1.8 STRUCTURE OF THE THESIS
The remainder of this thesis is organized as follows.
Chapter 2 provides literature review of IDS. The chapter also reviews
the commonly known attacks and defensive mechanisms.
Chapter 3 provides an introduction to the domain of intrusion detection
followed by literature review.
Chapter 4 describes the framework that is used to build effective and
efficient IDS. The framework developed can identify the network features related to
each packet encountered in the network. The real challenge was to manage and
process large volume of data, recognize misbehavior, low false alarms and react in
real time to avert an intrusion.
Chapter 5 describes how ML techniques can be integrated in the
framework along with the experimental results. To demonstrate the feasibility of the
architecture a prototype implementation has been developed and its performance has
been evaluated on synthetic as well as real intrusion data. This has shown that
applying ML technique is possible and even yields low false alarms.
Chapter 6 explains the use of CP, OD and RVM that is used to build a
real time IDS. Results obtained from the model provide accurate classification.
RVM may be preferable to SVM as it provides a Bayesian derived probability as an
output. These results suggest that these ML classifiers show good potential for
developing IDS. Our experimental results suggest that by using RVM, attacks can be
detected by analyzing only a small number of events which results in an efficient
and an accurate system.
Chapter 7 concludes and offers possible directions for future research.

27




CHAPTER 2
LITERATURE SURVEY

A large proportion of research in developing IDS focuses on developing
new system architectures to improve the accuracy and completeness of the IDS.
Several research areas in the domain of IDS help move the field towards
a set of ideal requirements as listed in Table 2.1.
Table 2.1 Ideal requirements of IDS
Accuracy No false positives
Completeness No false negatives
Performance Real time detection
Fault Tolerance IDS not becoming security vulnerability itself
Timeliness Handling large amounts of data
Scalability Quick propagation of information in the network to react to
potential intrusions using IDS

Intrusion Detection procedures are classified into three categories and
they differ in the reference data that is used for detecting unusual activity. Signature
based or MD considers signatures of unusual activity for detection. AD mechanism
considers a profile of normal system activity and Protocol-Based or Specification
based detection considers constraints that characterize the normal behavior of a
particular protocol or a program. The trend is to apply ML to IDS that offers
flexibility for detection and lends itself conveniently to AD. The AD operates
assuming that the attacks are different from the normal activity and try to focus on
28



identifying unusual behavior in a host or a network. However it is now common to
develop hybrid systems, which may combine misuse and anomaly detectors, host
based and network based modules, and event correlation and stateless detectors.
With increasing research on hybrid IDS, recent research focuses on correlating alerts
between the different modules in an efficient manner [3, 4]. Alert aggregation is one
such area in which similar alerts/events are grouped into a single generalized event.
With this method, data required to analyze to detect intrusion by the system
administrator gets reduced. Event correlation is another research area, which has
been established well for MD. MD based IDS often uses a set of rules or signatures
as attack model, with each rule usually dedicated to detect a different attack. Earlier
research work emphasized that data set for analysis can be obtained by real traffic,
sanitized traffic and simulated traffic [5,6]. But in real time fast response to external
events within an extremely short time is demanded and expected. Therefore, an
alternative algorithm to implement real time learning is imperative for critical
applications for fast changing environments. Even for offline applications, speed is
still a need. A real time learning algorithm that reduces training time and human
effort to nearly zero would always be of considerable value. The advent of new
technologies has greatly increased the ability to monitor and resolve the details of
changes in order to analyze better. Analyzing large amount of data is still a new
challenge. For identifying frequently changing trend data need to be analyzed and
corrected. In some cases, feature selection may improve the performance of the
detection as it simplifies the complexity problem by reducing the dimensions.
Researchers have proposed several methods of feature selection to achieve real time
IDS. The major benefit of feature selection is that the amount of data required to
process is significantly reduced, without compromising the performance of the
detection.
2.1 CURRENT IDS PRODUCTS
IDS can be classified according to many different features [7,8].
Table 2.2 lists some of the currently available IDS with features.
29



Table 2.2 Leading IDS products currently available
Name Description
SNORT
An open source network Intrusion Prevention and Detection
System (IDS/IPS). SNORT is developed by Sourcefire that
combines the benefits of signature, protocol and anomaly
based inspection. SNORT is one of the widely deployed
IDS/IPS technologies worldwide.
COUNTERACT
Delivers an entirely unique approach to prevent network
intrusions. The system stops attackers based on their
proven intent to attack. It will not use signatures, AD or
pattern matching of any kind. To launch an attack, an
attacker needs knowledge about a network's resources. Prior
to attack, intruders compile vulnerability and configuration
information by scanning and probing. This information is
used to launch attacks based on the unique structure and
characteristics of the targeted network. These characteristics
of intruders are used by COUTERACT to prevent
intrusions.
AIRMAGNET
Provides a simple, scalable WLAN monitoring solution This
enables an organization to proactively mitigate all types of
wireless threats.
BRO IDS
An open source, Unix based NIDS. Bro will passively
monitor network traffic and looks for suspicious activity.
Bro detects intrusions by first parsing network traffic to
extract its application level semantics and then executes
event oriented analyzers. It will compare the activity with
patterns deemed to be troublesome.
CISCO INTRUSION
PREVENTION
SYSTEM (IPS)
One of the most and widely deployed IPS. It provides
protection against more than 30,000 known threats, Timely
signature updates and Cisco Global Correlation to
dynamically recognize, evaluate, and stop emerging Internet
threats. Cisco IPS includes industry leading research and the
expertise of Cisco Security Intelligence Operations. It also
protects against increasingly sophisticated attacks, including
Directed attacks, Worms, Botnets, Malware, application
abuse. It provides intrusion prevention that stops outbreaks
at the network level and supports a wide range of
deployment options, with near real time updates for the most
recent threat.

30



Table 2.2 Leading IDS products currently available (Continued)
Name Description
J UNIPER
NETWORKS
INTRUSION
DETECTION AND
PREVENTION (IDP)
Offers comprehensive coverage by leveraging multiple
detection mechanisms. Backed by J uniper Networks
Security Lab, signatures for detection of new attacks are
generated on a daily basis. Working very closely with many
software vendors to assess new vulnerabilities, its not
uncommon for IDP Series to be equipped to prevent attacks
which have not yet occurred. Such coverage ensures that
organizations can merely react to new attacks and can
proactively secure network from future attacks.
McAFee HOST
INTRUSION
PREVENTION FOR
SERVER
Defends servers from known and new zero day attacks with
McAfee Host Intrusion Prevention. Boosts security with low
costs and simplify compliance by reducing the frequency of
patching new signatures.
SOURCEFIRE
INTRUSION
PREVENTION
SYSTEM
Built on the foundation of the award-winning Snortrules-
based detection engine. It uses a powerful combination of
vulnerability and AD based inspection methods.
STRATA GUARD IDS
/ IPS
This award winning high speed IDS/IPS gives a real time
protection from network attacks and malicious traffic. It will
prevent Malware, spyware, port scans, viruses, and DoS and
Distributed DoS(DDoS) attacks.
J UNIPER
NETWORKS
INTRUSION
DETECTION AND
PREVENTION (IDP)
Offers comprehensive coverage by leveraging multiple
detection mechanisms. Backed by J uniper Networks
Security Lab, signatures for detection of new attacks are
generated on a daily basis. Working very closely with many
software vendors to assess new vulnerabilities, its not
uncommon for IDP Series to be equipped to prevent attacks
which have not yet occurred. Such coverage ensures that
organizations can merely react to new attacks and can
proactively secure network from future attacks.
McAFee HOST
INTRUSION
PREVENTION FOR
SERVER
Defends servers from known and new zero day attacks with
McAfee Host Intrusion Prevention. Boosts security with low
costs and simplify compliance by reducing the frequency of
patching new signatures.
SOURCEFIRE
INTRUSION
PREVENTION
SYSTEM
Built on the foundation of the award-winning Snortrules-
based detection engine. It uses a powerful combination of
vulnerability and AD based inspection methods.
STRATA GUARD IDS
/ IPS
This award winning high speed IDS/IPS gives a real time
protection from network attacks and malicious traffic. It will
prevent Malware, spyware, port scans, viruses, and DoS and
Distributed DoS(DDoS) attacks.

31



Over the years, researchers and designers have used many techniques to
design IDS. But, there have been one or more issues with the existing IDS. Current
AD methods are mainly classified as Statistical Anomaly Detection, Detection
Based on Neural Network and Detection Based on DM, etc. The IDS for the AD
should first learn the characteristics of normal activities and abnormal activities, and
then the IDS detect traffic that deviate from normal activities. AD tries to determine
whether deviation from established normal usage patterns can be flagged as
intrusions [9]. AD techniques are based on the assumption that misuse or intrusive
behavior deviates from normal system procedure [10]. The advantage of AD is that
it can detect attacks that are never seen before but it is ineffective in detecting
insiders attacks. Shoubridge [11] developed IDS that can analyze critical network
events and trends. The [12,13] authors represents the dynamic network as a directed
graph and similarity measures were calculated that showed a change in trend of the
network behavior over time. With the same principle, Pincombe [14] developed
IDS that uses graph distance metrics, such as weight, modality, and diameter, to
compute graph similarities. Cumulative summation and minimum mean square
errors are then used recursively to detect CP. Although this method is faster
compared to previous methods, it did not provide good results for all graph distance
metrics. Hence an open question still remains as to which distance measure if
different types of graphs exists.
Recently DM and ML methods [15-18] for a data stream have been
actively proposed. A data stream is an ordered sequence of objects o
1
,,o
n
that must
be accessed in the same order. It can be read only once or a specified number of
times. Hence, it is not possible to maintain all the objects of a data stream in the
main memory. Each object should be examined only once to analyze the data
stream. The memory space for data stream analysis should be confined finitely,
although new objects get generated infinitely over time. Newly generated objects
should be analyzed as quickly as possible to maintain up to date results with
minimum false alarms. Therefore reducing false positives is major area of research.
Currently the detection of outliers has gained significant research interest with the
insight that outliers can be the key discovery for a possible new attack.
32



2.2 OUTLIER DETECTION (OD)
OD refers to the problem of finding interesting patterns in data that are
very different from the rest of the data. Such a pattern found, may often contain
useful information regarding abnormal behavior of the system. These patterns are
usually called outliers or noise. OD is an extensively researched area that finds
immense use in application domains such as credit card fraud detection, illicit access
in computer networks, military surveillance for enemy activities and many others.
Detection of outlier approaches found in literatures [19-21] has varying scopes and
abilities. Due to lack of prior knowledge on the data set collected OD problem falls
into the category of unsupervised learning. Another area of research is semi
supervised OD where some examples of outlier and inlier will be available as a
training set. The semi supervised outlier detection methods perform better than
unsupervised methods since additional label information is available. But such
outlier samples for training are not always available and if available may be diverse.
Thus learning from known types of outliers is not necessarily useful in detecting
unknown types of outliers. OD searches for objects that do not follow rules and
expectations in the data set. The detection of an outlier may be evidence that there
are new trends/patterns in data. Although, outliers are considered noise or errors,
they may carry important information. OD depends on the applied detection
methods and also data structures that are used. Depending on the approaches used in
OD, the methodologies can be broadly classified as Distance based, Density based
or Soft Computing based. Selecting subspaces in the case of OD is a complex and a
challenging problem [22] and outliers are rare and very hard to collect [23].
Rejecting some dimensions for the sake of easy calculation may lead to some loss of
important and also interesting knowledge.
2.3 STATISTICAL BASED ANOMALY DETECTION
Statistics is the widely used tool to build behavior based IDS [24,25].
The system or behavior of the user is measured by a number of variables sampled
over time. This includes
33



1. User login and logout time of each session.
2. Duration of the resource usage.
3. Amount of processor and memory consumed during that session etc.
4. Number of commands executed that are sampled over time.
One of the popular IDS is Intrusion Detection Expert System (IDES) that
works on statistical based AD. IDES monitors the users, remote hosts and target
systems with different parameters that include CPU usage, command usage, network
activity etc. Vectors are formed with these parameters and statistical profiles are
updated to reflect the new user behavior. To detect anomalies, IDES processes each
new data set and verifies it against the known profile. If any deviations are detected,
they are reported as probable intrusions. IDES is not suitable, if the parameters have
multi model distributions. This problem is sorted out in the next version of IDES
known as Next-generation Intrusion Detection Expert System (NIDES) [26]. NIDES
stores only statistics such as frequencies, means, variances, and covariance of the
profile since storing the audit data itself is too cumbersome. Given a profile with n
measures, NIDES characterizes any point in the n-space of the measures to be
anomalous, if it is sufficiently far from an expected or defined value. NIDES
evaluate the total deviation and not just the deviation of each individual measure.
Wisdom and Sense [27] is specifically designed using statistical anomaly detection
that analyzes behavior of users. Based on the activities of users over a period of time
the system updates a set of rules that statistically describe the behavior of the users.
Current behavior is then matched against these rules to detect inconsistent behavior.
These rules are regularly updated to analyze/detect new usage patterns. One of the
methods may be to model a system that keeps averages of all or any one of these
variables and detect whether thresholds are exceeded based on the standard
deviation. This model is too simple to represent the data faithfully. Even after
comparing the variables of individual users with aggregates group statistics may not
yield much improvement. Therefore, a more complex model needs to be developed
that compares profiles of long term and short term users or system activities. These
profiles are periodically updated as the behavior of user activities and this model are
now used in a number of intrusion detection tools and prototypes.
34



2.4 MACHINE LEARNING FOR ANOMALY DETECTION
AI is the simulation of human intelligence in machines with a feature to
be able to make decisions. ML is a branch of AI that is specifically concerned with
enabling machines to understand information. Recent research focuses more on the
combination of techniques to improve the detection rates of ML classifiers. For
example, Mahbod et.al [28] examined the performance of seven ML algorithms on
the KDD Cup99 dataset. They found that different techniques performed better on
different classes of intrusion. By combining the best techniques for each class,
overall performance of the detector has been increased. However, there are still
discrepancies in the findings reported in the literature as to how well different
techniques perform on the different classes of intrusions. ML is an ideal technology
for defending against attacks. Knowing that programmers tend to repeat mistakes, it
provides defenders with an advantage by detecting flaws before an intrusion
happens. Sophisticated IDS may use statistical techniques such as Nave Bayes [29]
to find new vulnerabilities. This enables the defender to capture, mislead or use
other counter measures against the attacker. ML provides an advantage to the
defender because it can detect any anomaly. Thus, the attacker would need to hide
byte patterns in addition to finding and exploiting vulnerabilities. This requires the
attacker to add complexity to bypass defenses. The IDS will learn with each attack
and ML makes the system more intelligent and secure over time. ML is an
algorithmic method in which an application automatically learns from the input and
provides the feedback to improve its performance over time. Unlike statistical
methods, which aim at determining the deviations in traffic features, ML based
approach aims at detecting anomalies using some unique mechanism. ML is focused
on finding relationships in data by analyzing the process and are classified as
1. Supervised Learning (SL) Attempts to learn some function with
given input vector and actual output.
2. Unsupervised Learning (UL) Attempts to learn only with given
input vector by identifying relationships among data.
35



3. Reinforcement Learning (RL) [30,31] Learns with a single bit of
information which indicates to the neuron whether the output is good
or bad.
These ML techniques can also recognize the patterns not presented
during a training phase. Some of the ML techniques used to detect attacks are Nave
Bayesian, SVM and ANN. Most of the ML algorithms applied to intrusion detection
have not considered minimizing the false alarms. The cost associated with false
alarm is more expensive than misdetection.
2.5 MACHINE LEARNING VERSUS STATISTICAL TECHNIQUES
A wide range of real world applications are discussed in the community
of Statistical Analysis and Data Mining. Statistical techniques usually assume an
underlying distribution of data and require the elimination of data instances
containing noise. Statistical methods, though computationally intense, can be
applied to analyze the data. Statistical methods are widely used to build behavior-
based IDS. The behavior of the system is measured by a number of variables
sampled over time such as the resource usage duration, the amount of processor-
memory-disk resources consumed during that session etc. The model keeps averages
of all the variables and detects whether thresholds are exceeded based on the
standard deviation of the variable.
2.6 INSTANCE BASED LEARNING (IBL)
Researchers have also employed IBL techniques in intrusion detection
and event correlation/fault management as a means to obtain a more flexible system
compared with most Expert Systems (ES). The drawbacks of using ES are extracting
knowledge of intrusions and coding this in the form of rules. This is difficult and
time consuming as managing and updating the rule base dynamically is a difficult
task. Another problem is specific rules cannot detect slight variations of known
attacks. IBL operates by solving these problems based on solved instances/cases
unlike ES which require previous knowledge to determine specific rules [32]. The
36



knowledge repository of instances/cases can be updated automatically and the
system can learn from its own experience during operation. However, IBL is not as
efficient as ES in performing event correlation and has high memory requirements
as it is necessary to store a large number of cases/rules. Case Based Reasoning
(CBR) may be used to improve the performance in acquiring and representing the
knowledge for IDS. Lane [33] developed IDS to perform anomaly detection by
means of IBL. In this the system builds up user profiles based on UNIX commands,
which are used to catch long term, unconventional as well as data that is misused.
The research focus is on data reduction techniques, addressing the general issue of
high memory requirement of IBL. However, IBL was not able to maintain the
characteristics of the users as compared to clustering method. Hence, clustering is
considered as the better alternative.
2.7 CHANGE POINT TECHNIQUE
Large scale computer network intrusions during the final stages can be
identified by observing the abrupt changes/threshold in the network traffic [34].
However, these changes are hard to detect and difficult to distinguish from usual
traffic fluctuations in the early stages. Researchers have developed efficient adaptive
sequential and batch sequential methods for an early detection of attacks/intrusions
that lead to changes in network traffic. These methods employ a statistical analysis
of network traffic to detect very subtle traffic changes. The algorithms are based on
CP detection methods that utilize threshold to achieve an alarm. The CP algorithms
are self learning, allow for the detection of attacks with a small average delay and
are computationally simple and thus can be implemented online. Application of CP
models falls into various categories such as Gaussian observations with varying
mean or variance, Poisson process with a piecewise constant rate, changing linear
regression models and Hidden Markov Models (HMM) with time varying transition
matrices. CP detection methods can be divided into two categories, posterior and
sequential. In posterior tests the entire data set is collected first and CP is detected
off-line based on the analysis on the data collected. In contrast sequential tests are
done on-line with the data collected and the analysis is made on the fly. In the
research work on statistical data analysis, detecting changes in mean of a given data
37



series plays an important role. Some of the approaches for CP detection are [35-37]
Chauvenets Criterion, Peirces Criterion, CUmulative SUM (CUSUM), Direct
Density Ratio estimation have been actively explored by the ML community, e.g.,
Kernel Mean Matching, Generalized Likelihood Ratio (GLR) and Direct Density
Ratio.
2.7.1 Coefficient of Variation (CV)
The behavior of certain type of data increases proportionally to the
average and the average shifts upwards at least by 50% so as the standard deviation
[38]. The common examples of this include filling the process, systems
measurements and accuracy of systems. CV can be used for such processes to better
characterize the ratio of the standard deviation to the average.
2.7.2 Chauvenet's Criterion [25,39]
From the mean value of a given sample of N measurements, a scatter is
defined from this criterion. All data points which fall within a band around the mean
that corresponds to the probability of [1-(1/2N)] should be retained. Data points are
considered for rejection only if the probability of the deviation obtained from the
mean is less than 1/2N.
2.7.3 Peirce's criterion [40]
This technique applies a rigorous method based on probability theory
which can be used to eliminate data outliers or spurious data in a better way.
However, Peirce's criterion can be applied more generally to a data set which
follows Gaussian distributions. A piecewise segmented function as proposed by
Stephen M Ross which caters for time dependent data where the CP is qualified as
the points between successive segments. A CP may be detected by discovering the
point such that all errors of local model fittings of segments to the data before and
after that point is minimized. However, it is computationally expensive to converge
to such a point as the local model fitting is required as many times as the number of
points between the successive points whenever the data is given as an input.
38



2.7.4 CUSUM (CUmulative SUM) [41-43]
CUSUM charts can be used to detect deviations from a given
predetermined values. This method computes the standard deviation of the observed
data from the desired process mean. This is accumulated over time to compute the
CUSUM at each given point. The basic rules for interpreting a CUSUM values are if
the data is above the overall average - CUSUM value increases, if the data is below
the overall average - CUSUM value decreases and if values have shifted it means a
there is a sudden change in direction. CUSUM method applies a hypothesis test to
distinguish between acceptable and unacceptable (quality) attribute values. CUSUM
can also be used to detect a shift in a normal mean based on inferences of the normal
distribution. It should be noted that the data provided to CUSUM calculations have
to follow the normal distribution. The continuous normal or Gaussian probability
distribution is parameterized by the population mean and the population variance
o
2
.
2.7.5 Generalized Likelihood Ratio (GLR) [37, 44]
This is an intuitive approach for handling the testing problems based on
discrepancy measures. The logarithmic value of the likelihood ratio between two
consecutive intervals in time-series data will be monitored for detecting change
points. The above premise has been extensively explored in the DM community in
connection with real world applications. Because of the computational cost of the
GLR, nonlinear models such as NN have never been employed, even for off line
analysis. Recent advances in both training algorithms and speed of computer has
made it possible to implement GLR for both off line and real time applications.
2.7.6 DDR (Direct Density Ratio) [45,46]
This is an estimation that has been actively explored in the ML
community. Kernel Mean Matching (KMM) avoids density estimation and directly
gives an estimate of the importance at test points. The values of the importance are
unknown in practice, so there is a need to estimate from the sample data that is
39



collected. If the training and test densities are estimated separately from the data
samples, then it is possible to estimate the importance by taking the ratio of the two
estimated densities. But this approach will suffer from the curse of dimensionality, if
the data has neither low dimensions nor a simple distribution. Vapnik [47] suggested
that DDR estimation is very crucial in statistical learning.
2.8 APPLICATION OF DM IN DEVELOPING IDS
Due to large volumes of intrusion detection, data set researchers have
applied many DM and ML algorithms for detecting intrusions. DM with ML can be
defined as the process of extracting patterns from large data sets by combining
methods from statistics and AI techniques. DM is seen as an increasingly important
tool by an enterprise to transform data into Business Intelligence (BI) giving an
informational advantage. It is also currently used in a wide range of profiling
practices, such as marketing, surveillance, fraud detection, and scientific discovery
[48-50]. The relevance of DM in detecting intrusion is still an open research area in
intelligent computing. DM can be used to clean, classify and study large amount of
network data to correlate violation for intrusion detection. The main reason for using
DM techniques for IDS is due to the enormous volume of existing and newly
appearing network data that require processing. The amount of data accumulated
each day by a network is enormous. DM algorithms can be used for misuse
detection and Anomaly Detection AD. Many DM algorithms have already been used
for AD such as DT, Nave Bayesian (NB), Neural Networks (NN), SVM etc.
The earlier work emphasized that data can be obtained by three
ways [51]:
i. By using real traffic.
ii. Using sanitized traffic.
iii. Using simulated traffic.
But in real time fast response to external events within an extremely short
time is demanded and expected. Therefore, an alternative algorithm to implement
40



real time learning is imperative for critical applications for fast changing
environments. Even for offline applications, speed is still a need, and a real-time
learning algorithm that reduces training time and human effort to nearly zero would
always be of considerable value. Mining data in real time is a big challenge.
2.8.1 Artificial Neural Networks (ANN)
ANN consists of a collection of processing units called neurons that are
well interconnected in a given topology. ANN has the ability of learning by example
and generalization from limited, noisy, and incomplete data. Hence ANN has been
successfully employed in a wide range of data intensive applications. ANN
contributions to and performance in the intrusion detection domain can be classified
as:
2.8.2 Feed Forward Neural Networks (FFNN)
FFNN is the first and the simplest type of ANN devised. Two types of
FFNN are commonly used in modeling either normal or intrusive patterns.
2.8.2.1 Multi Layered Feed Forward (MLFF) Neural Networks
MLFF uses various learning techniques and the most popular is Back
Propagation (MLFFBP). MLFFBP networks were applied to develop IDS primarily
in anomaly detection of user behavior level [52,53]. To distinguish between normal
and abnormal behavior Seth Freeman [54] used data set that consists of user
behavior. Ryan [55] considered the command patterns and their frequency of
execution. The recent research interest is to detect software behavior that is
described by sequences of system calls. Since system call sequences are more stable
than commands Ghosh [56] built a model using MLFFBP for the lpr program and
the DARPA BSM98 dataset. Detailed descriptions of this dataset can be found at
http://www.ll.mit.edu/IST/ideval/data/ data_index.html. The network traffic is
another vital data source that can be applied on network packets for the detection of
misuse. Although the training and test iterations required a day to complete,
experiments showed MLFFBP was successful as a binary classifier to correctly
41



identify attacks in the test data. MLFFBP can also be used as a Multi Class
Classifier (MCC). Such NN will have multiple output neurons and is more flexible.
Mukkamala, Sung and Ajith [57] compared twelve different learning algorithms on
the KDD99 dataset. They found that resilient back propagation achieved the best
performance in terms of accuracy and training time.
2.8.2.2 Radial Basis Function Neural Networks (RBFNN)
RBFNN are another popular type of FFNN. The classification is
performed by measuring distances between inputs and the centers of the RBFNN
hidden neurons. RBFNN are much faster than back propagation and is suitable for
problems with large data set [58]. Many researchers [59, 60] have developed
systems using RBFNN that can learn from multiple local clusters for well known
attacks and for normal events. A hybrid approach is used that integrates both misuse
and anomaly detections in a hierarchical RBF network. The first layer has an RBF
anomaly detector that identifies whether an event is normal or not. Anomaly events
are then passed through an RBF misuse detector chain for a specific type of attack.
Anomaly events which could not be classified were saved to a database and a
C-Means clustering algorithm clustered these events into different groups. Later a
misuse RBF detector was trained on each group, and added to the misuse detector
chain. Finally all intrusion events were automatically and adaptively detected and
labeled.
Since RBF and MLFF networks are widely used J iang and Zhang [61]
compared the RBF and MLFF networks for misuse and anomaly detection on the
KDD99 dataset. Their experiments have shown that for misuse detection, BP has a
slightly better performance than RBF in terms of detection rate and false positive
rate, but requires longer training time. For AD, the RBF network improves
performance with a high detection rate and a low false positive rate, and requires
less training time. In general RBF networks achieve better performance which was
also concluded by Hofmann et. al [62] using the DARPA98 dataset.
42



2.8.3 Recurrent Neural Networks (RNN)
It is important but difficult to detect attacks spread over a period of time.
The window size defined should be adjustable in predicting the user behavior.
A large window size is needed to enhance deterministic behavior when users
perform a particular job. During this time their behavior is stable and predictable.
When users are switching from one job to another, behavior becomes unstable and
unpredictable. Hence a small window size is required in order to quickly forget
meaningless past events. The inclusion of memory in NN led to the invention of
RNN or Elman network [63]. RNN was used in applications of forecasting, where a
network predicted the next event given an input sequence. If there is a deviation
between a predicted output and an actual event, an alarm was generated. Sheikhan
et.al [64] modified the RNN model with three layers. The results showed that the
model had an improvement in Classification Rate (CR), Detection Rate (DR) and
Cost Per Example (CPE). The model was compared with similar related works and
also the simulated MLP and Elman-based intrusion detectors. Ghosh et.al [65]
compared RNN with MLFFBP network for forecasting system call sequences and
the results showed that RNN achieved the best performance, with a detection
accuracy of 77.3% and zero false positives. Cheng et.al [66] developed a RNN to
detect network anomalies using the KDD99 dataset and emphasized the importance
of payload information in network packets. They showed that by discarding the
payload leads to an undesirable information loss and indicated that with payload
information the system outperformed RNN. Much research work confirms that RNN
outperforms MLFF networks in detection accuracy and generalization capability.
The Cerebellar Model Articulation Controller (CMAC) NN [67] is an additional
type of RNN which has the capability of incremental learning. This will avoid
retraining a NN every time when a new intrusion is detected.
2.8.4 Self Organizing Maps (SOM)
SOM and Adaptive Resonance Theory (ART) are two unsupervised
Neural Networks based on statistical clustering algorithms. They group objects by
similarity measure and are suitable for intrusion detection tasks. When grouped
43



normal behavior will be densely populated around one or two centers and abnormal
behavior or intrusions appear in sparse regions as outliers. SOM are Single Layer
Feed Forward Networks (SLFFN) where data is clustered in a low dimensional grid
[68]. It preserves topological relationships of input data according to their similarity
and is one of the most popular NN. Fox first employed SOM to detect viruses in a
multiuser machine in 1990. Researchers [69, 70] used SOMs to learn patterns of
normal system activities which have been used for misuse detection. Other
classification algorithms, such as FFNN were then trained on the output from the
SOM. Sarasamma et.al [71] proposed a work that calculates the probability of a
record mapped to a heterogeneous neuron being of a type of attack. A confidence
factor was defined to determine the type of attack that dominated the neuron. They
showed that different subsets of features were good at detecting different attacks.
The results showed that false positive rates were significantly reduced in hierarchical
SOMs as compared to single layer SOMs. Rhodes [72] examines network packets
and stated that every network protocol layer has a unique structure and function.
Malicious activities aiming at a specific protocol should also be unique and it is
unrealistic to build a single SOM to tackle all these activities. They organized a
multilayer SOM in which each layer corresponds to one protocol layer. Zanero [73]
analyzed payload of network packets and proposed a multi layer detection
framework. K-means algorithm was used to avoid calculating the distance between
each neuron. This greatly improved the runtime efficiency of the algorithm.
Several NN techniques have been used in intrusion detection and are
described as landmarks in the development of IDS. The aim is to simulate the
operation of the human brain, make it flexible and adaptable to environmental
changes. An alternative approach to training ANNs is proposed using A to evolve
the weights of the ANN, referred to as an ENN [74]. Hybrid systems developed
using NN and Fuzzy Logic [75] performed well with limited training sets on labeled
alerts. An excellent improvement was provided by hybrid systems with solutions for
real-world problems.
44



2.8.5 Bayesian Networks (BN)
BN is a probabilistic model that represents a set of variables and their
probabilistic independencies. BN are directed acyclic graphs with nodes
representing variables and edges representing the encoded conditional dependencies
between the variables [76]. They have been applied in AD in different ways and
have been utilized in the decision process of hybrid systems. Ben et.al [77]
developed an AD system that employed NB that assumes complete independency
between the nodes with two layers. BN are utilized in the decision process of hybrid
systems. They offer a sophisticated way of dealing with most hybrid systems that
generally obtain high false alarm rates. This is due to the simplistic approach of
combining the outputs of the techniques during the decision phase. Hybrid host
based AD system consists of the detection techniques like analyzing string length,
character distribution structure, and identifying learned tokens, in which a BN can
be used to decide the final output classification. Generally, in anomaly intrusion
detection, the number of possible features is large, but an attackers activity is
usually related to just a few features. Furthermore, the effectiveness of a specific
feature mainly depends on the behavior and for this reason activity can be analyzed
using individual feature independently. A typical AD method relies on statistical
analysis with an advantage that it can generate a concise profile containing only a
statistical summary without maintaining the activities. This can lessen the burden of
computation overhead for real time intrusion monitoring. However, when the value
of each feature varies largely, the statistical summary failed to make a concise
profile. However, most conventional classification algorithms [78], do not consider
any updates in a data set and are not suitable for real time data. Consequently, the
concept of updating should be incorporated, and a classification method that
considers updates in the data set has been proposed. The basic assumption of
conventional classification algorithms is that the data set is fixed and available
before classification can be performed. This assumption is valid only when static
knowledge embedded in a specific data set is the target of clustering. Therefore, it is
very important to identify an appropriate data set that reflects the characteristics of
the target application domain very well. Hence conventional classification
45



algorithms pose limitations as the normal behavior of a user is generally analyzed
off-line. Kok-Chin Khor et.al [79] implemented BN by selecting important features
by utilizing feature selection algorithm and filter approach. With respect to
performance they concluded that the BN performed equivalently well in detecting
network attacks. Mutz [80] extended the work by proposing an application based
IDS that considers system call arguments during analysis of user commands. Most
IDS exclude this information, which is a reason for the occurrence of False
Negatives (FN), as it is possible to execute intrusions with valid system calls.
Authors also focus on parameters like CPU load, since the IDS should not take up
too many resources. This is due to the fact that it may prevent the user from using
the computer efficiently. In their work the CPU load remained relatively low and
during stress tests, the increase in CPU load was within 20% on average.
2.8.6 Decision Trees (DT)
DT is popular in IDS, as they yield good performance and offer some
benefits over other ML techniques. For example, they learn quickly compared to
ANN and the tree structure built from the training data can be used to produce rules
for ES. DTs cannot generalize to new attacks in the same manner as certain other
ML approaches and they are not suitable for anomaly detection. New findings
demonstrate that DTs are very sensitive to the training data and do not learn well
from imbalanced data. DTs have been successfully implemented to IDS both as a
standalone and as a part of hybrid systems. An example to the success of DTs is an
application of a C5.0 DT [81]. Lot of work has been carried out to examine the
performance of several ML techniques on the KDD Cup 99 data set, including a
C4.5 DT. The DT provided good accuracy but could not perform well as other
techniques on some classes of intrusion. An ANN and k-means clustering obtained
higher detection rates and able to generalize from learned data to new, unseen, data.
Classification is a method of mapping from a set of attributes to a particular class.
DT induction is one of the classification algorithms in DM. The DT classifies the
given data item using the values of its attributes. The DT is constructed from a set of
pre-classified data set which is also known as training set. The main approach is to
select the attributes, which best divides the data items into their classes. The major
46



problem is deciding the attribute that will best partition the data into various classes.
The ID3 algorithm uses the Information Gain (IG) approach to solve this problem by
using the concept of Entropy, which measures the impurity of a data items. DT
induction has been implemented with several algorithms. ID3 later on got extended
to C4.5 and C5.0. C4.5 avoids over fitting the data and can handle continuous
attributes. C4.5 builds the tree from a set of data items using the best attribute to test
in order to divide the data item into subsets and then it uses the same procedure on
each sub set recursively. The best attribute to divide the subset at each stage is
selected using the IG of the attributes. Intrusion detection can be considered as a
classification problem where each network connection is identified either as an
attack or normal based on some existing data. DT can solve the problem of intrusion
detection by learning the model from the data set. Later using DT it is possible to
classify the new data item into one of the classes specified in the data set. Learning
is based on the training data and can predict the future data as one of the attack or
normal. DT works well with large data sets and this is important as large amounts of
network data flow across computer. The high performance of DT makes them
applicable in real time intrusion detection. Generalization accuracy of DT is another
useful property for intrusion detection model. New attacks on the system with small
variations of known attacks can also be detected after the model is built. This ability
to detect new intrusions is possible due to the generalization accuracy of DT.
2.8.7 Support Vector Machines (SVM)
SVM is a supervised learning algorithm that is used increasingly in IDS.
The classification performance of SVM model is better than the classification
methods, such as ANN [82]. The benefits of SVM are that they learn very
effectively with high dimensional data. A SVM maps input feature vectors into a
higher dimensional feature space through some nonlinear mapping. SVMs can learn
a larger set of patterns and are able to scale better, because the classification
complexity does not depend on the dimensionality of the feature space. SVMs also
have the ability to update the training patterns dynamically whenever there is a new
pattern detected during classification. The main disadvantage is that SVM can only
47



handle binary class classification whereas intrusion detection requires multi-class
classification.
The SVM is one of the most successful classification algorithms in the
DM area. The training time of SVM is a serious problem in the processing of large
data sets which limits its use in DM applications as it requires the processing of
huge data sets. Normally it would take years to train SVM on a data set consisting of
one million records. Many researchers have carried out work to enhance SVM in
order to increase its training performance [83-85]. This is achieved either through
random selection or approximation of the marginal classifier. These approaches are
still not feasible as multiple scans of entire data set are required which is also
expensive to perform [86]. Seo [87] applied SVM to Host-based AD of
masquerades. Their work is to analyze sequences of UNIX commands executed by
users on a host. Kim applied SVM with a Radial Basis Function (RBF) kernel,
analyzing commands over a sliding window and achieved a detection rate of 80.1%.
Seo examines two different kernels, K-gram and String kernel, which yielded higher
detection rates of 89.61% and 97.40%, respectively. The drawback is the same as
with the RBF kernel employed by Seo and Cha, that the false positive rate is higher.
Seo also examined a hybrid of the two kernel methods, which gave nearly identical
results as obtained by Kim. An unsupervised class of SVM was proposed by Dennis
[88], which has been adopted in several studies, comparing its performance with
clustering techniques. SVMs are supervised learning algorithms, which have been
applied increasingly to misuse detection in the last decade. One of the primary
benefits of SVMs is that they learn very effectively from high dimensional data.
Furthermore, they are trained very quickly. Mukkamala [89] conducted a
comparative study of feed forward MLP and SVM for misuse detection. Identical
detection rates were obtained, and the SVM training time was comparatively less
than MLP. SVM algorithms are binary classifiers, which will be sufficient for only
distinguishing between normal and attack. Recent SVM algorithms support multi
class learning [90]. The approach is to combine several two classes of SVM. Sung
and Mukkamala [91] applied SVM to network based intrusion detection with five
types of SVM. For each SVM, the training data is partitioned into two classes as
48



normal or intrusions. The hybrid technique adopted is that SVM with the highest
output value is taken as the final output. Peddabachigari [92] conducted practical
analysis of SVM and DT performed as standalone detectors and also as hybrids.
Performance was considered as a parameter and the results indicate that the hybrid
method performs better. Due to the magnitude of data involved in network-based
intrusion detection, Rung [93] proposed a hybrid which combines SVM with
weighted voting schema technique to shorten the training time. A hierarchical
clustering algorithm was employed to locate boundary points in the data that best
separates the two classes. These classes are then used to train the SVM as an
iterative process. During each iteration the support vectors were recalculated and the
SVM is tested against a stopping criterion. This is to determine if a desirable
threshold of accuracy is exceeded or not. The evaluation was done on the DARPA98
data set and the accuracy was improved. This was mainly due to correctly
classifying more DoS attacks. However there was an increase in false positive rates.
Song [94] proposed a Robust SVM (RSVM) that was developed to better deal with
noise. The RSVM was applied to host based intrusion detection by Hu [95]. The
benefit of using RSVM is that it produces less support vectors, which makes it a
quicker algorithm.
Ganapathy [96] pointed out that SVM can obtain generalization ability
with less training time through simulation experiments on a few artificial and real
benchmark function approximation and classification problems. They have indicated
that SVMs can perform well in text classification problems. Recently a significant
contribution showing the relationship between Extreme Learning Machines (ELM)
and SVM in the context of classification is made [97]. Recently researchers have
made a more in depth exploration of their relationship, and compared the
performance of ELM, SVM, and Least Squares SVM (LSSVM) [98]. ELM provides
a unified learning platform to different applications, such as regression, binary, and
multiclass classifications for the LSSVM, Proximal SVM (PSVM) [99] and other
regularization algorithms. ELM avoids issues involving manual tuning control
parameters like learning rate, learning epochs etc which are difficult to manage in
traditional approaches and reaches good solutions analytically. ELM can be
49



implemented and used easily with faster learning speed, response time and ease of
implementation that are keys to the success in the design of IDS. ELM algorithm
tends to achieve similar or better generalization performance at much faster learning
speed than the SVM and LSSVM algorithms. However, there also remain several
aspects needing further consideration. Recent experimental investigations focus
mainly on the comparisons of SVM and ELM. Both are applied to a variety of
examples but the advantages/disadvantages of applying these methods are still
unknown in real time network intrusion area. Knowing such information may
provide more insight into the SVM and ELM algorithms because the former is based
on the Structural Risk Minimization (SRM) principle which is especially suited for
learning small samples, while the latter is based on the inductive principle known as
Empirical Risk Minimization (ERM). The results can strengthen the understanding
on the essential relationship between SVM and ELM. This can also serve as
complementary knowledge for the past experimental and theoretical comparisons
between them. SVM algorithms are binary classifiers that are sufficient to
distinguish between normal and intrusive data. Recent SVM algorithms support
multi class learning. The approach combined several two-class SVMs and for each
SVM, the training data is partitioned into two classes so that one represents an
original class and the other class represents the attacks. It is also necessary to specify
an upper bound parameter C that can be determined experimentally. This results in a
cross-validation procedure, which is wasteful both for computation as well as data.
Kernel based ML algorithms are based on mapping data from the original
input feature space to a kernel feature space of higher dimensionality to solve a
linear problem in that space. These methods allow us to interpret and design learning
algorithms geometrically in the kernel space. SVM is one of the several Kernel
based techniques available in the field of ML. The choice of a proper kernel function
plays an important role in SVM based classification/regression. It is difficult to
choose one which gives the best generalization for a given dataset. Many Kernels
have been proposed in the SVM literature. Cheng [100] creates a kernel function
suitable for the training data using a GA mechanism. They showed that their genetic
50



kernel has good generalization abilities when compared with the polynomial and the
RBF kernel functions.
Ye [101] proposed an orthogonal Chebyshev kernel function. Chebyshev
polynomials are first constructed through Chebyshev formulae. Then based on these
polynomials Chebyshev kernels are created satisfying Mercer condition. They
showed that it is possible to reduce the number of support vectors using this kernel.
Wang et. al [102] proposed the Weighted Mahalanobis Distance Kernels. They first
find the data structure for each class in the input space via agglomerative
hierarchical clustering and then construct the weighted Mahalanobis distance kernels
which are affected by the size of clusters they reside in. Xu [103] proposed using the
weighted Levenshtein distance as a kernel function for strings. They used the UCI
splice site recognition dataset for testing their proposed specific kernel which got the
best results in this problem. They used the boosting paradigm to construct the
learned kernel. Their approach is suitable in learning tasks where the test data
distribution is different from the training data distribution. Lodhi [104] introduced a
novel kernel for comparing two text documents. The kernel is an inner product in the
feature space consisting of all subsequences of length k. A subsequence is any order
sequence of k characters occurring in the text though not necessarily contiguously.
These subsequences were given weightage based on some decay factor of their full
length in the text, hence putting some emphasis on contiguous characters. Rieck et al
[105] proposed an algorithm for computation of similarity measures for sequential
data. The algorithm uses suffix trees for efficient calculation of various kernel
functions. Its worst-case run-time is linear in the length of sequences and
independent of the underlying embedding language, which can cover words,
k-grams or all contained subsequences. Experiments with network intrusion
detection, Dynamic Network Analysis (DNA) and text processing applications
demonstrate the utility of distances and similarity coefficients for sequences as
alternatives to classical kernel functions.
Many of the detection results reported till date using ML algorithms with
DT, NN and SVM indicate that attacks involving more features in the data set have
substantially lower detection rates. Hence feature relevance analysis is another
51



research area of interest to substantiate the performance of ML IDS. The objective is
to investigate the relevance of the features with respect to dataset labels. That is, for
normal behavior and each type of attack the system should determine the most
relevant feature, which best discriminates the given class from the others.
To achieve this IG, which is the underlying feature selection measure for
constructing DT can be used. For a given class, the feature with the highest IG is
considered the most discriminative feature. Researchers have proposed several
methods of feature selection to achieve real time IDS. The major benefit of feature
selection is that the amount of data required to process is significantly reduced,
without compromising the performance of the detection. In some cases, feature
selection may improve the performance of the detection as it simplifies the
complexity problem by reducing the dimensions.
2.9 IMPORTANCE OF FEATURE SELECTION FOR IDS
Data preprocessing is considered as an important step in IDS. The
amount of data set that needs to be examined for detection of attack is very large
even for a small network. Analysis is very difficult as the number of features
available in the data set can make it harder to detect suspicious behavior patterns.
As complex relationships exist between the features, it is better to reduce the amount
of data to be processed for IDS. This is particularly important if real time intrusion
detection is preferred. Reduction of features can be made by considering the data
that is not useful by filtering. Data can be grouped or clustered by storing the
characteristics of the clusters instead of the individual data. Feature selection can
improve classification performance by reducing the computational complexity and is
an important preprocessing technique. Feature selection is the important step in
building intrusion detection models [106,107]. This will also increase the available
time for detecting intrusions but most of the work is still done manually and the
features selection depends strongly on expert domain knowledge. ML technique
provides the wrapper and the filter models for automatic feature selection. The major
problem that many researchers face is how to choose the optimal set of features.
This is because not all features are relevant to the learning algorithm. Irrelevant and
redundant features with noisy data can affect the learning algorithm by severely
52



degrading the performance with respect to training and testing time. Feature
selection was proven to have a significant impact on the performance of the
classifiers. Many researchers as in [108-110] illustrate that feature selection can
reduce the building and testing time of a classifier. Currently two models are of most
importance namely the filter model and the wrapper model. In the filter model
statistical characteristics of a data set is considered directly without relating to any
learning algorithm. The filter model uses a measure such as correlation, consistency,
or distance measures to compute the relevance of a set of features. In contrast, the
wrapper model will assess the features that are selected by learning algorithms
performance. The wrapper model uses the predictive accuracy of a classifier as a
means to evaluate the goodness of a feature set. Hence the wrapper model
requires more time [111]. The requirement of computational resources to find the
best feature subsets is also more in wrapper model. In order to increase the
computational efficiency, usually the filter method is used for selection of features
from high dimensional data sets. It is well known that the redundant features can
reduce the performance of IDS. A major challenge in the IDS feature selection
process is to choose appropriate measures that can precisely determine the relevance
and the relationship between features of a given data set.
2.10 RELEVANCE VECTOR MACHINES (RVM)
In spite of good performance with different datasets, SVM still suffers
from shortcomings such as visualization/interpretation of model, kernel choice and
kernel specific parameter. Recently RVM, another kernel based approach is being
explored for classification and regression problems. RVM proposed by Tipping
[112] is a sparse ML algorithm that is similar to the SVM in many respects. RVM is
another area of interest in the research community as they provide a number of
advantages. The advantage of RVM over the SVM is the availability of probabilistic
predictions, using arbitrary kernel functions and not requiring setting of the
regularization parameter. RVM is based on a Bayesian formulation of a linear
model with an appropriate sparse weight prior distribution. The sparseness property
enables selection of the proper kernel at each location by pruning all irrelevant
kernels which results in a sparse data representation. As a result, they can generalize
53



well and provide inferences at very low computational cost [113]. Through the use
of proper kernels in SVM, good generalization performance can be achieved. Some
desirable properties are, SVM fits functions in high dimensional feature spaces and
large space of functions available in feature space. It is sparse, which means only a
subset of training data set is retained at runtime that improves the computational
efficiency. Although relatively sparse, SVM makes unnecessary use of basis
functions as the number of Support Vector (SV) required typically grows linearly
with the size of the training data set. SVM outputs a point estimate with regression
and a binary decision in classification. As a result it is difficult to estimate the
conditional distribution to capture the uncertainty during prediction. In RVM the
kernel function must be the continuous symmetric kernel of positive integer operator
to satisfy Mercer condition. Maintaining its classification accuracy RVM has the
ability to yield a decision function that is much sparser than SVM. This leads to
significant reduction in the computational complexity of the decision function and
thereby making it more suitable for real time applications.
The RVM produces a function which is comprised of a set of kernel
functions also known as basis functions and a set of weights. This function
represents a model for the system presented to the learning process from a set of
training data set. The kernels and weights calculated by the learning process and the
model function defined by the weighted sum of kernels are fixed. From this set of
training vectors the RVM selects a sparse subset of input vectors which are deemed
to be relevant by the probabilistic learning scheme [114]. This is used for building a
function that estimates the output of the system from the inputs. These relevant
vectors are used to form the basis functions and comprise the model function.
2.11 CURRENT STATE OF IDS
IDS typically consist of security functions, firewall, IPS/IDS and some
filtering functions like anti-spam, antivirus and URL. Recent challenge in
developing IDS is to develop security software solutions and appliances to defend
against the threats faced by enterprise networks. The main focus is to develop
systems that work in real time with detection, prevention and response [115].
54



Detection can be done either through static signatures or anomaly detection. New
research work focuses on approaches that can secure the network by looking for the
reason about risks. This can happen before an attack happens and limit exposure to
threats. A general framework of IPS uses a trigger based approach to do reactive
network measurement [116]. NN approaches combine the complexity of some of
the statistical techniques with the ML objective of imitating human intelligence.
This is done at a more unconscious level and hence there is no accompanying
ability to make learned concepts transparent to the user. Important problems remain
to be solved although variety of security tools incorporating AD functionalities
exists. IDS are continuously evolving with the goal of improving the security and
protection of networks and computer infrastructures but there still exist several open
research issues. Some of the most significant challenges in the area are:
1. Low detection efficiency : Due to the high FP rate it calls for the
exploration and development of new, accurate processing schemes, as
well as better structured approaches to modeling network systems.
2. Low throughput and high cost: Due to the high data rates, IDS is
intended to optimize intrusion detection concerned with grid
techniques and distributed detection paradigms.
3. Absence of appropriate metrics: Due to lack of a general
framework to evaluate and compare different techniques assessing
IDS is a real challenge. Research shows that most of the IDS systems
perform poorly in defending themselves from attacks and significant
efforts should be done to improve intrusion detection technology in
this aspect.
2.11.1 Intrusion Prevention System (IPS)
The inadequacies inherent in current defense have driven the
development of a new breed of security products known as IPS [117]. IPS software
has all the capabilities of IDS and can also attempt to stop possible incidents. This
section provides an overview of IPS technologies and describes the key functions,
methodologies that they use. An overview of the major classes of IPS technologies
55



is also provided in [6, 58]. The purpose of an IPS is to not only detect that an attack
is occurring, but also to stop it. To do so, it can be considered to be an advanced
combination of a firewall and IDS. Recent trends in industry show that more and
more companies are choosing IPS-based solutions over IDS-based solutions,
primarily due to the need to actively block worm and hacker attacks, instead of
passively monitoring them as an IDS system would do. IPS research took its root
from IDS research and some researchers define IPSs as combination of IDSs with
added functionalities.
So IPS can be defined as an in line product that focuses on identifying
and blocking malicious network activity in real time. In general, there are two
categories namely Rate based and Content based IPS. The devices often look like
firewalls and often have some basic firewall functionality. But firewalls block all
traffic except that for which they have a reason to pass, whereas IPS pass all traffic
except that for which they have a reason to block.
2.11.1.1 Rate based IPS
Rate-based IPS blocks traffic based on network load that includes flow of
too many packets in a specified time, number of connections per unit time or the
number of errors that are generated. In the presence of these, a rate based IPS kicks
in and blocks, throttles or otherwise mediates the traffic. Most useful rate based IPS
include a combination of powerful configuration options with range of response
technologies. The process includes limiting queries to the DNS server and/or offers
other simple rules covering bandwidth and connection limit. A rate-based Intrusion
Prevention System can set a threshold of maximum amount of traffic to be directed
at a given port or service. If the threshold is exceeded, the IPS will block all further
traffic of the source IP only, still allowing other users to use that service.
The major problem in deploying rate based IPS products is deciding what
constitutes an overload. For any rate based IPS to work properly, the network owner
needs to know not only what normal traffic levels are but also other network
details such as how many connections their servers can handle. However, most
commercial products do not yet provide any help in establishing this base line
56



behavior, but require the services of a trained product specific systems engineer
who often spend hours on site setting-up the IPS. Because rate based IPS requires
frequent tuning and adjustment, they will be most useful in very high volume Web,
application and mail server environments.
2.11.1.2 Content based IPS
This is also referred to as signature and anomaly based. Content based
IPS blocks traffic based on attack signatures and protocol anomalies and they are the
natural evolution of the IDS and firewalls. If the packets do not comply with TCP/IP
and if any suspicious behavior is detected, IPS will trigger and block future traffic
from that host. The recent content based IPS offers a range of techniques for
identifying malicious content and many options for how to handle the attacks, such
as simply dropping bad packets to dropping future packets from the same attacker,
and advanced reporting and alerting strategies. As content based IPS offer intrusion
detection like technology for identifying threats and blocking them, they can be used
deep inside the network to complement firewalls and provide security policy
enforcement as they often require less manual maintenance and fine tune to perform
a useful function as compared to rate based method. The major challenge in
designing IPS is the fact that it is designed to work in line, presenting a potential
choke point and single point of failure. If passive IDS fail, the worst that can happen
is that some attempted attacks may go undetected. If an in-line device fails, it can
seriously impact the performance of the network. The latency rises to an
unacceptable value and if the device fails, a self inflicted DoS condition may also
occur. Even though IPS device does not fail altogether it still has the potential to act
as a bottleneck, increasing latency and reducing throughput as it struggles to keep up
with Gigabit or more of network traffic.
2.11.2 Intrusion Response System (IRS)
The task of most traditional IDSs is to detect intrusion, but once the alert
is generated human intervention is required and implementing an automated action
of response is certainly a challenge. For a traditional IRS, such a response involves
notifying the central decision core, wait its arbitration, and apply decision.
57



The current IRSs meet only a subset of the above challenges, and none
will address these problems. The general principles followed in the development of
the IRS naturally classify them into two categories.
2.11.2.1 Static Decision Making
This class of IRS provides a static mapping of the alert from the detector
to the response that is to be deployed. The IRS includes basically a look-up table
where the administrator has anticipated all alerts possible in the system and an
expert indicated responses to be taken for each. In some cases, the response site is
the same as the site from which the alarm was flagged, as with the responses often
bundled with anti-virus products (disallow access to the file that was detected to be
infected) or network-based IDS (terminate a network connection which matches a
signature for anomalous behavior).
2.11.2.2 Dynamic Decision Making
This class of IRS reasons about an ongoing attack based on the observed
alerts and determines an appropriate response to take. The first step in the reasoning
process is to determine which services in the system are likely affected, taking into
account the characteristics of the detector, the network topology, etc. The actual
choice of the response is then taken dependent on a host of factors, such as, the
amount of evidence about the attack, the severity of the response, etc. The
challenges in designing an IRS are the attacks through automated scripts are fast
moving and the owner of the distributed system does not have knowledge of or
access to the internals of the different services.
2.11.3 Artificial Immune Systems
Artificial Immune Systems (AIS) have been extensively researched in the
last decade, mainly for AD. Much research has been conducted on AIS as the model
lends itself conveniently to AD. Several researchers came to the conclusion that the
model has problems with scalability, limiting its application to real problems.
Consequently, some researchers considered alternative models, while others have in
recent years proposed enhancements to address scalability [118-120].
58




CHAPTER 3
LOG AND AUDIT DATA COLLECTION FRAMEWORK

3.1 INTRODUCTION
Most of todays IDS products are focused on Signature Detection (SD).
They have failed to keep up with the rapid advancement in bandwidth growth,
volume and the increased sophistication of attacks. IDS products often operate in a
monitoring only mode, which can detect attacks but cannot effectively and reliably
block malicious traffic before the damage is done. Network security managers
deploying IDS products today face a number of challenges:
1. Incomplete attack coverage: The IDS products typically focus on
Signature, Anomaly, or DoS detection. Network security managers
have to purchase and integrate solutions from separate vendors or
leave networks vulnerable to attack.
2. Inaccurate detection: IDS products detection capabilities can be
characterized in terms of accuracy and specificity. Accuracy is often
measured in true detection rate. The true detection rate specifies how
successful a system is in detecting attacks when they happen. IDS
products today are lacking in both accuracy and specificity and
generate too many false positives, alerting security engineers of
attacks, when nothing malicious is taking place. In some cases, IDS
products have delivered tens of thousands of false positive alerts a
day. There is much research work to be performed for network
vigilance which continually issue false alarms.
59



3. Detection, not prevention: Systems concentrate on attack detection.
Preventing attacks is a reactive activity, often too late to react to
intrusions.
4. Performance challenged: Software applications running on general
purpose PC/server hardware do not have the processing power
required to perform thorough analysis. These underpowered products
result in inaccurate detection and packet dropping, even on low
bandwidth networks.
5. Lack of high availability deployment: Single port products are not
able to monitor asymmetric traffic flows. Also, with networks
becoming a primary mechanism to interact with customers and
partners, forward-thinking organizations has developed back-up
systems. The inability of current IDS products to cope with server
failovers renders them virtually useless for any mission-critical
network deployment.
6. Poor scalability: Primarily designed for low-end deployments,
todays IDS products do not scale for medium and large enterprise or
government networks. The monitored bandwidth, the number of
network segments monitored, the number of sensors needed, alarm
rates, and the geographical spread of the network exceed system
limits.
7. No multiple policy enforcement: Current products generally support
the selection of only one security policy for the entire system, even
though the product may monitor traffic belonging to multiple
administrative domains in an enterprise: this could be the Finance,
Marketing, or Human Relation functions. This one size fitting all
approaches is no longer acceptable for organizations that require
different security policies for each function, business unit, or
geography.
60



8. Require significant IT resources: IDS products today require
substantial hands-on management. For example, the simple task of
frequent signature updates can take up a lot of time and skilled
engineering resources, delivering a very high total cost of ownership.
In response to these limitations, a new architecture that detects and
prevents known, unknown, and DoS attacks was developed for even
the most demanding enterprise and government networks.
As discussed earlier in Section 1.1, there is an increasing need to secure
systems. The existing detection methods are inadequate to manage todays new
attack methods that apply sophisticated threats. Anomaly based systems perform
well as they are able to identify broader range of network threats. Every step in the
process of applying AD to network defense requires a professional network
administrator intervention. However this process significantly delays the
implementation of appropriate fast response that affects the overall performance.
In my research work learning techniques with automated approach for
intrusion detection in real time is developed. The system will be able to quickly,
accurately, and dynamically respond to the events.
Present IDS analyze access logs or the data access logs for detecting
malicious activities. System can be developed that analyzes the logs and detects
specific attacks and produce alarm. As a result data needs to be captured in an
efficient manner. This process can improve attack detection and also extract
interesting or unknown attack signatures that can be used for a variety of intrusion
detection applications. Effective IDS should provide network intrusion detection
capability and ensure a higher level of security. A network based system focuses on
monitoring network packets and detecting attacks in an effective manner at network
level is a real challenge. Detecting network level attacks often require monitoring
every single packet in real time environment. This may not always be feasible as the
number of data requests per unit time is usually large. Intruders normally come up
with previously unknown attacks that make the detection even more difficult. The
61



requirement of the present IDS is to detect attacks reliably which is a major
challenge and alternate efficient methods need to be considered.
3.2 REQUIREMENTS OF IDS
Due to the large amount of incoming network data set, analysis can be
efficiently processed only if automated evaluation is performed. IDS are usually
applied to monitor critical servers and are often deployed in connection with other
security mechanisms like firewalls that can support a superior security management.
Therefore distributed solutions are becoming important while designing modern
IDS. Some of the important requirements of IDS are related to detection efficiency,
system adaptation and maintenance. IDS must also be capable of distributing the
IDS components, develop efficient signatures and integrate efficient analytical
methods.
3.3 PROBLEMS IN DETECTING NETWORK ATTACKS
Most of the currently available products still dont work very well but
have good success rates and normally generate a high false alarm rate. The reasons
for the poor performance are:
1. The Internet environment is very noisy, both at the content
level and at the packet level: A large amount of data arrives at a site,
and there is not enough information to interpret and generate a
significant false alarm rate. Reports suggest that many bad packets
result from bug in the software, out-of-date or corrupt DNS data etc.
2. Network attacks are very specific to particular software versions:
A general MD tool must have a large library of attack signatures that
change constantly.
3. In many cases, commercial organizations appear to buy IDS
simply: This may be to satisfy insurers or consultants without any
pre requisite.
62



4. Difficult to analyze encrypted traffic: Network based IDS have a
major drawback as it is not possible to supervise the contents of
encrypted communication channels. These channels can only be
supervised by host based IDS, making the components available on
all hosts.
5. The issues of packet capture and preprocessing: Though it is fast
and possible to filter at the packet layer, the entire process of
capturing and preprocessing gets affected due to packet
fragmentation. Reconstruction of packets in each session will take
more computation time. It is possible to examine application data.
This is still more expensive as it needs to be constantly updated to
cope with the advent of new applications.
The main purpose of my thesis is to develop IDS that specifically
achieves the following:
1. To automate the data capturing and processing of network behavior
in real time.
2. To automate the process of classifying anomaly alerts into an
evolving attack taxonomy.
3. To automate the feature selection process to tune the network
parameters that eliminates and enhances the performance of the
detection algorithm.
4. Propose a new model for optimal performance of an AD system.
3.4 GENERAL FRAMEWORK OF PROPOSED IDS
The higher level architecture of the proposed framework is as shown in
Figure 3.1.

63








Figure 3.1 General framework of proposed IDS
The architectural design process is concerned with establishing a basic
structural framework for a system. The process involves identifying the major
components of the system and communications between these components.
Normally a TCP connection determines a pair of connections: one from the sender
of the connection to the receiver, and one from the receiver to the sender. However,
a connection need not necessarily be due to the TCP protocol. For example, stream
of User Datagram Protocol (UDP) packets between a source host X and a destination
host Y will also result in a connection. Moreover, a connection does not have size
restrictions: each communication between source and destination hosts will establish
a connection, even if a single packet is exchanged.
The System consists of the following important functionalities.

1. After the successful connection, normal and attack data set is collected.
2. The collected real time packets are preprocessed.
3. Desired features are extracted.

Figure 3.2 presents the framework for packet capture process and the
creation of data set from the network traffic.


Network Traffic
Data Collection
Phase - I

Data
Preprocessing
Phase - II

Automated
Learning
Phase - III

Dynamic
Detection
Phase - IV

Automatic
Response
Phase - V
64








Figure 3.2 Framework for packet capture process
The incoming packets are captured from the network and the
preprocessing module does the job and extracts the required features from the
incoming packets. The two phases of IDS are training phase and testing phase. The
window size that has been used in the online data set collection is set to a week for
the training and testing process. As the window size is a week long, initial learning
period must be at least a week. Deviation threshold was set to a pre calculated
standard deviation as it is known that most of the time data lies inside the boundaries
of standard deviation. The outlier threshold and the CP threshold was set using trial
and error method. The steady period is used as a threshold to detect the change of
behavior in the network. The steady period is the period that the time series data
must stay constant/normal. The tuning was done by keeping the average of
thresholds for a particular day within the data set. Most of the time the number of
data accesses was very large and monitoring every data request in real time was a
real challenge. To detect intruders, an IDS can either monitor the application
requests or (and) monitor the data requests.
3.4.1 Data Collection Framework
Network traffic data collection is one of the biggest challenges in a
network security system. The decision on the amount of data, and the type and place
of the data capturing process dramatically influences not only the performance of the
system, but also its trustworthiness and detection scope. Network traffic can be
collected in different manners such as real time computer network, through

Target
Network

Packet
Capture

Pre Process
Captured Data

Extract Desired
Attributes

Training
Data set

Testing
Data set
65



simulation of network or creating log files. Depending on the data source, IDS can
be categorized as host based and network based. In a host based IDS data such as
system calls, log files and resource usage is collected from the local machine. In a
network based IDS intrusions are detected by looking at the network traffic.
Intruders try to attack system by launching attacks in multiple steps.
Hence an advanced level of analysis is required in order to provide information
about the security status of the network. When a network change is detected, a
threshold may be used to tune and modify the behavior of IDS. A program/daemon
runs periodically and looks at the previous network statistics to adjust the thresholds.
Data was collected in the research laboratory within the college campus
network for two different scenarios as Normal and Attack. A prototype system was
created to classify network traffic in real time. Network data is continuously
collected from a network port, preprocess the collected data and select the attributes
that are suitable for classification. A portion of packets is sent for classification and
a graphical tool will display the activities that are taking place on the network port
dynamically.
The Packet Capture module is a generic component of IDS that collect
activity. The activity includes network traffic, user misbehavior, application
misbehavior etc. In network activity, the activity present at the network is
considered as network traffic. This which can be categorized as:
1. Lower level protocols such as TCP or UDP
2. Application and service level protocols such as SMTP, HTTP, FTP
3. Content level such as e-mail or web pages.
The two major tasks of this module are
1. Connection management - Establish and close a connection.
2. Assess and examine the data flow.
66



The main objective is to monitor the communication link, used for
TCP/IP-based network traffic, in real time. The monitoring should at least include
the first three protocol layers, with a connection oriented protocol being deployed on
the transport layer.
3.4.1.1 TCP/IP Protocol
TCP/IP protocol is not a single protocol, but rather a suite of protocols
that is comprised of:
1. Internet Protocol (IP): This is a network layer protocol and is used
to deliver datagram across connected networks to their intended
destination. As it is connectionless datagram may not arrive in
sequence, get delayed or duplicated, or may not arrive at all.
2. Transmission Control Protocol (TCP): This protocol provides
reliable connection based and ensures error free, sequential and not
duplicated datagram delivery.
3. User Datagram Protocol (UDP): This protocol provides unreliable
datagram delivery. The arrival of datagram may not be in sequence,
be delayed or duplicated, or not arrive at all. To pass the datagram
that it produces, UDP depends upon the IP protocol.
4. Internet Control Message Protocol (ICMP): ICMP is used to send
messages between computing systems for diagnostic or management
purposes.
Given a range of IP addresses the packet capture module gathers network
and host information within that range. The tool can run on any workstation within
the designated IP range because of the use of standard tools and protocols. J PCAP
and WINPCAP provide libraries to find active or shut-down computer that is a part
of the network. Once the devices that are part of the network are found it is possible
to identify which operating system is running, whether it is a router or workstation
and also the manufacturer details. The tool will also provide the details of the way in
67



which the hosts are connected together at the physical layer and at the network layer.
Finally, the tool collects the identity of all the networked computers, which includes
the hosts Domain Name System (DNS) name, Network Basic Input/Output System
(NetBIOS) name, and Media Access Control (MAC) address.
3.5 ARCHITECTURE OF COMPUTING INFRASTRUCTURE
The block diagram of the computing infrastructure in the college is as
shown in Figure 3.3. The network includes a firewall (Forti Gate), servers deploying
Dynamic Host Control Protocol (DHCP), Active Directory Service (ADS) and
RADIUS (for wireless authentication) hosted on hypervisor VMware vSphere. The
network is supported using fiber backbone with the speed of 1 Gbps. The
infrastructure includes 2000 personal computers enabled with Internet facility for
both students and faculties. Each department in the college is having an isolated
Virtual LAN (VLAN). The VLAN architecture is used for easy network
management and to avoid virus propagation. The wireless architecture (802.11n) has
90 access points supported by a wireless controller. Kaspersky Antivirus software is
installed in all the computing terminals and the network consists of several managed
switches with a central chassis used for inter departmental connections as depicted
in Figure 3.3.

Figure 3.3 Campus network diagram for collecting normal and attack data
Staff Student Guest
68



3.5.1 Normal Data Collection
To collect normal data the application accessed network packets using
the window size as specified by the user. For data collection, the system was online
and the data set was collected with the features as specified in Table 3.1. This data
was from both faculties and students accessing the Internet. IP address was used to
identify a user accessing the Internet.
For the normal data set, collection was made with users accessing the
Internet as in Figure 3.1. This resulted in 512 unique sessions composed of 2,125
web requests and 52,635 data requests for one day with a window size of one hour.
The data such collected contain information such as request made by a client,
response of the server, the amount of data transferred etc. The data set collected is
then used as an input to the IDS, which is the final module in the framework that is
discussed in the next chapter. The data set of one hour window size is then
combined to generate a one day data set log file. Unique IP addresses and the
duration of usage is collected which resulted in 2117 user sessions along with the
features from the web requests and the associated data requests.
The number of Internet connections gradually increased with time and it
was found that it reached maximum during 11:00 AM and 1:00 PM. Each IP address
depicts a user browsing the website and looking at different sites. The summary
statistics may include features of a single TCP session between two IP addresses or
may include network level features such as the load on server, number of incoming
connections per unit time and others.
3.5.2 Collection of Attack Data
To collect attacked data several attacks were generated as described in
Table 3.1 manually. The network accesses were logged for a specific time window
and the logs are combined using the framework for further data pre processing
phase. For the first data set 5 different attack sessions were generated and the log
had 15 unique attack sessions. The attacks chosen in the current research work
represent a variety of threats on the Internet that includes scanning, remote
penetration and DoS. These attacks employ the most common protocols on the
69



Internet: TCP, UDP, and ICMP. The network attack tools were installed on different
machines on Windows operating system. To capture the necessary network data sets,
the attack tools were executed repeatedly against the designated target computers.
Every time when the attack tools were run, subsequent network traffic was captured
and the IDS was inspected to ensure that the attack was actually detected. All the
attacks were detected by IDS as they were captured using varying data set. This
established that the attacks were executed and that the captured data was derived
from actual attack traffic.
Table 3.1 Types of attacks generated
Attack Type Description
ARPpoison An attacker who has compromised a host on the local network disrupts
traffic by listening for ARP-who-has packets and sending forged
replies. Address Resolution Protocol (ARP) is used to resolve IP
addresses to Ethernet addresses. Thus, the attacker disrupts traffic by
misdirecting traffic at the data link layer.
DoS attack A DoS attack or Distributed Denial of Service DDoS attack is an
attempt to make a computer resource unavailable to its intended users.
This is accomplished by flooding a system with data that takes time to
process the requests. Web servers, file servers and mail servers are
frequent targets of such attacks. The effects of a DoS attack can be very
much unsafe to those providing the service.
Fragment
overlap attack
A TCP/IP Fragmentation Attack is possible because IP allows packets
to be broken down into fragments for more efficient transport across
various media. The TCP packets (and its header) are carried in the IP
packet. In this attack the second fragment contains incorrect offset.
When packet is reconstructed, the port number will be overwritten.
IPsweep An IPsweep attack is a surveillance sweep to determine which hosts are
listening on a network. This information is useful to an attacker in
staging attacks and searching for vulnerable machines.
Land This is a DoS attack where a remote host is sent a UDP packet with the
same source and destination.
Neptune Floods the target machine with SYN requests on one or more ports,
thus causing DoS.
POD This attack, also known as Ping Of Death, crashes some older operating
system by sending an oversize fragmented IP packet that reassembles to
more than 65,535 bytes, the maximum.
Teardrop By sending IP fragment packets this attack exploits by making it
difficult for reassembling the packets. A fragment packet uses an offset
which is used to reassemble the entire packet by the receiving system.
In this attack, the attacker modifies the subsequent fragments by
inserting confusing offset value. During this scenario the receiving
system will not know how to handle such situation, which may cause
the system to crash.
70



For the present problem data is collected from the campus network to
measure the accuracy and attacks of different types. A host PC running on Windows
is used as a primary test bed and the system is connected to a network using an
Ethernet controller. The subnet that the host is connected has other hosts that are
running several programs such that data constantly flows across the subnet. Traffic
in the network results in continuous change as the users login and make use of
Internet. For capturing the packets in real time JPCAP and WINPCAP tool is used to
collect the information that is being transmitted. J PCAP provides facilities to capture
and save raw packets live. A raw packet is referred to packets that are typically
collected as unprocessed data from the network rather than interpreting the
protocols. The major benefit of collecting raw data is speed. The JPCAP module can
automatically identify packet types and can generate corresponding J ava objects for
Ethernet, IPv4, IPv6, ARP/RARP, TCP, UDP, and ICMPv4 packets. Packets can
also be filtered according to the user requirement. J PCAP is developed on LIBPCAP
/ WINPCAP, which is implemented in C and J ava and is the industry standard tool
for link-layer network access. In Windows environment WINPCAP allows
applications to capture and transmit network packets bypassing the protocol stack.
The network data is collected from the interface which is capable of capturing
information flowing within the local network. For example, anomalies can be
detected on a single machine, a group of network a switch or a router. The TCP/IP
packet is collected in real time from the research lab network and dumped for further
process. Depending on the time interval packets are collected and stored in a file for
further preprocessing and classification as described in section 3.5.1. All other
system parameters such as the number of packets to collect and parameters
associated with the classifier can be easily configured in the source code. A time
interval of 30 minutes is used, as this was suitable to collect the data with different
traffic scenarios. This value constitutes the window size that is used to analyze
packets. If the window size value is too small, then there is a potential risk of losing
important associations between the packets that would otherwise show specific
interesting patterns. On the other hand if the widow size is too large, then the real
time effect could be lessened. This may be due to the fact that the graphical updates
would be less frequent, especially for hosts with light traffic. The information
71



written to the disk is minimized by selecting specific packets using flags and
filtering commands. The portion of the information received serves as a feature list
for the packet analysis. The IP addresses of both the destination and source are
selected and the protocol type that includes TCP/IP or UDP. All the different
variations such as ICMP, ARP and RARP are selected as all protocol type names are
decoded as text.
The information that is collected from each session is grouped together,
that is used separately for data preprocessing phase. All these preprocessed sessions
can be stored in one or more files. The main objective of an attacker is either to
launch a DoS attack or access the underlying data. To detect such malicious activity
it becomes critical to consider the user behavior and also the behavior of the
application together, i.e., by analyzing the applications interaction with the
underlying data.
3.6 PACKET CAPTURE MODULE
The main purpose of packet capture module is to check for network
connectivity and continuously capture the real time packets across LAN through
Ethernet interface and queue the captured raw packets for next level of processing.
This module will listen in promiscuous mode through NIC (Network Interface Card)
for real time packets flowing in the LAN. In promiscuous mode network intrusion
detection will collect all the packets on a network segment for analysis. WINPCAP
will queue the packets and forwards them to the preprocessor module. The
WINPCAP Native Handler captures the packets using the WINPCAP native
interface and forwards these IP Packets to the J PCAP portable handler. The J PCAP
Native Handler captures the packets queued from the WINPCAP Native Handler
and delivers one by one to the packet preprocessing module. The queue is able to
hold up to 2000 Ethernet packets with each packet having a size of 1500 bytes. The
interface displays a menu to the user for capturing the network packets and to
specify the window size. The module also accepts the NIC information to listen for
the packets.
72



The flow chart for the packet capture component is as shown in
Figure 3.4. The sequence of execution is:

1. Check for network connectivity
2. If the connectivity fails, then display an error message else capture real time
packets across LAN through NIC
3. Capture the packets and queue them for next level of packet preprocessing
4. If there are no packets to capture in the network, display error message


Figure 3.4 Flow chart for packet capture

Start
Check for
Network
Connectivity
Listen for
Network Packets
at NIC
Capture Packets
No Connection
Failure
No Packets to
Capture
Packet
Preprocessing
Stop
No
Yes
Yes
73




After the packets are captured the packet preprocessor and attribute
extractor module process the incoming raw packets and group them as TCP and
UDP connection information. The main purpose of this module is to process and
extract the TCP and UDP connection information from the raw data packet. The
TCPDump module parses the packets from the J PCAP and classifies them into TCP
or UDP packets based on the header information. The TCP Preprocessor module
constructs the connection attributes for the TCP packets. Many network features
may be used to detect intrusions such as protocol, type of service, number of bytes
etc. To detect a single attack class, only a small set of these attributes are required as
using more attributes makes the system inefficient. For example to detect Probe
attacks, attributes such as the protocol and type of service is sufficient and more
significant compared to other features.
The attribute extractor module selects important attributes that are
required to detect the attacks. The Probe attacks are intended to acquire information
about the target network from an external source. The network connection features
such as the duration of connection and source bytes are significant; while features
like number of file creations and number of files accessed will not provide
significant information for detecting Probe attacks. Similarly for network traffic
features such as the percentage of connections having same destination host and
same service and packet level features such as the source bytes and percentage of
packets with errors are significant. To detect DoS attacks, it may not be important to
know whether a user is logged in or not. To detect R2L attacks both host and
network level features are selected. The attributes that are selected from attribute
extractor module are listed in Table 3.2


74



Table 3.2 List of Network Dataset Attributes Selected
1. AppName
2. totalDestinationPackets
3. totalSourcePackets
4. sourcePayloadAsBase64
5. sourcePayloadAsUTF
6. destinationPayloadAsBase6
7. destinationPayloadAsUTF
8. sourceTCPFlagsDescription
9. totalDestinationBytes
10. destinationTCPFlagsDescription
11. direction
12. source
13. protocolName
14. sourcePort
15. destination
16. destinationPort
17. startDateTime
18. totalSourceBytes
19. stopDateTime

Similarly the UDP Preprocessor module constructs the connection
attributes for the UDP packets.
The flow chart for the packet preprocessing and attribute extractor
module is as shown in Figure 3.5.
The sequence of execution is as follows:

1. Process the incoming packets for certain time period T as specified by user
2. Call TCP preprocessor sub module that selects TCP packets and extracts the
desired features to construct the TCP connection attribute information
3. Call UDP preprocessor sub module that selects UDP packets and extracts the
desired features to construct the UDP connection attribute information
4. Find unique IP count for time period T and store all the desired attributes with
the corresponding values in a vector for further processing

75




Figure 3.5 Flow chart for the packet preprocessing and feature extractor
Online datasets were prepared by aggregating different event types with a
sliding window of five minutes as the window slides every minute. Data of one
week was collected with normal traffic as well as inserted attacks. Of these the first
four days are considered to be the training dataset and the last three days as the
testing dataset along with attack data. The first four days of data set was used to
train the system and after the training phase was complete IDS attempted to find
attacks in the three days of testing data and returned a list of attacks that were found.
These lists of discovered attacks were then scored to determine the success rate and
false alarm rate across various categories of attacks.
76










Figure 3.6 Traffic Pattern in the course of a working day (Monday)


Figure 3.7 TCP Packet count in the course of a day (Monday)
77




Figure 3.8 TCP statistics in a course of a working day (Monday)


Figure 3.9 UDP Packet count in the course of a day (Monday)

78



Figures 3.8 to 3.10 depicts the TCP and UDP statistics collected during
the course of a day. It was found that the network traffic was large during 11:00 AM
to 1:00 PM.

Figure 3.10 UDP statistics in a course of a working day (Monday)


Figure 3.11 Number of connections in a course of a working day (Monday)
79




Figure 3.12 Connection statistics in the course of a week


Figure 3.13 Traffic Statistics in the course of a week
80




Figure 3.14 Traffic statistics in the course of a month


Figure 3.15 Average packet count in the course of a day


81



Figures 3.11 to 3.15 shows different network packet counts collected for
a day, week and month. It also provides an analysis of the network traffic
distribution. After the packets are collected data needs to be prepared, which may
involve cleaning the data, transforming the data, selecting subsets of records etc. In
case of data sets with large numbers of attributes it is required to perform some
preliminary feature selection operation. This is to bring the number of attributes to a
manageable range. Depending on the statistical methods that are being considered
the attributes are selected and Table 3.3 lists the total number of data set that was
considered.
Table 3.3 Data set statistics for a month
Number of
connections
Number of
packets
Number of Unique IP
Addresses
Week 1
Normal
1125
2532
179
Attack 105
Week 2
Normal
1165
2600
192
Attack 132
Week 3
Normal
1131
2421
184
Attack 145
Week 4
Normal
1142
2201
186
Attack 126

One month data set of network connections is considered for analysis. A
connection is a request from the user from the beginning of establishing a
connection until the termination. The data set has a set of normal traces and a set of
attack traces. The detection performance improves as more and more training data is
collected. The capability of IDS varies even when a small change in attack tools.
Sometimes attacks can be detected with the complete network information collected
while others require a significant attributes of the data set. The results show that it is
possible to reduce the attributes and can still detect some attacks. However,
depending on the captured data, significant number of attacks may still be missed
82



and a generic alarm may be issued instead of the actual intrusion alarm. A standard
attribute set cannot be used to inspect as attacks vary for different protocols. In this
research work the total number of unique IP addresses collected from the dataset is
considered while evaluating the IDS.
3.7 SUMMARY
For online analysis, the major challenge is to deal with the huge number
of alerts in a fast and efficient manner. New attacks are commonplace in todays
networks, and identifying them rapidly and accurately is critical and important.
Designing IDS has become an important component for defensive measures and to
protect computer systems from intruders. For IDS to perform precisely, it is
important to collect data with high accuracy. This is achieved in the current research
work by designing and developing:
1. An underlying scalable and adaptive IDS architecture for collection
of network data set.
2. An efficient and integrated platform for testing and performance
study.
3. A system that can collect large data set by specifying window size as
per the requirement.
Detecting previously unseen attacks is equally important as it minimizes
the losses. Detecting attacks at an early stage is also important as it minimizes the
impact. The new proposed framework provides a way to detect the change in
network behavior and tune the thresholds.
A new algorithm is developed based on real time series analysis method
to detect the change in network behavior using NN. Depending on the nature of the
analytic problem, the next stage is to apply DM techniques. During this stage an
elaborate analysis using statistical methods are performed in order to identify the
most relevant variables for online threshold tuning. The complexity and the general
nature of models used are explained in Chapter 4.
83




CHAPTER 4
ANOMALY DETECTION USING NEURAL NETWORKS

4.1 INTRODUCTION
By effectively combining the strategies of DM and ES, IDS can be
designed that appears to be promising. But still with structural and performance
problems combining multiple techniques in designing IDS is a recent trend in
research and needs further improvement. DM technique uses different algorithms to
extract useful information, patterns and trends that are previously unknown.
Currently researchers have used DM techniques for analysis of cyber crimes,
security against terrorism and fraudulent behavior using credit card etc.
The major challenge in using these techniques is to detect, prevent
attacks and extract useful information. Many DM algorithms like DT, Nave
Bayesian Classifier, NN, GA and SVM can be used for classifying intrusion
detection datasets. When NN is used it is clear that the learning speed of the network
is generally slower and a research challenge is to improve performance of learning
algorithm. For approximation in a finite training set, researchers [121,122] have
showed that a Single-Hidden Layer Feed Forward Neural Network (SLFN) with at
most N hidden nodes and with almost any nonlinear activation function can exactly
learn N distinct observations. SLFN with at most N hidden neurons can learn N
distinct samples with zero error, and the weights connecting the input neurons and
the hidden neurons can be chosen arbitrarily. As suggested one may not adjust the
input weights of all the hidden nodes but can randomly be assigned. FFNN [123] is
one of the most popular research methods which consist of one input layer, one or
more hidden layers, and one output layer sending the network output to external
environment.
84



In my research work this method is applied. The result is encouraging and the
learning is fast.
4.2 ENSEMBLE OF APPROACHES
DM is the process of extracting useful and previously unknown patterns
from large data set. DM is a component of the KDD process. DM techniques can be
differentiated by their different model functions and their representations. Using the
classification model of DM it is possible to identify a normal or malicious attack.
Additionally, DM systems provide the means to easily perform data summarization
and visualization, aiding the security analyst in identifying areas of concern. The
models must be represented in some form. Common representations for DM
techniques include rules, DT, linear and non-linear functions (including neural nets),
instance-based examples, and probability models. Although many methods use
various preference criteria such as processing cost, the primary concern is accuracy.
Mining for knowledge security employs a number of search algorithms, such as
statistical analysis, deviation analysis, rule induction, neural abduction, making
associations, correlations, and clustering. With these techniques, one can find
hidden patterns based on previously undetected intrusions. This allows one to
transcend the limitations of many current IDS which rely on a static set of intrusion
signatures (misuse detection systems) and evolve from memorization to
generalization. Such a system represents an anomaly detection system. Anomaly
detection attempts to quantify the usual or acceptable behavior and flags any
irregular behavior. Given the high data rates on current networks, events that happen
with a low probability also occur at a non-trivial rate. Hence there is a need to
employ more advanced techniques, to detect additional attacks and decrease the
false alarm rate. The output of ideal IDS would be to provide identity of an intruder,
the intruders activity, the observed threats or the attack rates. Such a system would
allow a human analyst to more easily assess the situation and respond accordingly
and may advance well beyond current IDS.

85



4.2.1 Offline Processing
The detection of intrusions should be performed in real time but there are
some advantages in performing intrusion detection in an offline environment. In
offline analysis, it is implicitly assumed that all connections have been removed, and
therefore, it is possible to compute all the features and check the detection rules one
by one. The estimation and detection process is highly mathematical and processor
intensive. Hence, the problem is more tractable in an offline environment. DM is
therefore better suited to batch processing of a number of collected records, and that
a daily processing provides a good trade-off between timeliness and processing
efficiency. While offline processing would seem to be solely a compromise between
efficiency and timeliness, it provides some unique functionality. For instance,
periodic batch processing allows the related results (such as activity from the same
source) to be grouped together, and all of the activity can be ranked in the report by
the relative threat levels. Offline processing overcomes many shortcomings that
exist in real time IDS. For example, many IDS drop packets when flooded with data
faster than they can process it. Meanwhile, the attacker can break into the real target
system without fear of detection.
4.2.2 Multi Sensor Correlation
Multi sensor correlation is necessary to detect some types of malicious
activity. The scan activity only looks for a small number of services across a large
number of, potentially highly distributed hosts. Analyzing the data from multiple
sensors should increase the accuracy of the IDS. Correlation of information from
different sources has allowed additional information to be inferred that may be
difficult to obtain directly. Such correlation is also useful in assessing the severity
of other threats. Severity may be due to, an attacker making a concerted effort to
break in to a particular host. Severity may also be due to the fact that source of the
activity is a worm with the potential to infect a large number of hosts in a short
amount of time.
86



It is well known in the DM literature that appropriate combination of a
number of weak classifiers can yield a highly accurate global classifier. Combining
classifiers leads to different learning methods, such as hill-climbing and genetic
evolution. The process of learning the correlation between these ensemble of
techniques [124] is known by names such as multi strategy learning, or meta-
learning. In reality there are many different types of intrusions, and different
detectors are needed to detect them. For the use of multiple sensors, it is required to
use multiple methods. If one method or technique fails to detect an attack, then
another should detect it. Combining evidence from multiple base classifiers is likely
to improve the effectiveness in detecting intrusions and a hierarchical arrangement is
a natural way to accomplish this. Ensemble approach allows one to quickly add new
classifiers to detect previously unknown activity. This can be done without any loss,
and possibly a gain, in classification performance. There are many ways that one can
go about the meta-classification task. Popular methods include Bayesian statistics,
belief networks, or covariance matrices. The use of Fuzzy Logic (FL) [125] may be
useful in enhancing the accuracy of these approaches. IDES, and its successor,
NIDES, use a covariance matrix to correlate multiple statistical measures in order to
calculate a quantifiable measure of how much a given event differed from the users
profile. It is observed that, the best way to make intrusion detection models adaptive
is by combining existing models with new models trained on new intrusion data or
new normal data.
4.3 SURVEY OF AVAILABLE IDS PACKAGES
The initial IDS developed were Center for Education and Research in
Information Assurance and Security (CERIAS) which performs data fusion and
cross sensor correlation for the Information Security Officers Assistant and the
Distributed Intrusion Detection System (DIDS). Both used rule based expert systems
to perform the centralized analysis. The primary difference between the two was that
CERIAS was more focused on AD and DIDS on misuse detection. The Event
Monitoring Enabling Responses to Anomalous Live Disturbances (EMERALD)
architects employed numerous approaches such as statistical analysis, an ES and
modular analysis engines. It is believed that no one paradigm can cover all types of
87



threats hence there is a need to endorse a pluralistic approach. The data fusion and
correlation capabilities of IDS spans over a wide range of capability. A few products
are specifically designed to do centralized alarm collection and correlation. For
example Real Secure Site Protector, claims to carry out advanced data correlation
and analysis, by interoperating with the other products in ISSs Real Secure line.
Some products, such as Symantec Man Hunt and nSecure nPatrol, integrate the
means to collect alarms and the ability to apply multiple statistical measures to the
data that they collect directly into the IDS itself. Most IDS, such as the Cisco IDS,
or Network Flight Recorder (NFR) provide the means to do centralized sensor
configuration and alarm collection. The problem with all these systems is that they
are designed more for prioritizing what conventional intrusion (misuse) detection
systems already detect, and not for finding new threats. Other products, such as
Computer Associates eTrust Intrusion detection Log View, and Net Secure Log are
more focused on capturing log information to a database, and doing basic analysis
on it.
Most intrusion detection techniques beyond basic pattern matching
require sets of data to train on. Datasets allow different systems to be quantitatively
compared. They provide an alternative to the prior method of dataset creation, which
involved every researcher collecting data from a live network and using human
analysts to thoroughly analyze and label the data. None of the literature explicitly
discusses the use of separate training sets for meta-classifiers and the classifiers they
incorporate.
4.4 OPEN PROBLEMS IN THE DESIGN OF IDS
4.4.1 Feature Selection
Feature selection from the available data is required for representing the
effectiveness of the methods employed. Having a set of features whose values in
normal audit records differ significantly from the values in intrusion records is
essential for having good detection performance.

88



4.4.2 Visualization
A DM system for intrusion detection can offer numerous ways to
visualize the data to aid human analysts to identify attacks. Additionally, the analyst
should quickly learn the visual patterns of certain types of anomalous activity, aiding
in its future detection.
4.4.3 Predictive Analysis
DM based IDS not only detect intrusions but also provides some degree
of predictive analysis. A typical attack session can be split into three phases: a
learning phase, a standard attack phase, and an innovative attack phase. With this
information one should be able to predict standard and innovative attacks to some
degree based on prior activity.
While a lot of research has been done in applying DM techniques to
network connection data for intrusion detection purposes, there remains much, that
need further research. There are also a number of open problems in the area that
require further research. For example,
1. For baseline purposes, deriving the accuracy of a modern, signature
based network intrusion detector on the standard evaluation
datasets is a challenging task.
2. Proposing an ideal feature set for different DM techniques.
3. Improving the performance of DM techniques by grouping related
packets in connectionless protocols like UDP and ICMP, and
treating them as a single connection.
4. Finding the need to use separate training sets for meta classifiers.
5. Determining the accuracy for offline network IDS.
6. Determining the amount of data required to properly train IDS.
89



7. Determination of normal usage profiles of individual hosts and
services.
8. Identifying the different forms of data compression.
9. Determining the predictive capabilities of offline network IDS.
10. Determining the improvement in performance while incorporating
classifiers using different data sources, such as alerts from real-time
IDS, system logs, or system call data.
11. Improvement of accuracy by not looking at the connections
themselves, but instead looking at the cumulative state of a host or
group of hosts, where each connection acts as a state transition
operator.
12. Determination of the ideal time window, that depends on the
current state of a host.
13. Determination of similarities or differences that exist in the traffic
characteristics between different types of networks.
14. Reduction of false alarms through the use of user feedback and
learning algorithms.
4.5 APPLICATION OF ARTIFICIAL NEURAL NETWORKS (ANN)
IN IDS
An ANN is a system that simulates the work of the neurons in the human
brain. As shown in Figure 4.1 a general ANN consists of neurons, a summation
module, an activation function and one or more neuron. The importance of a
particular input can be intensified by the weights that simulate the neurons
activation function. The input signals are then multiplied by the values of weights
and next the results are added in the summation block. The sum is sent to the
activation block where it is processed by the activation function. The computational
units or neurons is indexed as I ={1, . . . , n}, where n =|I| is the network size. Some
90



of these units may serve as external inputs or outputs and the network will have n
input and m output neurons. The remaining ones are called hidden neurons. The
units are densely connected into a graph representing the architecture of the network,
in which each edge (i, j) leading from neuron i to j is labeled with a weight w(i, j).
The absence of a connection in the design corresponds to a zero weight between the
respective neurons. Before beginning the computation, the NN will be placed in an
initial state y(0), which may also include an external input. The network state is
continuously updated by a selected subset of neurons collecting their inputs from the
outputs of their incident neurons. The weighted connections and the transformation
of these input values may also be provided as input to next stages. Finally, a global
output from the network is produced at the end of computation. There are many
different algorithms to train the NN models. Many times different algorithms fail to
train the model for a given set of problem. As the models are complex in nature, one
single algorithm cannot be claimed as best for training that suits different scenarios
in real time. Depending on the complexity of the problem, the number of layers and
number of neurons in the hidden layer needs to be changed. Training the model will
be more complex if the number of layers and the number of neurons in the hidden
layer is increased.




Y =F(_ X
k
n
k=0
)
Figure 4.1 General block diagram of ANN
The term ANN encompasses a range of models, including MLP and Self
Organizing Maps (SOMs), which are the main models applied to IDS. The majority
of the misuse detection applications of ANNs are implemented as feed forward
W1
W2
W4
W3
Activation
Function F
SUM
Dendrites
INPUT
Weights (Synapses)
X
1
X2
X3
X4
AXON
Output
91



MLPs and most of the misuse detection applications are network based. MLPs are
also applied to AD, but SOMs have been more widely used.
NN is used in IDS to train the network with succession of benchmark
training data set. The data points i.e. the attributes selected is provided to the input
layer and the profiles are established. The trained profile is matched against the
testing data set and performance is analyzed based on the output of the matching
process. Different types of neural works exist and three main approaches are used
for training the Feed Forward Neural Network
1. Gradient Descent based (e.g. Back Propagation (BP) method)
[126]: In this the effect of initial weight selection on Feed Forward
Networks learning simple functions with the Back Propagation
technique is demonstrated, through the use of Monte Carlo
techniques. The magnitude of the initial condition vector (in weight
space) is a very significant parameter in convergence time variability.
2. Standard optimization method based (e.g. SVMs, for a specific
type of SLFNs, the so-called Support Vector Network). Rosenblatt
[127] suggested a learning mechanism where only the weights of the
connections from the last hidden layer to the output layer were
adjusted.
3. Least-square based network learning [128], utilizes nonlinear
optimization of the first layer parameters which is beneficial only
when a minimal network is required to solve a given problem.
The major advantage of using NN in designing IDS is that retrieving
information from the trained network is very fast though training takes some time.
The FFNN also known as the MLP is a nonlinear regression or classification model,
in which a set of input variables is related to an output variable. Research on the
approximation capabilities of multilayer FFNN have focused on two aspects:
universal approximation and approximation in a finite set. It is explored that if the
activation function is continuous, bounded and non constant, then continuous
92



mappings can be approximated in measure by NN over compact input sets. It is also
known that standard SLFNs with at most N hidden neurons including biases can
learn N distinct samples with zero error, and the weights connecting the input
neurons and the hidden neurons can be chosen arbitrarily. Different types of NN
approach exist and three main approaches are used for training the FFNN. As a
learning technique, FFNN has demonstrated well in resolving Regression and
Classification problems. Compared to Back Propagation algorithm SLFNs provide a
better result if the number of instances is greater than the number of features. In real
time fast response to external events within an extremely short time is highly
demanded and expected. Therefore, an alternative algorithm to implement real time
learning is highly demanded for critical applications with fast changing
environments. Even for offline applications, speed is still a need, and a real-time
learning algorithm that reduces training time and human effort to nearly zero would
always be of considerable value.
4.5.1 Bayesian Networks (BN)
Current research area is to utilize BN for the decision process in hybrid
systems. They offer a more sophisticated way of dealing with decision process as
most hybrid systems obtain high false alarm rates. BN can be used with a simple
threshold based approach for detection techniques. Researchers have focused on
using CPU load as a threshold, since the IDS should not take up too many resources.
If proper care is not taken it might prevent the user from using the computer
efficiently.
BN provide a convenient way for modeling the relationship between
attack detection and alerts. Using BN it is possible to infer the unknown attacks
based on the training. The main tasks associated with BN are
1. Given values of variables corresponding to observed nodes BN
should infer values of variables corresponding to nodes that are
unknown. In my research work this corresponds to predicting
whether an attack step has been achieved based on detector alerts.
93



2. The next task is to learn the conditional probabilities in the model
based on available data which in our context corresponds to
estimating the reliability of the detectors and the probabilistic
relations between different attack steps.
3. Finally the system should learn the structure of the network based on
available data. All three tasks have been extensively studied in the
ML literature and, despite their difficulty in the general case, may be
accomplished relatively easily in the case of a BN.
4.5.2 Nave Bayes (NB) Classification
NB is a simplified version of BN that offers ML capabilities. BN requires
a priori knowledge about the problem to determine probabilities. Though it is
possible to extract probabilities from training data it is computationally intensive.
However, NB does assume that all the features in the data are independent of each
other. Researchers have successfully applied NB to develop network based IDS as a
classifier model.
Similarly DT also obtains a higher degree of accuracy compared to NB.
But NB obtains better detection rates on the three minor classes, namely Probing,
U2R and R2L intrusions as shown by Ben Amor et al. The benefit of the NB is that
it is robust compared to other ML techniques. Many researchers have shown that in
the best cases NB classifier performs significantly better on the minor classes, U2R
and R2L. The main aim in the current research work is on classification of packets
and detecting TCP/UDP flooding.
The module that is developed will also analyze TCP session data and is
performed using Waikato Environment for Knowledge Analysis (WEKA) tool. The
system detected all the attacks and hence motivated potential hybridizations of
techniques.
94



The advantages of the NB classifier are as follows:
1. NB operation is simple as it relies only on basic laws of probability.
2. It accommodates limited information as it does not require
observations of all independent variables.
3. It is robust to outliers.
4. It can account for information received at different points in time.
4.6 HYBRID OR ENSEMBLE CLASSIFIERS
The main goal in the development of IDS is to achieve the accuracy and
this has lead to the design of hybrid approaches for the attack detection problem.
Hybrid classifiers combine several ML techniques so that the IDS performance can
be significantly improved. Normally a hybrid approach consists of a component that
takes raw data as input and generates intermediate results and the other that produces
the output. They may use some clustering based approach to preprocess the input
samples in order to eliminate unreliable training data set examples from each class.
Then, the output of the clustering technique is used as training examples for
classifier design. The hybrid classifier may be based on either supervised or
unsupervised learning techniques. Similarly hybrid classifiers can also be developed
by integrating two different techniques. In the first stage the learning performance
may be optimized and in the second stage it is possible to integrate the model for
prediction.
Recent research work proposed ensemble classifiers that can improve the
classification performance of a single classifier. Ensemble means to combine
multiple weak learning algorithms or weak learners. By training different samples to
the weak learners the overall performance is effectively improved. For combining
weak learners, the majority vote, boosting and bagging methods are
predominantly used.
95



DT is popular in IDS, as they yield good performance and offer some
benefits over other ML techniques. For example, they learn quickly compared to
ANN and the tree structure built from the training data can be used to produce rules
for ES. DT cannot generalize to new attacks in the same manner as certain other ML
approaches and they are not suitable for anomaly detection. New findings
demonstrate that DT are very sensitive to the training data and do not learn well
from imbalanced data. DT has been successfully implemented to IDS both as a
standalone and as a part of hybrid systems. An example to the success of DT is an
application of a C5.0 DT. Lot of work has been carried out to examine the
performance of several ML techniques on the KDD Cup 99 data set, including a
C4.5 DT. The DT provided good accuracy but could not perform well as other
techniques on some classes of intrusion. An ANN and k-means clustering obtained
higher detection rates and able to generalize from learned data to new, unseen, data.
Some researchers have observed that NB is better at detecting some intrusions than a
DT. The drawbacks of DT are that they are not able to deal well with unseen data.
Hence new attacks may be classified as some default class, which causes more false
negatives. DT can be built by changing the way in which the trees are built. This can
be achieved by producing and choosing attributes that are less likely to produce false
positives.
ML based methods identify novel attacks and has the ability to improve
the performance through learning from experience. This learning ability makes ML
techniques to make predictions and demonstrate their strength by detecting unseen
or new attacks.
Some of the properties that are desirable for developing IDS using ANN
are:
1. Ability to learn complex patterns using data sets and generalize from
known patterns to new patterns that makes them successful.
2. Flexibility with respect to noisy/missing data and also are capable of
continuously learning during run time.
96



3. If an attribute in the selected data set is irrelevant, the ANN is
capable of learning to ignore it.
However the drawbacks of using ANNs are:
1. The training requirement - Large amount of training data is necessary
that directly affects the performance of the network.
2. They are black boxes.
3. Determining the topology of the ANN is difficult and time
consuming.
The main focus on the capabilities of Multilayer Feed Forward Neural
Networks (MFFNN) is the capability of universal approximation and approximation
in a finite set. Research work have explored that if the activation function is
continuous, bounded and non constant, then continuous mappings can be
approximated in measure by NN over compact input sets. It is also proved that feed
forward networks with a non polynomial activation function can approximate
continuous functions. In real time applications NN are trained using finite input data
set. It is a known fact that N arbitrary distinct samples (x
i
; t
i
) can be learned
precisely by standard SLFNs with N hidden neurons (including biases) and an
activation function. Particular input samples need not be found, and the weights for
the hidden layer can be chosen arbitrarily and gradually can be changed depending
on the requirement. The nonlinearities for the hidden layer neurons are not
restricted by the activation function. The success of the method used depends on the
activation function and the distribution of the input samples because for some
activation function arbitrarily chosen weights may cause the inputs of hidden
neurons to lie within a linear subinterval of the nonlinear activation function.
4.7 CONSTRUCTION OF CLASSIFIER MODEL
Construction of a classifier model is a research challenge while building
efficient IDS. Many DM algorithms are popular in classifying intrusion detection
97



datasets. Some popular methods are DT, NB, NN, GA, and SVM etc. Improving the
classification accuracy is still a major challenge with existing DM algorithms as it is
difficult to detect new or unknown attacks. It is a known fact that the attackers will
continuously change the attack patterns and hence the FP are usually high. The
performance of IDS depends on its Detection Rates (DR) and FP. DR is the total
number of intrusion instances detected divided by the total number of the intrusion
instances present in the dataset. FP is an alarm, which is raised though there is not
really an attack. Good IDS will maximize the DR and minimize the FP. Therefore
classifier construction while designing IDS is another major challenge in the field of
DM.
4.8 MULTI LAYER PERCEPTRONS (MLP)
MLPs are widely in use as classifiers to perform network intrusion
detection. The selection of topology is very important as it impacts significantly on
the performance and also on the method used to achieve attack detection. There are
different ways in which an MLP can produce output as classification. One of the
ways is to use a single output neuron that gives a binary classification to imply
whether an attack has been identified or not. Researchers also consider the range of
the output given. The classification is accepted only if the value of the output
exceeds a certain threshold. Threshold can be used to adjust the ability of the MLP
to detect new or unknown attacks. By adopting an MLP with three output neurons it
is possible to classify as normal, attack or unknown. The unknown classifier can be
used to extract new attacks that will be used for future training of the ANN. After
training the attack becomes a known attack. Moradi [129] focus on Probing and DoS
attacks and they obtain detection rates of approximately 90% with a four layered
feed forward MLP. Bouzida [130] included a threshold parameter to the
classification process. The threshold is used to determine whether predicted
classifications should be accepted or not. This was implemented by checking if the
value of the firing neuron exceeds a specified threshold. If not, the instance is
classified as a new class, and is considered an anomaly, which requires further
analysis.
98



Although MLP is capable of performing AD, the approaches are obsolete
as it is not sufficient to perform offline intrusion detection. Also other algorithms
have been shown to be more successful in performing AD. By employing an Elman
network [131] which is a recurrent ANN provides significant performance gain and
can be considered for real time intrusion detection.
Attribute selection in intrusion detection using DM algorithm involves
the selection of a subset of attributes d from a total of D original attributes of dataset,
based on a given optimization principle. The input pattern captured from each
session consists of M records. A is a constant value that is used as a single input to
the network. There is one hidden layer, which consists of N nodes. Attribute
selection methods search through the subsets of attributes, and try to find the best
one among the completing 2
M
candidate subsets according to some evaluation
function. For approximation in a finite training set, A SLFN with at most N hidden
nodes and with almost any nonlinear activation function can exactly learn N distinct
observations. As suggested one may not adjust the input weights of all the hidden
nodes but can randomly be assigned. Feed Forward Neural Networks is one of the
most popular research methods which consist of one input layer, one or more hidden
layers, and one output layer sending the network output to external environment. In
this research we apply this method and the result is encouraging and the learning is
fast. As a learning technique, Feed Forward Network has demonstrated well in
resolving Regression and Classification problems. Compared to Back Propagation
algorithm SLFNs as shown in Figure 4.1, provide a better result if the number of
instances is greater than the number of features. Earlier work emphasized that data
can be obtained by three ways; by using real traffic, using sanitized traffic and also,
using simulated traffic, but IDS are tested mainly on a standard dataset. But in real
time fast response to external events within an extremely short time is highly
demanded and expected. Therefore, an alternative algorithm to implement real time
learning is highly demanded for critical applications with fast changing
environments. Even for offline applications, speed is still a need, and a real-time
learning algorithm that reduces training time and human effort to nearly zero would
always be of considerable value.
99




Figure 4.2 A framework of SLFN
4.9 ARCHITECTURE OF THE MODEL USING SLFN
A detailed description of the design strategies and the lower level
subcomponents and the modules developed are elaborated in this section. The
proposed architecture of the model, as shown in Figure 4.3 gives a brief idea about
components and the interaction between them. The decisions taken are based on
certain design considerations, constraints and dependencies which will affect the
subsequent functioning of the product. Based on the need for IDS the system
architecture has AD using NN component. The main aim is to classify data set and
detect intruders from this classification. NN is selected to accomplish the task as
they have preferable properties that make them suitable for the task of classification.

Figure 4.3 Block diagram of the model using SLFN
100



The various modules involved in the development of work are described
in depth and the flow of control is explained with help of flow chart.
In my research work, the real time network data which consists of both
normal and attack traffic are used to train the ANN. For the real time packets
collected, after training, the system will classify the attack. If an attack is found an
alert message or an alarm is generated. WEKA an open source tool is used to carry
out effective DM to extract useful information. The challenge in using these
techniques is to detect and/or prevent attacks and eliminate False Positives and False
Negatives as much as possible. Further for the detected attacks statistical approach is
applied for outlier detection that is explained in next chapter. The functionality of
AD module has two phases, Training Phase and Detection Phase.
4.9.1 Training Phase
The objective of the trainer module is to train the NN for classifying the
log of real time network data with desired attributes as attack or normal. Some of the
advantages of NN are:
1. They have the inherent property of learning through training.
2. The complex internal structures enable them to learn and
accommodate large number of patterns.
3. They can generalize the knowledge acquired through training for
similar patterns.
4. They have efficient storage capability for a large set of patterns.
These characteristics of NN, motivated to choose it for classification as
in the training phase both the input and output data set is available and supervised
learning method can be used. In this phase, the system gathers knowledge about the
normal behavior of the network users from the preprocessed input data, and stores
101



the acquired knowledge. In the detection phase, the system detects attacks based on
the knowledge which is achieved during the training phase, and notify the system
administrator. The input to the module is the real time network data packets and it
accepts the TCP or UDP connection. A MLP is trained for patterns of attack and
normal cases. The trainer module as shown in Figure 4.2 has a training logic using
NN which will classify the TCP/UDP connection records into normal or attack. This
is achieved by matching the connection record to attack or normal cases. During the
training phase real time network packets data set that is already preprocessed is
classified into different attack cases. These classified records are used to train a
MLP neural network that classifies the captured packets into two output classes
attack or normal. During the testing phase, if any new attacks are found the NN will
learn from the captured live data.
The flow chart for the training phase of the AD module is as shown in
Figure 4.4. The sequence of steps is as follows.

1. Parse the network log file.
2. Fetch unique Packet IP and increase its count.
3. Create log of Unique Packet Connection Attribute and also record the threshold
of the IP in a particular time window.
4. Record the TCP/UDP connection that needs to be used by ANN to classify the
packets as attack or normal.
5. Call ANN module for classification of packets.



102





Figure 4.4 Flow chart for the training phase of neural networks






Start
Parse the Log File


Fetch Unique
Packet IP,
Increase Count
Construct
TCP/UDP
Connection information


Stop
Log Unique IP count and
relevant attributes
Update Connection
Information to ANN Module


103



4.9.2 Detection Phase
During this phase, the system that is already trained uses the knowledge
to detect the intruders or abnormal behavior of authorized users. The input data is
retrieved from a file in the same way as in training phase. The network will be
initialized with the weights using threshold as a parameter and this threshold is used
to find the abnormal behavior.
4.9.3 Selection of Layers in MLP
Performance of the NN architecture depends on the selection of values
for the number of layers and the number of nodes in each of these layers.
4.9.3.1 The Input Layer
The number of neurons comprising this layer is determined once the
training data set features are known. In my research work, the number of neurons
comprising this layer is equal to the number of features in the data set that is 19 as
listed in Table 3.2.
4.9.3.2 The Output Layer
Like the Input layer, every NN has exactly one output layer. Determining
the size of number of neurons is determined by the chosen model configuration. In
my research work this layer returns a class label as NORMAL / ATTACK.
4.9.3.3 The Hidden Layers
The major issue on selecting this layer is the performance and the
optimal size of the hidden layer is usually between the size of the input and size of
the output layers. In my research work the number of hidden layers is equal to and
the number of neurons in that layer is selected as the mean of the neurons in the
input and output layers.
104




The sequence of execution steps is as follows.

1. Create a MLP with an input of 19 features, 1 hidden layer with 10 neurons and 2
output neurons.
2. Read each record from log of packets captured.
3. MLP neural network is trained for patterns of attack and normal cases using
SLFN algorithm.
4. NN configuration file is created that is used to classify the live packets into
attack or normal.
5. Read the TCP/UDP connection records queued up from packet preprocessing
module.
6. Call NN to classify the connection records into attack or normal.
7. If the status returned is normal, display message as normal and update IP in the
text file for next level of filtering.
8. If the status returned is attack, raise an alarm in the alarm panel and update the
packet details for next level of filtering.






105





Figure 4.5 Flow chart for the anomaly detection using neural networks

Start
Read each record
from Log
Train the Neural
Network using
SLFN
Update NN config file to
be used for classification
Stop
Read TCP/UDP
Connection Records
from Preprocessing
module
Call NN API
to classify the connection
Records
Raise an alarm
Store Attack
information
Status

Attack
Normal
Record as
Normal

Log of Unique Packet
Attributes and Count
Create a MLP
106



There is a rapid increase in the usage of intelligent DM approaches to
predict intrusion in local area networks. An approach for IDS which embeds an ES
using DM technique behaves intelligently and is considered as a system integrated
with intelligent subsystems.
Using this data set the model was initially updated after collecting a week
long data. Week I model was trained on the first week of data. Week II model was
trained on the first two weeks of data, while week III model is trained on all three
weeks of data and continue the process. We evaluate the model after processing each
week of data. To keep the evaluation from being biased to when the attacks
occurred, we created a data set that includes normal and attack data. The detection
models are periodically updated by the system automatically as more data is
collected.
NN have been used by researchers in anomaly intrusion detection and are
modeled to learn the typical characteristics of system users and identify statistically
significant variations from the previous user behavior. The NN component
incorporates into an existing or modified ES and filter the incoming data for
suspicious events. This improves the effectiveness of the detection system. Similarly
the NN would receive data from the real time network stream and analyze for
attacks by identifying instances. They have the ability to learn these new
characteristics that are unlike which have not been observed before.
DM algorithms extract new knowledge from the data obtained and this
can be used to make predictions about novel data in the future. A Bayesian Network
is a model that encodes probabilistic relationships among variables of interest. This
method is generally used for developing IDS in combination with statistical
schemes. This yields several advantages like including the capability of encoding
interdependencies between variables and of predicting events, as well as the ability
to incorporate both prior knowledge and data. However, a serious disadvantage of
using Bayesian Networks is that their results are similar to those derived from
threshold-based systems, where considerably higher computational effort is
required. k-Nearest Neighbor (k-NN) is Instance Based Learning (IBL) for
107



classifying objects based on closest training examples in the feature space. It is a
type of lazy learning where the function is only approximated locally and all
computation is deferred until classification. The k-NN algorithm is one of the
simplest of all ML algorithms and an object is classified by a majority vote of its
neighbors, with the object being assigned to the class most common amongst its k
nearest neighbors. If k=1, then the object is simply assigned to the class of its
nearest neighbor. The k-NN algorithm uses all labeled training instances as a model
of the target function. During the classification phase, k-NN uses a similarity-based
search strategy to determine a locally optimal hypothesis function. Test instances are
compared to the stored instances and are assigned the same class label as the k most
similar stored instances.
The FFNN also known as the multilayer perceptron is a nonlinear
regression or classification model, in which a set of input variables is related to an
output variable. Research on the approximation capabilities of multilayer feed
forward neural networks have focused on two aspects: universal approximation and
approximation in a finite set. It is explored that if the activation function is
continuous, bounded and non constant, then continuous mappings can be
approximated in measure by NN over compact input sets.
Current research shows that for a wide range of real time applications
change detection methods can be used effectively. Real time learning capability of
Neural Networks is the need whenever a new threat is faced, where a new
knowledge map has to be built. In this research work a simple and efficient SLFN is
used which resulted in a system that can detect attacks faster compared to other
methods. As a learning technique, SLFN has demonstrated good potential in
resolving Regression and Classification problems. Results from the proposed
algorithm are truly encouraging and perform better compared to gradient-based
and/or iterative approaches.


108



4.10 SLFN ALGORITHM

Algorithm 1 : SLFN Algorithm

Given a training set X={(x
i
,t
i
)|x
i
C R
n
,t
i
C R
m
,i=1,2,.,N} activation function g(x),
and hidden node number N
Step 1 : Randomly assign input weight w
i
and bias b
i
, i=1,2,N
Step 2 : Calculate the hidden layer output matrix H
Step 3 : Calculate the output weight
= H

T where H

is the Moore-Penrose generalized inverse of matrix H and


T =[t
1
,t
N
]
T
. Standard SLFNs with N hidden nodes and activation function g(x) are
mathematically modeled as N
N N
L
i
g
i
(x
j
)=L
i
g(w
i
. x
j
+b
i
) =o
j
,j=1..n (4.1)
i=1 i=1

where w
i
= [w
i1
,w
i2
, .w
in
]
T
is the weight vector connecting the i
th
hidden node and
the input nodes.
i
=[
i1

i2
,
im
]
T
is the weight vector connecting the i
th
hidden
node and the output nodes and b
i
is the threshold of the i
th
hidden node w
i
. x
i

denotes the inner product of the w
i
and x
i
. The value of N is calculated as in
Equation (4.1). For finding the Moore-Penrose matrix MATLAB is used. The
original matrix H is passed as a parameter from the J ava application and the
resultant matrix H

is obtained. The experiment was carried out on a real data stream


called Intrusion Dataset, which is collected from the server in real time using
J PCAP and WINPCAP tools.
4.11 EXPERIMENTAL RESULTS
The experiment was carried out on a real data stream called Intrusion
Dataset, which is collected from the server in real time using JPCAP and
109



WINPCAP tools as shown in Figure 4.6 to Figure 4.8. Data that was collected is
stored in a file where data contains the details of the network connections, such as
protocol type, Source IP, Destination IP, Source Port and Destination Port and
number of bytes in the source etc.

Figure 4.6 Initial screen

Figure 4.7 Packet capture screen

110




Figure 4.8 Use of WEKA tool for analysis
Each data sample in the dataset represents attribute value of a class in the
network data flow, and each class is labeled either as normal or as an attack with
exactly one specific attack type. In total, features can be used in the training dataset
and each connection can be categorized into two main classes as shown in Table 4.1.
Table 4.1 Types of attack classes
2 Main Attack Classes Attack Classes
Probing Ipsweep, Nmap, Portsweep
Denial of Service (DOS) Neptune, Smurf, Teardrop
This project has two phases namely learning and classifying training
data. In the first phase the data set is pre processed and the training network updated.
In the second phase the data that is collected online is provided for analysis to detect
intrusion. Table 4.2 and Table 4.3 give the results of the analysis.
Table 4.2 Number of examples considered
Attack Types Training Examples Testing Examples
Normal 83453 74563
Denial of Service 12783 10435
Probing 1348 3221
Total Examples 97854 88219

111



Table 4.3 Comparison results
Time taken to detect attacks in milli seconds
Method Without using SLFN Use of SLFN algorithm
Normal 0.08 0.03
Denial of Service 0.06 0.04
Probe 0.21 0.13

Table 4.4 Result analysis of SLFN v/s Nave Bayesian algorithm
Distinct Records
used for training
Records used for
testing
SLFN
Percentage
Attacks found
250 100 0.05 Teardrop
500 200 0.12 Teardrop, U2R
5000 300 0.61 Teardrop, U2R, R2L
10000 400 0.82 Teardrop, U2R, R2L, Portsweep
25000 500 0.91 Teardrop, U2R, R2L, Portsweep

Distinct Records
used for training
Records used for
testing
Naive
Bayesian
Attacks found
250 100 0.03 Teardrop
500 200 0.11 Teardrop, U2R
5000 300 0.52 Teardrop, U2R, R2L
10000 400 0.63 Teardrop, U2R, R2L, Portsweep
25000 500 0.77 Teardrop, U2R, R2L, Portsweep

There are two important measurements in evaluating the performance of
IDS, the DR and the False Positive Rate (FPR). The DR is the percentage of attacks
present in the data that a system detected. The FPR is the percentage of normal data
which the system claims to be real attacks. Performance is optimized if the DR is
maximized and FPR is minimized. The plot of DR versus the FPR at different
thresholds, Receiver Operating Characteristics (ROC) curve is obtained. An
effective method for comparing different models is by comparing their ROC curves
on the same data set. Figures 4.9 (a-c) show the ROC curves for three weeks data set
that has include attacks in the dataset. The x-axis shows the FP in percentage and y-
axis the detection accuracy in percentage. The result shows that as the detection rate
increases the FP rate reduces. From the ROC curves it is noted that there is an
improvement in model performance if more data is collected from the system.
112




(a)

(b)

(c)
Figure 4.9 ROC curves for 3 weeks data

0
0.2
0.4
0.6
0.8
1
1.2
0 0.5 1 1.5
D
e
e
t
c
t
i
o
n

R
a
t
e
False positives
Diagonal
Week 1
Week 2
Week 3
0
0.2
0.4
0.6
0.8
1
1.2
0 0.5 1 1.5
D
e
e
t
c
t
i
o
n

R
a
t
e
False positives
Diagonal
Week 1
Week 2
Week 3
0
0.2
0.4
0.6
0.8
1
1.2
0 0.5 1 1.5
D
e
e
t
c
t
i
o
n

R
a
t
e
False positives
Diagonal
Week 1
Week 2
Week 3
113



4.12 SUMMARY
With the advent of new technologies building IDS has become more
complex. Designing the IDS for a real time has become even more challenging. Real
time learning capability of NN is the need whenever a new threat is faced, where a
new knowledge map has to be built.
In my research work an efficient SLFN is used, which resulted in a
system that can detect attacks faster compared to other methods. As a learning
technique, SLFN has demonstrated good potential in resolving classification
problem. With the use of proposed Huangs algorithm the results are truly
encouraging and perform better compared to gradient-based and/or iterative
approaches. When the number of pattern classes are large (say, larger than 10), the
training time cost is most likely higher. In general, with the use of Soft Computing
Paradigms like (NN, ES, Agents, BN, Fuzzy Logic, Immune Systems and GA) IDS
may detect intrusions where tedious time consuming trials of other algorithms can
be prevented.
The main aim of this research was to compare different NN architectures.
It was possible to formulate the assumptions for making benchmark data set as well
as selecting a particular NN architecture when applied to IDS. Though selection of
the input data is a very important issue, representation of all types of attacks and
normal activity was a real challenge. Results obtained shows that detection rate was
the best for SLFN algorithm and in particular number of false alarms decreased. The
reason to achieve this was that representation of more different type of normal
activity was added to input vectors during learning phase.
An elaborate analysis using statistical methods will provide a way to
detect the change in network behavior. A new algorithm based on CP and OD is
implemented and is discussed in Chapter 5.
114




CHAPTER 5
INTRUSION DETECTION USING CHANGE POINT
AND OUTLIER DETECTION METHODS

5.1 INTRODUCTION
Most of the IDS proposed are based on AI techniques, including ES, NN,
DM and many others. CP methods have been the topic of research interest both from
the mathematical statistics and algorithmic points of view. The approach undertaken
in my research work belongs to the class of anomaly IDS. In general network
intrusions occur at unknown points in time that may lead to changes in the statistical
properties. It is therefore a research challenge to formulate the problem of detecting
attacks as a CP problem. The goal of CP is to detect changes using statistical models
as fast as possible while maintaining the generation of false alarms in a controlled
manner. In the standard formulation of the CP network intrusion detection problem,
there is a sequence of observations whose distribution may change abruptly at some
unknown point of time. The aim of my research work is to show that recent
advances in the CP detection mechanism and its suitability in the design of IDS. The
method proposed is robust and does not have any computational overhead.
The need for effective intrusion detection mechanisms as part of a
security mechanism for computer systems was proposed by Denning and Neumann.
The following reasons for utilizing intrusion detection within a secure computing
framework were identified:
1. Many existing systems have security flaws which make them
vulnerable that are very difficult to identify and eliminate because of
technical and economic reasons.
115



2. Existing system with security flaws cannot be easily replaced by
more secure systems because of application and economic
considerations.
3. The development of completely secure systems is probably impossible.
4. Even highly secure systems are vulnerable to misuse by legitimate
users.
Typically Network IDS products focus their efforts around one of the
following areas.
5.1.1 Signature Detection (SD)
Hackers often attack networks through tried and tested methods from
previously successful assaults. These attacks are analyzed by network security
vendors and a detailed profile, or attack signature will be created. SD techniques
identify network assaults by looking for the attack fingerprint within network traffic
and matching against an internal database of known threats. Once an attack signature
is identified, the security system delivers an attack response, in most cases a simple
alarm or alert. Success in preventing these attacks depends on an up to the minute
database of attack signatures, compiled from previous strikes. The drawback to
systems that rely mainly, or only, on signature detection is clear: they can only
detect attacks for which there is a released signature. If signature detection
techniques are employed in isolation to protect networks, infrastructure remains
vulnerable to any variants of known signatures, first strike attacks, and DoS attacks.
5.1.2 Anomaly Detection (AD)
AD techniques are required when hackers discover new security
weaknesses and rush to exploit the new vulnerability. When this happens there are
no existing attack signatures. The Code Red virus is a very good example of an
attack which could not be detected through an available signature. In order to
identify these first strikes, IDS products can use AD techniques, where network
traffic is compared against a baseline to identify abnormal and potentially harmful
behavior. These anomaly techniques will look for statistical abnormalities in the data
116



traffic, protocol ambiguities or a typical application activity. Nowadays IDS
products do not generally provide enough specific anomaly information to prevent
sophisticated attacks and if used in isolation, AD techniques can miss attacks that
are only identifiable through signature detection.
5.1.3 Denial of Service (DoS) Detection
The objective of DoS and Distributed DoS attacks is to deny legitimate
users access to critical network services. Hackers achieve this by launching attacks
that consume excessive network bandwidth or host processing cycles or other
network infrastructure resources. DoS attacks have caused some of the worlds
biggest brands to disappoint customers and investors as Web sites became
inaccessible to customers, partners, and users sometimes need to wait for long hours
to get desired results. IDS products often compare current traffic behavior with
acceptable normal behavior to detect DoS attacks, where as normal traffic is
characterized by a set of pre-programmed thresholds. This can lead to false alarms
or attacks being missed because the attack traffic is below the configured threshold.
IDS can also analyze user statistics to determine misuse and this method
is called Statistical Analysis (SA). SA provides some of the most powerful features
in intrusion detection. It can be used to identify trends in data and also to assess the
damage. Statistics are gathered from the previous history of the users behavior and
compared with the current data and if a difference is detected a warning flag is
raised automatically. This method has the advantage of detecting misuse as
signatures are not required. Once an attack is detected an automated response needs
to be generated and this can be in the form of predefined response scenarios.
Responses are generated automatically by the response subsystem and require no
human intervention. The newer challenge is these automated responses itself can be
used by attackers as an excellent DoS mechanism to the network. Automated
responses usually require recovery mechanism for the system by logging off the user
and/or disabling a users account, shutting down the target system, breaking the
network connection etc. Logging off users can be done when a user has been
positively identified as misusing the system. This is not very efficient as the user can
log in again. This may become DoS when a legitimate user has been mistaken for a
bad user and is logged off. It may so happen that a normal user is repeatedly logged
117



off and is prevented from doing his/her job. Disabling an account may be an
alternative effective method as the intruder will not be able to logon and is
effectively forbidden from usage of the system. This requires involvement from the
administrator to reenable the account. If an intruder abuses the account of a normal
user, then the normal user is denied access to the account. Shutting down the target
system is one of the most severe actions of all automated responses. Shutting down a
machine denies service to all of its users and requires time to recover. Stopping a
process may shut down the offending process, leaving all normal processes to do
their jobs. Breaking the network connection is used primarily by IDS when an attack
connection is detected. Similarly denying network requests is a method in which the
attacker will spoof an authorized users IP address whose service could later be
blocked. Automated real time response systems pertain to detection of intrusion by
monitoring and responding within a lesser time. Most of the current IDS are
designed to operate in real time but still have some performance implications.
5.1.4 Data Mining and Machine Learning
The amount of data in the world is overwhelmed and seems it is still
increasing. New mass storage technology made online storage and disks inexpensive.
Learning means to acquire knowledge and get awareness through
information. It is a known fact that computers find these tasks trivial. The main
interest is in improving the in new situations. DM is the process of discovering
patterns, automatically in large set of data that are useful and interesting. DM
algorithms like rule mining and frequent episodes can also be used in constructing
anomaly and misuse based IDS. The new rules and the frequent episodes are found
from the audit data. These rules and frequent episodes are used against the new audit
data for computing scoring functions. These functions can be used to measure the
discrepancy of the newly found rules from the old profile rules.
Learning using DM applications can be done in different ways:
1. Learning by Classification - Learning scheme is presented with a
set of classified examples from which it is expected to learn a way of
classifying unseen examples.
118



2. Learning by Association - Association among features is sought, not
just ones that predict a particular class value.
3. Clustering - Groups of examples that belong together are sought.
4. Numeric Prediction - The outcome to be predicted is not a discrete
class but a numeric quantity.
Based on the method used for profiling anomaly based IDS can be
classified as ML based and statistics based.
5.2 MACHINE LEARNING BASED IDS
The most important and key technology for profiling intruders in IDS
using DM is through the use of ML algorithms. ML algorithms are mainly used to
segment a data set and automate the manual process of searching and discovering
key features and intervals. They can also segment a database into statistically
significant clusters based on a desired output, such as the identifiable characteristics
of suspected intruders. Like NN, they can generate graphical DT or IF/THEN rules.
These rules can be used by analysts to understand and gain important insight into the
attributes of intruders. The ML algorithms operate through a process by
interrogating a data set and try to discover the attributes that are important for
identifying a potential attacker.
ML learning algorithms can be used to learn the normal behavior using
training set and the output of the algorithms will be a set of hypothesis that describes
the training set. A hypothesis, H, is a statement that the entire data set of n objects
comes from an initial distribution Model. After the hypothesis is discovered from
the data future audit data instances can be classified after they are tested against the
hypothesis. One more advantage of ML algorithm is during the training phase the
generalization ability of the hypothesis can be controlled. Generalization is the
ability of a learning system to classify the future sample of data correctly. A good
example of this generalization ability is SVM that is explained in Chapter 6. The
first comprehensive IDS used ML using NN for AD. The aim of building IDS is to
119



automatically improve the performance with experience. Classification is one of the
standard methods in ML. A classifier function will assign a class label from a finite
set of labels to an instance. These class labels can also be constructed automatically.
The classifiers that are built can be interpreted as rules, DT, SVM, NN etc.
In my research work the focus is to automatically construct the
interpretable classifiers. This is achieved using supervised ML technique that is
applied in two phases: learning phase and testing phase. In the learning phase, the
classifier is given a set of instances together with correct class labels, which allows it
to modify its internal structure. In the testing phase, the classifier is presented with
unlabeled, previously unseen instances, for which it predicts class labels. The testing
phase allows the user to evaluate the performance of a classifier.
Most of the classification methods are binary, in which the classifier
assigns one of two possible classes. Binary classification is very well understood
and has some properties that make it particularly appealing. A scoring classifier is an
extension of a binary classifier that assigns scores to instances. Calibrated binary
classifier outputs probabilities that an instance belongs to a specific class.
Training data set that is collected for the training phase is greatly
influenced by:
1. Whether the learner module receives a direct or indirect feedback
2. To what degree the learner module can control the sequence of
training examples it is given.
3. The way distribution of training examples represents the distribution
of examples that the system is measured against.
My research work deals with learning systems that receive direct input
from the setup. There are four types of learning depending on the availability of
labeled instances:
120



1. Supervised learning: Uses a set of example class label pairs to
construct a classifier C. Classifier C can be subsequently used to
classify new instances.
2. Semi-supervised learning: Uses two labeled data set, a set of
example class label pairs and an unlabeled set. During learning stage
some examples of outlier and inlier are available as a training set. In
this method additional information may not always available in
practice. Since the type of outliers is diverse, applying this method
may not be useful in detecting unknown types of outliers.
3. Active learning: In this method the learner interactively chooses
which data points to label. The aim of active learning is that
interaction can substantially reduce the number of labels required,
that makes solving problems using ML more practical. This is a
subfield of ML and, more generally, AI. The key component of this
learning mechanism is that the learning algorithm is allowed to
choose the data from which it learns.
4. Unsupervised learning: Uses only an unlabeled set with the goal to
detect interesting patterns in the data. Some common examples are
clustering, association rule mining, outlier detection and time series
learning.
ML technique focus on predicting the value of one target variable based
on known values of other attributes. Using DM nontrivial extraction of implicit,
previously unknown, and potentially useful and interesting patterns from data can be
obtained using unsupervised ML techniques. Unsupervised learning algorithms are
useful in the management of intrusion detection alerts in IDS.
My research work concentrates in using statistical methods for OD and
has proved to be realistic in building real time IDS applications. The problem is
addressed by first collecting the inlier. The OD problem is to find outlier instances in
the test set based on the training set. The collected data is separated into a training
121



set consisting only of inlier samples observed in the past and the test set consisting
of recent data set from which the algorithm will try to find outliers.
The ML component in my research work is formulated with the
following features:
1. The system can learn from training examples.
2. Builds a classifier so that its correctness and behavior can be verified.
3. Incorporate the background knowledge required and is efficient
enough to perform real time learning.
4. The system can learn incrementally.
5. The system present the learned concept in the form that can be
interpreted
5.3 STATISTICS BASED IDS
Use of statistics in building IDS is another well known method. To detect
anomalies IDS process each new data set against the known profile and reports as
intrusion if any deviations are detected. The IDS may operate with a set of rules that
statistically describe the behavior of the users based on the activities of users over a
period of time. Present behavior can then be matched against the rules generated to
detect abnormal behavior. The rules should regularly be updated to accommodate
new patterns. The rule base can be updated automatically and the process of
updating the patterns can be done under supervision. Statistical analysis of network
traffic parameters can be successfully used in identifying attacks. During normal
operations the properties of attributes that describe the network traffic either remain
constant or vary slowly with time. Malicious activity usually transforms these
attributes in such a way that their statistical properties no longer remain constant
resulting in abrupt changes.
122



5.4 MACHINE LEARNING VERSUS STATISTICAL TECHNIQUES
A wide range of real world applications are discussed in the community
of Statistical Analysis and Data Mining. Statistical techniques usually assume an
underlying distribution of data and require the elimination of data instances
containing noise. Statistical methods though computationally intense can be applied
to analyze the data. Statistical methods are widely used to build behavior-based
Intrusion Detection system. The behavior of the system is measured by a number of
variables sampled over time such as the resource usage duration, the amount of
processor-memory-disk resources consumed during that session etc. The model
keeps averages of all the variables and detects whether thresholds are exceeded
based on the standard deviation of the variable. Using standard statistics, researchers
derive the skills and others which are more closely associated with ML. The main
emphasis of statistics is that it has been more concerned with testing hypotheses. ML
is more concerned with formulating the process of generalization as a search through
possible hypotheses. The DT induction developed by J. Ross Quinlan is a good
example that can infer classification trees from examples. The use of nearest
neighbor methods for classification is another standard statistical technique that has
been extensively adapted by ML researchers. This method improves classification
performance and makes the procedure computationally more efficient. The
technique used in this work incorporates a statistical model that constructs and
refines the initial data set, apply standard statistical method, and visualize data by
selecting relevant attributes to discover outliers. Statistical tests are used to validate
ML models and to evaluate ML algorithms.
5.5 OUTLIER DETECTION (OD)
The network data set that is collected may contain data that do not
comply with the normal behavior and these data sets are called as outliers. DM
methods will discard outliers, treating them as noise or exceptions. In many real time
applications such as fraud detection, these rare events may be more interesting than
123



the regularly patterns. The analysis of this outlier data is known as Outlier Analysis
or Outlier Detection. Using statistical models that assume a distribution these
outliers can be detected. Other methods of OD includes probabilistic model for the
data, using distance measures, clustering etc. Similarly statistical or distance
measures, deviation-based methods try to identify outliers by examining differences
in the main characteristics of objects in a group.
OD may uncover intruders by detecting the use of resources for a given
user in comparison to regular changes that may occur. OD can also be performed by
detecting the frequency or threshold of the network usage for a given time. Outliers
can be caused by different measurement techniques, execution error or inherent data
variability. Many DM algorithms try to minimize the influence of outliers or
eliminate them all together. If the outliers are eliminated, chances of loss of
important hidden information are more as these outliers may be of particular interest.
Research community finds outlier detection and analysis as an interesting DM task
and has been applied to many applications.
Given a set of n data points and p, the predictable number of outliers,
find the top p data sets that are significantly dissimilar, exceptional, or inconsistent
with respect to the remaining normal data. The problem of OD can be solved in two
different stages:
1. In the first stage, given a data set try to define the data that can be
considered as inconsistent.
2. In the second phase, find outliers using an efficient method from the
data set so defined.
Using a regression model, analysis of the residuals can give a good
estimation for data extremeness, which can be defined as outliers. The task becomes
more when finding outliers in time series data. It becomes still more challenging
when analyzing multidimensional data as a combination of different dimensions
may be at extremes. Users can visualize only two to three dimensions and may
124



become weak in detecting outliers with many categorical attributes or in data with
high dimensionality. Automatic detection of outliers falls into four methods.
1. Statistical method: A model is generated that allows a small number
of observations to be randomly sampled from distributions D
1
,. . . ,
D
n
, differing from the target distribution T, which is often taken to be
a normal distribution. This approach assumes a distribution or
probability model for the given data set and then identifies outliers
with respect to the model.
2. Distance based method: These methods are usually based on local
distance measures and are capable of handling large databases.
3. Density based method: These methods cluster the data set into dense
regions in the data space that are separated by regions of low density.
4. Deviation based method: This method does not use statistical or
distance based measures to identify outliers. Instead, it identifies
outliers by examining the main characteristics of data in a group.
It is important to note that each outlier discovered by these approaches is
indeed a real outlier. Hence OD has become a very essential component while
designing IDS. A failure to detect outliers or their ineffective handling can have
serious effects on the accuracy of IDS. Although large numbers of techniques are
available to perform OD, selection of the most suitable technique poses a big
challenge to the research community. There is no standard technique for OD, but
checking for outliers should be an integrated part of any data analysis.
An intensively investigated research area is real time series analysis with
an assumption that the properties or parameters describing the data are either
constant or vary slowly with time. CP means a change in characteristics that occur
very fast with respect to the sampling period of the measurements, if not
instantaneously. The detection of these changes refers to tools that help to decide
125



whether such a change occurred in the characteristics needs to be considered for
further analysis or not. A CP thus refers to a time instant at which properties
suddenly change, but before and after which properties is constant/stationary. CP by
no means implies changes with large magnitude but is concerned with the detection
of small changes. The problem of identifying malicious activities can be formulated
as a CP detection problem. A CP is to detect changes in the network traffics
statistical properties as quickly as possible with a minimal false alarm rate.
Internet technology has massive data available such as IP network traffic,
financial transactions, media, scientific data and much more. A number of methods
have been developed for managing and analyzing such large data. In the
development of IDS, changes in the traffic patterns might indicate an intrusion or an
attack that may need immediate attention. Similarly in financial streams, a change in
the pattern of trades might represent an abnormal behavior from the users, so that an
analyst can take necessary action. A change in the data distribution may cause the
model to become stale and degradation in detection accuracy.
The problem with CP detection while developing IDS is to identify the
measure and threshold for change. Another problem is in identifying the two data
sets that would be basis for detecting a change. Normally CP methods specify a
window size W in which the query for detection of change is executed. A research
issue is in the determination of W, the window size. . If the window size is fixed, it
can only work well if information on the time scale of change is available. On the
other hand if the system requires a continuous detection for changes then a sliding
window model can be used. If the size of the window is smaller than the changing
rate, then chances of missing the detection is more. Hence it may require capturing
the change at a later time when gradual changes cumulate over time. A window size
larger than the changing rate may delay the detection until the end of the window.
One way to detect the window size is to search over multiple sliding windows of
different sizes and performing some analysis. Though it is not feasible to linearly
search over all possible window size a weighted window mechanism can be used. In
this method a weight is associated depending on the importance of data set and the
126



choice of the weight should match the unknown rate of change. The current need is a
CP method that can accurately determine a change with a short delay.
In my research work CP algorithm is proposed that guarantees keeping
track of the CP incrementally and with a minimum delay in detecting a change. The
algorithm is space efficient and the performance study is done with synthetic and
real time data set. The system was accurate and robust with the implementation of
the algorithm and could detect changes. The study is performed on real traces of
network packets that demonstrated the accuracy of CP algorithms and can help
improve to model better IDS. The results with labeled attacks showed that CP
detection is a prospective method that can be applied for intrusion detection
monitoring.
In my research work statistical analysis technique is used to identify
changes in the behavior of the network traffic that unveil both ongoing and new
attack patterns. Change Point Outlier Detection (CPOD) algorithm as explained in
Section 5.6.2, is implemented using the features of CP and OD methods. A detailed
analysis of the effectiveness of the algorithm using real time network traffic
collected is performed and is successful in detecting CPs related to different
malicious activities. A standard OD problem falls into the category of unsupervised
learning due to lack of prior knowledge on the normal data.
When attacks are detected it is expected that the statistical properties of
the traffic parameters will no longer remain constant. These CP can be detected
using sequential analysis methods such as the CUSUM. The analysis of time
varying parameters is carried out over attribute values covered by a window of finite
length. Considering all the previous values, or by analyzing values within a window
of finite change detection is performed. A better way to provide the window size is
to set a minimum value whenever a change in system parameter is detected. The
window size will remain fixed until a new CP is detected or the current CP is
terminated. After the CP is detected, the window will either be reset or a new value
can be specified.
127



5.6 PROPOSED ARCHITECTURE OF OUTLIER DETECTION AND
CLASSIFICATION MODULE
Firewalls use IP router information and filtering rules to determine
whether users have access to the network or not. Packet filter firewalls allow or deny
network communication by using rules. These rules examined incoming or outgoing
packets to allow or disallow the transmission. The basis for generating these rules
was often the source IP address, the destination port and the protocol used. Current
generation of firewalls uses filters and gateways to control traffic in the network.
These firewalls are an intermediary between the internal network and the Internet.
When network packets pass through the firewall, it is possible to collect and analyze
data and headers within the traffic.
In my research work OD and classification mechanism is used to reduce
the false positive which is a major drawback of all AD systems.






Figure 5.1 Architecture of the model using CP and OD
As in Figure 5.1 the real time network statistics are studied for a week
and threshold levels are detected for each participating entity in the LAN. OD and
CP detection mechanism is applied for the attributes that are selected such as unique
IP count, source bytes etc.
Stage 1 Stage 2

Alarm Unit
Knowledge
Base

Outlier
Detection
Algorithm
Real Time
Data
Training
Data
Feature
Extraction
128



After setting the threshold at different time windows, OD mechanism is
used to calculate and analyze the mean standard deviation, coefficient of variance
and CUSUM charts of unique IP count for each IP in the LAN. Initially normal
network profile is considered and the system is trained using these values as inlier.
The attack tools were executed repeatedly against the designated target computers
and the unique IP count of the attacks detected are stored in a text file for further
processing.
The next stage of the implementation involves in training the system.
Data such collected is preprocessed and used to detect CP in Network performance
characteristics. The Feature Extraction handles the conversion of raw packet or
connection data into a format that DM algorithms can utilize and store the results in
the knowledge base. Rather than operating on a raw network dump file, the
algorithm uses summary information to perform the analysis. Data is preprocessed
to generate summary lines about each connection found in the dump file. The
resulting summary file is then parsed and processed by the algorithm to give a score
to each data at each time point, with a higher score indicating a high possibility of
being an outlier and/or a change point. The Knowledge Base stores the data as rules
produced by the detection algorithm for further mining process. It may also hold
information for the preprocessor, such as patterns for recognizing attacks and
conversion templates. The training data is responsible for generating the initial rule
sets that needs to be used for deviation analysis. It can be triggered automatically
based on time or the amount of pre-processed data available.
The flow chart for the OD is as shown in Figure 5.2. The sequence of
execution steps is as follows.
1. Unique packet count for each IP is counted and stored.
2. Using OD module the mean and standard deviation are calculated.
CV and CUSUM charts as explained in Sections 2.7.1 and 2.7.4
respectively, were drawn that detect the outlier from observed traffic
statistics in real time traffic.
3. Compare the attacks detected and store as records which is detected
in first level of intrusion detection against the above statistics.
129



4. If the result is normal, then classify it as normal or else classify as
attack.
5. Repeat from Step 2 and log for all data set for a certain time window
size specified.

Figure 5.2 Flow chart for change point and outlier detection
130



5.6.1 Dataset Description
The process of detecting outliers using CPOD algorithm is depicted in
Figure 5.1. The first stage of the implementation involves in training the system. For
the present problem data is collected from our campus network datasets to measure
the accuracy and attacks of different types. Data such collected is preprocessed and
used to detect change point in Network performance characteristics. Traffic in the
network results in continuous change as the users login and make use of Internet.
For capturing the packets in real time JPCAP and WINPCAP tool are used to collect
the information that is being transmitted. J PCAP provides facilities to capture and
save raw packets live. It can automatically identify packet types and can generate
corresponding Java objects for Ethernet, IPv4, IPv6, ARP/RARP, TCP, UDP, and
ICMPv4 packets. Packets can also be filtered according to the user requirement.
JPCAP is developed on LIBPCAP / WINPCAP, which is implemented in C and
J ava and is the industry-standard tool for link-layer network access. In Windows
environment WINPCAP allows applications to capture and transmit network packets
bypassing the protocol stack.
The network data is collected from the interface which is capable of
capturing information flowing within the local network. For example, anomalies can
be detected on a single machine, a group of network switches or a router. For the
current research work the TCP/IP packet is collected in real time from the research
lab network and dumped for further process.
The Feature Extraction handles the conversion of raw packet or
connection data into a format that DM algorithms can utilize and store the results in
the knowledge base. Rather than operating on a raw network dump file, the
algorithm uses summary information to perform the analysis. Data is preprocessed
to generate summary lines about each connection found in the dump file. The
resulting summary file is then parsed and processed by the algorithm to give a gain
to each data/each time point, with a higher gain indicating a high possibility of being
an outlier/a change point.
131



The Knowledge Base stores the data as rules produced by the detection
algorithm for further mining process. It may also hold information for the
preprocessor, such as patterns for recognizing attacks and conversion templates.
The Training Data is responsible for generating the initial rule sets that
needs to be used for deviation analysis. It can be triggered automatically based on
time or the amount of pre-processed data available.
5.6.2 Proposed Change Point Outlier Detection (CPOD) Algorithm
To illustrate the problem, network data collected is observed with
threshold values in the college local area network at regular intervals of time
window for a certain period of time. In the analysis it is found that the threshold
variations are statistically regular most of the time. Once in a while there may be a
point that deviates from the normal pattern that can be marked as outlier point.
Detection of such outliers is very important as they may be due to an anomaly within
the network or from an external environment.
In sequential analysis, CP detection methods are categorized as offline or
online change detection algorithms. In the case of offline change detection algorithm
the process of data acquisition is completed before applying the algorithm. In the
case of online change detection algorithm, the challenge is to detect change as early
as possible which is critical for network operations. Suppose that a random process
R is sampled at a fixed time interval t resulting in a sequential observation R
t
. After
each sampling period, a decision is computed to decide whether or not there is a
transformation in process statistical properties that resulted in a CP.
The following are the requirements of CPOD algorithm. The statistics
should be on-line meaning an outlier has to be detected as it appears. A CP has to be
detected within some constant number of observations after the change happens.
Specific assumptions regarding distributions are not considered and hence the
detection can be adaptive to a non-stationary time series and robust to a wide variety
of distributions.
132



The major theme of my research is to demonstrate a unified solution for
detecting outliers and change points whilst taking care of the above requirements.
This algorithm satisfies the on-line requirement as it computes an outlier
classification gain for a data point immediately at the time at which it occurs without
waiting. CPs are detected after a constant number of successive outliers. As sliding
window technique is used, the algorithm satisfies the adaptive requirement. This
technique overwrites the old data and adapts the thresholds after CPs are detected.
The algorithm considers a real time data sequence {x
i
: i = 1, 2,... N},
where i denotes the time window variable. CPx
i
denotes the data point in the time
series that is currently being considered for analysis. t and s are the thresholds for
CP and OD respectively. Threshold value is the mean magnitude of fluctuation
allowed in the data points within which they wont be classified that is user defined.
The mean and median is computed for window size w and v respectively. When a
median over a small window is chosen it is found that it less sensitive to outliers and
the deviations are localized. When w <v it is found that selecting (w/v) <0.5 is
optimal. A vector maintained that signifies the classification state of each data point
that is selected. States could be {0,1,2,3} where 0 means neither outlier nor change
point, 1 means outlier, 2 means outlier and high probability that previous point
was the change point, and 3 means the CP done from previous two observations.
If the thresholds are well tuned, CPOD can detect maximum outliers and
change points in a time series.
CPOD Algorithm (Input : p,s,t,w,v)

Step 1: The iteration for all data points is done after initializing i=p+1
Step 2: The median and mean over the windows v, w is computed respectively as
in Equation (5.1) and Equation 5.2
u
i
=
1
~

i
w i
x

(5.1)

133



Mean of values in window v

i
=
1

i
v i
x

(5.2)

Step 3: Gain
1
is the ratio of absolute difference between the current data point
from the median to the mean amplified by the threshold and calculated as
Gain
1i
=(|
i
-

u
i
|* t) /
i

and (5.3)
Gain
2
is the ratio that is normalized between two distance magnitudes.
Gain
2i
=

(|u
i
-
i
|)/ (|
i
-

u
i

|) * 100 (5.4)
The median that is calculated is over the short term window w. Similarly
the mean is calculated over the longer term window v. This ratio makes Gain
2
much
more robust to fluctuations that may happen in the long terms . Using Brute Force
method it is found that selecting (w/v <0.5) would be the best choice as this
depends on the performance of detection of outliers. If Gain
1
>Gain
2
, the data point
is classified as an outlier. Gain
2
is used as a data dependent threshold that classifies
Gain
1
as outlier or not. Gain
1
is always sensitive to the mean and to the variation of
current data point from median. Gain
2
is always sensitive to deviation of current data
point from mean and to variation of mean from median. In the presence of outliers in
window v, Gain
1
will be greater than Gain
2
.
Step 4: A stronger possibility of current data point being an outlier or CP is
indicated if Gain
1
is higher than Gain
2
. To classify the current data point
as CP an additional check is made to find if the point lies beyond a
certain band around the median represented as LL
i
and UL
i
. The
classification state is then saved in vector V.
Gain
1i
0 : ) (
1 : ) (
2
2


i i i i i
i i i i i i
GV UL CPx LL Gain
GV UL CPx LL CPx Gain

(5.5)
134



Step 5: Gain information in vector GV is used to classify outlier and CP. If the
current point has a higher Gain
1
as indicated in GV
i
the past three states
are considered for classification. Vector GV
si
is the sum state of
GV
i
,GV
i-1
and GV
i-2
whichstores state of the past two data points with
respect to the selected current data point. If the value of GV
si
is 3 it
indicates that outliers were detected in the past and current point could be
the change point. We test the previous states to make sure there was no
change point detected in the past two data points. If detected then the CP
is inferred as an outlier. Similarly if the value of GV
si
is 1 then it is
possible that current point is an outlier as shown in Equation (5.6) and
Equation (5.7).
GV
i
= 1 : GV
si
= (GV
i
+ GV
1 i
+ GV
2 i
)
= 0 : GV
si
= 0 (5.6)

GV
si

= 3 (GV
s
1 i

= 3 GV
s
2 i
= 3) : GV
s
i
=

1
= 1 (GV
s
1 i

= 1 GV
s
2 i
= 1):GV
s
i
= GV
s
i
+ GV
s
i
- 1 (5.7)
Finally CPx
i
is classified depending on the values of GV
si
. If the value of
the current point is 0, then it is inferred that there is no significant deviation. If the
current point is 1 or 2 and the current data point deviates more than t% threshold it is
inferred as an outlier and signifies a higher possibility of CPx
i
- 1 being a CP. If the
state of current point is 3, with accuracy it is inferred that two points prior to current
one is the CP.
= 0 : No change , adjust LL, UL to u
i

GV
si

= (1 2) ( CPx
i
> (CPx
i
+ t % CPx
i
) ): Outlier (5.8)
> 2 : Change point, adjust LL, UL
Step 6: The classification of outliers and change points as signified in the GV
si

vector is reported. The Gains and state elements of vector GV, GV
s
for
past data points N, N-1 and N-2 are persisted. Persistence of median and
mean over the window sizes while classifying current data point enables
online implementation.

135



5.7 STRUCTURE CHART
The overall flow in the research work between different modules is as
shown in Figure 5.3

Figure 5.3 Structure chart of the research work
The proposed CPOD Algorithm examines the network data and creates a
description of differences and stores in the knowledge base for further reference. If a
deviation is detected it signals the alarm unit. A strategy for invoking the deviation
analyzer is by querying periodically the knowledge base for the new profiles. Also
the profiler may signal when a new profile is added to the knowledge base. The
Alarm Unit is responsible for informing the administrator when the deviation
analyzer reports unusual behavior in the network stream. This can be in the form of
SMS, e-mails, console alerts, log entries etc.
5.8 IMPORTANCE OF SELECTING OPTIMAL SUBSET OF
FEATURES
The importance of selecting appropriate features is one of the important
steps that need to be performed before any kind of data manipulation. Several efforts
have been successfully put into research areas such as statistical pattern recognition,
statistics, ML and DM. Most of these research efforts were also successfully applied
136



in AI field such as image retrieval, text categorization and IDS. For selecting an
optimal subset of features from the given data set, the FS process generally performs
generation and evaluation of subset from the original data set with a stopping
criterion and finally the result is validated. The subset generation and evaluation
process is repeated until the stopping criterion is met. Recent research work has
shown that reaching an optimal set of features is still a NP-hard task. The FS process
normally does not produce or combine new features. Initially it starts from a given
set of fixed features and gradually searches for the optimal subset. During the subset
generation step new sets of features are selected based on an internal searching or
heuristic method. The factors that affect this step are the search direction and
starting point. In backward method it starts from the full set of features and
gradually removes features as needed while in forward method, it starts with an
empty set of features and incrementally adds features as needed. Bidirectional
approach combines both the methods and a random approach that tries to avoid local
minima. The search strategy is used to create the next set of candidate features. The
common searching strategies are heuristics, complete search or sequential search and
random search.
Once a feature subset is generated, the subset evaluation phase will
evaluate the features against a certain criterion. This stage is a complex task which
can be achieved with the help of expert knowledge information or using ML
algorithms or both. The stopping criterion is used to end the subset generation
process. This may happen when the process completes its search, or a specified
threshold is reached such as maximum number of iterations. In the final stage the
result validation will provide an empirical proof of the selected feature set with the
use of expert knowledge or by conducting a performance experiment. In practice, a
feasible way to evaluate the results is to conduct experiments before and after the FS
process and compare the overall performance of the algorithm.
Chapter 6 presents the proposed model for network level feature
evaluation. The model provides a solution for the subset evaluation step of the
feature selection process. The model uses statistical methods combined with SVM
and Relevance Vector Machines (RVM). The method is primarily designed for
network traffic based features.
137



5.9 EXPERIMENTAL RESULTS
Statistics of IP addresses and their count observed in our research
laboratory for a time period of 30 minutes specified as window w
1
, w
2
, w
3
as in
Table 5.1.

Figure 5.4 Initial packet capture screen
Table 5.1 Real time data collected
IP Address Time Window w
1
Time Window w
2
Time Window w
3

172.16.30.28 1010 1011 2000
172.16.30.91 86 86 90
172.16.30.75 415 492 512
172.16.30.108 140 140 140
172.16.30.70 24 24 24
172.16.30.92 58 58 58
172.16.30.68, 175 175 179
172.16.30.69 14 14 14
172.16.30.96 14 14 14
172.16.30.35 7 7 7
172.16.30.95 100 100 100
172.16.30.114 7 7 7
172.16.30.18 3 3 3
172.16.30.88 2 2 2
172.16.30.49 35 36 36
172.16.30.101 11 12 12
138



Outlier was detected for the network data over a time period of 9.00 to
5.00 for a particular host. For experimental analysis the data was collected. The
packet count for all the IP addresses was captured for one week on all working days
in a time window of 30 minutes from 9:00 AM to 5:00 PM. The average packet rate
was calculated for each time window and statistics are as shown in Table 5.2.
Table 5.2 Average packet rate for one week
Week 1 Average Packet rate
Time Day 1 Day 2 Day 3
9:00 1010 1000 900
9:30 1000 900 1000
10:00 3000 1000 1123
10:30 900 923 950
11:00 1020 900 856
11:30 1011 1000 1200
12:00 1000 1000 1000
12:30 5000 5950 5020
1:00 900 950 912
1:30 1001 1011 1010
2:00 1000 1000 1000
2:30 1120 1210 1000
3:00 980 900 1000
3:30 850 900 850
4:00 1020 1220 1211
4:30 1120 1020 1112
5:00 20 20 20


Figure 5.5 Outlier detected for Table 5.2 data
139



Figure 5.5 shows outlier detected for the network data over a time period
of 9.00 to 5.00 for a particular host.
For the real time statistics the data was collected for one week and the
graph is as shown in Figures 5.6 and 5.7.

Figure 5.6: Plot of real time data and detection of change point and outliers

Figure 5.7 Plot of real time data and detection of change point and outliers

140




Figure 5.8 Plot of real time data collected for week 1

Figure 5.9 Plot of real time data collected for week 2
Figure 5.10 shows CUSUM chart, which helps in analyzing the change
point and variation in the behavior for a particular host IP.

Figure 5.10 CUSUM analysis
141



Figure 5.11 shows standard deviation analysis, which is used for drawing
normal network profile with all the observed statistics and results

Figure 5.11 Standard deviation analysis

Figure 5.12 Outlier detected for the data given in Table 5.2

Figure 5.13 Snapshot showing threshold of IP addresses captured
142



5.10 SUMMARY
Lot of work on techniques for detecting changes in data is still in
progress. CPOD algorithm is based upon statistical method which is fast and space
efficient. The algorithm in practice with synthetic and real data sets revealed the
effectiveness in detecting changes.
In my research a solution for classifying outliers and CP from real time
data is implemented which is addressed in two parts: scoring and classification.
Scores are computed that reflect outliers and incrementally discover to keep the state
of outliers in data series. The algorithm is characterized in its property to address
outliers and change points at the same time. This enabled to deal with frequently and
fast changes in the source. The current implementation and usage indicates the
success of the algorithm. This gave a unifying view of outlier detection and change
point detection in real time network data. In this work, the design and successful
implementation of a system with OD was done.
My research work addresses management of the IDS where significant
detection of attacks present major difficulties. Using the RVM a probabilistic kernel-
based learning machine model is implemented that is explained in Chapter 6.


143




CHAPTER 6
APPLICATION OF RELEVANCE VECTOR MACHINES
IN DEVELOPING IDS

6.1 INTRODUCTION
Reasonable level of security is provided by static defense mechanisms
such as firewalls and software updates. Dynamic mechanisms can also be used to
achieve security such as IDS and Network Analyzers (NA). The main difference
between IDS and NA is that IDS aims to achieve the specific goal of detecting
attacks whereas NA aims to determine the changing trends in network of computers.
Earlier work emphasized that data can be obtained by three ways using real traffic,
sanitized traffic and simulated traffic. But in real time, fast response with reduced
false positives to external events within an extremely short time is demanded and
expected. Therefore design of alternative algorithms to implement real time learning
is imperative for critical applications for fast changing environments. Even for
offline applications, speed is still a need, and a real time learning algorithm that
reduces training time and human effort to nearly zero would always be of
considerable value. Mining data in real time is still a big challenge.
IDS involve automatic identification of unusual activity by collecting
data, and comparing it with reference data. An assumption of IDS is that a networks
normal behavior is distinct from abnormal or intrusive behavior, which can be a
result of various attack/s.
In my research work, flow analysis is used for network traffic analysis
which searches for behavioral characteristics in a flow. There are various
characteristics such as transferred bytes, packets, flow length, inter-arrival times,
144



inter-packet gaps, etc that are monitored and computed. Data that is collected from
flows can be used on high-speed networks, as there is no deep packet inspection.
In the recent years, there has been a growing interest in the development
of Change Detection (CD) techniques for the analysis of intrusion detection. This
interest stems from the wide range of applications in which CD methods can be
used. Detecting the changes by observing data collected at different times is one of
the most important applications of network security. Research in exploring CD
techniques for medium/high network data can be found for the new generation of
very high resolution data. The advent of these technologies has greatly increased the
ability to monitor and resolve the details of changes and makes it possible to
analyze. At the same time, they present a new challenge over other technologies in
that a relatively large amount of data must be analyzed and corrected for registration
and classification errors to identify frequently changing trend. In my research work
an approach for IDS which embeds a Change Detection Algorithm with Relevance
Vector Machine (RVM) is implemented. IDS are considered as a complex task that
handles a huge amount of network related data with different parameters. Current
research work has proved that kernel learning based methods are very effective in
addressing these problems. In contrast to SVM, the RVM provides a probabilistic
output while preserving the accuracy. Th143e focus of my work is to model RVM
that can work with large network data set in a real environment and develop RVM
classifier for IDS. The new model consists of Change Point Outlier Detection
(CPOD) algorithm (explained in Chapter 5) and RVM. The model is competitive in
processing time and improves the classification performance compared to other
known classification models like SVM. The goal is to make the system simple but
efficient in detecting network intrusion in an actual real time environment. Results
show that the model learns more effectively, automatically adjusting to the changes
and as well as adjusting the threshold while minimizing the false alarm rate with
timely detection.
In my research work a hybrid approach for improving the performance of
detection algorithm by building more intelligence to the system is proposed. In this
direction CP detection is considered for discovering change points if properties of
145



network behavior change. CP is the change in characteristics that occur very fast
with respect to the sampling period of the measurements, if not instantaneously. The
detection of changes refers to tools that help to decide whether such a change has
occurred in the characteristics or not. OD is another major step in DM problem
which discovers abnormal or deviating data points with respect to distribution in
data. Outliers are often considered as an error or noise although they may carry very
important information.
A real time detection system is one in which network intrusion detection
happens while an attack is occurring. A real time IDS captures the present network
traffic data which is on line data. Bayesian learning algorithms, like RVM allow the
user to specify a probability distribution over possible parameter values from the
learned classifier. This will provide one solution to the over fitting problem as the
algorithm can use prior distribution to regularize the classifier.
6.2 RELATION BETWEEN DM, ML AND STATISTICS

Figure 6.1 Relation between DM, ML and statistics
Figure 6.1 shows the relation between DM, ML and statistics. DM can
help to improve IDS by employing one or more of the following techniques:
Statistics
AI and ML
Database and
Datawarehouse
Data
Mini ng
146



1. Data summary: Statistics which includes finding outliers.
2. Visualization: Presenting a graphical summary of the acquired
answers.
3. Clustering of the data into categories.
4. Rule Discovery: Defining normal activity and enabling to discover
anomalies.
5. Classification: Predict the category to which a particular dataset
belongs.
Using DM techniques it is convenient to extract patterns on issues
relating to their feasibility, usefulness, effectiveness and scalability. Few specific
things that DM can contribute to the design of IDS are:
1. Remove normal activity from alarm data set and allow analysis that
focus on real attacks.
2. Identify false alarms and insignificant signatures
3. Find anomalous activity that uncovers a real attack.
4. Identify important, interesting ongoing patterns in time series data
set.
Some of the benefits of using DM Techniques are
1. Large data sets may contain valuable implicit information that can be
discovered automatically.
2. Difficult to program applications using traditional manual
programming.
3. Classification of security issues involves a vast amount of data that
needs to be analyzed and DM is well suited to discover interesting
patterns.
147



Use of DM approaches in developing IDS
1. IDS are difficult to program using ordinary programming languages.
2. ML is suitable as it is adaptive and dynamic in nature.
3. The environment in which an IDS works is dependent on personal
preferences. Hence the ability of computers to learn with improved
performance is really a challenging task.
Research shows that many ML techniques can be used for data
classification. It is presented that popular supervised learning techniques gives high
detection accuracy for IDS. A.M.Turing identified ML as a precondition for
intelligent systems. ML is generally used for automatic computing procedures that
are based on logical or binary operations. They learn a task from a series of
examples and in my research work usage of ML is concerned with classification. DT
approaches the problem applying a sequence of logical steps and is capable of
representing the most complex problem given sufficient data. This means an
enormous amount of data set needs to be collected. GA and Inductive Logic
Procedures (ILP) are currently active research areas that allow dealing with more
general types of data. If the number and type of attributes vary and additional layers
of learning are introduced these types are helpful. Some of the main characteristics
of ML and DM are:
1. ML focuses on the prediction, based on known properties learnt from
the training data
2. DM focuses on the discovery of (previously) unknown properties on
the data
3. DM uses many ML methods with a slightly different goal in mind.
4. ML employs DM methods as unsupervised learning or as a
preprocessing step to improve learner accuracy.
148



5. In ML, the performance is usually evaluated with respect to the
ability to reproduce known knowledge
The main aim of ML algorithms is to generate classifying expressions
simple enough to be understood by the users. Like statistical approaches,
background knowledge may be used in development but operation is assumed to be
automatic without user intervention.
A wide range of real world applications are discussed in the community
of Statistical Analysis and DM. Statistical techniques usually assume an underlying
distribution of data and require the elimination of data instances containing noise.
Statistical methods though computationally intense can be applied to analyze the
data. They are widely used to build behavior based IDS. The behavior of the system
is measured by a number of variables sampled over time such as the resource usage
duration, the number of processors, memory disk resources consumed during that
session etc. The model keeps averages of all the variables and detects whether
thresholds are exceeded based on the standard deviation of the variable. Very few on
line (real time) network IDS approaches are proposed until now. Subaie [132] uses
Hidden Markov Models (HMM) over NN in anomaly intrusion detection to classify
normal network activity and attack using a large training dataset. The approach was
evaluated by analyzing how it affected the classification results. Authors in [133]
propose a hybrid intelligent systems using DT, SVM and Fuzzy SVM for anomaly
detection (unknown or new attacks). The results show that the hybrid SVM approach
improves the performance for all the classes when compared to a SVM approach.
ML is concerned with building systems that experience and improve the
performance. Classification is one of the standard tasks in ML. A classifier is a
function F(x) that assigns a class label C from a finite set of labels. Given a
character, the classifier might depict the letter which can be constructed
automatically. One method to build classifiers automatically is to use supervised ML
techniques that are executed in two stages: Learning phase and Testing phase.
149



A classifier C as a function F(i) : I ->C, where I is an input vector and
C ={C
1
,C
2
, . . . ,C
n
} is a class space. A binary classifier F
b
is a classifier mapping an
instance space I to a binary class space: F
b
(i) : I ->{P, N}.
SVM proposed by Cortes and Vapnik [134] is a supervised learning
algorithm that is used increasingly in IDS. The classification performance of SVM
model is better than the classification methods, such as ANN. The benefit of SVMs
is that they learn very effectively with high dimensional data. Rui [135] uses
Incremental RVM to detect intrusions. The features selected by this method prove to
be effective and also decreased the space density of data. This improves the
generalization performance of RVM and the results are better than RVM and SVM.
This guarantees the reliability of using RVM based approach for designing IDS.
RVM has a better generalization performance than SVM due to the generation of
less support vectors.
The SVM is one of the most successful classification algorithms in the
DM area.
RVM is capable of delivering a fully probabilistic output and it is proved
to have nearly identical performance to, if not better than, that of SVM in several
benchmarks. Di He [136] proposes an IDS approach based on the RVM where a
Chebyshev chaotic map is introduced as the inner training noise signal. The result
shows that the approach can reach higher detection probabilities under different
kinds of intrusions and the computational complexity reduces efficiently.
6.3 SVM TRAINING AND CLASSIFICATION
The SVM is one of the most successful classification algorithms in the
DM area. SVM is a supervised learning algorithm that is used increasingly in IDS.
The classification performance of SVM model is better than the classification
methods, such as ANN. The benefits of SVM are:
1. They learn very effectively with high dimensional data.
150



2. Maps input feature vectors into a higher dimensional feature space
through some nonlinear mapping.
3. Can learn a larger set of patterns and be able to scale better, because
the classification complexity does not depend on the dimensionality
of the feature space.
SVMs also have the ability to update the training patterns dynamically
whenever there is a new pattern during classification. The main disadvantage is
SVM can only handle binary class classification whereas intrusion detection requires
multi-class classification.
SVM algorithms are binary classifiers that are sufficient to distinguish
between normal and intrusive data. Recent SVM algorithms support multi class
learning. The approach combined several two-class SVMs and for each SVM, the
training data is partitioned into two classes so that one represents an original class
and the other class represents the attacks. It is also necessary to specify an upper
bound parameter C that can be determined experimentally. This results in a cross-
validation procedure, which is wasteful both for computation as well as data.
SVM uses kernel trick to apply linear classification methods to non linear
classification problems. SVM tries to separate two classes of data points in a multi-
dimensional space using a maximum margin hyperplane. A hyperplane is one that
has a maximum distance to the closest data point from both classes. The problem of
learning SVMs is theoretically well founded and well understood. They are
particularly useful for application domains with a high number of dimensions, such
as text classification, image recognition, bioinformatics or medical applications. The
disadvantage of these methods is that the models are difficult to understand.
Suppose an empirical data is given
(x
1
,y
1
), . . . , (x
n
, y
n
) X Y (6.1)
where X is a nonempty set with x
i
as predictor variables and y
i
Y that are called as
response variables.
151



Assumptions are not made on the domain X and in order to generalize the
points to unseen data points an additional structure is required. In the case of binary
classification, given new input x X, predict the corresponding y {1}. It means
to choose y such that (x,y) is similar to the training examples in some sense.
Similarity measure is required in X and in {1}. For X, a function
k :X X R, (x,x') k(x,x'), for all x,x' X (6.2)
k(x,x') =<f(x),f(x')> (6.3)
where f(x) maps into some dot product space H, called the feature space. The
similarity measure k is usually called a kernel, and f is called its feature map. The
main advantage of using such a kernel is to construct algorithms in dot product
spaces using similarity measure.
The detection of intruders from a data set can be realized using many
methods by identifying an unknown pattern from a set of known patterns. SVM are
supervised learning methods that are used for classification. SVM is a kernel method
and the selection of a kernel function used in SVM is very crucial in determining the
performance. The idea of SVM is to identify maximum margin using the kernel
methods. The data can be first implicitly mapped to a high dimensional kernel space.
The max margin classifier is determined in the kernel space whereas the
corresponding decision function in the original space can be non-linear.

(a) (b)
Figure 6.2 (a) Original data in the input space. (b) Mapped data in the
feature space
152



The non-linear data in the feature space is classified into linear data in
kernel space by the SVM which is illustrated in Figure 6.2. Consider a classification
problem where the discriminant function is nonlinear, as illustrated in Figure 6.2 (a).
Applying a mapping function f(x) into this feature space the data under
consideration can become linearly separable as illustrated in Figure 6.2 (b).The aim
of SVM classification is to find an optimal hyper plane separating relevant and
irrelevant vectors by maximizing the size of the margin (between classes). User can
initially construct the training set by selecting samples from several validation data
sets.
In my research work, all known IP addresses are considered as normal
and by providing a bigger training set the accuracy and invariance would definitely
increase. IDS uses ML approach and an effort is made for automating intrusion
detection in large data sets, thereby improving retrieval accuracy. Similarly, in the
absence of class labels, unsupervised clustering can be employed. Intrusion
detection depends on similarity measure. However classification can be performed
using several techniques that neither require nor make use of similarity measures.
During the training process specifying the kernel parameter is important and if it is
too small then generalization performance may suffer from overfitting. If sufficient
data set is available, the kernel parameter can be found using cross validation. Using
cross validation single elements from the data set are removed one at a time and the
SVM is trained on the remaining elements and then tested on the removed data set.
Tight bounds on the generalization can be obtained using the approximation that the
set of Support Vectors (SV) does not change by removing single patterns. The
perpendicular distance between the separating hyper plane and a hyper plane
through the closest points are called as SV. The recent model selection strategies
will give a reasonable estimate for the kernel parameter based on theoretical
arguments without the use of validation data. Using a limited number of datasets,
these model selection strategies appear to work well. In real life datasets, the data
points are not labeled. This is of particular importance in special situations, where
labeling data is expensive or the dataset is large and not labeled. SVM constructs the
hypothesis using a subset of the data containing the most informative patterns. These
153



patterns are good candidates for active or selective sampling techniques which
would predominantly request the labels for those patterns that will become SV.
SVM searches for SV that are data points found to lie at the edge of an
area in space which is a boundary from one class of points to another. The SV are
used to identify a hyperplane that separates the classes. Modeling of SVM deals with
these support vectors, rather than the whole training dataset, and so the size of the
training set is not an issue. If the data is not linearly separable, then kernels are used
to map the data into higher dimensions so that the classes are linearly separable.
Many researchers have proposed SVM as a novel technique for
developing IDS. SVM map input feature vectors into a higher dimensional feature
space through some nonlinear mapping and are developed on the principle of
Structural Risk Minimization (SRM). SRM seeks to find a hypothesis h for which
one can find lowest probability of error. The traditional learning techniques for
intrusion detection are based on the minimization of the empirical risk, which
attempt to optimize the performance of the learning set. Computing the hyper plane
to separate the data points to train SVM leads to a quadratic optimization problem.
SVM uses a linear separating hyper plane to create a classifier but all the problems
cannot be separated linearly in the original input space. SVM uses a feature called
kernel to solve this problem. The kernel functions transforms linear algorithms into
nonlinear ones via a map into feature spaces that include polynomial, Radial Basis
Functions (RBF) two layer sigmoid neural nets etc. The user may provide one of
these functions at the time of training the classifier, which selects SV. SVM classify
data by using these SV which are members of the set of training inputs that outline a
hyper plane in feature space.
6.4 THE KERNEL MAPPING
The key issue of SVMs is to use f(x) to map the data into a higher
dimensional space. Covers theorem guarantees that any data set becomes arbitrarily
separable as the data dimension grows. Finding such nonlinear transformations is far
from trivial. To achieve this task, a class of functions called kernels is used. Roughly
154



speaking, a kernel K(x,y) is a real-valued function K : X X -> R for which there
exists a function x:X -> Z, where Z is a real vector space, with the property
K(x,y) =f(x)
T
f(y). This function f(x) is precisely the mapping in Equation (6.2).
The kernel K(x,y) acts as a dot product in the space Z. In the SVM literature X and
Z are called, respectively, input space and feature space. Kernel methods give a
systematic and principled approach to ML and the good generalization performance
achieved can be readily justified using statistical learning theory or Bayesian
arguments.
The emphasis will be on using RBF kernels which generate RBF
networks. This approach is general since other types of learning machines can be
readily generated with different choices of kernel. RBF networks have been widely
used because they exhibit good generalization and universal approximation. This is
achieved by using RBF nodes in the hidden layer. A new approach to designing
RBF networks based on kernel methods is applied and this technique has a number
of advantages. The emphasis is on classification and novel intrusion detection. The
kernel representation of data amounts to a nonlinear projection of data into a high
dimensional space where it is easier to separate the two classes of data. With a
suitable choice of kernel, the data can become separable in feature space despite
being non-separable in the original input space.
In real time problems the task is not to classify but to detect novel or
abnormal instances. In the current research work that involves classification of
intrusions, the system does not correctly detect an intrusion with an abnormal
behavior which is distinct from all normal behavior of the training set. Novelty
detection would potentially highlight the data as abnormal and model the support of
a data distribution. The main objective is to create a binary valued function which is
positive in those regions of input space where the data predominantly lies and
negative elsewhere. The approach is to find a hyper sphere with a minimal radius R
and centre a, which contains most of the data and novel test points that lie outside
the boundary of this hyper sphere.

155



Some of the unique features of SVM and Kernel Methods are
1. They are explicitly based on a theoretical model of learning
2. They come with theoretical guarantees about their performance
3. They have a modular design that allows one to separately implement
and design their components
4. They are not affected by local minima
5. They do not suffer from the curse of dimensionality
The major advantage of using kernel based systems is that after a valid
kernel function is selected, one can practically work in spaces of any dimension
without increasing computational cost. User even need not know which features are
being used. One more advantage is that it is possible to design and use a kernel for a
particular problem. This kernel can be applied directly to the data without the need
for a feature extraction process. When a large data set is available with many
features then feature extraction process is very important. The importance of kernel
based learning methods is that it allows one to use any valid kernel on a kernel based
algorithm. The R interface provided in e1071 [137] and klaR [138] includes
interface to SVM, a popular implementation along with other classification tools like
Regularized Discriminant Analysis (RDA) etc. libsvm is an integrated software for
support vector classification. The R tool offers an interface to the libsvm a very
efficient SVM implementation. libsvm provides a robust and fast SVM
implementation and produces results for most classification and regression
problems. Most of the libsvm and klaR SVM code is in C++ and hence can be
enhanced by modifying the code and updating new kernels.
6.5 RVM TRAINING AND CLASSIFICATION
RVM is a sparse ML algorithm that is similar to the SVM in many
respects. RVM is another area of interest in the research community as they provide
a number of advantages. RVM is based on a Bayesian formulation of a linear model
with an appropriate prior that results in a sparse data representation. As a result, they
156



can generalize well and provide inferences at very low computational cost. Though
SVM has several desirable properties like it fits functions in high dimensional
feature spaces, through the use of kernels and with possibly large space of functions
available in feature space, good generalization performance can be achieved. It is
sparse meaning only a subset of training examples is retained at runtime, thereby
improving computational efficiency. Although relatively sparse, SVM make
unnecessary use of basis functions as the number of SV required typically grows
linearly with the size of the training data set. SVM outputs a point estimate with
regression and a binary decision in classification. As a result it is difficult to estimate
the conditional distribution to capture the uncertainty during prediction. In RVM the
kernel function must be the continuous symmetric kernel of positive integer operator
to satisfy Mercer condition. Maintaining its classification accuracy RVM has the
ability to yield a decision function that is much sparser than SVM. This leads to
significant reduction in the computational complexity of the decision function and
thereby making it more suitable for real time applications.
The RVM produces a function which is comprised of a set of kernel
functions also known as basis functions and a set of weights. This function
represents a model for the system presented to the learning process from a set of
training data set. The kernels and weights calculated by the learning process and the
model function defined by the weighted sum of kernels are fixed. From this set of
training vectors the RVM selects a sparse subset of input vectors which are deemed
to be relevant to the probabilistic learning scheme. This is used for building a
function that estimates the output of the system from the inputs. These relevant
vectors are used to form the basis functions and comprise the model function.
RVM is a probabilistic sparse kernel model identical in functional form
to the SVM making predictions based on a function of the form
y(x) =_ o
n
K(x,x
n
)
N
n=1
+ o
0
(6.4)
where u
n
are the model weights and K() is a kernel function. It adopts a Bayesian
approach to learning, by introducing a prior over the weights u
157



p(o,[) = [ N([

|0,o

-1 m
=1
) 0ommo([

|[
[
,o
[
) (6.5)
governed by a set of hyper-parameters , one associated with each weight, whose
most probable values are iteratively estimated for the data. Sparsity is achieved
because in practice the posterior distribution in many of the weights is sharply
peaked around zero. Furthermore, unlike the SVM classifier, the non-zero weights in
the RVM are not associated with examples close to the decision boundary, but rather
appear to represent prototypical examples. These examples are termed Relevance
Vectors (RV). The function returns an object containing the model parameters along
with indexes for the RV and the kernel function along with the hyper parameters
used.
RVM is currently of much interest in the research community as they
provide a number of advantages. RVM is based on a Bayesian formulation of a
linear model with an appropriate prior that results in a sparse data representation. As
a result, they can generalize well and provide inferences at very low computational
cost. Many applications like object detection and classification, target detection in
images, classification of micro calcifications from mammograms etc are developed.
RVM produces a function which is comprised of a set of kernel functions also
known as basis functions and a set of weights. This function represents a model for
the system presented to the learning process from a set of training data set. The
kernels and weights calculated by the learning process and the model function
defined by the weighted sum of kernels are fixed. From this set of training vectors
the RVM selects a sparse subset of input vectors which are deemed to be relevant by
the probabilistic learning scheme. This is used for building a function that estimates
the output of the system from the inputs. These relevant vectors are used to form the
basis functions and comprise the model function.
In the classification phase each of the network data selected from the
feature selection phase is classified as normal data or attack data. This phase
consists of two main dataset which are used for training and testing. During
158



the first phase training is performed using RVM with a set of network
records with known answer classes. Based on the training the IDS model can
classify the data in each record into normal network activity or main attack
types. Then the model is tested with new or untrained dataset where each
record was captured in a real time environment in the college research lab.
For an input vector x, an RVM classifier models the probability
distribution of its class labeled C c (1, +1} using logistic regression as
p _(C =1_x) =
1
1+cxp(-]RvM(x))
] (6.6)
where f
RVM
(x) the classifier function is given by,
f
RVM
(x) =_
N
=1
uiK(x,xi) (6.7)
where K(.,.) is a kernel function, and x
i
, i =1,2,...,N, are training samples. The
parameters u
i
, i 1, 2, ..., N, in f
RVM
(x) are determined using Bayesian estimation,
introducing a sparse prior on u
i
The parameters u
i
are assumed to be statistically
independent obeying a zero-mean Gaussian distribution with variance
i
-1
, used to
force them to be highly concentrated around zero, leading to very few nonzero terms
in f
RVM
(x).
6.6 METHODOLOGY AND PROPOSED ARCHITECTURE
Over 90% of Internet traffic uses the TCP. Because of its widespread use
and its impressive growth, the research focuses on the detection of anomalous
behavior within TCP traffic. Exploring the TCP packet attributes would enable a
classifier to identify normal and abnormal activity on a packet-by-packet basis.
From these attributes, a DT is built which will enable to identify and classify
different attacks and violations. The process of building a classifier model using
RVM is depicted in Figure 6.3.

159










Figure 6.3 Architecture of the RVM model
In my research work, design of IDS is treated as a traditional
classification problem where each abnormal behavior corresponds to a class label.
Researchers have found that SVM classifiers are ideal for designing IDS and hence
this kind of classification algorithms is chosen. Since SVM is a binary classifier, the
multi class problem needs to be decomposed into binary problems. Traffic in the
network results in continuous change as the users login and make use of Internet.
For capturing the packets in real time, JPCAP and WINPCAP tool is used to collect
the information that is being transmitted. Data set for 30 minutes is collected which
contains both normal and attack data set. In order to mine the contextual information
contained in the data set, it is required to detect the attacks and extract the required
information.
The procedure is as follows:

1. Collect data set with normal and attack behavior.
2. Extract features and derive a subset of features that are necessary and sufficient
to be used in a classifier
3. Train SVM and use SVM for classification

Stage 3 Stage 1 Stage 2
Alarm
Unit
CPOD
Algorithm
Real Time
Data
Log File
Data
Preprocessing
Outliers
Vector
Traffic Data
Classifier
Inference
Using RVM
Log
File
160



The network data is collected from the interface which is capable of
capturing information flowing within the local network. For example, anomalies can
be detected on a single machine, a group of network, a switch or a router. For the
current research work the TCP/IP packet is collected in real time from the research
lab network and dumped for further process. The Data Preprocessing phase handles
the conversion of raw packet or connection data into a format that algorithms can
utilize and store the results in the knowledge base. Rather than operating on a raw
network dump file, the algorithm uses summary information to perform the analysis.
Data is preprocessed to generate summary lines about each connection found in the
dump file. The resulting summary file is then parsed and processed by the algorithm
to give a count to each data/each time point, with a higher score indicating a high
possibility of being an outlier/a change point.
The Log File stores the data as rules produced by the detection algorithm
for further mining process. It may also hold information for the preprocessor, such
as patterns for recognizing attacks and conversion templates. This training data is
responsible for generating the initial rule sets that are needed to be used for
deviation analysis. It can be triggered automatically based on time or the amount of
pre-processed data available.
The proposed Outlier Detection Algorithm examines the network data
and creates a description of differences and stores in the outlier vectors for further
reference. If a deviation is detected, it signals the alarm unit. A strategy for invoking
the deviation analyzer is by querying periodically the outlier vectors for the new
profiles. Also the profiler may signal when a new profile is added and the Alarm
Unit is responsible for informing the administrator when the deviation analyzer
reports unusual behavior in the network stream. This can be in the form of SMS, e-
mails, console alerts, log entries etc.
In the data preprocessing step as shown in Figure 6.3, packets are
captured using J PCAP library and information is extracted that includes IP
header, TCP header, UDP header, and ICMP header from each packet. After that,
the packet information is partitioned and formed into a record by aggregating
161



information every 30 minutes. Each record consists of data features considered
as the key signature features representing the main characteristics of network data
and activities.
Experiments were done using R statistical framework. Performance of
the process at each level is measured. In order to train the SVM, unique IP addresses
with 19 different feature sets as listed in Table 3.2 with 3043 training instances and
375 and 459 test instances for normal and attack were used respectively.
6.6.1 R Statistical Tool
The R statistical tool is used for the implementation and it is a known
fact that different choices for its kernel, its parameters and its way of solving the
quadratic problems result in very different models. The R software has built in
functions that can be used for solving the quadratic problems. The use of API
facilities makes it extremely flexible and is capable of performing evaluation
measurements. DM combines concepts, tools, and algorithms from ML and
Statistics for the analysis of huge datasets. This will allow users to gain insights,
understanding, and knowledge of data set and many commercially available
products offer high levels of analytical tools. R is ideally suited for many
challenging tasks associated with DM. R offers a complete statistical computing
product and a programming language for the skilled statistician.
R provides the SVM in e1071 as an interface to libsvm, which provides a
very efficient and fast implementation. ksvm provided in kernlab [139] for kernel
learning is integrated into R so that different kernels can easily be explored. kernlab
in R currently has an implementation of the RVM which can be used for regression.
A new class called kernel is also introduced and kernel functions are objects of this
class. An issue with SVM is that parameter tuning is not an easy job and
computationally expensive. The approach is to build multiple models with different
parameters and choose the one with lowest expected error. This can lead to
suboptimal results though, unless quite extensive search is performed. Research has
explored that it is possible to predict the performance of models by different
162



parameter settings. It has been found that the task of learning is difficult from the
large set of available collection features. By changing the representation of features
the ability to reason and learn will significantly improve. Learning is easier if kernel
learning entities are projected into a higher dimensional space which can be done by
computing the dot product of the data. Different kernels result in different
projections and have demonstrated excellent performance on many ML and pattern
recognition work. However, they are sensitive to the choice of kernel, may be
intolerant to noise, and cannot deal with missing data as well as data of mixed types.
Choosing a suitable kernel is vital as different kernels result in different
mappings of the data space to the feature space. R tool allows using many of the
kernels and for this work a linear and an RBF kernel are used. Results show that a
RBF kernel works better for detection and linear kernel could not perform a
mapping to a higher dimensionality space. Since the number of parameters was
more, polynomial kernel was not used.
kernlab in R is an extensible package for kernel based ML methods
which provides a framework for creating and using kernel based algorithms. The
package contains dot product kernels implementation of SVM and RVM, Gaussian
processes, a ranking algorithm, kernel PCA, kernel CCA, kernel feature analysis and
a range of clustering algorithm.
6.6.2 RBF Networks
In my research work RBF kernel is used. RBF network consists of three
layers as described below:
1. Input Layer: This layer broadcasts the values of the input vector to
each of the units in the hidden layer. One neuron in the input layer
corresponds to one predictor variable. If the values are categorical
variables, n-1 neurons are used where n is the number of categories in
the input vector.
163



2. Hidden Layer: Each unit in this layer will produce an activation
based on the associated radial basis function Hidden layer consists of
variable number of neurons. Each neuron consists of a radial basis
function centered on a point with the same dimensions as the
predictor variables.
3. Output Layer: Each unit in this layer computes a linear combination
of the activations of the hidden units. The layer has a weighted sum
of outputs from the hidden layer to form the network outputs.



(x) = _ w
]
m
]=1
b
]
(x)
b(x) =exp _-
(x - c)
2
r
2
_

Figure 6.4 RBF networks
Figure 6.4 shows RBF Networks where f(x) is the function
corresponding to the j
th
output unit which is a linear combination of h radial basis
functions r
1
, r
2
,, r
h
. h(x) is the Gaussian activation function with the parameters r
which could be the radius or standard deviation and c the center or average taken
from the input vector defined separately for each RBF unit. The learning process is
based on adjusting the parameters of the network.
This can be achieved by adjusting the three parameters

1. Weight w between the hidden nodes and the output nodes
2. Center c of each neutron of the hidden layer
3. Unit width r.

X
1

X
i

X
n
h
m
(x)
h
j
(x)
h
1
(x)
f(x)
RBF Network
Output Layer Hidden Layer Input Layer
W
m

W
j

W
1

164



The tests were performed using the linear kernel model with the RBF
kernel by calculating the performance over all classes. The RBF kernel resulted in
better performance but some bad choices for the parameters of the RBF kernel result
in accuracy even below 60%. But still enough tests must be conducted in order to
define good parameters, as this can determine the performance of IDS.
Changing the values of h(x) and weights affect the model and in order to
find the optimal parameters it is required to try different values and compare the
results. In this work the best parameter setting for each of the feature sets of
Table 6.1was performed.
As specified in Table 6.1 the input data set x to the classifier (SVM or
RVM) is collected for a window size of 30 minutes. The choice of the window size
was large enough to cover all attacks in the dataset that was chosen empirically for
our research work. Data was preprocessed for the training samples by applying
Gaussian function to detect the center s of all manually identified attacks. For
applying RVM, several parameters need to be tuned for best performance during the
training phase. The most important parameter is the type of kernel function to be
used (i.e., polynomial vs RBF) and its associated parameter (i.e., the order p for the
polynomial, or the kernel width s for RBF). To determine these for the classifier
model a cross validation procedure was applied using the training set and
determined the parameter settings for the RVM classifier.
For each attack in the test set, the attack detection process was carried out
by the following steps:

A trained classifier (RVM or SVM) was applied with a threshold to classify each
attack in the dataset as NORMAL or ATTACK.
1. A confusion matrix of the potential attacks was generated.
2. The detected attacks were grouped into attack objects.
3. The performance of the detection algorithm was evaluated using the Receiver
Operating Characteristic (ROC) curve.

165



6.7 EXPERIMENTAL RESULTS
ROC [140] curves plot the correct detection rate, True Positive (TP)
versus the average number of False Positives (FPs) per dataset varied over the
continuum of the decision threshold.

Figure 6.5 ROC curve for the data

Figure 6.6 Performance chart of SVM
166



Table 6.1 Error obtained by the RVM with different parametric values
Poly Kernel Degree 1 Degree 2 Degree 3 Degree 4
Error Rate 0.092 0.084 0.063 0.042
RBF Kernel o =1 o =4 o=6 o=9
Error Rate 0.072 0.064 0.043 0.039

Table 6.2 Error obtained by the SVM with different parametric values
Poly Kernel Degree 1 Degree 2 Degree 3 Degree 4
Error Rate 0.102 0.093 0.081 0.052
RBF Kernel o =1 o =4 o=6 o=9
Error Rate 0.094 0.072 0.051 0.043

The training dataset have a total of 121 ATTACK class and 1291
examples for the NORMAL class. This dataset was collected after preprocessing
phase and these examples formed the training set. The RVM was trained and
Tables 6.1 and 6.2 summarize the training results where the generalization error
obtained by the classifier is listed under different parametric values. A similar set of
results was also obtained for training the SVM and the results for RVM is best
obtained with a degree 4 polynomial kernel. SVM achieved the best error level with
a degree 5 polynomial kernel.
Table 6.3 Comparison between RVM and SVM models
Model Number of training data Number of vectors Testing Performance
SVM
100 25 0.76
500 129 0.81
1000 230 0.86
5000 540 0.88
RVM
100 17 0.71
500 109 0.72
1000 170 0.81
5000 240 0.82

167



At these parametric values, for RVM the number of relevance vectors
was found to be less as compared to SVM. Table 6.3 summarizes the number of
support vectors and relevance vectors found from the training dataset. With their
parameters tuned both RVM and SVM classifier models were retrained using all the
samples in the training set. The trained classifiers were used subsequently for
performance evaluation.

Figure 6.7 ROC curve obtained for RVM and SVM models
6.8 PERFORMANCE ANALYSIS OF RVM AND SVM
Both the RVM and SVM classifiers were evaluated using all the attacks
in the test dataset. As can be deducted from Figure 6.7 the RVM classifier could
achieve essentially the same performance as the SVM, but with a much reduced
computational complexity. Table 6.4 shows the comparison between the SVM and
RVM models. The value of testing performance from RVM model is effectively
same as that of SVM with lesser support vectors. The performance of the RVM is
also better than the SVM. The current implementation and usage indicates the
success of the algorithm. This gave a unifying view of detection of attacks in real
time network data. Usage of RVM also showed a competitive accuracy by
maintaining ability of sparseness. Experimental results showed that the RVM model
168



achieved essentially the same performance with a much sparser model as a
previously developed SVM model. This much reduced computational complexity in
RVM makes it more feasible for real time processing while designing IDS. The
proposed method is competitive with respect to processing time and allows the use
of selected training data set. The result shows an improvement in RVM
classification performance.

169




CHAPTER 7
CONCLUSION

This thesis has made contributions to two key research areas namely
Intrusion Detection and ML. The contributions apply specifically to the application
of ML to intrusion detection. Several factors were found to affect significantly the
results and further investigation demonstrated that this is indeed a critical challenge
to in designing an IDS. SLFN was capable of detecting a particular class of
intrusion. CP algorithm was proposed to optimize the detection of outliers. This
approach was found to be successful and able to detect the class of intrusion. Since
single objective optimization is performed, there was no control on the
classification. To address this limitation, RVM was proposed and class imbalance
has been identified as a significant challenge for intrusion detection.
The overall performance and scope of detection of the IDS directly
depends on the feature selection stage. The main focus of this thesis is on mining the
most useful network features for attack detection. In order to do this a network
feature classification schema is proposed and a deterministic feature evaluation
procedure that helps to identify the most useful features that can be extracted from
network packets. The difference however is in the time of collection, size of the
network, throughput, and also the type of users that the networks have.
The proposed method uses mathematical, statistical and RVM techniques
to rank the participation of individual features into the detection process. The
presented experimental results empirically confirm that the proposed model can
successfully be applied to mine new features in the detection process.
170



An ideal data for IDS should be labeled at packet level and there must be
a considerable number of attacks. The current work does not differentiate the final
results based on the speed of the attacks. We believe that an interesting further study
would be to analyze the set of features that are appropriate for fast or slow attacks.
However, to do that, the dataset that will be used needs to have an equal number of
attacks in each of the attack categories for study. Multiple datasets can be considered
and data sets need to be extracted from a set of diverse networks.
An extensive review of Artificial Intelligence (AI) applied to intrusion
detection was conducted and the findings are reported in the literature survey.
Various research studies have adopted the ML techniques and evaluated them on the
KDD Cup 99 data set. The results thus obtained are and also contradictory. This
made to investigate the causes of the discrepancies and it was found that a critical
challenge to intrusion detection is the collection of data set and to detect a particular
class of intrusion in real time.
SLFN was proposed to optimize the weights and selection of layers in
MLP to better learn and was found to be successful. The system was able to detect
the previously unknown class of intrusion. The data set selected posed several
challenges during the selection of ML algorithms such as:
1. Working with high dimensional data and large memory requirements.
2. Learning speed that gets affected because of very large data set.
3. Feature selection from the large data set.
4. Implementation learning that is incremental / continuous.
5. Detecting new unknown intrusions.
As explained before CP and OD methods with the use of RVM was
carried out without too much loss of accuracy. The system has proved good with
respect to architecture, data processing, alert aggregation, and reporting
mechanisms. However, the methods proposed here have addressed a critical
challenge of learning from large data set and providing the user with a set of
171



solutions. User can select and can then incorporate in an IDS framework as a
detection module. Furthermore, there is always an improvement required from the
proposed methods, concerning scalability and performance that are discussed further
in Section 7.2.
CP and OD combination was a successful method for improving on the
performance of IDS. The results obtained in this thesis support the observations that
combination of different methods can improve the performance of IDS. Current
approaches of creating hybrid are prone to succeed because they may yield a
solution with a good classification trade-off. It has been demonstrated in this thesis
that FPR were comparatively and outperformed other methods.
7.1 MAJOR CONTRIBUTIONS AND NOVELTY
To the field of intrusion detection domain the main contribution of this
thesis is the suitability and application of RVM in building robust and efficient IDS.
A novel framework is developed that addressed three critical issues which affect the
performance of anomaly and hybrid IDS in high speed networks.
The following three issues are addressed:

1. Attack detection coverage
2. Generating less false alarms
3. Efficiency in operation.

As a result of this research, a framework is built to develop efficient IDS.
The framework offers customization and ease of detecting different variety of
attacks. The system can identify the type of attack and specific intrusion response
mechanism can be initiated by the user so that the impact of the attack is minimized.
172



CP and OD are efficient methodologies available for building robust and
efficient IDS. Integrating the framework with these two technologies can be used to
build effective IDS. Using CP and OD as intrusion detectors resulted in very few
false alarms and the attacks can be detected with very high accuracy.
The logging framework developed using J PCAP, WINPCAP and J AVA
can capture network data that are significant to detect attacks. The framework can be
used for a variety of applications that requires IDS as plug-in.
Network session needs to be in order to detect attacks with high accuracy
and Feature Extractor can be effectively used to model the events and select required
features. Using CPOD attacks can be detected with smaller window size and good
selection of threshold. A range of experiments are performed and in order to detect
intrusions effectively, it is critical to model the correlations between multiple
features. Since feature sets are independent it makes the model complex and
inefficient as it affects the attack detection capability. The framework developed can
easily define and specific features are extracted, which enables for building effective
intrusion detectors. Our framework is customizable and can be used to build
efficient network IDS which can detect a wide variety of attacks. Experimental
results and comparison with other well known methods for intrusion detection such
as Decision Trees, Naive Bayes and Support Vector Machines has proved better in
terms of accuracy and detection rate without affecting the overall system
performance.
The notable part of our research work is the improvement in attack
detection accuracy. Statistical tests using CPOD demonstrates a higher assurance in
detection accuracy. As the system developed is not based on signatures of attacks it
is capable of detecting novel attacks. Experimental result confirms that our system,
based on CPOD, RVM methods can detect attacks at an early stage by analyzing
only a small number of data set resulting in an efficient system which can block
attacks in real-time.

173



7.2 FUTURE WORK
The task of detecting intrusions in networks is very critical and leaves no
margin for errors. Developing successful attack detection is to identify the best
possible approach which is an extremely difficult task. To develop a single solution
that can work for every network and application is a real challenge. In my research
work a novel framework is developed using different methods which perform better
than previously known approaches. In order to improve the overall performance the
domain knowledge is used for selecting better features for training which is justified
because of the critical nature of the task of intrusion detection. An interesting
direction for future research is to develop completely automatic IDS. Another area
of work is to develop a faster implementation by employing our approach on multi
core processors.
IPS/IRS which aim at preventing attacks rather than simply detecting
them are another area that can be explored. This can be achieved by integrating IDS
with the known security policy of individual networks. This would help by
minimizing the false alarms raised by the IDS.
But one of our objectives in this work was to detect and classify network
attacks. Future research in this area is definitely needed and other DM methods can
be incorporated. Studies could also be conducted with more attack types which are
totally new, or variations on existing types. In this vein, other studies could address
the problem of classifying rarely seen attack types like the U2R and R2L. Future
work will research on the possibility of expanding the output of the individual
classifiers so that it would be easier to identify the exact source of a given attack. By
adding a prediction layer it is possible to reduce the complexity of carefully tuning
the thresholds and window sizes. Idea is to develop a layer that predicts next data
point with some probability. The layer would also learn to readjust probabilities
from current deviations and enhance accuracy. Although this study makes a
contribution to the IDS classification, there are other DM methods, such as memory-
based systems, logistic regression, and discriminant analysis that can be further
explored.
174



7.3 SIGNIFICANT CHALLENGES AND OPEN ISSUES
1. It is really very hard to trace the true source of attack. Hence if a
reliable method is developed that can trace back the packets to their
actual source then it is possible to prevent many of the attacks.
Though some solutions are available, a global effort is required which
is a real challenge ahead. It is not only to identify the true source but
the overall performance of the system should not be affected.
2. New methods based on user profiling can be developed which will
learn the normal user activity and the same can be used to detect
deviations if any from the model that is learnt. Most of the works
related to this are based on thresholds. Hence a detailed empirical
analysis can be performed and develop IDS.
3. In the current Internet era to keep pace with the rapid and ever
changing networks and applications is still a major task. The research
in developing IDS must synchronize with the present network that
supports wireless technologies, ADHOC networks and mobile
devices. IDS must be developed in such a way that they can integrate
with such networks and devices. They should also provide support
for advances in a comprehensible manner.


175




REFERENCES

[1] Levent Koc , Mazzuchi T.A., Shahram Sarkani, A Network Intrusion
Detection System Based on A Hidden Nave Bayes Multiclass Classifier,
Expert Systems with Applications, 39, pp 1349213500, 2012
[2] Andreas Fuchsberger, Intrusion Detection Systems and Intrusion Prevention
Systems, Information Security Technical Report, 10, pp 134 139, 2005
[3] Peyman Kabiri, Ali Ghorbani A, Research on Intrusion Detection and
Response: A Survey, International J ournal of Network Security, 1(2),
pp 84 102, Sep. 2005
[4] Aleksandar Lazarevic, Levent Ertoz, Vipin Kumar, Aysel Ozgur, J aideep
Srivastava, A Comparative Study of Anomaly Detection Schemes in
Network Intrusion Detection, SIAM Conference on Data Mining , 2003
[5] Douglas J . Brown, Bill Suckow, Tianqiu Wang, A Survey of Intrusion
Detection Systems, Department Of Computer Science, University of
California, San Diego, USA, 2008
[6] Karen Scarfone , Peter Mell, Guide to Intrusion Detection and Prevention
Systems (IDPS), Recommendations of the National Institute of Standards
and Technology, 2008
[7] Next Generation Intrusion Detection Systems (IDS), McAfee Network
Protection Solutions, 2007
[8] Leslie T. O., Current Trends in IDS and IPS, May 29, 2007
[9] Animesh P., J ugn-Min P., An Overview of Anomaly Detection Techniques:
Existing Solutions and Latest Technological Trends, Computer Networks,
51, pp. 3448-3470, 2007
[10] Garcia P., Diaz J., Macia G.F., Vazquez E., Anomaly-Based Network
Intrusion Detection: Techniques, Systems and Challenges, Computer &
Security, 28, pp. 18-28, 2009
[11] Richard Heady, George Luger, Arthur Maccabe, Mark Servilla, The
Architecture of A Network Level Intrusion Detection System, Technical
Report NM 87131. Computer Science Department, University of New
Mexico, Albuquerque Mexico., 1990
176



[12] Barford P., Kline J., Plonka D., Ron A., A Signal Analysis of Network
Traffic Anomalies, In Internet Measurement Workshop, Marseille,
November 2002
[13] Lakhina A, et al., Mining Anomalies Using Traffic Feature Distributions,
Proc. ACM SIGCOMM, 2005
[14] Kim S. S, Narasimha Reddy A.L, Marina Vannucci, Detecting Traffic
Anomalies through Aggregate Analysis of Packet Header Data, In
Networking, 2004
[15] Wu S., Yen E., Data Mining-Based Intrusion Detectors, Expert Systems
with Applications, pp 56055612, 2009
[16] Shelly Xiaonan Wu, Wolfgang Banzhaf, The Use of Computational
Intelligence in Intrusion Detection Systems: A Review, Applied Soft
Computing, pp 1-35, 2010
[17] Georgios P. S, Sokratis K. K, Reducing False Positives In Intrusion
Detection Systems, Computers & Security, 29, pp 35 44, 2010
[18] Chih-Fong Tsai, Yu-Feng Hsu, Chia-Ying Lin, Wei-Yang Lin, Intrusion
Detection by Machine Learning: A Review, Expert Systems with
Applications , 36, pp 11994 12000, 2009
[19] Petrovskiy M., Outlier Detection Algorithms in Data Mining Systems,
Programming and Computer Software, 29(4), pp 228237, 2003
[20] J ian Tang, Zhixiang Chen, Ada Waichee Fu, David Cheung W, Capabilities
of Outlier Detection Schemes In Large Datasets, Framework and
Methodologies, Knowledge and Information Systems, 11, pp 4584, 2006
[21] Varun Chandula, Arindam Banerjee, Vipin Kumar, Outlier Detection: A
Survey, Technical Report TR 07-017, Dept of CSE, University of
Minnesota, USA
[22] Joel Branch W., Chris Giannella, Boleslaw Szymanski, Ran Wolff , Hillol
Kargupta, In Network Outlier Detection in Wireless Sensor Networks,
Knowledge and Information Systems, 34, 2013
[23] Takeuchi J , Yamanishi K., A Unifying Framework For Detecting Outliers
and Change Points From Time Series, IEEE Transactions on Knowledge
and Data Engineering, 18(4), pp 482 492 , APRIL 2006
177



[24] Ramaswamy S., Rastogi R., Shim K., Efficient Algorithms for Mining
Outliers from Large Data Sets, ACM SIGMOD International Conference
On Management of Data, Dallas, TX, USA, pp. 427-438, 2000
[25] Basseville M., Nikiforov V., Detection of Abrupt Changes: Theory and
Applications, Prentice-Hall Inc., Englewood Cliffs, N. J., 1993
[26] Ertoz L., Eilertson E., Lazarevic A., Tan P. N., Kumar V., Srivastava J.,
Dokas P., The MINDS - Minnesota Intrusion Detection System, Next
Generation Data Mining Boston, MIT Press, 2004.
[27] Mohammed Nazer G., Lawrence Selvakumar A., Current Intrusion
Detection Techniques in Information Technology - A Detailed Analysis,
European J ournal of Scientific Research, Euro J ournals Publishing,
pp 611-624, Inc. 2011
[28] Mahbod Tavallaee, Ebrahim Bagheri, Wei Lu, Ghorbani Ali A., A Detailed
Analysis of the KDD CUP 99 Data Set, CI in Security and Defense
Applications, 2009
[29] Huang G.B., Zhu Q.Y., Siew C.K., Real-Time Learning Capability of
Neural Networks, IEEE Transactions on Neural Networks, 17(4),
pp 863 878, JULY 2006
[30] Yongqiang Liu, A Review About Transfer Learning Methods and
Applications, International Conference on Information and Network
Technology IACSIT Press, Singapore IPCSIT, 4, pp 7 11, 2011
[31] Sinno J ialin Pan, Qiang Yang, A Survey on Transfer Learning, IEEE
Transactions on Knowledge and Data Engineering, 22(10), pp 1345 1359,
October 2010
[32] Weon I-Y, Doo Heon Song, Chang-Hoon Lee, Effective Intrusion Detection
Model Through The Combination of A Signature-Based Intrusion Detection
System and A Machine Learning-Based Intrusion Detection System,
J ournal of Information Science and Engineering, 22, pp 1447-1464, 2006
[33] Terran Lane, Brodley Carla E., An Empirical Study of Two Approaches to
Sequence Learning for Anomaly Detection, Machine Learning, 51,
pp 73107, 2003
[34] Hodge V., Austin, J ., A Survey of Outlier Detection Methodologies,
Artificial Intelligence, 22, pp 85126, (2004)
178



[35] William Chauvenet, A Manual of Spherical and Practical Astronomy,
Lippincott, Philadelphia, 1
st
Ed (1863); Reprint of 1891 5th Ed: Dover, NY
(1960).
[36] Kifer D., Ben-David S. , Gehrke, J ., Detecting Change in Data Streams,
30
th
International Conf. on Very Large Data Bases, pp 180-191, 2004
[37] Gustafsson, F., Adaptive Filtering and Change Detection, J ohn Wiley &
Sons Inc., 2000
[38] J oao B. D. Cabrera, Gosar J ., Wenke Lee, Mehra Raman K., On the
Statistical Distribution of Processing Times in Network Intrusion Detection,
Proceedings of the 43
rd
IEEE Conference on Decision and Control, Bahamas,
December 2004
[39] Benjamin Peirce, Criterion for the rejection of doubtful observations,
Astronomical Journal II, 45, pp 161- 163, 1852
[40] Gould B.A., On Peirce's Criterion for the Rejection of Doubtful
Observations, With Tables for Facilitating Its Application Astronomical
J ournal IV, 83, pp 81 87, 1855
[41] Wei Lu, Hengjian Tong, Detecting Network Anomalies Using CUSUM and
EM Clustering, ISICA 2009, LNCS 5821, pp. 297308, 2009
[42] Alexander Tartakovsky G.,Boris Rozovskii L., Rudolf Blazek B., Hongjoong
Kim, Detection of Intrusions In Information Systems By Sequential Change
- Point Methods, Statistical Methodology, 3 ,pp 252293, 2006
[43] Cheifetz N., Same A., Aknin P., Verdalle E., A CUSUM Approach for
Online Change-Point Detection On Curve Sequences, Computational
Intelligence and Machine Learning, Bruges (Belgium), pp 25-27, April 2012
[44] Fabio Pacifici, Change Detection Algorithms: State of the Art, v1.2, Earth
Observation Laboratory, Tor Vergata University, Rome, Italy, Feb 2007
[45] Shohei Hido , Yuta Tsuboi , Hisashi Kashima , Masashi Sugiyama ,
Takafumi Kanamori , Statistical Outlier Detection Using Direct Density
Ratio Estimation, Knowledge and Information Systems. 26(2), pp 309-336,
2011
[46] Yoshinobu Kawahara, Masashi Sugiyama , Sequential Change Point
Detection Based on Direct Density Ratio Estimation, Statistical Analysis
and Data Mining, 5(2), pp 114 127, 2012
179



[47] Vapnik Vladimir N., An Overview of Statistical Learning Theory, IEEE
Transactions on Neural Networks, 10(5), September 1999
[48] Wenke Lee, Stolfo Salvatore J ., Chan Philip K., Real Time Data Mining-
based Intrusion Detection
[49] Usman Asghar Sandhu, Sajjad Haider , Salman Naseer, Obaid Ullah Ateeb,
A Survey of Intrusion Detection & Prevention Techniques, International
Conference on Information Communication and Management , IPCSIT, 16,
pp 66 71 , 2011
[50] Adedayo Adetoye, Andy Choi, Marina Md. Arshad ,Olufemi Soretire,
Network Intrusion Detection & Response System, September 2003
[51] Olin Hyde, Machine Learning For Cyber Security at Network Speed &
Scale, 1
st
Public Edition: AI-ONE Inc, October 11, 2011
[52] Iftikhar Ahmad, Azween Abdullah and Abdullah Alghamdi, Towards the
Selection of Best Neural Network System for Intrusion Detection,
International Journal of the Physical Sciences , 5(12), pp 1830-1839,
October, 2010
[53] Liao Y, Vemuri V., Use of K-nearest Neighbor Classifier for Intrusion
Detection, Computers & Security, 21(5), pp 439448, 2002
[54] Freeman S., Bivens A., Branch J., Host-Based Intrusion Detection Using
User Signatures, Research Conference, NY, 2002
[55] J ake Ryan, Meng-J ang Lin, Intrusion Detection with Neural Networks,
Advances in Neural Information Processing Systems , Cambridge, MA, MIT
Press, 1998
[56] Ghosh Anup K., Schwartzbard A., Michael Schatz, Learning Program
Behavior Profiles for Intrusion Detection , Proceedings of the Workshop on
Intrusion Detection and Network Monitoring, 1999
[57] Srinivas Mukkamala, Sung Andrew H., Ajith Abraham, Intrusion Detection
Using An Ensemble of Intelligent Paradigms, Journal of Network and
Computer Applications, 28, pp 167182, 2005
[58] Wlodzislaw Duch, Norbert Jankowski, Survey of Neural Transfer
Functions, Neural Computing Surveys, (2), pp 163-212, 1999

180



[59] Zheng Zhang, Jun Li, Manikopoulos C.N., Jay Jorgenson, Jose Ucles,
HIDE: A Hierarchical Network Intrusion Detection System Using
Statistical Preprocessing and Neural Network Classification, Proceedings of
the IEEE Workshop on Information Assurance and Security United States
Military Academy, West Point, NY, pp 85 90, 2001
[60] Huang G-B, Zhu Q-Y, Siew C-K., Extreme Learning Machine: A New
Learning Scheme of Feedforward Neural Networks, Extreme Learning
Machine: Theory and Applications, Neurocomputing, 70, pp 489-501, 2006
[61] Chunlin Zhang, Ju J iang, Mohammed Kamel, Comparison of BPL and RBF
Network In Intrusion Detection System, Proceedings of the 9
th
International
Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular
Computing, pp 466 - 470, 2003
[62] Hofmann C., Schmitz B., Sick, Rule Extraction from Neural Networks For
Intrusion Detection In Computer Networks, IEEE International Conference
on Systems, Man and Cybernetics, 2, pp 12591265, 2003
[63] Anyanwu Longy O., J ared Keengwe, Arome Gladys A., Scalable Intrusion
Detection with Recurrent Neural Networks, International J ournal of
Multimedia and Ubiquitous Engineering , 6(1), pp 21 28, 2011
[64] Mansour Sheikhan, Zahra Jadidi, Ali Farrokhi , Intrusion Detection Using
Reduced-Size RNN Based on Feature Grouping, Neural Computing and
Applications, 21(6), pp 1185 1190, September 2012
[65] Ghosh A.K., Michael C., Schatz M., A Real-Time Intrusion Detection
System Based on Learning Program Behavior, Proceedings of the 3
rd

International Workshop on Recent Advances in Intrusion Detection
(RAID00), Toulouse, France, 1907, pp 93 109, 2000
[66] Cheng E., J in H., Han Z., Sun J., Network-Based Anomaly Detection Using
An Elman Network, Networking and Mobile Computing, 3619,
pp 471 480, 2005
[67] Cannady J ., Applying CMAC-Based On-Line Learning To Intrusion
Detection, Proceedings of the IEEE-INNS-ENNS International J oint
Conference on Neural Networks (IJCNN00), 5, pp 405 410, 2000
[68] Liberios Vokorokos, Anton Balaz, Martin Chovanec, Intrusion Detection
System Using Self Organizing Map, Acta Electrotechnics Informatica, 6
(1), pp 1 6, 2006
181



[69] Yuan Cao, Haibo He, Hong Man, Xiaoping Shen, Integration of Self-
organizing Map (SOM) and Kernel Density Estimation (KDE) for Network
Intrusion Detection, Proc. of SPIE, 7480, 2009
[70] Alan Bivens, Chandrika Palagiri, Rasheda Smith, Boleslawszymanski,
Network-Based Intrusion Detection Using Neural Networks, ANNIE , 12,
pp 579 584, 2002
[71] Sarasamma S. T., Zhu Q. A., Huff J., Hiearchical Kohonenen Net for
Anomaly Detection in Network Security, IEEE Transactions on Systems,
Man, and Cybernetics Part B: 35(2), pp 302312, 2005
[72] Rhodes B.C., Mahaffey J.A., Cannady J.D., Multiple Self-Organizing Maps
For Intrusion Detection, Proceedings of the 23
rd
National Information
Systems Security Conference, pp 1619, 2000
[73] Zanero S., Analyzing TCP Traffic Patterns Using Self Organizing Maps,
International Conference on Image Analysis and Processing (ICIAP05),
3617, pp 8390, 2005
[74] Hoque M S., Md. Abdul Mukit, Md. Abu Naser Bikas, An Implementation
of Intrusion Detection System Using Genetic Algorithm , International
Journal of Network Security & Its Applications (IJNSA), 4(2), pp 109 120,
March 2012
[75] Chet Langin, Shahram Rahimi, Soft computing in Intrusion Detection: The
State of the Art, J Ambient Intell Human Comput, 1, pp 133 145, 2010
[76] Sangkatsanee P., Wattanapongsakorn N., Charnsripinyo C., Practical Real-
Time Intrusion Detection Using Machine Learning Approaches, Computer
Communications , 34, pp 2227 2235, 2011
[77] Amor N. Ben, Benferhat S. , Z. Elouedi, Naive Bayes vs Decision Trees in
Intrusion Detection Systems, Proceedings of the ACM symposium on
Applied computing, pp 420 424, 2004
[78] Weiming Hu, Steve Maybank, AdaBoost-Based Algorithm for Network
Intrusion Detection, IEEE Transactions on Systems, Man, and Cybernetics,
38(2), pp 577 583, April 2008
[79] Khor K-C, Ting C-Y, Amnuaisuk S-P, From Feature Selection to Building
of Bayesian Classifiers: A Network Intrusion Detection Perspective,
American Journal of Applied Sciences, 6 (11), pp 1948-1959, 2009
182



[80] Altwaijry H., Algarny S., Bayesian Based Intrusion Detection System,
J ournal of King Saud University, Computer and Information Sciences, 24,
pp 16 , 2012
[81] Pan Z-S, Chen S-C, Hu G-B, Zhang D-Q, Hybrid Neural Network and C4.5
for Misuse Detection, Proceedings of the Second International Conference
on Machine Learning and Cybernetics, Xian, pp 2463 2467, November
2003
[82] Moguerza J avier M., Munoz A., Support Vector Machines with
Applications, Statistical Science, 21(3), pp 322 336, 2006
[83] Rung-Ching Chen, Kai-Fan Cheng, Chia-Fen Hsieh, Using Rough Set and
Support Vector Machine for Network Intrusion Detect, International Journal
of Network Security & Its Applications (IJNSA), 1(1), pp 1 13, April 2009
[84] Shaohua Teng, Hongle Du, Naiqi Wu, Wei Zhang, J iangyi Su, A
Cooperative Network Intrusion Detection Based on Fuzzy SVMs, J ournal
of Networks, 5(4), pp 475 483, April 2010
[85] Latifur Khan, Mamoun Awad, Bhavani Thuraisingham, A New Intrusion
Detection System Using Support Vector Machines and Hierarchical
Clustering, The VLDB Journal, 16, pp 507 521, 2007
[86] Yang Yi, J iansheng Wu, Wei Xu, Incremental SVM Based on Reserved Set
For Network Intrusion Detection, Expert Systems with Applications, 38,
pp 7698 7707, 2011
[87] J eongseok Seo, Sungdeok Cha, Masquerade Detection Based on SVM and
Sequence-Based User Commands Profile, ASIACCS07, 2007
[88] Dennis Decoste, Bernhard Scholkopf, Training Invariant Support Vector
Machines, Machine Learning, 46, pp 161 190, 2002
[89] Srinivas Mukkamala, Guadalupe J anoski, Andrew Sung, Intrusion
Detection: Support Vector Machines and Neural Networks
[90] Duan K-B, Sathiya Keerthi S., Which is the Best Multiclass SVM Method?
An Empirical Study, N.C. Oza et al. (Eds.), LNCS 3541, pp 278 285,
2005
[91] Sung Andrew H., Srinivas Mukkamala, Identifying Important Features for
Intrusion Detection Using Support Vector Machines and Neural Networks,
Symposium on Applications and the Internet - SAINT , pp 209 - 217, 2003
183



[92] Sandhya Peddabachigari, Ajith Abraham, Crina Grosanc, J ohnson Thomas,
Modeling Intrusion Detection System Using Hybrid Intelligent Systems,
J ournal of Network and Computer Applications, Elsevier Ltd, pp 1 16,
2005
[93] Chen R-C., Chen S-P., Intrusion Detection Using A Hybrid Support Vector
Machine Based on Entropy and TF-IDF, International J ournal of Innovative
Computing, Information and Control, 4(2), pp 413 424, 2008
[94] Qing Song, Wenjie Hu, Wenfan Xie, Robust Support Vector Machine for
Bullet Hole Image Classification, IEEE Transaction on Systems, Man and
Cybernetics, 32(4), pp 440 448, Nov 2002
[95] Wenjie Hu, Yihua Liao, Vemuri V. Rao, Robust Anomaly Detection Using
Support Vector Machines, Proceedings of the International Conference on
Machine Learning, pp 1 - 7
[96] Ganapathy S., Yogesh P., Kannan A., Intelligent Agent-Based Intrusion
Detection System Using Enhanced Multiclass SVM, Hindawi Publishing
Corporation Computational Intelligence and Neuroscience , pp 1- 10, 2012
[97] Huang G-B, Zhu Q-Y, Siew C-K, Extreme Learning Machine: Theory and
Applications, Neurocomputing ,70, pp 489 501, 2006
[98] Zhong L L., Zhang Ya Ming, Zhang Yu Bin, Network Intrusion Detection
Method by Least Squares Support Vector Machine Classifier, IEEE
Transactions, 2010
[99] Chengjie G U, Shunyi Zhang, He Huang, Online Internet Traffic
Classification Based on Proximal SVM, J ournal of Computational
Information Systems, 7(6), pp 2078 2086, 2011
[100] Huang C-L, Wang C-J , A GA-Based Feature Selection and Parameters
Optimization For Support Vector Machines, Expert Systems with
Applications, 31, pp 231240, 2006
[101] Ning Ye, Ruixiang Sun, Yingan Liu, Lin Cao, Support Vector Machine
With Orthogonal Chebyshev Kernel, IEEE Transactions, 2006
184



[102] Defeng Wang, Yeung D S, Tsang E C, Weighted Mahalanobis Distance
Kernels for Support Vector Machines, IEEE Transaction on Neural
Networks, 2007
[103] J ianhua Xu , Xuegong Zhang, Kernels Based on Weighted Levenshtein
Distance, Proceedings IEEE International Joint Conference, 2004
[104] Lodhi H., Craig Saunders, J ohn Shawe-Taylor, Text Classification Using
String Kernels, J ournal of Machine Learning Research, 2, pp 419 - 444,
2002
[105] Konrad Rieck, Pavel Laskov, Linear-Time Computation of Similarity
Measures for Sequential Data, J ournal of Machine Learning Research, 9,
pp 23 48, 2008
[106] Wenke Lee, Stolfo Salvatore J ., A Framework for Constructing Features
and Models for Intrusion Detection Systems, ACM Transactions on
Information and System Security, 3(4), pp 227 261, November 2000
[107] Srinivas Mukkamala, Sung Andrew H., Feature Selection for Intrusion
Detection Using Neural Networks and Support Vector Machines, Technical
Report, pp 1 17
[108] Luis Talavera, An Evaluation of Filter and Wrapper Methods for Feature
Selection in Categorical Clustering, Technical Report
[109] Huan Liu, Lei Yu, Toward Integrating Feature Selection Algorithms for
Classification and Clustering, IEEE Transactions on Knowledge and Data
Engineering, 17(4), pp 491 502, 2005
[110] Nguyen H T., Katrin Franke, Slobodan Petrovic, Towards a Generic
Feature-Selection Measure for Intrusion Detection, International
Conference on Pattern Recognition, IEEE, pp 1529 1532, 2010
[111] Ivan Kojadinovic, Thomas Wottka, Comparison Between A Filter and A
Wrapper Approach To Variable Subset Selection in Regression Problems,
ESIT 2000, pp 311 321, 2000
[112] Tipping M.E., The Relevance Vector Machine Advances in Neural
Information Processing Systems, 12, pp 652 - 658
[113] Tipping M.E., Sparse Bayesian Learning and the Relevance Vector
Machine, J ournal of Machine Learning Research, 1, pp 211 244, 2001
185



[114] Zhiqiang Zhang, J ianzhong Cui, Network Intrusion Detection Based on
Robust Wavelet RVM Algorithm, J ournal of Information & Computational
Science, pp 2983 2989, 2011
[115] Natalia Stakhanova, Samik Basu, J ohnny Wong, A Taxonomy Of Intrusion
Response Systems, International J ournal of Information and Computer
Security, 1(1/2), pp 1 18, 2007
[116] Bingrui Foo, Glause Matthew W., Howard Gaspar M., Wu Yu-Sung,
Saurabh Bagchi, Spafford Eugene H., Intrusion Response Systems: A
Survey, CERIAS Tech Report, 2008
[117] Andreas Fuchsberger, Intrusion Detection Systems and Intrusion Prevention
Systems, Information Security Technical Report,10, pp 134 139, 2005
[118] Powers Simon T., A Hybrid Artificial Immune System and Self Organising
Map for Network Intrusion Detection, Preprint submitted to Elsevier, 2012
[119] Julie Greensmith, Amanda Whitbrook, Uwe Aickelin, Artificial Immune
Systems, Technical Report
[120] Jungwon Kim, Bentley Peter J., Immune System Approaches to Intrusion
Detection - A Review, Technical Report
[121] Li K, Huang, Fast Construction of Single Hidden Layer Feedforward
Networks, Handbook of Natural Computing. Springer, Berlin, Mar 2010
[122] Li M-B, Huang G-B , Saratchandran P., Sundararajan N., Fully Complex
Extreme Learning Machine, Neurocomputing, 68, pp 306 314, 2005
[123] Gang Wang, J inxing Hao , J ian Ma, Lihua Huang, A New Approach To
Intrusion Detection Using Artificial Neural Networks and Fuzzy Clustering,
Expert Systems with Application, 2010
[124] Srilatha Chebrolu, Ajith Abraham, Thomas Johnson P., Feature Deduction
and Ensemble Design of Intrusion Detection Systems, Computers &
Security, 24, pp 295 307, 2005
[125] Shanmugavadivu R, Nagarajan N An Anomaly-Based Network Intrusion
Detection System Using Fuzzy Logic, Indian J ournal of Computer Science
and Engineering (IJ CSE), 2(1), pp 101 111
[126] Dao Vu N.P., Rao Vemuri, A Performance Comparison of Different Back
Propagation Neural Networks Methods in Computer Network Intrusion
Detection, Technical Report
186



[127] Rosenblatt F., Principles of Neurodynamics: Perceptrons and the Theory of
Brain Mechanisms, Spartan Books, New York, 1962
[128] Abdulkadir Sengur, Multiclass Least-Squares Support Vector Machines for
Analog Modulation Classification, Expert Systems with Applications,
36(3), pp 6681 6685, 2009
[129] Mehdi Moradi, Mohammad Zulkernine, A Neural Network Based System
for Intrusion Detection and Classification of Attacks, International
Conference on Advances in Intelligent Systems, Theory and Applications,
Luxembourg, IEEE, November 2004
[130] Bouzida, Cuppens F., Neural Networks Vs. Decision Trees For Intrusion
Detection, IEEE/IST Workshop on Monitoring, Attack Detection and
Mitigation
[131] Xiaojun Tong, Zhu Wang, Haining Yu, A Research Using Hybrid
RBF/Elman Neural Networks for Intrusion Detection System Secure
Model, Computer Physics Communications, 180(10), pp 1795 1801, 2009
[132] Al-Subaie M., Zulkernine M., Efficacy of Hidden Markov Models Over
Neural Networks in Anomaly Intrusion Detection, International Computer
Software and Applications Conference (COMPSAC06), pp 325 332, 2006
[133] Ahmad Ghodselahi, A Hybrid Support Vector Machine Ensemble Model
for Credit Scoring, International J ournal of Computer Applications, 17(5),
pp 1 5, March 2011
[134] Cortes C., Vapnik V., Support Vector Networks, Machine Learning, 20,
pp 273 297, 1995
[135] Li Rui, Computer Network Attack Evaluation Based on Incremental
Relevance Vector Machine Algorithm, J ournal of Convergence Information
Technology, J CIT, 7(1), J anuary 2011
[136] Di He, Improving the Computer Network Intrusion Detection Performance
Using the Relevance Vector Machine with Chebyshev Chaotic Map, IEEE,
2011
[137] Package e1071 September 12, 2012, http://cran.r-project.org /web
/packages /e1071 /e1071.pdf
[138] Package klaR, August 28, 2012http://cran.r-project.org/ web/ packages/
klaR/ klaR.pdf
187



[139] Package kernlab, November 28, 2012, http://cran.r-project.org /web
/packages /kernlab/kernlab.pdf
[140] R. A. Maxion, R. R. Roberts, Proper Use of ROC Curves in
Intrusion/Anomaly Detection, Technical Report Series, CS-TR-871,
November 2004


188




APPENDIX 1
GLOSSARY OF TECHNICAL TERMS

Alert A message generated by IDS whenever it detects an event of interest.
An alert typically contains information about the attack or some
unusual activity that was detected.
Anomaly Any significant deviations from the normal behavior/pattern
Attack An intelligent act that is a deliberate attempt (especially in the sense
of a method or technique) to evade security services and violate the
security policy of a system In other words, an intrusion attempt.
Event Activity detected by the IDS which may result in an alert. For
example, N failed logins in T seconds might indicate a brute-
force login attack
False
negative
occurs if the IDS does not identify an event that is part of an attack
as being malicious
False positive Occurs if the IDS identify an event that is not part of an attack as
being malicious.
Intrusion Any set of actions that attempt to compromise the confidentiality,
integrity or availability of system or network resources. Any
intrusion is a consequence of an attack, but not all attacks lead to an
intrusion
Intrusion
Detection
System
Monitors computer systems and/or network and analyzes the data for
possible hostile attacks originating from external world and also for
system misuse or attacks originating from inside the enterprise
Network
Security

Protection of Integrity, Availability and Confidentiality of
Network Assets and services from associated threats and
vulnerabilities so as to maintain the service availability, avoid
financial losses, damage to image, protect personnel, customer and
business secrets etc.
189



Promiscuous
Mode

Network Interface card when set in promiscuous mode, not only
accepts the packets intended to it but also receives and processes all
other packets which are moving around in the network
Signature/
Pattern based
intrusion
detection
The intrusion detection system contains a database of known
vulnerabilities in the form a sequence of strings. It monitors traffic
and seeks a pattern or a signature match
True
Negative
They occur when no alerts are triggered for events which are not part
of an attack(s)
True Positive They occur when alerts are triggered for events which are part of an
attack(s)
Vulnerability A flaw or weakness in a systems design, implementation, or
operation and management that could be exploited to violate the
systems security posture
Security
Policy
A set of rules and practices that specify or regulate how a system
or organization provides security services to protect sensitive and
critical system resources









190




APPENDIX 2
ATTACK DESCRIPTION

ARPpoison An attacker who has compromised a host on the local
network disrupts traffic by listening for ARP-who-has
packets and sending forged replies. ARP (address resolution
protocol) is used to resolve IP addresses to Ethernet
addresses. Thus, the attacker disrupts traffic by misdirecting
traffic at the data link layer
DoS attack A denial-of-service attack or distributed denial-of-service
attack (DDoS attack) is an attempt to make a computer
resource unavailable to its intended users. Although the
means to, motives for, and targets of a DoS attack may vary,
it generally consists of the concerted, malevolent efforts of a
person or persons to prevent an Internet site or service from
functioning efficiently or at all, temporarily or indefinitely
by choking the network bandwidth, and/or consuming
computing resources like memory and CPU
Fragment overlap
attack

A TCP/IP Fragmentation Attack is possible because IP
allows packets to be broken down into fragments for more
efficient transport across various media. The TCP packets
(and its header) are carried in the IP packet. In this attack the
second fragment contains incorrect offset. When packet is
reconstructed, the port number will be overwritten
IPsweep An IPsweep attack is a surveillance sweep to determine
which hosts are listening on a network. This information is
useful to an attacker in staging attacks and searching for
vulnerable machines
Land This is a Denial of service attack where a remote host is sent
a UDP packet with the same source and destination
191



Neptune Floods the target machine with SYN requests on one or
more ports, thus causing Denial of service
POD This attack, also known as Ping Of Death, crashes some
older operating system by sending an oversize fragmented IP
packet that reassembles to more than 65,535 bytes, the
maximum allowed by the IP protocol. It is called ping of
death because some older versions of Windows 95 could be
used to launch the attack using ping -l 65510
Smurf This is a distributed network flooding attack initiated by
sending ICMP ECHO REQUEST packets to a broadcast
address with the spoofed source address of the target. The
target is then flooded with ECHO REPLY packets from
every host on the broadcast address
Teardrop This attack reboots the host by sending a fragmented IP
packet that cannot be reassembled because of a gap between
the fragments
UDP storm An attacker floods the local network by setting up a loop
between an echo server and a Client machine or another
echo server by sending a UDP packet to one server with
the spoofed source address of the other






192




LIST OF PUBLICATIONS

1. Naveen N.C, Dr.Srinivasan R., Dr.Natarajan S., Application of Change
Point Outlier Detection Methods in Real Time Intrusion Detection,
International Conference on Advanced Computer Science Applications and
Technologies ACSAT2012 - Kuala Lumpur, Malaysia, 26 28 Nov 2012,
Accepted for publication in IEEE Xplore 2013
2. Naveen N.C, Dr.Srinivasan R, Dr.Natarajan S., Application of Relevance
Vector Machines in Real Time Intrusion Detection, (IJ ACSA) International
J ournal of Advanced Computer Science and Applications, 3 (9), pp 48 53,
2012
3. Naveen N.C, Anisha B.S, Arvind Murthy A Unified Approach for Outlier
Detection Using Change Point for Intrusion Detection, IFRSAs
International J ournal of Computing, 2(3), pp 550 555, July 2012
4. Naveen N.C, Dr. Srinivasan R., Dr.Natarajan S., A Unified Approach for
Real Time Intrusion Detection using Intelligent Data Mining Techniques,
International Journal of Computer Applications IJCA Special Issue on
Network Security and Cryptography (NSC), pp 13-17, 2011
5. Naveen N.C, Dr. Srinivasan R., Dr.Natarajan S., Research Direction in
Intrusion Detection , Prevention and Response System-A Survey, IFRSA
International Journal of Data Ware Housing & Mining (IIJ DWM), 1(1), pp
95-100, Aug 2011
193



VITAE
Currently working as Associate Professor in the Dept of ISE, R V
College of Engineering, Bangalore.
Responsibilities held in current designation:
Handle subjects for MTech Software Engineering and Information
Technology
Co-ordinator for Placement, NBA and TEQIP of ISE Department
Event coordinator and organizer for workshops conducted for faculty of
various engineering colleges.
Conducted various technical and cultural fests in the department
Play a major role in college administrative activities.
BOS and BOE member for Autonomous
Professional Training
1. Successfully completed training in ORACLE at TULEC, Bangalore.
2. Successfully completed training in C,C++at SPAN.
3. Successfully completed training in J ava, EJ B
4. Successfully completed training on .Net conducted from Microsoft
Industry Exposure
Working as a corporate trainer for induction batches for Wipro, Tata Elxsi,
YAHOO, SAP Labs, Sabre Holdings and faculty for MS BITS, Pilani.
Books Publication
Solution Manual for the custom Cryptography and Network Security 4
th
Edition,
Pearson, 2011 ISBN 978-81-317-5906-6
Naveen N C

S-ar putea să vă placă și