Sunteți pe pagina 1din 6

2015 IEEE 17th International Conference on High Performance Computing and Communications (HPCC), 2015 IEEE 7th

International Symposium on Cyberspace Safety and Security (CSS), and 2015 IEEE 12th International Conf on Embedded Software
and Systems (ICESS)

A Neural-Network Based DDoS Detection


System Using Hadoop And HBase
Teng Zhao Dan Chia-Tien Lo Kai Qian
The School of Computing and The School of Computing and The School of Computing and
Software Engineering Software Engineering Software Engineering
Kennesaw State University Kennesaw State University Kennesaw State University
Marietta, Georgia, USA Marietta, Georgia, USA Marietta, Georgia, USA
tzhao@spsu.edu clo@spsu.edu kqian@spsu.edu

Abstract—This paper presents a detection system for the


will result in a huge unstructured dataset on the server for
Distributed Denial of Service (DDoS) attack based on neural
network, which is implemented in the Apache Hadoop cluster the detection system to analyze. Thus, the ability to process
and the HBase system. While there are already many big data is important for the DDoS detection.
approaches for the DDoS detection, there are two main In this paper, we implement and validate a neural network
challenges: the learning capability of a DDoS detection system based DDoS detection system using Hadoop and HBase.
and the ability to process a huge unstructured dataset. The Related work is reviewed in Section 2, and problem
main contribution of this paper is to develop a DDoS detection
formulation is shown in Section 3. The proposed core
system with learning capability to adapt to new types of DDoS
attacks and ability to store and analyze a huge unstructured system architecture and neural network model are presented
dataset collected from network logs. Particularly, a neural in Section 4. Lastly, experimental results and comparisons to
network architecture is designed for the DDoS detection previous research are covered in Section 5.
system, and a list of training samples is developed to train the
neural network. This approach is validated with a series of II. RELATED WORK
generated datasets of different scenarios. It was shown that the
Neural networks have been used in many different types
system with the well-trained neural network is able to detect
DDoS attacks efficiently and successfully. of applications and systems. In general, there are two main
directions which can be researched, one is to apply neural
Keywords—DDoS Detection, Neural Networks, Hadoop, HBase. networks to different types of application or systems to solve
real-world problems; the other is to improve the neural
I. INTRODUCTION networks algorithms or models for particular problems. For
Distributed denial-of-service (DDoS) [1] attack has the first direction, researchers have applied neural networks
become a vital threat to Internet servers. The DDoS attack to robot autonomous navigation, which is described in [2].
mainly drains the resources of website servers and stop For the second direction, Wu proposed a method to improve
legitimate customers or website users from accessing the neural network performance in daily flows prediction, which
servers. This type of attack may seem trivial on website with is described in [3].
small load of traffic, but it is a big issue if it occurs, for Researchers have come up with some approaches to detect
example, on a stock trading website or an ecommerce DDoS attacks. Dimitris proposed a real-time DDoS
website because one second delay may cause millions of detection approach using RBF neural networks, which
dollars of losses. A variety of DDoS detection systems have utilizes a small number of statistical descriptors of incoming
been designed and implemented to detect DDoS attacks, but packets to describe DDoS attack behaviors, as discussed in
there are still challenges which need to be conquered. [4]. An detection approach using neural classifier is
The challenges of DDoS detection system are mostly proposed by Kumar in [5], and he tested his approach with
focused on two aspects. Firstly, while researchers are KDD Cup dataset, which is the official DDoS dataset used
designing new DDoS detection systems, hackers are also by most researchers. In [6], Abdulkader found an intelligent
improving their technique of DDoS attacks. As a result, in approach for detecting DoS, not DDoS using neural
order to detect DDoS attacks correctly and efficiently, it is networks, and his approach can detect 91% of all DoS
necessary that the system has the learning capability to adapt attacks. Dusan proposed an approach of detecting malicious
to new types of DDoS attack and detecting them. Secondly, visitors using unsupervised neural network learning in [7].
hackers will use a distributed system of thousands or The neural network analysis approach can be categorized
millions of zombie computers to start a DDoS attack, and it as a multi-factor detection approach, there are other multi-

978-1-4799-8937-9/15 $31.00 © 2015 IEEE 1326


DOI 10.1109/HPCC-CSS-ICESS.2015.38
factor detection approaches, such as Bayesian networks, optimal actions for the robot. In particular, the neural
which is described in [8]. The difference between Bayesian network will accept the sonar measurement data as its input,
networks and neural networks is that each node in Bayesian and select an action for the robot so that it can navigate in
networks represents meaningful variable, and each edge has the maze successfully. The proposed neural network
meaning as well. However, in neural network, the only the architecture is shown in Fig. 2.
input layer and output layer has meaning, the hidden layer is
only used to predict analysis result. A. Entire DDoS Detection System Architecture
In this paper, a cloud cluster of multiple computers
constructed with Hadoop and HBase is used to process and
store huge unstructured dataset. In addition, supervised
neural network is utilized and a specific neural networks
model is designed to analyze processed result and
accomplish the detection of DDoS attacks. The specifically
designed neural networks grant learning capability to the
whole system to be able to detect different types of DDoS
attacks, and cloud cluster enables the system to process huge
network log files and store core data into HBase NoSQL
database.

III. PROBLEM FORMULATION


A PHP website is implemented and hosted on an Apache
server as depicted in Fig. 1, and the website is used as a
simulation environment. The background is that this website Fig.2. Entire DDoS detection system architecture.
is designed for a startup company, and few people know this The whole cluster is composed of two modules, one is the
website. As a result, the traffic of this website is relatively
Hadoop & HBase module and the other is neural networks
low on "normal operations." In addition, if the founder of the
startup company invites other people to visit the website, it analysis module. The Hadoop & HBase module grant the
will have lots of traffic but only on those days, which are ability of processing and storing huge unstructured dataset to
referred as "busy days." A cluster of computers are used to the whole system, and the neural network analysis module
simulate a DDoS attack to this website, and there is a data enables the system to analyze and predict if there is a DDoS
collector installed on this server. Another cluster of attack.
computers constructed with Hadoop and HBase will be able
to import data from the data collector and analyze it to B. Neural Network Architecture
provide analytical results of the possible DDoS attack on
"normal days, " "busy days, " and "attack days."

Fig.1. The overall process for the designed scenario


Fig.3. The neural network architecture for DDoS detection system.
IV. NEURAL-NETWORK BASED DDOS DETECTION WITH
In the input layer of this neural network architecture, there
HADOOP AND HBASE
are three nodes which represents three input parameters for
In order to help the robot to move out of the maze this neural network model, as shown in Fig.3. These input
autonomously, a neural network is developed to select

1327
parameters are obtained from Hadoop and HBase cluster 2. Train the neural network with specifically designed
module, including average CPU usage, average packet size, training samples. For each sample, step 3 to step 5 will be
and total number of TCP connections. These three are the repeated until the values of weights W and weights V are
most important factors that will change when a DDoS attack stable.
occurs. In order to adjust these numbers to be the form of 3. Based on the values of weights W and weights V,
training samples’ inputs, a baseline value is recorded during calculate the value of each node in Hidden Layer (H) and
experiments, which will convert the raw parameters into Output Layer (O).
proper format before being passed into neural network. For 4. In order to adjust weights W and weights V, reverse
example, assuming the baseline value is 1000 and the actual error v and w need to be calculated.
value is 1100, so the converted value will be 0.1. 5. Check the value of v and w, if they are below an
In order to grant learning, analyzing, and predicting enough small number, which means the network become
capability to this neural network, the design of hidden layer stable, and the algorithm can move on to process next
is extremely important. Based on past experiences of other training sample.
researchers, the number of hidden layer nodes is calculated The whole process will finally provide stable weights W
based on a formula: m = 2n + 1, in which n is the number of and weights V, which can be used to calculate the value of
input layer nodes. As shown in Fig.3, the number of nodes in Output Layer (O) based on future input values.
input layer is 3, then the number of nodes in hidden layer is
D. Training Samples
determined to be 7.
The output layer has two nodes, one is the possibility of Eight training samples are specifically designed to train
“normal”, and the other is the possibility of “attack, ” the this neural network model, and they are listed as follows:
system could tell if the server is attacked by DDoS by (1) 0 0 0 0 1
checking the value of these two outputs. The one which is (2) 0 0 1 0 1
closer to 1 than the other one means it is the detection result. (3) 0 1 0 0 1
(4) 1 0 1 1 0
(5) 2 1 2 1 0
C. Back Propagation Algorithms
(6) 1 1 0 1 0
Initialize all‫ݒ‬௜௛ and ‫ݓ‬௛௝ to rand (-0.01, 0.01) (7) 2 2 1 1 0
Repeat (8) 1 1 1 1 0
For all (‫ ݔ‬௧ ǡ ‫ ݎ‬௧ )‫ ܺ א‬in random order In each single sample, the first tree integers represent
For ݄ ൌ ͳǡ ǥ ǡ ‫ܪ‬ average CPU usage, average packet size and total number of
‫ݖ‬௛ ՚ ‫݀݅݋݉݃݅ݏ‬ሺ‫ݓ‬௛் ‫ ݔ‬௧ ሻ TCP connections, and the last two digits means the detection
For ݅ ൌ ͳǡ ǥ ǡ ‫ܭ‬ result, normal and attack. For example, the first training
sample teaches the neural network that if all the parameters
‫ݕ‬௜ ൌ ‫ݒ‬௜் ‫ݖ‬
are low, then it is probably not an attack. The fourth training
For ݅ ൌ ͳǡ ǥ ǡ ‫ܭ‬
sample teaches the neural network that if the average CPU
ο‫ݒ‬௜ ൌ Ʉሺ‫ݎ‬௜௧ െ ‫ݕ‬௜௧ ሻœ
usage is high and the number of TCP connections are high,
For ݄ ൌ ͳǡ ǥ ǡ ‫ܪ‬
then it is probably an attack.
ο‫ݓ‬௛ ൌ Ʉሺσ௜ሺ‫ݎ‬௜௧ െ ‫ݕ‬௜௧ ሻ‫ݒ‬௜௛ ሻ‫ݖ‬௛ ሺͳ െ ‫ݖ‬௛ ሻ‫ ݔ‬௧
For ݅ ൌ ͳǡ ǥ ǡ ‫ܭ‬ E. Training Results
‫ݒ‬௜ ՚ ‫ݒ‬௜ ൅  ο‫ݒ‬௜
In order to determine if the neural network is “well-
For ݄ ൌ ͳǡ ǥ ǡ ‫ܪ‬
trained, “ three experiments are performed with different
‫ݓ‬௛ ՚ ‫ݓ‬௛ ൅  ο‫ݓ‬௛
number of nodes in Hidden Layer(H). In each graph, the
Until convergence
Fig.4. BP algorithm pseudo code [7].
final trend of each curve indicates if it becomes stable or not.
Fig.5 shows the training results when the number of nodes in
As shown in Fig.4., BP algorithm is basically a repeated Hidden Layer(H) is 7, which is also the number used in the
process until finding stable weights W and weights V, and if neural network model of the whole detection system. In
the values of weights become stable, we can call this neural Fig.5(a) and Fig.5(b)., there are five curves which represent
network is “well-trained.” The method of measuring if the changing trend of weights W and weights V.
weights become stable is to check if the adjustment value to
weight is below a specified small number, like 0.0001.
According to the pseudo code in Fig.4., this algorithm can be
divided into five steps:
1. Initialize the array of weights W and weights V. The
initial values are usually specified within a range from -0.01
to 0.01.

1328
(a) The weight history of ܹ݄݁݅݃‫ݏݐ‬ሺܸሻ. (b) The weight history of ݄‫ݏݐ‬ሺܹሻ .
Fig.6. Weights W and Weights V changing trend when number of nodes
in Hidden Layer(H) is 5.

As shown in Fig.7(a) and Fig.7(b), these graphs are the


results of the experiment when the number of nodes in
Hidden Layer(H) is 15. These two graphs are not
approaching a constant value in the end, which means the
neural network is not really “well-trained.” As a result,
setting the number of nodes in Hidden Layer(H) too high is
not good for making the neural network become “well-
trained” as well.

(b) The weight history of ݄‫ݏݐ‬ሺܹሻ .


Fig.5. Weights W and Weights V changing trend when number of nodes
in Hidden Layer(H) is 7.

As shown in Fig.6(a) and Fig.6(b), these graphs are the


results of the experiment when the number of nodes in
Hidden Layer(H) is 5. The graph of Fig.6(a) indicates that
weights V becomes stable. However, in Fig.6(b), the weights
W is very unstable in the end. As a result, If the number of
nodes in Hidden Layer(H) is too small, it will make it hard
for the neural network to get stable weights.
(a) The weight history of ݄‫ݏݐ‬ሺܸሻ .

(a) The weight history of ݄‫ݏݐ‬ሺܸሻ .


(b) The weight history of ™‡݄݅݃‫ݏݐ‬ሺܹሻ .
Fig.7. Weights W and Weights V changing trend when number of nodes in
Hidden Layer(H) is 15.

1329
V. EXPERIMENTAL RESULTS is 34.18883, which is a little more than normal days. The
After many times of experiments, a basic assumption is CPU value can be converted to 0.577995; the value of
applied to determine the baseline value for CPU usage, the number of connections is 262840.0, which will be converted
number of connection and the average package length. For to 0.3142; the value of average packet length is 0.108. All
CPU usage, the value is around 19%; the number of the converted values are a bit higher than the values of
connection is around 210,000 connections; the average normal days, but they doesn’t become too high. In this
package is around 500. These assumptions will be used situation, the algorithm will provide value which will
when converting the real values to values that can be used by indicates the possibility or trend of “Normal” or “Attack.” In
neural network. Fig.10., since everything becomes a little bit higher,
“Attack” has a value of 0.61429 and “Normal” has a value
A. Experiment in Normal Days of 0.40277, which means the current situation is possibly an
As shown in TABLE I., this analysis result is for a attack, but it may also be a busy period.
network log that is recorded in normal days. The value of TABLE III. Analysis result in HBase of busy but not attack days.
CPU is 18.71589, which will be converted to -0.0149842; Type Value
the value of number of connections is 208143, which will be Attack 0.61429
converted to 0.040715; the value of average packet length is Normal 0.40277
-0.09. All the converted values are nearly 0, as the training Average CPU Usage 34.18883
sample 1 indicates, if all inputs are near to 0, this will be Number of Connections 262840.0
determined as a “Normal. ” In TABLE I., “Normal” has a Average Packet Length 554.20276
value of around 1.0246 and “Attack” has a value of around -
0.010, so the value of "Normal" is more close to 1, then the D. Comparison with Single-Factor Detection Approach
analysis result of the system is "Normal."
Single-factor detection approach is that the system only
TABLE I. Analysis result in HBase of normal days. takes one factor into consideration when determining if the
Type Value server is attacked or not, for example, only the number of
Attack -0.01099 TCP connections factor. In some situation, this may be able
Normal 1.02462 to produce correct analysis results, but the following
Average CPU Usage 18.71589
experiment proves that this type of approach cannot produce
Number of Connections 208143.0
correct analysis results in certain situations, and that's why
Average Packet Length 273.00363
we need multi-factor detection approach, like the neural
network based approach proposed in this paper.
B. Experiment in Attack Days The following experiment is conducted in a scenario that
As shown in TABLE II.., this analysis result is for a attackers send huge data packets to the server in a short time
network log that is recorded in days when attack happens. period, which causes the CPU usage significantly increase.
The value of CPU is 64.25917, which is pretty high. The However, the number of connections is within normal range.
CPU value can be converted to 2.40609; the value of number TABLE IV. Analysis result in HBase of different type of attack days.
of connections is 323856, which will be converted to Type Value
0.61928; the value of average packet length is 0.112. This is Attack 1.32590
very obvious that the website is enduring an attack because Normal -0.30765
the CPU usage and the number connections are high, which Average CPU Usage 54.252224
is based on training sample 5. The “Attack” field has a value Number of Connections 203030.0
of 1.71033, and the “Normal” field has a value of -0.66896, Average Packet Length 1037.6412
so the attack value is more close to 1, which means the
system decide this as "Attack." With single-factor detection approach, a threshold value is
TABLE II. Analysis result in HBase of attack days. selected as the edge of "Attack" and "Normal, " as
Type Value mentioned before, the normal number of connections is
Attack 1.71033 around 210,000. Based on the statistics in TABLE IV., the
Normal -0.66896 number of connections is 203,030, which is below 210,000.
Average CPU Usage 64.25917 As a result, this scenario will be determined as a "Normal"
Number of Connections 323856.0 using single-factor detection approach, which is wrong
Average Packet Length 556.9996 result.
With multi-factor detection approach, the neural network
C. Experiment in Busy Days based approach proposed in this paper, as shown in TABLE
As shown in TABLE III.., this analysis result is for a IV., the "Attack" is 1.325909, and the "Normal" is -0.30765,
network log that is recorded in busy days. The value of CPU which determined this scenario as "Attack," which is correct.

1330
VI. CONCLUSIONS
A DDoS detection approach based on neural networks
using Hadoop and HBase was developed so that the system
is able to detect DDoS attacks efficiently. Firstly, a Hadoop
and HBase cluster was set up to process huge unstructured
dataset. Then, a neural network model was designed for
DDoS detection, and a series of 6 training samples were
used to train the neural network model, which allows it to
able to detect DDoS attacks. Finally, the trained neural
network is tested with three different scenarios which proves
its feasibility.

ACKNOWLEDGMENT
This material is based in part upon work supported by the
National Science Foundation under Grant Numbers
1438858, 1244697, and 1241651. Any opinions, findings,
and conclusions or recommendations expressed in this
material are those of the author(s) and do not necessarily
reflect the views of the National Science Foundation.
REFERENCES
[1] CERT Coordination Center, "Denial of Service Attacks,"
http://www.cert.org/tech_tips/denial_of_service.html, accessed on
Nov. 17, 2014.
[2] D. A. Pomerleau, "Efficient training of artificial neural networks for
autonomous navigation, " Neural Computation, 1991 - MIT Press, pp.
xxx-xxx.
[3] C.L. Wu, K. W. Chau, and Y. S. Li, "Methods to improve neural
network performance in daily flows prediction, " Journal of
Hydrology, 2009, Elsevier, pp. xxx-xxx.
[4] D. Gavrilis, E. Dermatas, "Real-time detection of distributed denial-
of-service attacks using RBF networks and statistical features,"
Computer Networks, Vol. 48, Issue 2, pp. 235-245, 2005.
[5] P. A. R. Kumar, S. Selvakuma, "Distributed denial of service attack
detection using an ensemble of neural classifier, " Computer
Communications, Vol. 34, pp. 1328-1341, 2011
[6] A. A. Alfantookh, "DoS Attacks Intelligent Detection using Neural
Networks, " Journal of King Saud University - Computer and
Information Sciences, Vol. 18, pp. 31-51, 2006
[7] D. Stevanovic, N. Vlajic, A. An, "Detection of malicious and non-
malicious website visitors using unsupervised neural network
learning, " Applied Soft Computing, vol. 13, pp. 698-708, 2013
[8] M. T. Hagan, H. B. Demuth, and M. H. Beale, Neural Network
Design, Martin Hagan, 2002.
[9] V. R, Soc. for Electron. Transactions & Security, Chennai, India ;
Raghavan, S.V. ; Ravindran, B.

1331

S-ar putea să vă placă și