Documente Academic
Documente Profesional
Documente Cultură
International Symposium on Cyberspace Safety and Security (CSS), and 2015 IEEE 12th International Conf on Embedded Software
and Systems (ICESS)
1327
parameters are obtained from Hadoop and HBase cluster 2. Train the neural network with specifically designed
module, including average CPU usage, average packet size, training samples. For each sample, step 3 to step 5 will be
and total number of TCP connections. These three are the repeated until the values of weights W and weights V are
most important factors that will change when a DDoS attack stable.
occurs. In order to adjust these numbers to be the form of 3. Based on the values of weights W and weights V,
training samples’ inputs, a baseline value is recorded during calculate the value of each node in Hidden Layer (H) and
experiments, which will convert the raw parameters into Output Layer (O).
proper format before being passed into neural network. For 4. In order to adjust weights W and weights V, reverse
example, assuming the baseline value is 1000 and the actual error v and w need to be calculated.
value is 1100, so the converted value will be 0.1. 5. Check the value of v and w, if they are below an
In order to grant learning, analyzing, and predicting enough small number, which means the network become
capability to this neural network, the design of hidden layer stable, and the algorithm can move on to process next
is extremely important. Based on past experiences of other training sample.
researchers, the number of hidden layer nodes is calculated The whole process will finally provide stable weights W
based on a formula: m = 2n + 1, in which n is the number of and weights V, which can be used to calculate the value of
input layer nodes. As shown in Fig.3, the number of nodes in Output Layer (O) based on future input values.
input layer is 3, then the number of nodes in hidden layer is
D. Training Samples
determined to be 7.
The output layer has two nodes, one is the possibility of Eight training samples are specifically designed to train
“normal”, and the other is the possibility of “attack, ” the this neural network model, and they are listed as follows:
system could tell if the server is attacked by DDoS by (1) 0 0 0 0 1
checking the value of these two outputs. The one which is (2) 0 0 1 0 1
closer to 1 than the other one means it is the detection result. (3) 0 1 0 0 1
(4) 1 0 1 1 0
(5) 2 1 2 1 0
C. Back Propagation Algorithms
(6) 1 1 0 1 0
Initialize allݒ and ݓ to rand (-0.01, 0.01) (7) 2 2 1 1 0
Repeat (8) 1 1 1 1 0
For all ( ݔ௧ ǡ ݎ௧ ) ܺ אin random order In each single sample, the first tree integers represent
For ݄ ൌ ͳǡ ǥ ǡ ܪ average CPU usage, average packet size and total number of
ݖ ՚ ݀݅݉݃݅ݏሺݓ் ݔ௧ ሻ TCP connections, and the last two digits means the detection
For ݅ ൌ ͳǡ ǥ ǡ ܭ result, normal and attack. For example, the first training
sample teaches the neural network that if all the parameters
ݕ ൌ ݒ் ݖ
are low, then it is probably not an attack. The fourth training
For ݅ ൌ ͳǡ ǥ ǡ ܭ
sample teaches the neural network that if the average CPU
οݒ ൌ Ʉሺݎ௧ െ ݕ௧ ሻ
usage is high and the number of TCP connections are high,
For ݄ ൌ ͳǡ ǥ ǡ ܪ
then it is probably an attack.
οݓ ൌ Ʉሺσሺݎ௧ െ ݕ௧ ሻݒ ሻݖ ሺͳ െ ݖ ሻ ݔ௧
For ݅ ൌ ͳǡ ǥ ǡ ܭ E. Training Results
ݒ ՚ ݒ οݒ
In order to determine if the neural network is “well-
For ݄ ൌ ͳǡ ǥ ǡ ܪ
trained, “ three experiments are performed with different
ݓ ՚ ݓ οݓ
number of nodes in Hidden Layer(H). In each graph, the
Until convergence
Fig.4. BP algorithm pseudo code [7].
final trend of each curve indicates if it becomes stable or not.
Fig.5 shows the training results when the number of nodes in
As shown in Fig.4., BP algorithm is basically a repeated Hidden Layer(H) is 7, which is also the number used in the
process until finding stable weights W and weights V, and if neural network model of the whole detection system. In
the values of weights become stable, we can call this neural Fig.5(a) and Fig.5(b)., there are five curves which represent
network is “well-trained.” The method of measuring if the changing trend of weights W and weights V.
weights become stable is to check if the adjustment value to
weight is below a specified small number, like 0.0001.
According to the pseudo code in Fig.4., this algorithm can be
divided into five steps:
1. Initialize the array of weights W and weights V. The
initial values are usually specified within a range from -0.01
to 0.01.
1328
(a) The weight history of ܹ݄݁݅݃ݏݐሺܸሻ. (b) The weight history of ݄ݏݐሺܹሻ .
Fig.6. Weights W and Weights V changing trend when number of nodes
in Hidden Layer(H) is 5.
1329
V. EXPERIMENTAL RESULTS is 34.18883, which is a little more than normal days. The
After many times of experiments, a basic assumption is CPU value can be converted to 0.577995; the value of
applied to determine the baseline value for CPU usage, the number of connections is 262840.0, which will be converted
number of connection and the average package length. For to 0.3142; the value of average packet length is 0.108. All
CPU usage, the value is around 19%; the number of the converted values are a bit higher than the values of
connection is around 210,000 connections; the average normal days, but they doesn’t become too high. In this
package is around 500. These assumptions will be used situation, the algorithm will provide value which will
when converting the real values to values that can be used by indicates the possibility or trend of “Normal” or “Attack.” In
neural network. Fig.10., since everything becomes a little bit higher,
“Attack” has a value of 0.61429 and “Normal” has a value
A. Experiment in Normal Days of 0.40277, which means the current situation is possibly an
As shown in TABLE I., this analysis result is for a attack, but it may also be a busy period.
network log that is recorded in normal days. The value of TABLE III. Analysis result in HBase of busy but not attack days.
CPU is 18.71589, which will be converted to -0.0149842; Type Value
the value of number of connections is 208143, which will be Attack 0.61429
converted to 0.040715; the value of average packet length is Normal 0.40277
-0.09. All the converted values are nearly 0, as the training Average CPU Usage 34.18883
sample 1 indicates, if all inputs are near to 0, this will be Number of Connections 262840.0
determined as a “Normal. ” In TABLE I., “Normal” has a Average Packet Length 554.20276
value of around 1.0246 and “Attack” has a value of around -
0.010, so the value of "Normal" is more close to 1, then the D. Comparison with Single-Factor Detection Approach
analysis result of the system is "Normal."
Single-factor detection approach is that the system only
TABLE I. Analysis result in HBase of normal days. takes one factor into consideration when determining if the
Type Value server is attacked or not, for example, only the number of
Attack -0.01099 TCP connections factor. In some situation, this may be able
Normal 1.02462 to produce correct analysis results, but the following
Average CPU Usage 18.71589
experiment proves that this type of approach cannot produce
Number of Connections 208143.0
correct analysis results in certain situations, and that's why
Average Packet Length 273.00363
we need multi-factor detection approach, like the neural
network based approach proposed in this paper.
B. Experiment in Attack Days The following experiment is conducted in a scenario that
As shown in TABLE II.., this analysis result is for a attackers send huge data packets to the server in a short time
network log that is recorded in days when attack happens. period, which causes the CPU usage significantly increase.
The value of CPU is 64.25917, which is pretty high. The However, the number of connections is within normal range.
CPU value can be converted to 2.40609; the value of number TABLE IV. Analysis result in HBase of different type of attack days.
of connections is 323856, which will be converted to Type Value
0.61928; the value of average packet length is 0.112. This is Attack 1.32590
very obvious that the website is enduring an attack because Normal -0.30765
the CPU usage and the number connections are high, which Average CPU Usage 54.252224
is based on training sample 5. The “Attack” field has a value Number of Connections 203030.0
of 1.71033, and the “Normal” field has a value of -0.66896, Average Packet Length 1037.6412
so the attack value is more close to 1, which means the
system decide this as "Attack." With single-factor detection approach, a threshold value is
TABLE II. Analysis result in HBase of attack days. selected as the edge of "Attack" and "Normal, " as
Type Value mentioned before, the normal number of connections is
Attack 1.71033 around 210,000. Based on the statistics in TABLE IV., the
Normal -0.66896 number of connections is 203,030, which is below 210,000.
Average CPU Usage 64.25917 As a result, this scenario will be determined as a "Normal"
Number of Connections 323856.0 using single-factor detection approach, which is wrong
Average Packet Length 556.9996 result.
With multi-factor detection approach, the neural network
C. Experiment in Busy Days based approach proposed in this paper, as shown in TABLE
As shown in TABLE III.., this analysis result is for a IV., the "Attack" is 1.325909, and the "Normal" is -0.30765,
network log that is recorded in busy days. The value of CPU which determined this scenario as "Attack," which is correct.
1330
VI. CONCLUSIONS
A DDoS detection approach based on neural networks
using Hadoop and HBase was developed so that the system
is able to detect DDoS attacks efficiently. Firstly, a Hadoop
and HBase cluster was set up to process huge unstructured
dataset. Then, a neural network model was designed for
DDoS detection, and a series of 6 training samples were
used to train the neural network model, which allows it to
able to detect DDoS attacks. Finally, the trained neural
network is tested with three different scenarios which proves
its feasibility.
ACKNOWLEDGMENT
This material is based in part upon work supported by the
National Science Foundation under Grant Numbers
1438858, 1244697, and 1241651. Any opinions, findings,
and conclusions or recommendations expressed in this
material are those of the author(s) and do not necessarily
reflect the views of the National Science Foundation.
REFERENCES
[1] CERT Coordination Center, "Denial of Service Attacks,"
http://www.cert.org/tech_tips/denial_of_service.html, accessed on
Nov. 17, 2014.
[2] D. A. Pomerleau, "Efficient training of artificial neural networks for
autonomous navigation, " Neural Computation, 1991 - MIT Press, pp.
xxx-xxx.
[3] C.L. Wu, K. W. Chau, and Y. S. Li, "Methods to improve neural
network performance in daily flows prediction, " Journal of
Hydrology, 2009, Elsevier, pp. xxx-xxx.
[4] D. Gavrilis, E. Dermatas, "Real-time detection of distributed denial-
of-service attacks using RBF networks and statistical features,"
Computer Networks, Vol. 48, Issue 2, pp. 235-245, 2005.
[5] P. A. R. Kumar, S. Selvakuma, "Distributed denial of service attack
detection using an ensemble of neural classifier, " Computer
Communications, Vol. 34, pp. 1328-1341, 2011
[6] A. A. Alfantookh, "DoS Attacks Intelligent Detection using Neural
Networks, " Journal of King Saud University - Computer and
Information Sciences, Vol. 18, pp. 31-51, 2006
[7] D. Stevanovic, N. Vlajic, A. An, "Detection of malicious and non-
malicious website visitors using unsupervised neural network
learning, " Applied Soft Computing, vol. 13, pp. 698-708, 2013
[8] M. T. Hagan, H. B. Demuth, and M. H. Beale, Neural Network
Design, Martin Hagan, 2002.
[9] V. R, Soc. for Electron. Transactions & Security, Chennai, India ;
Raghavan, S.V. ; Ravindran, B.
1331