Documente Academic
Documente Profesional
Documente Cultură
Use of internet grows so the cyber security problems also arises. Different activities
are done by attacker to gain sensitive information of the victim. Different malicious
activities are being carried out by the attackers so that they will be able to get the
information of the victim. Using this information the attackers performs their illegal
activities. After gaining the information attacker perform illegal activities. For this we
are proposing the system and the main objective of this project is to detect phishing
emails from collected data in Honeypot. In this project we design an architecture in
which being built on the top of the Big Data frameworks that aims to mitigate the
cyber security problem like phishing.
3.1 Introduction
Security issues become more critical due to factors such as the large volumes and
variety of data that may be vulnerable, the diversity of data sources and formats, and
the velocity in which data are generated, typically following a stream nature with
a high volume. Enterprises usually collect terabytes of security-relevant data,
including network traffic, and software application events, among others. However,
well established techniques, most of the time, are not scalable and typically produce
many false positives when dealing with large amounts of data, degrading their
efficacy. To face these emerging problems, big data analytics has attracted the interest
of the security community. The use of big data frameworks for security solutions
presents several benefits, such as the possibility of storing and using large quantities
of security data. Although analyzing logs, network flows, and system events has been
used for several decades in security solutions, conventional technologies are not
adequate to be applied on such long term, large-scale volumes. In general, the
traditional infrastructure keep the data only for a limited period. Besides that,
traditional techniques are inefficient when performing analytics and complex queries
on large, unstructured datasets, while big data platforms perform these operations
efficiently. In this paper we present an architecture for cybersecurity applications
based on big data frameworks. Our architecture has the capability of collecting data
from different sources, storing, combining, and processing them effectively. For
example, sources like pcap files and other logs from a honeypot, data streams
collected from black list sites can all be stored in our system.
End User: The user who will operate the system will be known as end user. The user
interacting with system should be able to understand the operation of the system.
Technical User: Any technical user will be able to operate on the project. It will be
easy for the user to interact with the system.
Non-Technical User: A non technical user will also be able to operate on the system
as the GUI will be designed in such a way that it will get easy for the user to interact
with the system. Certain documentation will also be provided so that it will get easy
for the user to understand the working of the system.
3.2.1 System Features 1: Any project requirement needs to be well through out,
balanced and clearly understood by all involved, but perhaps the most important is
that they are not dropped or compromised halfway through the project. The official
definition of the functional requirement is that it is essentially specifies something the
system should do.Some of the functional requirement are:
REQ-1: The main requirement is the detection of the spam emails from the data set of
big data.
REQ-2: The time required for the algorithm to run successfully and detect the spam
emails should be less.
REQ-3: If invalid inputs are provided like data other than emails then pop up
messages should appear regarding as invalid input.
Consistency:
The data provided to the system as input should be managed by the system and
project will get execute on different types of computer without any
modification of data.
5.1 Advantage
The spam emails are detected successfully.
Provides security from harmful emails.
Provides accuracy of 99.5%.
Reusable
5.2 Disadvantage
Complexity is average.
Require more time to find out the spam messages due to use of big data set.
5.3 Applications
It is similar to Gmail account but the inputs provided are more.
It can be used in software company for detecting spam emails.
06. Results
We expect that the project designed should be able to detect the phishing/spam
emails. It should be able to detect the spam emails in less stipulated to time. The
system should work and give result as per the accuracy.
07. Conclusion
The proliferation of data sources and data collecting structures has lead to a large
increase in the data available for cyber security experts. To process such large
volumes of data, scalable massive data processing solutions are needed. As
mentioned in literature survey, the present work on the project uses the LSH
algorithm which detects the spam emails but it gives accuracy of 98.1%. The
system complexity is also high. Our system will reduce the complexity along with
that the accuracy increases to 99.5% and the spam emails will be detected
successfully in less time.
08. Bibliography
[2] Y. Yu, Y. Mu, and G. Ateniese, “Recent advances in security and privacy in big
data,” j-jucs, Mar 2015.
09. Annexure
Annex A:
Glossary