Documente Academic
Documente Profesional
Documente Cultură
Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools
Muhammad Najmi Ahmad Zabidi
International Islamic University Malaysia
MOSC 2012
1/34
About
I am a research grad student at Universiti Teknologi
Windows executables
For past few years (since 2003), I am a Subversion(SVN)
committer for KDE localization project to Malay language (but now rarely commit.. need a new intern to replace :) )
MOSC 2012
2/34
Interconnected machine Previously less connected, now socialized machines Brought real problems to the cyberworld
MOSC 2012
3/34
Risks
MOSC 2012
4/34
Types of adversaries
MOSC 2012
5/34
Spam
MOSC 2012
6/34
Spam
Annoying
MOSC 2012
6/34
Spam
MOSC 2012
6/34
Spam
Annoying Productivity wasted in unneccesary file deletion Difficult to find important email - extreme case
MOSC 2012
6/34
Scam
MOSC 2012
7/34
Scam
MOSC 2012
7/34
Scam
Preying on naive victims Sounds to good to be true, but still some people believed
MOSC 2012
7/34
Scam
Preying on naive victims Sounds to good to be true, but still some people believed Organized crime/syndicate... with mules cooperating
MOSC 2012
7/34
Phishing
MOSC 2012
8/34
Phishing
Almost similar with scam, but different tactic
MOSC 2012
8/34
Phishing
Almost similar with scam, but different tactic More sophisticated, but does not need mule/physical
meetup
MOSC 2012
8/34
Phishing
Almost similar with scam, but different tactic More sophisticated, but does not need mule/physical
meetup
Main purpose to gain important details - online banking
MOSC 2012
8/34
Phishing
Almost similar with scam, but different tactic More sophisticated, but does not need mule/physical
meetup
Main purpose to gain important details - online banking
MOSC 2012
8/34
Malware
MOSC 2012
9/34
Malware
Safely to say,covers
trojan,virus,dialers,rabbits,worms,rootkit(bundled nowadays)
MOSC 2012
9/34
Malware
Safely to say,covers
trojan,virus,dialers,rabbits,worms,rootkit(bundled nowadays)
Already infecting computers since 1980s, threat is more
MOSC 2012
9/34
Malware
Safely to say,covers
trojan,virus,dialers,rabbits,worms,rootkit(bundled nowadays)
Already infecting computers since 1980s, threat is more
MOSC 2012
9/34
Some manually crafted, some automated React relatively fast, difficult to trace Too many (for example, spam) hence too time consuming
MOSC 2012
10/34
In house analysis
Given enough expertise, in house analysis could be useful Maintaining reputation, having own group of analysts to
handle incidents
Try minimize costs, use open source tools whenever
possible
MOSC 2012
11/34
Categories
Machine Learning
Associated with the Artificial Intelligence Mimicking human (brain) learning Learns through experience Deals with known and unknown patterns Overlapping (or somehow originated) with Data Mining,
Pattern Recognition
MOSC 2012
12/34
Categories
MOSC 2012
13/34
Categories
MOSC 2012
13/34
Categories
MOSC 2012
13/34
Categories
MOSC 2012
13/34
Categories
MOSC 2012
13/34
Categories
MOSC 2012
13/34
Categories
MOSC 2012
13/34
Categories
MOSC 2012
13/34
Categories
MOSC 2012
13/34
Categories
MOSC 2012
13/34
Categories
MOSC 2012
13/34
Categories
What to look?
We look for patterns In some case, have the spam,phishing mails corpus ready We call these patterns as features
MOSC 2012
14/34
Categories
Spam/scam
The language that being used Perhaps words like You have won GBP100,000,000
reasons
MOSC 2012
15/34
Categories
Phishing mails
Look for URL Current effort for example by PhishTank is done by using
MOSC 2012
16/34
Categories
Malware
Researchers tend to look on the Application
MOSC 2012
17/34
Categories
Some example
MOSC 2012
18/34
The datasets
Spam email research is already quite sometimes
since it is unwanted email. Might as well being categorized as sub-spam Phishing emails samples:
Sample dataset: http://phishtank.com
MOSC 2012
19/34
Feature Selection/Extraction
When analyzing, were interested with features What kind of feature? Important keywords, strong features Non important features will be phased out.. unneccesary Some features might be redundant
MOSC 2012
20/34
There are algorithms which meant for this: Information Gain Support Vector Machine (SVM) other... some maybe hybrid algoritms(combining several algorithms altogether) - also known as ensemble
MOSC 2012
21/34
List of tools
MOSC 2012
22/34
List of tools
Weka
MOSC 2012
22/34
List of tools
Weka R language
MOSC 2012
22/34
List of tools
Weka R language Octave (as replacement for Matlab)
MOSC 2012
22/34
List of tools
Weka R language Octave (as replacement for Matlab) Python Sci-py with Matplotlib
MOSC 2012
22/34
Figure 2: Weka
MOSC 2012
23/34
Weka
Obtained data are in numbers and visualizations Need to do some reading on how to interpret them Test with different algorithms to get the best results
MOSC 2012
24/34
R language
No merely a tool, but a language by itself Usually being used by data analysts
MOSC 2012
25/34
MOSC 2012
26/34
Octave
LABoratory)
Works almost similar like Matlab does
MOSC 2012
27/34
Python Scipy
#!/usr/bin/env python """ Example: simple line plot. Show how to make and save a simple line plot with labels, title and grid """ import numpy import pylab t = numpy.arange(0.0, 1.0+0.01, 0.01) s = numpy.cos(2*2*numpy.pi*t) pylab.plot(t, s) pylab.xlabel(time (s)) pylab.ylabel(voltage (mV)) pylab.title(About as simple as it gets,folks) pylab.grid(True) pylab.savefig(simple_plot) pylab.show()
MOSC 2012
29/34
MOSC 2012
30/34
Flowchart Conclusion
The flow
Feature Selection Feature Categorization
Weka, Octave, R
Clustering
Weka, Octave, R
Classification
scipy, octave, R
Visualization
scipy, octave, R
MOSC 2012
31/34
Flowchart Conclusion
Conclusion
MOSC 2012
32/34
Flowchart Conclusion
Conclusion
Malicious/unwanted threats from spam, scam, phishing
MOSC 2012
32/34
Flowchart Conclusion
Conclusion
Malicious/unwanted threats from spam, scam, phishing
MOSC 2012
32/34
Flowchart Conclusion
Conclusion
Malicious/unwanted threats from spam, scam, phishing
MOSC 2012
32/34
Flowchart Conclusion
Conclusion
Malicious/unwanted threats from spam, scam, phishing
MOSC 2012
32/34
Flowchart Conclusion
Conclusion
Malicious/unwanted threats from spam, scam, phishing
organization/enterprise reputation
Muhammad Najmi Ahmad Zabidi MOSC 2012 32/34
Flowchart Conclusion
Get in touch!
http://mypacketstream.blogspot.com
A This slides was created with LTEX Beamer
najmi.zabidi @ gmail.com
MOSC 2012
33/34
Flowchart Conclusion
Bibliography
Rieck, K., Trinius, P., Willems, C., and Holz, T. (2009). Automatic analysis of malware behavior using machine learning. TU, Professoren der Fak. IV. Tan, P.-N., Steinbach, M., and Kumar, V. (2005). Introduction to Data Mining, (First Edition). Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA.
MOSC 2012
34/34