Sunteți pe pagina 1din 117

International Journal of Computer Science

and Business Informatics


(IJCSBI.ORG)

ISSN: 1694-2507 (Print)


VOL 6, NO 1
ISSN: 1694-2108 (Online) OCTOBER 2013
IJCSBI.ORG
Table of Contents VOL 6, NO 1 OCTOBER 2013

An Efficient Classification Mechanism For Network Intrusion Detection System Based on Data Mining
Techniques:A Survey .......................................................................................................................... 1
Subaira A. S. and Anitha P.

Automated Biometric Verification: A Survey on Multimodal Biometrics .............................................. 1


Rupali L. Telgad, Almas M. N. Siddiqui and Dr. Prapti D. Deshmukh

Design and Implementation of Intelligence Car Parking Systems ........................................................ 1


Ogunlere Samson, Maitanmi Olusola and Gregory Onwodi

Intrusion Detection Techniques for Mobile Ad Hoc and Wireless Sensor Networks .............................. 1
Rakesh Sharma, V. A. Athavale and Pinki Sharma

Performance Evaluation of Sentiment Mining Classifiers on Balanced and Imbalanced Dataset ........... 1
G.Vinodhini and R M. Chandrasekaran

Demosaicing and Super-resolution for Color Filter Array via Residual Image Reconstruction and Sparse
Representation .................................................................................................................................. 1
Jie Yin, Guangling Sun and Xiaofei Zhou

Determining Weight of Known Evaluation Criteria in the Field of Mehr Housing using ANP Approach .. 1
Saeed Safari, Mohammad Shojaee, Mohammad Tavakolian and Majid Assarian

Application of the Collaboration Facets of the Reference Model in Design Science Paradigm ............... 1
Lukasz Ostrowski and Markus Helfert

Personalizing Education News Articles Using Interest Term and Category Based Recommender
Approaches ....................................................................................................................................... 1
S. Akhilan and S. R. Balasundaram
International Journal of Computer Science and Business Informatics

IJCSBI.ORG

An Efficient Classification
Mechanism For Network Intrusion
Detection System Based on Data
Mining Techniques:A Survey
Subaira A. S.
PG Scholar
Dr N. G. P. Institute of
Technology Coimbatore, India

Anitha P.
Assistant Professor
Dr.N. G. P. Institute of Technology
Coimbatore, India

ABSTRACT
In spite of growing information system widely, security has remained one hard-hitting area
for computers as well as networks. In information protection, Intrusion Detection System
(IDS) is used to safeguard the data confidentiality, integrity and system availability from
various types of attacks. Data mining is an efficient artifice that can be applied to intrusion
detection to ascertain a new outline from the massive network data as well as it use to
reduce the strain of the manual compilations of the normal and abnormal behaviour
patterns. This work reviews the present state of data mining techniques and compares
various data mining techniques used to implement an intrusion detection system such as,
Support Vector Machine, Genetic Algorithm, Neural network, Fuzzy Logic, Bayesian
Classifier, K-Nearest Neighbour and decision tree Algorithms by highlighting the
advantages and disadvantages of each of the techniques.
Keywords
Classification, Clustering, Intrusion Detection System, Data mining, Anomaly detection,
Misuse Detection
1. INTRODUCTION
In the era of information society, as network-based computer systems
play fundamental roles, they have become the target for intrusions by
attackers and criminals. Intrusion prevention technique such as firewalls,
user authentication, information protection and data encryption have failed
to completely shield networks and systems behaviour from the growing and
sophisticated attacks and malwares. To protect the computers and networks
from various cyber-attacks and viruses the Intrusion Detection Systems
(IDS) are designed. An IDS is a mechanism that monitors network or

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 1


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
system actions for malicious activities and produces reports to a
management station [1].
As a significant application area of data mining is intrusion detection
based on data mining algorithms, aims to solve the troubles of analyzing
enormous volumes of data [8]. IDSs build efficient clustering and
classification models to distinguish normal behaviour from abnormal
behaviour using data mining techniques. This study makes foundation in
this field of research and exploration and implements intrusion detection
model system based on data mining technology.

2. TRADITIONAL INTRUSION DETECTION


There are two types of traditional intrusion detection system
2.1 Anomaly Detection
It refers to detection of abnormal behaviour of host or network. It actually
refers to storing features of users usual behaviour hooked on database, and
then it compares users present behaviour with database. If any deviation
occurs, then the data tested is abnormal [6]. The patterns detected are called
anomalies. Anomalies are also referred to as outliers.
2.2 Misuse Detection
In misuse detection approach, it defines abnormal system behaviour at first,
and then defines any other behaviour, as normal behaviour. It assumes that
detecting abnormal behaviour at first has a simple to define model. It
produce high intrusion detection rate and raise low percentage of false
alarm. However, it fails in discovering the non-pre-elected attacks in the
feature library, so it cannot detect the abundant new attacks [16].
2.3 Host Based IDS
It refers to intrusion detection that takes place on a single host system. It
gets audit data from host audit trails and monitors activities such as integrity
of system, file changes, host based network traffics, and system logs. If
there is any unlawful change or movement is detected, it alerts the user by a
pop-up menu and informs the central management server. Central
management server blocks the movement or a combination of the above
three [17]. The judgment should be based on the strategy that is installed on
the local system.
2.4 Network Based IDS
It is used to supervise and investigate network transfer to protect a system
from network-based threats. It tries to detect malicious activities such as
denial-of-service (Dos) attacks and network traffic attacks. Network based
IDS includes a number of sensors to monitors packet traffic, one or more
servers for network management functions, and one or more management
relieves for the human interface [18].

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 2


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
2.5 Hybrid Intrusion Detection
The recent development in intrusion detection is to combine both types host-
based and network-based IDS to design hybrid systems. Hybrid intrusion
detection system has flexibility and it increases the security level. It
combines IDS sensor locations and reports attacks are aimed at particular
segments or entire network [28].

3. TYPES OF ATTACKS
3.1 Dos attack
A denial-of-service attack or distributed denial-of-service attack is an
effort to make a computer resource out of stock to its indented users
[32].This type of attack slows down the system or shut down the system so
it disrupt the service and deny the legitimate authorized user. Due to this
attack high network traffic occurs [15].
3.2 User to Root Attack (U2R)
In this type of attack, the attacker starts with user level like taking down
the password, dictionary attack and finally attacker achieves root to access
the system.
3.3 Probing
In this type of attack, an attacker examines a network to gather
information or discover well-known vulnerabilities. An attacker who has a
record, of which machines and services are accessible on a known network,
can make use of this information to look for delicate points.
3.4 Remote to User Attack (R2U)
In this type of attack, an attacker has the capability to send packet to a
machine over a network but does not have an account on that machine,
make use of some vulnerability to achieve local access as a user of that
machine.
3.5 Eavesdropping attack
Eavesdropping is a network layer attack consisting of capturing packets
from the network transmitted by others' computers and reading the sensitive
information like passwords, session tokens, or any kind of confidential
information.
3.6 Man-In-The-Middle Attack
In this the attacker makes independent connections with the victims and
relays messages between them and making them believe that they are
talking directly to each other over a private connection, but the fact is that
the entire conversation is controlled by the attacker.
4. DRAWBACKS OF IDS
This Intrusion Detection Systems (IDS) have become an important
component in security infrastructures as they permit networks
administrators to identify policy variations. These policy violations range

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 3


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
from outside attackers trying to gain unconstitutional access to intruders
abusing their access. Current IDS have a number of considerable drawbacks.
4.1 False Positives
A major problem is the amount of false positives IDS will produce.
Developing distinctive signatures is a complicated task. It is much trickier to
pick out a legitimate intrusion attempt if a signature also alerts regularly on
valid network activity.
4.2 False Negatives
In these IDS does not generate an alert when an intrusion is actually taking
place. It simply put if a signature has not been written for a particular
exploit there is a tremendously good chance that the IDS will not detect it.
Table 1: Performance Measure

Intrusion Normal

Intrusion True Positives(TP) False Negatives(FN)

Normal False Positives(FP) True Negatives(TN)


To identify the Accuracy Rate and False Positive rate the following two
formulas can be used
+
Accuracy Rate =
++ +
False Positive =
+
5. DATA MINING ASSISTS IN INTRUSION DETECTION
The central theme of intrusion detection using data mining approach is to
detect the security violations in information system. Data mining can
process large amount of data and it discovers hidden and ignored
information. To detect the intrusion, data mining consists of following
processes such as classification, clustering, and regression [3]. It monitors
the information system and raises alarms when security violations are
founded.

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 4


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Fig. 1[30]: The Data Mining Process of Building ID Models


5.1 A. Support Vector Machine (SVM)
SVM is a learning method for the Classification and Regression analysis of
both linear and nonlinear data. It uses a hypothesis space of linear functions
and maps input feature vectors into a higher dimensional space all the way
through some nonlinear mapping [2].SVM constructs a hyper plane or set of
hyper planes only the good separation is achieved by the hyper plane. The
hyper plane searching process in SVM is achieved by the leading margin [7]
[13]. The related margin gives the major separation between classes. While
training an SVM it creates a quadratic optimization problem [4].
In SVM the classifier is created by linear separating hyper plane but all the
linear separation cannot be solved in the original input space. SVM uses a
function called kernel to solve this problem. The Kernel transforms linear
problem into nonlinear one by mapping into feature spaces. Radial basis
function, polynomial, two layer sigmoid neural nets are the some of the
kernel functions. At the time of training classifier, user may provide one of
these functions, which selects support vectors along the surface of this
function. The implementation of SVM tries to accomplish maximum
separation between the classes [25]. Intrusion detection system has two
phases: training and testing. SVMs can learn a larger set of patterns and be
able to provide better classification, because the categorization difficulty
does not depend on the dimensionality of the feature space. SVMs also have
the ability to update the training patterns dynamically whenever there is a
new pattern during classification [11].
5.2 Genetic Algorithms
Genetic algorithms were initially introduced in the meadow of
computational biology. After that they have been bloomed into various
fields with promising result [24]. Nowadays the researchers have tried to
incorporate this algorithm with IDSs. Using Genetic approach, in 1995

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 5


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
Giordana and Neri has proposed one intrusion detection algorithms called
REGAl. The REGAL System is based on distributed genetic algorithm.
REGAL is a concept learning system that learns First Order Logic multi-
model concept descriptions. The learning examples are stored in relational
database that are represented as relational tuples.
Gonzalez and Dasgupta [26] applied a genetic algorithm, though they were
examined host based IDSs, not network based. They used the algorithm only
for the Meta learning step instead of running algorithm directly on the
feature set. It uses the statistical classifiers for labelled vectors. A 2-bit
binary encoding methodology is used for identifying the abnormality of a
particular feature, ranging from normal to abnormal. Chittur [27] used a
genetic algorithm with decision tree. Decision tree is used to represent the
data. They used the high detection rate that reduces the false positive rate.
The false positive occurrence was minimized by utilizing human input in a
feedback loop [10].
5.3 K-nearest Neighbour
K-Nearest Neighbour (k-NN) is a type of Lazy learning, it simply stores a
given training tuple and waits until it is given a test tuple. It is an instance
based learner that classifies the objects based on closet training examples in
the feature space. For a given unknown tuple, a k-Nearest neighbour looks
the pattern space for the k-training tuples that are closest to the unknown
tuple. It is the simplest algorithm among all the machine learning
algorithms. Here the object is classified by a majority vote of its neighbours.
The object is simply assigned to the class of its neighbour only in the case of
K=1. For a target function this algorithm uses all labelled training instances
model. To obtain the optimal hypothesis function algorithm uses similarity
based search. The intrusion is detected with the combination of statistical
schemes. This technique is computationally expensive and requires efficient
storage for implementation of parallel hardware.
5.4 Neural Networks
Neural Network was traditionally used to refer a network or biological
neurons. In [20], IDS neural network has been used for both anomaly and
misuse intrusion detection. In anomaly intrusion detection the neural
networks were modelled to recognize statistically significant variations from
the users recognized behaviour also identify the typical characteristics of
system users. In misuse intrusion detection the neural network would collect
data from the network stream and analyse the data for instances of misuse
[22]. In neural network the misuse intrusion detection can be implemented
in two ways. The first approach incorporates the neural network component
into an existing system or expert system. This method uses the neural
network to sort the incoming data for suspicious events and forward them to
the existing and expert system. This improves the efficiency of the detection

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 6


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
system. The second method uses the standalone misuse detection system.
This system receives data from the network stream and analyses it for
misuse intrusion. It has the ability to learn the characteristics of misuse
attacks and identify instances that are unlike any which have been observed
before by the network. It has high degree of accuracy to recognize known
suspicious events. Generally, it is used to learn complex nonlinear input-
output relationships [12].
5.5 Bayesian Classifier
A Bayesian Classifier provides high accuracy and speed for handling large
database. In network model Bayesian classifier encodes the probabilistic
relationship among the variable of interest. In intrusion detection this
classifier is combined with statistical schemes to produce higher encoding
interdependencies between the variables and predicting events. The
graphical model of casual relationships performs learning technique. This
technique is defined by two components-a directed acyclic graph and a set
of conditional probability tables. Direct Acyclic Graph (DAG) represents a
random variable, which may be discrete or continuous. For each variable
classifier maintain one conditional probability table (CPT) and it requires
higher computational effort.
5.6 Decision Tree
Decision tree is a classification technique in data mining for predictive
models. Decision tree is a flowchart like tree structure where internal node
represents a test on attribute, branch represents an outcome of the test and
leaf node represents a class label. From the pre classified data set it
inductively learns to construct the models. Here each data item is defined by
the attribute values. Initially decision tree is constructed by set of pre-
classified data. The important approach is to select the attributes, which can
best divide the data items into their respective classes based on these
attributes the data item is partitioned [5].
This process is iteratively applied to each partitioned subset of the data
items. If all the data items in current subset belongs to the same class then
the process get terminated. Each node contains the number of edges, which
are labelled along with a possible value of attribute in the parent node. An
edge connects either a node or two nodes. Leaves are always labelled with a
decision value for classification of the data [21]. To classify an unidentified
object, the process is started at the root of the decision tree and followed the
branch. Decision trees can be used for misuse intrusion detection that can
learn a model based on the training data and predict the future data from the
various types of attacks. It works well with large data sets. Decision tree
model can also be used in the rule-based techniques with minimum
processing. It provides high generalization accuracy [9].

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 7


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
5.7 Fuzzy Logic
Fuzzy logic is derived from fuzzy set theory; it uses the rule based systems
for classification. Fuzzy can be thought of as the application side of fuzzy
set theory dealing with sound thought out real world expert values for a
complex problem[29].The fuzzy related data mining techniques is used to
extract the patterns behaviour. The sets of fuzzy association rules are used to
mine the network audit data models and to detect the anomalous behaviour
the set of fuzzy association rules are generated [30][31].The audit data and
mined normal data have been compared to identify the similarity. If the
similarity values are below an upper limit, an alarm raises [14].
6. A COMPARATIVE ANALYSIS OF DATA MINING
TECHNIQUES FOR INTRUSION DETECTION SYSTEM
TABLE1. GENERAL CLASSIFIER COMPARISON
Classifier Method Advantages Disadvantages

Support A support vector 1. High Accuracy.


1. High
Vector machine is a 2. Able to model
algorithmic
Machine classification and complex and complexity and
regression technique nonlinear decision
extensive
it constructs a hyper boundaries. memory
plane or set of hyper 3. Less prone to
requirement.
planes in a high or over fitting than
2. The choice of
infinite dimensional other methods.the kernel is
space. difficult.
3. The training
and testing speed
is slow
Genetic Genetic algorithm 1. It solves every 1. No global
Algorithm learning examples optimisation optimum.
are stored in problem.
relational database 2.It solves the 2.No constant
that are represented problems with optimization
as relational tuples multiple solutions response time
3. Easily
transferred to
existing models.

K Nearest An object 1. Analytically 1.High storage


Neighbour classification tractable. requirements
process is achieved 2. Implementation 2. Highly
by the majority vote task is simple. susceptible to the
of its neighbours. 3.Highly adaptive curse of

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 8


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
The object is being behaviour dimensionality.
assigned to the class 4.Easy for parallel 3. Slow in
most common implementations classifying and
amongst its k nearest testing tuples.
neighbours. If k = 1,
then the object is
simply assigned to
the class of its
nearby neighbour
Neural A Neural Network is 1. Requires less 1. Process is
Network an adaptive system formal statistical black box.
that changes its training. 2. Greater
structure based on 2. Implicitly computational
external or internal detect the burden.
information that complex 3. Over fitting.
flows through the nonlinear 4. It Requires
network during the relationships long training
learning phase. between time.
dependent and
independent
variables.
3. Highly tolerate
the noisy data.
4. Availability of
multiple training
algorithms.
Bayesian classifier 1. Nave Bayesian 1. The
Bayesian based on the rules. It classifier assumptions
Method uses the joint simplifies the made in class
probabilities of computations. conditional
sample classes and 2. Exhibit high independence.
observations. The accuracy and 2.Lack of
algorithm tries to speed when available
estimate the applied to large probability data
conditional databases.
probabilities of
classes given an
observation.
Decision tree 1. Construction 1. Output
Decision initially builds a tree does not require attribute must be
Tree with classification. any domain categorical.
Each node knowledge. 2. Limited to
represents a binary 2. Can handle one output

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 9


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
predicate on one high dimensional attribute.
attribute, one branch data. 3. Decision tree
represents the 3. Representation algorithms are
positive instances of is easy to unstable.
the predicate and the understand. 4. Trees created
other branch 4. Able to process from numeric
represents the both numerical datasets can be
negative instances. and categorical complex.
data.

The fuzzy logic has 1. Uses linguistic 1. Hard to


Fuzzy been used for both variables. develop a model
Logic anomaly and misuse 2. Allows from a fuzzy
intrusion detection. imprecise inputs. system.
3Permits fuzzy 2. Require more
thresholds fine tuning and
4.Reconciles simulation
conflicting before
objectives operational.
5.Rule base or
fuzzy sets easily
modified

7. CONCLUSIONS
In this paper, many data mining techniques have been proposed to improve
the classification mechanism of Network Intrusion Detection. Different
classifiers have different knowledge to solve the problem, so combining
more than one data mining algorithm is used to remove the demerits of one
another and a number of trained classifier lead to a superior performance
than any single classifier. Combining SVM with Genetic algorithm takes
advantages in accuracy rate and optimization result. Likewise, while
combining k-nearest approach with Decision tree produces a faster
classification result and handles high dimensional data. Overall, these
techniques provide better performance in Intrusion Detection accuracy rate
and faster running time. To fragment a complex problem into sub problems
for which the solutions obtained are simpler to realize, execute, supervise
and update.

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 10


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
REFERENCES
[1] W. Lee, S.J. Stolfo, K.W. Mok, A data mining framework for building intrusion
detection models, in: Proceedings of IEEE Symposium on Security and Privacy, 1999,
pp. 120132.
[2] W. Feng, Q. Zhng, G. Hu, J Xiangji Huang, Mining network data for intrusion
detection through combining SVMs with ant colony networks Future Generation
Computer Systems,2013.
[3] T. Zhang, R. Ramakrishnan, M. Livny, BIRCH: an efficient data clustering method
for very large databases, in: Proceedings of SIGMOD, ACM, 1996, pp. 103114.
[4] L. Khan, M. Awad, B. Thuraisingham, A new intrusion detection system using
support vector machines and hierarchical clustering, The VLDB Journal 16(2007)
507521
[5] X. Xu, Adaptive intrusion detection based on machine learning: feature extraction,
classifier construction and sequential pattern prediction, Information Assurance and
Security 4 (2006) 237246.
[6] J.X. Huang, J. Miao, Ben He, High performance query expansion using adaptive co
training, Information Processing & Management 49 (2) (2013) 441453.
[7] Y. Li u, X. Yu, J.X. Huang, A. An, Combining integrated sampling with SVM
ensembles for learning from imbalanced datasets, Information Processing
&Management 47 (4) (2011) 617631.
[8] V. Vapnik, The Nature of Statistical Learning Theory, Springer, 1999.
[9] Marcelloni, combining supervised and unsupervised learning for data clustering,
Neural Computing & Applications 15 (34) (2006)289297.
[10] C.-F. Tsai, Y.-F. Hsu, C.-Y. Lin, W.-Y. Lin, Intrusion detection by machine learning:
a review, Expert Systems with Applications 36 (2009) 1199412000.
[11] S.X. Wu, W. Banzhaf, The use of computational intelligence in intrusion detection
systems: a review, Applied Soft Computing 10 (2010) 135.
[12] H. Brahmi, I. Brahmi, S.B. Yahia, OMC-IDS: at the cross-roads of OLAP mining and
intrusion detection, in: Advances in Knowledge Discovery and Data Mining, in:
LNCS, vol. 7302, 2012, pp. 1324.
[13] S.-J. Horng, M.-Y. Su, Y.-H. Chen, T.-W. Kao, R.-J. Chen, J.-L. Lai, C.D. Perkasa, A
novel intrusion detection system based on hierarchical clustering and support vector
machines, Expert Systems with Applications 38 (2011) 306313.
[14] Q. Zhang, G. Hu, W. Feng, and Design and performance evaluation of a machine
learning-based method for intrusion detection, in: Software Engineering, Artificial
Intelligence, Networking, and Parallel/Distributed computing, in: Studies in
Computational Intelligence, vol. 295, Springer, 2010, pp. 6983.
[15] T.A. Longstaff, J.T. Ellis, S.V. Hernan, H.F. Lipson, R.D. McMillan, L.H. Pazente,D.
Simmel, Security of the Internet, in: F. Froehlich, A. Kent (Eds.), The
Froehlich/Kent Encyclopedia of Telecommunications. Vol. 15, Marcel Derrek, 1998,
pp. 231254.
[16] S. Axelsson, Research in intrusion detection systems a survey, in: Tech. Rep.TR98-
17, Chalmers University of Technology, Goteborg, Sweden, 2000.
[17] S. Freeman, J. Branch, Host-based intrusion detection using user signatures, in:
Proceedings of the Research Conference RPI., 2002.

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 11


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
[18] D. Marchette, A statistical method for profiling network traffic, in: Proceedings f
Workshop on Intrusion Detection and Network Monitoring, 1999,pp. 119128.
[19] T.F. Lunt, A survey of intrusion detection techniques, Computers and Security12 (4)
(1993) 405418.
[20] J. Ryan, M.-J. Lin, R. Miikkulainen,Intrusion detection with neural networks, in:
Proceedings of AAAI-97 Workshop on AI Approaches to Fraud Detection and Task
Management, 1997, pp. 9297.
[21] H. Teng, K. Chen, S. Lu, Security audit trail analysis using inductively generated
predictive rules, in: Proceedings of the 6th Conference on Artificial Intelligence
Applications, Vol. 1, 1990, pp. 2429.
[22] ] D.E. Denning, An intrusion-detection model, IEEE Transactions on Software
Engineering 13 (2) (1987) 222232.
[23] F. Monrose, A. Rubin, Authentication via keystroke dynamics, in: Proceedings of the
4th ACM Conference on Computer and Communications Security, 1997
[24] Neri, F., Comparing local search with respect to genetic evolution to detect intrusion
in computer networks, In Proc. of the 2000 Congress on Evolutionary Computation
CEC00, La Jolla, CA, pp. 238243. IEEE Press, 16-19 July, 2000.
[25] Neri, F. Mining TCP/IP traffic for network intrusion detection, In R. L.de Mantaras
and E. Plaza (Eds.), Proc. of Machine Learning: ECML\2000, 11th European
Conference on Machine Learning, Volume 1810of Lecture Notes in Computer Science,
Barcelona, Spain, pp. 313322.Springer, May 31- June 2, 2000.
[26] Dasgupta, D. and F. A. Gonzalez,An intelligent decision support system for intrusion
detection and response, In Proc. of International Workshop on Mathematical
Methods, Models and Architectures for Computer Networks Security (MMM-ACNS),
St.Petersburg. Springer-Verlag, 21-23 May, 2001
[27] Chittur, A., Model generation for an intrusion detection system using genetic
algorithms, High School Honors Thesis, Ossining High School. In cooperation with
Columbia Univ, 2001.
[28] Crosbie, M. and E. H. Spafford,Active defense of a computer system agents,
Technical Report CSD-TR- 95-008, Purdue Univ. West Lafayette, IN, 15 February
1995.
[29] G. J. Klir,Fuzzy arithmetic with requisite constraints, Fuzzy Sets and Systems,
91:165175, 1997.
[30] http://wenke.gtisc.gatech.edu/project/image004.gif
[31] Luo, J.,Integrating fuzzy logic with data mining methods for intrusion detection,
Masters thesis, Mississippi State Univ., 1999.
[32] Christos Douligeris, Aikaterini Mitrokotsa, DDoS attacks and defense mechanisms:
classification and state-of-the-art ,Computer Networks: The International Journal of
Computer and Telecommunications Networking, Vol. 44, Issue 5 , pp: 643 - 666, 2004.

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 12


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Automated Biometric Verification:


A Survey on Multimodal Biometrics
Rupali L. Telgad
Research student, Dr. Babasaheb Ambedkar Marathwada University, A bad.

Almas M. N. Siddiqui
Research student, Dr. Babasaheb Ambedkar Marathwada University, A bad.

Dr. Prapti D. Deshmukh


Principal
MGMs Dr.G.Y.P.C.C.S.and I.T. Aurangabad.

ABSTRACT
In the world of computer science & Information Technology security is essential and
important issue. Identification and Authentication Techniques plays an important role while
dealing with security and integrity. The human physical characteristics like fingerprints,
face, hand geometry, voice and iris are known as biometrics. These features are used to
provide an authentication for computer based security systems. Biometric verification refers
to an automatic verification of a person based on some specific biometric features derived
from his/her physiological and/or behavioral characteristics. Biometrics is the science and
technology of measuring and analyzing biological data of human body, extracting a feature
set from the acquired data, and comparing this set against to the template set in the
database. The future in biometrics seems to belong to the multimodal biometrics (a
biometric system using more than one biometric feature) as a Unimodal biometric system
(biometric system using single biometric feature) has to contend with a number of
problems. In this paper, a survey of some of the multimodal biometrics is conducted.
Keywords
Biometrics, Unimodal Biometrics, Multimodal Biometrics, Verification, Identification,
Recognition.

1. INTRODUCTION
The term Biometric comes from the Greek word bios which mean life and
metrikos which means measure. It is well known that humans intuitively use
some body characteristics such as face, gait or voice to recognize each other.
Since, a wide variety of application requires reliable verification schemes to
confirm the ID of an individual, recognizing human on basis of their
characteristics [1].The characteristics are as follows: voice, fingerprints,
body contours, retina & iris, face, soft biometrics, etc.

Biometric systems based on single source of information are called


Unimodal systems [2]. Unimodal biometric system has some limitations of
these Systems are considered when deploying with the real World
applications. Some of the challenges encountered by these systems are

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 1


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
Noise in sensed data, Intra class variations, Inter class similarities, Non
universalities, spoof attacks [3]. Multi biometric (multimodal) systems seek
to alleviate some of the drawbacks encountered by Unimodal biometric
systems by consolidating the evidence presented by multiple biometric traits
/ sources [4]. Biometric system can also be designed to recognize a Person
based on information acquired from multiple biometric sources. Such
system is known as Multimodal biometric system. Multibiomeric system
can offer substantial improvement in the matching information being
combined and fusion methodology accuracy of a biometric system
depending upon the adopted. It addresses the issue of non universality or
insufficient population coverage. This system also effectively addresses the
problem of noisy data. These systems also help in continuous monitoring
and tracking of individual in situation when a single trait is not sufficient.
Fusion schemes should be employed to combine the information presented
by multiple biometric sources. There are various data combination levels
that can be considered, examples are the feature level, score level and
decision level [5].

This paper proceeds as follows. Section 2 presents introduction of unimodal


Biometrics, section 3 presents literature survey of Multimodal biometrics,
Section 4 describes related works in this field, etc.
2. UNIMODAL BIOMETRICS
Biometric systems based on single source of information are called
Unimodal Biometric system. Unimodal Biometric system considers the
single Biometric trait. Unimodal biometric systems rely on the evidence of a
single source of information for authentication of person. Though these
Unimodal biometric systems have many advantages, it has to face with
variety problems like [6]:

2.1 NOISY DATA


Susceptibility of biometric sensors to noise leads to inaccurate matching, as
noisy data may lead to false rejection
2.2 INTRA CLASS VARIATION
The biometric data acquired during verification will not be identical to the
data used for generating template during enrollment for an individual. This
is known as intra-class variation. Large intra-class variations increase the
False Rejection Rate (FRR) of a biometric system.

2.3 INTERCLASS SIMILARITIES


Inter-class similarity refers to the overlap of feature spaces corresponding to
multiple individuals. Large Inter class similarities increase the False
Acceptance Rate (FAR) of a biometric system.

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 2


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
2.4 NON UNIVERSALITY
Some persons cannot provide the required standalone biometric, owing to
illness or disabilities [7].

2.5 SPOOFING
Unimodal biometrics is vulnerable to spoofing where the data can be
imitated.
3. LITERATURE SURVEY ON MULTIMODAL BIOMETRICS:
[11] Proposed a multimodal biometric system using Fingerprint and Iris
features. They use a hybrid approach based on: 1) Fingerprint minutiae
extraction and 2) Iris template encoding through a mathematical
representation of the extracted Iris region. This approach was based on two
recognition modalities and every part provided its own decision. The final
decision was taken by considering the Unimodal decision through an AND
operator. No experimental results have been reported for recognition
performance.
[12] Proposed a multimodal biometric system using Finger print and Face.
They use Scale Invariant Features Transform (SIFT), Fingerprint
Verification based on Minutiae matching Technique and Feature Level
Fusion for the recognition. This paper present multimodal biometric system
based on the integration of face and fingerprint traits at feature extraction
level were presented. Both fingerprint and face images are processed with
compatible feature extraction algorithms to obtain comparable Features
from the raw data. As even in the literature, it is claimed that ensemble of
classifier operating on uncorrelated features increases the performance in
comparison to correlated features.
[13] Proposed a multimodal biometric system using Ear and Face. They use
Iterative Closest Point (ICP) algorithm, Local 3D feature, PCA. The
approach is based on local 3D features which are very fast to compute and
robust to pose and scale variations and occlusions due to hair and earrings.
An expression-robust multimodal ear-face biometric recognition approach is
proposed with fusion at the score level in this paper.
[14] Proposed a multimodal biometric system using Fingerprint and Iris.
They use PCA (Principal Component Analysis) and FLD (Fisher Linear
Discriminant) methodology for Biometric recognition. This paper presents
the difference between borda count method and logistic regression methods.
From the comparison results that rank-level fusion with the logistic
regression approach provided the better performance in terms of error rate
and increase the recognition rate of multi biometric systems, because in this
approach, weights are assigned to different matchers according to their
performance.

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 3


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
[15] Proposed a multimodal biometric system using Face. They use the
methodology PCA (Principal Component Analysis) +RMPM (Reduced
Multiple Polynomial Model) to develop the multimodal Biometric system.
This paper presents stereo face recognition formulation which combines
appearance and depth at feature level. A Reduced Multivariate Polynomial
Model was adopted to fuse the appearance and disparity images. RMPM is
extended so that the problem of new user registration can be overcome. The
face recognition approach is useful for some online application such as
visitors.
[16] Proposed a multimodal biometric system using Face and Finger Veins.
They use the LDA methodology for this system. This paper presents the
multimodal low resolution face and finger veins recognition system at score
level fusion. Proposed multimodal recognition system is very efficient to
reduce the FAR .000026 and increase GAR 97.4.The proposed system is
difficult due to the extra processing required for the feature spaces.
[17] Proposed a multimodal biometric system using fingerprint and voice.
They use Leave-One-Out Cross Validation technique (LOOCV) and
Gaussian mixture model for score level fusion. The proposed system in this
paper Optimum reliability ratio based integration weight optimization
scheme for fingerprint and voice modalities is implemented. The
performance of the system is calculated under different noise condition. One
drawback of this method is that under extreme noise conditions it gives
attenuating fusion.
[18] Proposed a multimodal biometric system using Fingerprint and Finger
Vein They use MHD(modified Hausdorff distance)algorithm as well as
Minutia extraction and matching based on turnery vector .This paper
proposed the system of score level fusion based on finger print and finger
vein. The process of recognition experiments based on homologous
biometrics database.
[19] Proposed a multimodal biometric system using palm print by using
rank level fusion. The authors in this paper have investigated the rank level
combination for palm print matchers using four different approaches, i.e.,
Borda count, weighted Borda count, Highest and product of ranks, and
Bucklin majority voting, and also proposed a new nonlinear approach for
combining the ranks. The experimental results suggested in this paper put
forward that the considerable performance improvement in the recognition
accuracy can be achieved from rank-level combinations as compared to
those from individual palm print representations
[20]Proposed a multimodal biometric system using Finger print and Face by
using Normalization method and Adaptive method. This paper studies a
population approaching 1,000 individuals which is larger. The performance
of multimodal biometric authentication systems using state-of-the-art

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 4


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
Commercial Off-the-Shelf (COTS) fingerprint and face biometric matchers
on a population approaching 1,000 individuals is studied in this paper.
[21] Proposed a multimodal biometric system using face and signature with
Score level fusion technique The performance of single modality based
biometric recognition has been suffered from the different noisy data, non-
universality of biometric data, and susceptibility of spoofing. The
multimodal biometric system can improve the performance of the system. In
this paper shows that face and signature based bimodal biometric system
can improve the accuracy rate about 10%, than single Face/signature based
biometric system.

4. MULTIMODAL BIOMETRICS
Noisy data, Infraclass Variation, Interclass Similarities, Non universality,
Spoofing etc problems are imposed by Unimodal biometric systems which
tend to increase False Acceptance Rate [FAR] and False Rejection Rate
[FRR], ultimately reflecting towards poor performance of the system. Some
of the limitations imposed by Unimodal biometrics can be overcome by
including multiple source of information for establishing identity of person
[7]. Multimodal biometrics refers to the use of a combination of two or
more biometric modalities in a Verification or Identification system. They
address the problem of non- universality, since multiple traits ensure
sufficient population coverage [8].

Multimodal biometrics also address the problem of spoofing as it concern


with multiple traits or modalities, it would be very difficult for an imposter
to spoof or attack multiple traits of genuine user simultaneously.
Multimodal biometric system has the potential to be widely adopted in a
very broad range of civilian applications: banking security such as ATM
security, check cashing and credit card transactions, information system
security like access to databases via login privileges. A decision made by a
multimodal biometric system is either a genuine individual" type of
decision or an imposter" type of decision. In principle, Genuine
Acceptance Rate [GAR], False Rejection Rate [FRR], False Acceptance
Rate [FAR] and Equal Error Rate [ERR] is used to measure the accuracy of
system. Generally multimodal biometrics operates in two phases i.e.
Enrollment phase and authentication phase which are described as follows
[9]:

4.1 ENROLLMENT PHASE


In enrollment phase, biometric traits of a user are captured and these are
stored in the system database as a template for that user and which is further
used for authentication phase.

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 5


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
4.2 AUTHENTICATION PHASE
In authentication phase, once again traits of a user captured and system uses
this to either identify or verify a person. Identification is one to many
matching which involves comparing captured data with templates
corresponding to all users in database while verification is one to one
matching which involves comparing captured data with template of claimed
identity only [7].

5. METHODS FOR MULTIMODAL FUSION


The fusion methods are divided into the following three categories: rule-
based methods, classification based methods, and estimation-based methods
[10]. This categorization is based on the basic nature of these methods and it
inherently means the classification of the problem space, such as, a problem
of estimating parameters is solved by estimation-based methods. Similarly
the problem of obtaining a decision based on certain observation can be
solved by classification-based or rule based methods. However, if the
observation is obtained from different modalities, the method would require
fusion of the observation scores before estimation or making a classification
decision. The fusion methods are divided into the following three
categories: rule-based methods, classification based methods, and
estimation-based methods [10]. This categorization is based on the basic
nature of these methods and it inherently means the classification of the
problem space, such as, a problem of estimating parameters is solved by
estimation-based methods. Similarly the problem of obtaining a decision
based on certain observation can be solved by classification-based or rule
based methods. However, if the observation is obtained from different
modalities, the method would require fusion of the observation scores
before estimation or making a classification decision.

5.1 RULE-BASED FUSION METHODS

The rule-based fusion method includes a variety of basic rules of combining


multimodal information. These include statistical rule-based methods such
as linear weighted fusion (sum and product), MAX, MIN, AND, OR,
majority voting. There are custom-defined rules that are constructed for the
specific application perspective. The rule-based schemes generally perform
well if the quality of temporal alignment between different modalities is
good.

5.2 CLASSIFICATION-BASED FUSION METHODS


This category of methods includes a range of classification techniques that
have been used to classify the multimodal observation into one of the pre-
defined classes. The methods in this category are the support vector

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 6


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
machine, Bayesian inference, DempsterShafer theory, dynamic Bayesian
networks, neural networks and maximum entropy model. Note that we can
further classify these methods as generative and discriminative models from
the machine learning perspective. For example, Bayesian inference and
dynamic Bayesian networks are generative models, while support vector
machine and neural networks are discriminative models.

5.3 ESTIMATION-BASED FUSION METHODS


The estimation category includes the Kalman filter; extended Kalman filter
and particle filter fusion methods. These methods have been primarily used
to better estimate the state of a moving object based on multimodal data. For
example, for the task of object tracking, multiple modalities such as audio
and video are fused to estimate the position of the object.

6. CONCLUSIONS
Unimodal biometric systems fail in case of lack of proper biometric data for
a particular trait. It is robust to use multiple biometrics for providing
authentication .We have observed that multimodal biometrics overcome the
problems related with Unimodal biometrics like noisy data, interclass
similarities, intra class variation, non universality and spoofing. There are
many multimodal biometric systems in existence for authentication of a
person but still selection of appropriate modals, choice of optimal fusion
level and redundancy in the extracted features are some challenges in
designing multimodal biometric system that needs to be solved.

7. ACKNOWLEDGMENTS
We are thankful to our Guide Dr. P. D. Deshmukh for providing valuable
guidance and technical support.

REFERENCES
[1] Neena Godbole, Information Security System, Wiley Publication.
[2] Prof. V. M. Mane and Prof. (Dr.) D. V. Jadhav, Review of Multimodal Biometrics:
Applications, challenges and Research Areas, International Journal of Biometrics
and Bioinformatics (IJBB), Volume 3, Issue 5.
[3] Arun A. Ross, Karthik Nandakumar, Anil K. Jain, Hand book of Multibiometrics,
Springer International Edition.
[4] Arun Rossa and Rohin Govindarajanb, Feature Level Fusion Using Hand and Face
Biometrics, Appeared in Proc. of SPIE conference on biometric technology for
human identification II.
[5] Fortuna, J., Sivakumaran, P., Ariyaeeinia, A. and Malegaonkar, A., 2004. Relative
effectiveness of score normalization methods in open-set speaker identification.
Proc.IEEE Speaker and language Recognition Workshop (Odyssey'04), pp. 369-376.
[6] P. S. Sanjekar and J. B. Patil ,AN OVERVIEW OF MULTIMODAL BIOMETRICS,
Signal & Image Processing : An International Journal (SIPIJ) Vol.4, No.1, February
2013.

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 7


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
[7] M. Golfarelli, D. Maio and D. Maltoni,On the Error-reject Tradeoff in Biometric
Verification Systems, IEEE Trans. on Pattern Analysis and Machine Intelligence, vol.
19, no. 7, pp.786-796, July 1997.
[8] A. Ross and A. Jain, Information Fusion in Biometrics, Journal of Pattern
Recognition Letters,vol. 24, pp. 21 15-2125, 2003.
[9] V. Mane and D. Jadhav, Review of Multimodal Biometrics: Applications, Challenges
and Research Areas, International Journal of Biometrics and Bioinformatics (IJBB),
vol. 3, no. 5, pp. 90-95, 2009.
[10] Dapinder Kaur , Gaganpreet Kaur ,Level of Fusion in Multimodal Biometrics: a
Review International Journal of Advanced Research in Computer Science and
Software Engineering 3(2), February 2013.
[11] F. Besbes, H. Trichili, and B. Solaiman, Multimodal biometric system based on
Fingerprint identification and Iris recognition, in Proc. 3rd Int. IEEE Conf. Inf.
Commun. Technol.: From Theory to Applications (ICTTA 2008), pp. 15. DOI:
10.1109/ ICTTA.2008.4530129.
[12] A. Rattani, D. R. Kisku, M. Bicego, Member, IEEE and M. Tistarelli, Feature level
fusion of face and finger Biometric.
[13] S.M.S. Islam, M. Bennamoun, A.S. Mian, d R. Davies, Score Level Fusion of Ear
and Face Local 3DFeatures for Fast and Expression-Invariant Human Recognition,
ICIAR 2009, LNCS 5627, pp. 387396, 2009.Springer-Verlag Berlin Heidelberg
2009.
[14] N. Radha, A. Kavitha, Rank Level Fusion Using Fingerprint and Iris Biometric,
Indian Journal of Computer Science and Engineering (IJCSE) ISSN: 0976-5166 Vol.
2 No. 6 Dec 2011-Jan 2012.
[15] Jian-Gang Wang , Kar-Ann Toh, Eric Sung , Wei-Yun Yau, A Feature-level Fusion
of Appearance and Passive Depth Information for Face Recognition, Source: Face
Recognition, Book edited by:Kresimir Delac and Mislav Grgic, ISBN978-3-902613-
03-5, pp.558, I-Tech,Vienna, Austria, June2007
[16] Muhammad Imran Razzak, Muhammad Khurram Khan Khaled Alghathbar,Rubiyah
Yusof, Multimodal Biometric Recognition Based on Fusion of Low Resolution Face
and Finger Veins , International Journal of Innovative Computing, Information and
Control ICIC International 2011 ISSN 1349-4198 Volume 7, Number 8, August 2011
pp. 4679{4689}.
[17] Anzar S .M. Sathidevi P. S. Optimal Score Level Fusion using Modalities Reliability
and Separability Measures, International Journal of Computer Applications (0975 -
8887) Volume 51 - No. 16, August 2012.
[18] Feifei CUI, Gongping YANG, Score Level Fusion of Fingerprint and Finger Vein
Recognition, F. Cui et al. Journal of Computational Information Systems 7:16
(2011) 5723-5731 5724.
[19] Ajay Kumar, Senior Member, IEEE, Sumit Shekhar, Personal Identification Using
Multibiometrics Rank-Level Fusion, IEEE TRANSACTIONS ON SYSTEMS,
MAN, AND CYBERNETICSPART C: APPLICATIONS AND REVIEWS.
[20] Robert Snelick, Umut Uludag, Alan Mink, Michael Indovina and Anil Jain, Large
Scale Evaluation of Multimodal Biometric Authentication Using State-of-the-Art
Systems IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 27,
No. 3, Mar 2005, pp 450-455.
[21] Kazi M. M. , Rode Y.S. ,Dabhade S. B. , Al-dawla N. N. H. ,Mane A.V. , Manza R.
R. ,Kake K.V. Multimodal Biometric System Using Face And Signature : A Score
Level Fusion Approach, Advances in Computational Research ISSN: 0975-3273 &
E-ISSN: 0975-9085, Volume 4, Issue 1, 2012.

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 8


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Design and Implementation of


Intelligence Car Parking Systems
Ogunlere Samson
Computer Science, Babcock University,
Ilisan Remo, Ogun State, Nigeria

Maitanmi Olusola
Computer Science, Babcock University,
Ilisan Remo, Ogun State, Nigeria

Gregory Onwodi
School of Science & Technology,
National Open University of Nigeria

ABSTRACT
The intelligent Car Park System is a system designed to prevent problems usually
associated with car parks. This study would cover a wireless transmitter and receiver which
is the sensor. The principle of electromagnetic field would be employed in which an
inductor would be wound and buried under the entry and exit post of parking garage as a
sensor to detect the metal under the vehicle; once the vehicle passes across the searching
coil and it automatically opens the parking garage door and closes it after a few minutes.
This aimed at solving congestion, indiscriminate parking and the problem of locating
empty parking lots. Others significances of the paper are: to eliminate the need for manual
operations and make life easier and secure for car owner and eradicates human
inconsistencies. It also examines in details how the automatic gate system works and to
understand the concepts involved so as being able to incorporate such into an
intelligent Car Parking System.

Keywords
Car parking systems, electromagnetic field.

1. INTRODUCTION
For over thirty years, traffic information has been provided to help motorists
make en-route decisions. The development of Intelligent Transportation
Systems (ITS) and Advanced Traffic Management Systems (ATMS) have
begun to improve transportation through the use of technology. Along the
same lines, system like Intelligent Vehicle Highway Systems (IVHS),
acquire, analyse, communicate, and present information to assist surface
transportation. Travellers are moving from a starting location to their
desired destination. Data from IVHS can now be utilized as information for
en-route assistance as well as collection of traffic data. Information
Technology is beginning to recognize the importance of post-trip
information dissemination by providing information on the location and

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 -1-


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

availability of parking. Real-time information can be accurately provided to


motorists through Intelligent Parking Systems (IPS) to reduce congestion in
or near parking areas, insufficient utilization of the available parking space
stock, road congestion caused by space-searching traffic, access problems
and safety hazards caused by illegal parking and environmental strains[1],
[5], [2]. Recently, people find it easy when using the parking systems
because the system is fully automatic [2], [3]. The complexity calls for
better management of the parking system which involves technical
improvement in the system used. Additionally, users of the parking system
commented that bigger parking space means they have to spend more time
to find parking area [3].

1.2 Specific Objectives


To design and implement an intelligent car parking system for both
domestic and an official environment.
To alert the customers when the parking garage is filled and when
vehicles are not allowed into the garage until spaces are available.

2 LITERATURE REVIEW OF SIMILAR PROJECTS

2.1 Parking Guidance in Tapiola


Difficulty in finding an unreserved parking space in developing countries is
a major challenge. In a research according to [4] opined that insufficiency of
Car packs are not only the problems but also, the availability of the location
of the available space. The searching traffic, caused by this lack of
information, can in some cases be estimated to account for as much as 20 to
30% of the total traffic in cities. The unnecessary traffic generated by
searching for vacant parking spaces aggravates congestion on streets and
increases the volumes of traffic. The general purpose of parking guidance is
to guide the driver to a suitable parking space along a suitable route and
thus reduce searching traffic. The paper outlines the research done by [1]in
Tapiola, Finland in the year 1992.The system consists of a series of gate-
arm counters and induction loop detectors located at the entrance and exits
of the car parks count the vehicles going in and out of the car parks. This
data is processed in car park counting units and sent to the control center.
The central computer assesses the overall situation and decides what to
display in each of the 19 changeable message signs on the streets. With this
available data, drivers are aware of parking vacancies and locations thereby
reducing traffic congestions.
2.2 Vehicle Arrival/Departure Management System Using Remote
Bar Code Readers

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 -2-


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Research carried out by [2] discusses the development of a remote bar code
reader-applied management system for arriving and departing vehicles. The
system designed for use at a substation or similar facility, scans a bar code
sticker on the windshield of an approaching vehicle and sends signals to
open a motor-driven entrance gate, while at the same time automatically
recording the type of vehicle, license plate number, vehicle-owners name,
time of arrival/departure and other relevant data.
2.3 Siespace Car Park Control and Information Systems
Siemens car park control and information system, SIESPACE, guides the
motorist to vacant parking areas, SIESPACE is a modular system designed
for use in an urban network where a choice of car parking is available. A
computer based in station collates information and determines the
appropriate messages to be displayed on guidance signs.
2.4 Saint Paul Advanced Parking Information System
Developments in Minnesota, (1995) shows that Advanced Parking
Information System was deployed in late 1995 and early 1996 as a test case
under the Minnesota Guidestar Program. Minnesota Guidestar provides
overall direction for the Minnesota Department of Transportations
(MnDOT). It is a program that provides a focus for Strategic planning,
project management and evaluation. The APIS itself was utilized in the
Downtown Saint Paul area to inform drivers of parking location and
availability so that they would have the opportunity to make advance
decisions hopefully helping reduce congestion and Pollution. The goal of
MnDOT is to use the system continuously however, the APIS was only
deployed during events in which the Civic Center attracted greater than
2,000 visitors, the Music Center attracted greater than 1,000 visitors, or a
combination of events occurred simultaneously.
The system operates using loop detectors, ticket splitters or cash registers as
vehicle counting equipment located at each garage or lot. A controller
interface is also required since the equipment is not capable of calculating a
space available number. This interface counts and calculates space
availability as each car enters or exits the lot in real time. This number is
then transmitted via modem to a central computer at the city. The central
computer then sends the required signal along to the variable message signs
with the help of MnDOTs Microsoft-based Ramp Management Software.
The operator controls the system from the central computer and, at any
time, has the ability to read and modify sign messages, correct parameters,
check the state of an entire mounted electronic sign mast or parking facility,
and take appropriate action as required [5].

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 -3-


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

2.5 Security Assured Car Park


Derbyshire based Parksafe Systems Ltd is the first car park operator in the
UK to guarantee the safety of vehicles, and their contents. The system was
installed in Derby city councils Bold Lane car park, under a partnership
agreement. Operations started in January 1998 and the company claims,
with justification, to have created the most secure public car system. In two
years of operation there has not been a single theft of either a vehicle or its
contents. This contrasts with seven vehicles stolen and 71 broken into
during the year ended December 1997.
Bold Lane car par is an edge of city center multi-storey car park with 440
spaces. It has automatic entry and exit gates, individual bay sensors, secure
pedestrian entry points, sophisticated close circuit television (CCTV) and
panic buttons every 15 metres on the parking decks and in the stair
wells.The strengths of the Parksafe system are that not only is casual access
prevented but also that it does not rely on the vigilance of its attendants to
monitor CCTV cameras. In addition to cameras in stairwells and pedestrian
walkways there is one covering every four parking spaces. However, the
relevant camera is only switched on, and an attendant alerted, if a bay
sensor is activated or a panic button is pushed. In that event, attendants are
able to observe record and take appropriate action.

3 METHODOLOGY

Circuit design and Analysis


The methodology employed in this project is experimental where all the
following mentioned devices and components would be used.

3.1 Principle of Operation


The principle of operation is based on a sensor, which detects the car
entering and exiting the building. The garage door automatically opens,
once the vehicle is moving towards the building gate and closes after car has
passed through the gate. Three displays are used for the monitoring of
ENTRY / EXIT. The sensor is installed at the doors post of the parking
garage. This project employed the use of sensing the metal underneath the
car in which the sensor is buried and installed underground. The metal
underneath the car is detected using the beat frequency technique whereby
an inductor is a search coil, which forms part of a Logic Control (LC)
network circuit. The LC network is part of a colpitts oscillator stage, which
oscillates at a constant frequency. The field strength of the coil reduces as a
metal enters the field, which consequently tends to dampen the oscillations
and also drift the frequency. The output is fed to a peak detector (which
senses the dc level) and comparators with a variable reference for sensitivity

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 -4-


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

adjust. Once a metal enters the field, the output of the comparator goes
LOW and triggers a monostable stage for time (T), during which two
operations are achieved.
1. A monostable multivibrator, which clocks the counter to count-up for
ENTRY. If a car enters into the building, while the EXIT counter count
down, an independent counter is configured to count UP/ DOWN to
monitor both the Entry and Exit. This counter will display the NET count,
which is the number of cars still left inside a particular car lot or garage.
2. The switching of the sliding door was achieved using a logic gate and
timer to control the opening and closing of the door. Once the metal under
the car is detected two monostable is trigger, one is design to activate the
switching circuit which is responsible for the opening of the gate, while the
other monostable later activate the switching circuit which is responsible for
the closing of the gate. Once the full capacity of the garage is achieved, the
garage gate automatically denied entry to others cars coming in, also truck
and bike of heavy and lower capacities of range set are disallowed.

This project employed the use of principle of AC to DC converter for the


power supply unit in which two different voltages were derived; 5v DC and
12v DC. The 5v works for the entire circuit while the 12v was used to
power the relays and the auto reverse DC motor.

3.2 Oscillator Stage Search Coil


The oscillator stage is designed using a colpitts oscillator where the search
coil is the inductor L. Fig 1 shows the oscillator circuit. The oscillator stage
generates a varying output when the metal is in the field.
220nF C1
L1
100nF C2
15k R1

10nF

C3 IN 4148

TR1

2.2K

4.7k R2 1k
R4

Figure 1. Colpitts oscillator stage. Source: [5]


For the colpitts oscillator stage the frequency of oscillation fo, is given by,
fo = 1 / (2LC)
Where:

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 -5-


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

fo = 10KHz = 10,000Hz
C = [C1C2 / (C1 + C2)]
C1 = 100nF = 100 x 10-9 F = 1 x 10-7 F
C2 = 220nF = 220 x 10-9 F = 2.2 x 10-7 F
L=?
C = [C1C2 / (C1 + C2)] = [(1 x 10-7) x (2.2 x 10-7)] / [(1 x 10-7) + (2.2 x 10-7)]
Therefore,
C = 6.875 x 10-8 F (or 68.75nF)
Transposing L and substituting for all the values in fo,
fo= 1 / (2LC)
L = 1 / (2fo)2xC
L = 1 / [(2 x 3.14 x 1000)2 x 6.875 x 10-8]
L = 0.368814519H (or 368.8mH) being inductance().
To get a uniform field, a toroid has to be designed to represent the inductor
of inductance = 368.8mH.

Fig 2 shows the search coil (toroid).

Cross sectional area

No turn

Figure 2. Search coil (toroid). Source: [5]


Where the inductance () = [(oAN2) / Lo]
Therefore, = [(oAN2)/ L]
The no of turns to give the desired inductance would be calculated.
Where: o is permeability of vacuum (Hm-1)
A is cross-sectional area (m2)
N is number of turns of windings
Lo is length of loop
Since = 368.8mH
And o= 4 x 10-7 Hm-1 = a constant
For Lo = 15cm = 0.15m loop length

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 -6-


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

and A = r2
where d = 1cm = 2r
therefore, r = 0.5cm
Thus r2 = (0.5)2
A = x (0.5)2
A =[ (22/7) x 0.25] = 0.79cm2
From the equation,
= [(oAN2)/ Lo]
Substituting for all the values,
0.368814519 = [((4 x 10-7) x 0.79 x N2) / 0.15]
Therefore, N2 = [(Lox ) / (ox A)]
Therefore, N = 236 turns.
Hence winding 236turns for the inductor will give the required inductance.
Resistors R1, R2 and R4 are dc bias resistors for the transistor T1.

3.3 Comparator - Calibrator


The peak detector output is fed to the inverting input of the comparator,
where it is compared with a reference voltage (akin to a reference oscillator
in the analog beat frequency type). When metal enters the field, the voltage
at the comparator input drops below the reference to give a LOW output
that triggers a 555 timer monostable stage. Fig3 shows the comparator
stage.

R15

LED2

1K
3 +
8 R8

C4 LM 393 1
IC1
2 100uF
- 4 C5

V+

1K

VR2 2.2K

Figure 3. Comparator stage. Source: [5]

VR2 is set at 2.6 V(this is because when metal is detected the voltage drops
from approximately 4V to 1.5V), any voltage below 2.6V in the inverting
input will make the output of the comparator go low, to trigger the
monostable stage since,
Vout = A0 Vin
Where A0 = open loop voltage gain.

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 -7-


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

And Vin = V+ -V-


Vout will drop to 0V for the slightest negative difference in voltage since A0
is often very large (in order of 20000).

3.4 One Shot Monostable Stage


The monostable stage generates a one shot pulse once triggered from the
sensor stage. When the comparator gives a low the one shot monostable
generates a one shot pulse for one second (1s) to update the counter. Fig 4
shows one shot monostable.
V+(5V)

8 4

R6
2 IC2

7 555

6
C3 1

Figure 4. One Shot Monostable Source:[5]


Since T = 1.1RC, and the required time duration of the monostable is 1s;
(To allow for fast clocking of the counter).
Letting C=10uF,
Gives R = 1/ 1.1x10uf
= 90.9K = 10uF.
The one-shot monostable stage generates one shot of clock pulse each time
the search coil detects a metal. The one-shot monostable is triggered from
the output of a comparator, which senses the break beam. The one shot
monostable is built around IC2 in Fig 4.
Since T = 1.1RC, and the time duration of the monostable is 5s; (To allow
for continuous beeping for time T, after metal has been detected).
Letting C=100uF,
Gives R = 5/ 1.1x100uf
= 9.09K
= 45.5K.
= 47 K.

3.5 Switching Transistor for Relay


The switching transistor switches the power to the auto reverse DC motor as
shown in fig 5.

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 -8-


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

V+(12V)

D5

From comparator R14


TR4
47K

Figure 5. Switching transistor stage. Source:[5]

The transistor as a switch operates in class A mode. A base resistor is


required to ensure perfect switching of the transistor in saturation. Diode D5
protects the transistor from back emf that might be generated since the relay
coil presents an inductive load.
In this case RC, which is the collector resistance, is the resistance of the
relay coil, which is 400 for the relay type used in this project.
Hence, given that RC = 400 (Relay coil resistance)

V+ = 12V (regulated voltage from the power supply


stage)
Vbe = 0.6V (silicon)
Vce = 0V (when transistor is switched)
Vin = 3.5(from the comparator output)
Hfe = 300 (from data sheet for BC337)
Since,
V+ = IcRc + VCE ------------------------------------------- equation (1.0)

Vin = IBRB + VBE ------------------------------------------- equation (2.0)

h
IC = fe ------------------------------------------- equation (3.0)


Rb = ------------------------------------------- equation (4.0)

Where,
IC = collector current
IB = base current
Vin = input voltage
Vt = supply voltage
VCE = collector-emitter voltage
Hfe= current gain.
From 1.0, 12 = IcRc +Vce

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 -9-


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

12 = Ic (400) + 0
and, Ic = 30mA
From 3.0, IB = 30mA/300
= 100uA
From 2.0, 3.5= 100uA RB + 0.6
RB = 11.4/200uA
= 29K
=30 K (preferred value).
Hence R11 R14 = 29K.

3.6 Counter / Decoder Driver Stage


The counter stage is a cascaded 3-digit counter using 7490 decade counters.
To inhibit counting or reset the counter, the reset inputs most go LOW. The
D output (MSB) of the first counter is used to clock the second counter
while resetting of the counters can be done from the cascaded reset inputs
via switchS1.The counter stage counts from units 0-9 before being reset.
The output of the counter is fed to the decoder/ driver stage to allow for
decimal display that enables the user to know the number of objects that
have passed the sensor. The 7447 is 7 segment decoder which accepts a 4
bit BCD and produces the appropriate outputs for selection of segments in
a 7 segment displaying arrangement used for representing the decimal
numbers 0 to 9. The outputs (a, b, c, d, e, f, and g) of the decoder select the
corresponding segments as shown in fig. 6
V+

Rx

V+

16
g f e d c b a

8
7447
D C B A

FROM COUNTER
OUTPUT.
Figure 6. Seven Segment Digital Display. Source: [7]

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 - 10 -


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

The value of the limiting resistor Rx is calculated is shown below. The


display has the following specifications,
Max. Forward current (If) = 16mA
Voltage drop across each LED VLED = 1.7V
And V+ = 5V
V+ = IF Rx + VLED
R8 = V+ - VLED
IF
= 5 - 1.7
16mA
= 3.3
16mA
= 206.25
= 220 preferred value.
Hence R8-R10 = 220

Fig. 7 below shows the counter and decoder driver stage.


V+ V+
V+

R 9 R 8
R 10

V+ V+
V+

16 16 16

8 7447 IC7 8 8
7447 IC8 7447 IC 6

V+
V+ V+ V+
12 1 5 11 8 9 12 1 5 11 8 9 12 1
5 11 8 9
S1 14
7490 IC4 14 7490 IC3
7490 IC5 14
6 7 10 2 3 6 7 10 2 3 6 7 10
2 3

R7

Figure 7. Counter / Decoder Driver Stage.Source :[6]

3.7 Power Supply Stage


All stages in the project use +5v excepts the relay circuit that uses +12V.
The power supply stage is a linear power supply type and involves in step
down transformer, filter capacitor, and voltage regulators, to give the
various voltage levels. The power supply circuit diagram is shown in fig.8.

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 - 11 -


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

+12V IC5
15V
T1
D4 D1

+5V IC4

D3 D2
47F C2 1000F C1 220VAC
0

Figure 8. Power Supply Stage. Source [6]

The choice of the filter capacitor is dependent on the output current. Given
that:
Vr (rms) = 2.4 Il/CFI .. (1)
Were Vr (rms) = Rectified D.C ripple voltage
Il= Load current (mA)
CFI = Filter capacitor (F)
For a load current of a (500mA), and a ripple factor of 5 %
Vrms= Vpeakx2
= 15v x 2
= 21.2V
For a ripple factor of 5%
Vr(rms) = 5/100 x 21.2
= 1.06V
From (1)
2.12V = 2.4 x 500mA/CFI
GFI =2.4 x 500mA / 1.06V
=1,132F
= 1000F preferred value.
Hence, C1= 1000uF, C2 = 47uF.
3.8 Logic Gate (Xor Gate)
The logic control circuit in this project controls the switching of the relay
that is responsible for opening and closing of the gate. The XOR gate only
gives a HIGH output when either of the inputs is HIGH and the other is
LOW (AB + AB); hence term unequal comparator as shown in fig. 9.
A
C

B
Figure 9. Shows the exclusive OR gate symbol Source: [5]

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 - 12 -


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Table 1. Truth Table

INPUT INPUT OUTPUT


A B C
LOW LOW LOW
LOW HIGH HIGH
HIGH LOW HIGH
HIGH HIGH LOW

3.9 Comprehensive Circuit Diagram

Figure 10. Comprehensive circuit diagram. Source :[7]

3.10 Component List


1. IN4007 .RECTIFIER DIODE
2. 7806.. 5V DC LINEAR REGULATOR
3. 7812..12V DC LINEAR REGULATOR
4. 7 SEGMENT COMMON ANODE DISPLAY
5. 7447 DECODER
6. 7490.. DECADE COUNTER
7. LM393..VOLTAGE COMPARATOR
8. TIP 41..BUFFER TRANSISTOR
9. BC 337.. SWITCHING TRANSISTOR
10. 7411. LOGIC AND GATE
11. NE555. TIMER

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 - 13 -


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

12. 3300UF (25v). Filtering capacitor


13. 47uf (25v)
14. 10uf (25v)
15. 4.7k. RESISTOR
16. 15K
17. 10K
18. AUTO REVERSE DC MOTOR
19. LIGHT EMITEN DIODE
20. COIL
21. IC SOCKET
22. ELECTROMAGNETIC SWITCH. (RELAY)

4 IMPLEMENTATION (CONSTRUCTION AND TESTING)


This session seeks to look at the workability and practicability of the
intelligent car parking system when the system is fully constructed and
operational. The implementation of this paper was done on the breadboard.
The power supply was first derived from a bench power supply in the
school electronics lab. (To confirm the workability of the circuits before the
power supply stage was soldered).Stage by stage testing was done according
to the block representation on the breadboard, before soldering of circuit
commenced on Vero board. The various circuits and stages were soldered in
tandem to meet desired workability of the project.
4.1 Construction
The construction of the project was done in two different stages.
1. The soldering of the circuits to the boards
2. The coupling of the entire project to the casing.
The first stage was first constructed before the other stages were done. The
soldering of the project was soldered on four Vero boards because of the
complexity of the circuit.

The first Vero board contains the power supply, oscillator and comparators
and the second contains the XOR gate and the monostable stages, the third
contains the counter, decoder and display stages; and the fourth contains the
switching circuits which could not be displayed because of size of file.

4.2 Casing and Boxing


The second phase of the project construction is the casing of the project.
This project was coupled in a metal (PLASTIC) casing. The casing material
being wrought metal (STAINLESS STEEL OR FIBER GLASS PLASTIC),
designed with special perforation and vents and also sprayed to ensure

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 - 14 -


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

insulation and give ecstatic value which we couldnt show because of its
large size.

TABLE 2. Bill of Quantity

ITEM QUANTITY
in4007 rectifier diode 9
7806 5v dc linear regulator 1
7812 12v dc linear regulator 1
7 segment common anode display 3
7447 decoder 3
7490 decade counter 3
lm393 voltage comparator 4
tip 41 buffer transistor 1
BC 337 switching transistor 4
7411 logic XOR gate 1
ne555 timer 6
3300uf(25v) filtering capacitor 1
47uf(25v) 6
10uf(25v) 1
4.7k resistor 5
15k 5
10k 5
auto reverse dc motor 2
light emitting diode 1
Coil 4
IC socket (8 pin) 8, (14 pin) 2, (16 pin) 6
electromagnetic switch relay 4

5. CONCLUSION
The project which is the design and construction of an intelligent car
parking system was designed considering some factors such as economic
application, design economy, availability of components and research
materials, efficiency, compatibility, portability and durability. The
performance of the project after testing met design specifications. However,
the general operation of the project and performance is dependent on the
user who is prone to human error such as failure to perform or omitting a
task, slips of action, performing the task incorrectly, lapses of memory,
knowledge- based mistakes, etc. The operation is dependent on how well
the soldering is done, and the positioning of the components on the Vero-
board. If poor soldering lead is used the circuit might form dry joint early
and in that case the project might fail. Also if logic elements are soldered
near components that radiate heat, overheating might occur and affect the
performance of the entire system. Other factors that might affect
performance include transportation, packaging, ventilation, quality of
components, handling and usage.

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 - 15 -


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

REFERENCES
[1] Mehta,V. K. Principles of Electronics (117-205, Transistors, and General
References), published by S. Chand & Company Ltd (2003).
[2] Robert, I. Boylestad and Louis Nashelsky, Electronics devices and circuit theory
(eighth edition), published by Prince-Hall (2002).
[3] Maddock, R. J. & Calcutt, D. M. Electronics a course for Engineers. (pages 341-
349, IC Timers, 249-263 counters, 290-293 decoder drivers), published by
Longman (1994).
[4] Tom Duncan, Success in Electronics (pages 44-75, other passive components,
107-119, op-to devices and transducers), Published by Longman (1983).
[5] George Loveday, Essential Electronics (pages 241-244 transistors, general
references). Published by Pitman (1984).
[6] Jacob Millman, Micro Electronics (General references), McGraw-Hill book
company (1979).
[7] Luecke G., J. P. Mize and W. N. Carr, Semiconductor Memory Design and
Applications (Chap. 3 & 4, General References), McGraw-Hill Book Company,
1973.
[8] Everyday Electronics Journal, May-1998 edition (Power Supplies and
Multivibrators), Wimborne Publishing.
[9] Clayton hallmark, IC Cookbook (Pin Configurations of all the ICS), Mc-Graw
Hill Book Company, 1986.
[10] NTE data book 12th edition, (General Data Sheets for all the components).

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 - 16 -


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Intrusion Detection Techniques for Mobile


Ad Hoc and Wireless Sensor Networks
Rakesh Sharma
Department of Computer science & Engineering, HCTM Technical Campus
Kaithal,Haryana,India

V. A. Athavale
Department of Computer science & Engineering, Gulzar Group of institutions
Khanna,Punjab,India

Pinki Sharma
Department of Computer science & Engineering, HCTM Technical Campus
Kaithal,Haryana,India

ABSTRACT
Mobile circumstantial networks and wireless detector networks have secure a large form of
applications. However, they are usually deployed in probably adverse or perhaps hostile
environments. Therefore, they cannot be without delay deployed while not first addressing
security challenges. Intrusion detection systems supply a necessary layer of in-depth protection
for wired networks. However, relatively little or no analysis has been performed concerning
intrusion detection at intervals the areas of mobile ad-hoc networks and wireless sensing
element networks. Throughout this text, first we tend to shortly introduce mobile ad-hoc
networks and wireless sensor networks and their security issues. Then, we tend to concentrate
on their intrusion detection capabilities. Specifically, we tend to gift the challenge of
constructing intrusion detection systems for mobile ad-hoc networks and wireless detector
networks, survey the prevailing intrusion detection techniques, and indicate important future
analysis directions
Keywords
Mobile ad-hoc networks, wireless sensor networks, attacks, AODV, IDS, secure aggregation.

1. INTRODUCTION

Mobile Ad-hoc Networks (MANETs) and Wireless Sensor networks (WSNs)


are comparatively new communication paradigms. MANETs don't need dear
base stations or wired infrastructure. Nodes at intervals radio vary of every
different will communicate directly over wireless links, and people that are way
apart use different nodes as relays. Every host during a painter additionally acts
as a router as routes are principally multi-hop. The shortage of mounted
infrastructure and centralized authority makes a painter appropriate for a broad

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 1


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
vary of applications in each military and civilian environments. As an example,
a painter might be deployed quickly for military communications within the
field. A painter additionally might be deployed quickly in situations like a
gathering space, a town transportation wireless network, for hearth fighting,
and so on. To create such a cooperative and self-configurable network, each
mobile host ought to be a forthcoming node and keen to dispatch messages for
others. Within the original style of a painter, international trait in nodes at
intervals the full network may be a basic security assumption. Recent progress
in wireless communications and Micro-Electro Mechanical Systems (MEMS)
technology has created it possible to make miniature wireless detector nodes
that integrate sensing, processing, and communication capabilities. These
miniature wireless detector nodes is very tiny, as little as a cubic centimeter.
Compared with typical computers, the inexpensive, powered, detector nodes
have a restricted energy offer, tight process and communications capability, and
memory is inadequate. The planning and implementation of relevant services
for WSNs should keep these limitations in mind. Supported the cooperative
efforts of an outsized variety of detector nodes, WSNs became smart candidates
to supply economically viable solutions for a large vary of applications, like
environmental observance, scientific information assortment, health
observance, and military operations [1].An example WSN is illustrated in Fig.
1, the WSN is deployed to sight targets.

Figure 1.Example of a wireless sensor network.


When detector nodes sight a target, they'll collaboratively route information to
a base station for analysis. Then, the bottom station will transmit information
more to users through another communications infrastructure, as an example,
the net. Despite the big variety of potential applications, MANETs and WSNs

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 2


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
usually are deployed in adverse or perhaps hostile environments. Therefore,
they can't be without delay deployed while not 1st addressing security
challenges. Because of the options of associate degree release medium, the low
degree of physical defense of movable nodes, a vibrant topology, a restricted
power offer, and therefore the absence of a central management purpose [2],
MANETs are a lot of prone to malicious attacks than ancient wired networks
ar. In WSNs, the shortage of physical security combined with unattended
operations creates detector nodes liable to an elevated threat of being capture
and compromised, creating WSNs prone to a range of attacks. So far, study to
search out protection solutions for MANETs and WSNs has originated from the
interference function of read. As an example, in each network, there exist
several key distribution and management schemes that may be designed
supported link-layer security design, interfering of denial of service attacks, and
vulnerable direction-finding protocols. Theres additionally analysis targeted to
specific services and applications. As an example, one amongst the foremost
vital functions of deploying WSNs is to gather relevant information. During an
information assortment method, aggregation was needed to save lots of energy,
so prolonging the lifespan of a WSN. However, aggregation primitives are
prone to node compromise attacks. This results in incorrectly mass results by a
compromised collector. Hence, effective techniques are needed to verify the
integrity of mass results. Prevention-based approaches will considerably scale
back potential attacks. However, they can't whole eliminate intrusions. When a
node is compromised, all the secrets related to the node are receptive attacks.
This renders prevention- primarily based techniques less useful for guarding
against malicious insiders. In observe, insiders will cause abundant larger
injury. Therefore, Intrusion Detection Systems (IDSs), serving because the next
line of defense, are essential in providing a highly-secured system. By
modeling behaviors of correct activities, IDS will effectively establish potential
intruders and so offer in-depth protection. During this article, we tend to first
offer a short introduction to IDS. Then, we tend to gift challenges in
constructing IDSs for mobile circumstantial networks and wireless detector
networks and analysis their existing intrusion detection techniques. Finally, we
tend to imply vital future analysis directions.

2. INTRUSION DETECTION TECHNIQUES


An intrusion is outlined as a group of actions that compromises confidentiality,
convenience, and integrity of a system. Intrusion detection may be a security
technology that makes an attempt to spot United Nations agency are attempting
to interrupt into and misuse a system while not authorization and people United

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 3


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
Nations agency have legitimate access to the system however are abusing their
privileges. The system is a number laptop, network instrumentation, a firewall,
a router, a company network, or any system being monitored by intrusion
detection system. Associate degree IDS dynamically monitors a system and
users actions within the system to sight intrusions. As a result of associate
degree system will suffer from numerous sorts of security vulnerabilities, it's
each technically tough and economically pricey to make and maintain a system
that's not vulnerable to attacks. Expertise teaches America ne'er to place
confidence in one defensive technique. IDS, by analyzing the system and users
operations, in search of uninvited and doubtful behavior, might effectively
monitor and defend against threats. Generally, there are two kinds of intrusion
detection: misuse-based detection and anomaly primarily based detection [3]. A
misuse-based detection method encodes identified attack signatures and system
vulnerabilities and stores them during information. If deployed IDS notice a
match between current activities and signatures, associate degree alarm is
generated. Misuse sight ion techniques don't seem to be effective to detect
novel attacks as a result of the shortage of corresponding signatures. Associate
degree anomaly-based detection technique creates time-honored profiles of
system states or user behaviors and compares them with existing behavior. If a
major deviation is ascertained, the IDS raises associate degree alarm. Anomaly
sight ion will detect unknown attacks. However, traditional profiles are
sometimes terribly tough to make. As an example, in a MANET, mobility-
induced dynamics create it difficult to tell apart between normalcy and
anomaly. It is, therefore, tougher to tell apart between false alarms and real
intrusions. The potential to determine traditional profiles is crucial in planning
economical, anomaly-based IDS. As a promising various, specification
primarily based detection techniques mix the benefits of misuse detection and
anomaly detection by victimization physically developed condition to
distinguish justifiable system behaviors. Specification-based sight ion
approaches are kind of like anomaly detection techniques in this each of them
detect attacks as deviations from a traditional profile. However, specification-
based detection approaches are supported manually developed specifications,
so avoiding the high rate of false alarms. However, the drawback is that the
event of careful specifications is long.

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 4


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Source
A B
C D

A can overhear Bs
transmission to decide
Destination
whether B is
misbehaving

Figure 2. Watchdog mechanism for MANETs.


2.1 Intrusion Detection in MANET attack models
It is terribly difficult to gift a once-for-all detection approach. The analysis of
existing attack models will facilitate the extraction of effective options, that
seems to be one amongst the foremost vital steps in building associate degree
IDS. The subsequent are representative kinds of attacks within the context of a
painter IDS:
Routing Logic Compromise: In routing protocols, typical attack situations
embody region, routing update storm, fabrication, and modification of varied
fields in routing management packets (for example, route request message,
route reply message, route error message, etc.) throughout completely different
phases of routing procedures. Of these attacks will cause serious pathology
during a painter.
Traffic Distortion: This includes attacks like packet dropping, packet
corruption, information flooding, and so on. Motivated by their completely
different objectives, attackers might take completely different actions to control
packets. As an example, attackers might arbitrarily, sporadically, or by
selection drop received packets to egotistically save power or advisedly stop
different nodes from receiving information.

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 5


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
In addition to those, attacks like speeding, wormhole, and spoofing even have
been mentioned within the context of a painter. What is more, it is not tough to
fabricate intrusions supported the mix of attacks mentioned antecedently.
2.1.1 Existing analysis
The pioneer ID analysis within the context of a painter seems during a series of
works in [26]. Within the system thought, associate degree agent is hooked up
to every node. Every node will perform intrusion detection and response
practicality separately. One amongst the foremost vital steps in IDS analysis is
to construct effective options. specializing in painter routing protocols, Zhang
et al. [2] use associate degree unattended methodology to construct a feature set
and choose an important set of options (e.g., distance to a destination, node
moving rate, the proportion of modified routes, the proportion of changes
within the total of hops of all routes, etc.) that have high info gain. Info gain is
a vital metric to live the effectiveness of options. Options with high info gain
will facilitate made IDS to realize fascinating performance. Different routing
protocols might end in different feature sets. Intrusion detection is developed as
a pattern classification downside, during which classifiers are designed to
classify ascertained activities as traditional or intrusive. In [2], supported
associate degree known feature set, Zhang et al. apply two renowned
classifiers, manslayer and support vector machine (SVM) lightweight, to
construct a set of anomaly detection models. Manslayer may be a decision- tree
equivalent classifier for rule induction. By separating provided information into
acceptable categories, manslayer will cipher rules for the system. SVM
lightweight will turn out a lot of correct classifier once the info that's provided
cannot be delineate by the given set of options. As a result of the neck of the
woods of one intrusion session, post-processing is also introduced to strain
false alarms. In post-processing, if there are a lot of abnormal predictions than
traditional predictions during a pre-defined amount of your time, activities
outlined during this amount of your time is deemed abnormal. During this
approach, spurious errors that occur throughout traditional sessions are
removed. As a result of the importance of feature choice in IDS analysis,
Huang et al. [4] more introduce a brand new learning-based methodology to
utilize cross-feature analysis to capture inter-feature correlation patterns.
Suppose that L options, f1, f2, , fL, are known, wherever every fi denotes
one feature characterizing either topology or route activities. The classification
downside to be solved is to form a group of classification model Ci: {f1, fi-1,
fi+1, , fL} fi from the coaching method. Here one feature fi is chosen
because the target to classify. Then, the classification model Ci is wont to
establish temporal correlation between one feature and every one of the
opposite options. The prediction of Ci is extremely possible in traditional

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 6


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
things. However, once there are malicious events, the prediction of Ci becomes
not possible. Supported this, traditional events and abnormal events is
distinguished. Native detection alone is not enough as a result of the distributed
nature of a painter. Huang and Lee [5] more elaborate on mechanisms during
which one node will collaborate with its neighbors and initiate a detection
method over a broader vary. This will offer not solely a lot of correct detection
results, however additionally a lot of info in terms of attack sorts and sources.
When fairly and sporadically electing an observance node during a cluster of
neighboring painter mobiles, a cluster-based detection theme is planned. Every
node maintains a finite state machine, with attainable states of Initial, Clique,
Done, and Lost. Supported the finite state machine, a group of protocols, as
well as a coterie computation protocol, a cluster-head computation protocol, a
cluster-valid assertion protocol, and a cluster recovery protocol are careful.
Resource constraint issues sweet-faced by a painter are addressed once these
protocols are designed. Supported a specification-based approach to explain
major practicality of Ad-hoc On Demand Distance Vector (AODV) routing
algorithms at information layers and routing layers, Huang associate degreed
Lee [6] recommend an extended finite state automaton (EFSA), wherever
transitions and states will carry a finite set of parameters. During this approach,
the planned EFSA will sight invalid state violations, incorrect transition
violations, and sudden action violations. The development of EFSA will lead
naturally to a specification-based approach. Supported a group of applied
mathematics options, datum learning algorithms are then adapted to sight
abnormal patterns from abnormal basic events. Supported Dynamic supply
Routing (DSR) protocols, subverted et al. [7] propose to put in additional
facilities, watchdog and path rater, to spot and reply to routing misbehaviors
during a painter. In information transmission processes, a node might
misdemeanor by agreeing to forward packets so fail to try to so. Contemplate
the instance illustrated in Figure.2 to know the watchdog approach. Suppose a
path exists from a supply node S to a destination node D through intermediate
nodes A, B, and C. Node A will take in node Bs transmissions. Node A cannot
transmit on to node C and should undergo node B. To sight whether or not node
B is mischievous, node A will maintain a buffer of packets recently sent by
node A. Node A then compares every overheard packet from node B with a
buffered packet of node A to examine if there's a match. A failure tally for node
B will increase if node A finds that node B is meant to forward a packet
however fails to try to thus. If the tally is on top of one threshold, node B is
deemed to be misbehaving. Every node maintains a rating for every node it is
aware of regarding within the network. Then, a path metric is calculated by
averaging the node ratings within the path. Pathrater [7] will then choose the

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 7


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
trail with the very best metric. Subverter et al. [7] additionally discuss many
limitations of this approach, as well as limitations ensuing from packet
collisions, false reports of node wrongdoing, and potential watchdog evasion
mechanisms. Specializing in AODV routing protocols, Tseng et al. [8] propose
a specification-based ID technique. A finite state machine (FSM) is built to
specify correct behaviors of AODV, that is, to keep up every branch of a route
request/route reply (RREQ/RREP) flow by observance all of the RREQ and
RREP messages from a supply node to a destination node. Then the created
specification is compared with actual behaviors of monitored neighbors. The
distributed network monitor passively listens to AODV routing protocols,
captures RREQ and RREP messages, and detects run-time violations of the
specifications. A tree system and a node coloring theme also are planned to
sight most of the extraordinary attacks. A tree system and a node coloring
theme are also planned to sight most of the intense attacks. Sun et al. [9]
propose employing a Markov process (MC) to characterize traditional
behaviors of painter routing tables. A MC-based native detection engine will
capture temporal characteristics of painter routing behaviors effectively. As a
result of the distributed nature of a painter, a private alert raised by one node
should be mass with others to enhance performance. Motivated by this, a non-
overlapping zone-based intrusion detection system (ZBIDS) is planned to
facilitate alert correlation and aggregation [9], as illustrated in Figure. 3.
Specifically, the full network is split into non-overlapping zones. Entree nodes
(also referred to as inter-zone nodes, i.e., those nodes that have physical
connections to completely different zones) of every zone are chargeable for
aggregating and correlating domestically generated alerts within a zone. Intra-
zone nodes when detection an area anomaly generates associate degree alert
and broadcast this alert within the zone. Solely entree nodes will utilize alerts to
get alarms, which might effectively scale back false alarms. In a ZBIDS, the
aggregation algorithmic program will scale back the warning quantitative
relation and improve the detection quantitative relation. Associate degree alert
information model conformed to intrusion detection message exchange format
(IDMEF) is also given to facilitate the ability of IDS agents. Supported this,
entree nodes will more offer a wider read of attack situations. Considering that
one amongst the most challenges in building a painter IDS is to integrate
quality with IDSs and to regulate IDS behavior, Sun et al. [10] demonstrate that
a nodes moving speed, a unremarkably used parameter in calibration painter
performance, isn't an efficient metric to tune IDS performance beneath
completely different quality models. Sun et al. then propose associate degree
adaptative theme, during which appropriate traditional profiles and
corresponding correct thresholds is elite adaptively by every native IDS

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 8


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
through sporadically activity its native link modification rate, a planned
performance metric that may replicate quality levels. The planned theme is a
smaller amount hooked in to underlying quality models and might more
improve performance.

9 8 7

IDS IDS
IDS IDS IDS
IDS 11
IDS IDS 7 IDS
IDS1 IDS
IDS 10
IDS
IDS
IDS
2
IDS IDS

5IDS
IDS Alert
IDS IDS IDS
IDS
IDS
concentration
IDS
3
IDS 6
8 IDS point
4
IDS

3 6 Zone 5 4

3 3 Alert concentration point 2 1


IDS
9
C IDS

IDS
3

Figure 3. The zone-based instruction detection system for MANETs.

2.2 Intrusion detection in a WSN


Similar to security analysis during a painter, several prevention-based
approaches during a WSN are planned. These approaches address challenges as
well as key institution, trust started, privacy, authentication, secure routing, and
high level security services. However, the large-scale localized preparation of a
WSN and therefore the lack of physical security create prevention-based
schemes inadequate when detector nodes are compromised. Therefore,
associate degree IDS also can supply adequate security protection for a WSN.
During this section, we tend to gift a survey of existing IDS analysis within the
context of a WSN. Compared with a painter, a WSN provides a comparatively
newer communication paradigm. Therefore, there are fewer works that address

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 9


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
the development of a WSN IDS. What is more; completely different
applications and services motivated by WSNs demonstrate different
characteristics. Therefore, it's necessary to integrate ID approaches with
corresponding applications as a result of attacks targeted at completely different
applications and services demonstrate different manifestations. Within the
following, we tend to use two vital services of a WSN, secure aggregation and
secure localization, maybe current WSN IDS analysis efforts.
2.2.1 Challenges
The distinctive characteristics of detector nodes cause challenges to the
development of a WSN IDS. A WSN contains a restricted power offer, so
requiring energy-efficient protocols and applications to make best use of the
lifespan of detector networks. Detector nodes have tight system resources in
terms of memory and process capabilities, creating intensive calculations
impractical. Detector nodes are liable to failure. This leads to frequent
configuration changes. Also, a WSN sometimes is densely deployed, inflicting
serious radio channel competition and measurability issues. The planning of an
efficient WSN IDS should bear in mind all of those challenge.
2.2.2 Secure Localization in WSNS
Many WSN applications need that detector nodes have location info. Because
of value issues, it's still not sensible to equip each detector node with a global
positioning system (GPS) receiver. Therefore, several localization protocols are
planned to assist detector nodes to approximate their locations. To utilize
localization protocols, some special nodes, referred to as beacon nodes, usually
are used. These beacon nodes are implicit to identify with their locations and
transmit their locations to different non-beacon nodes from end to end beacon
packets. Non-beacon nodes as well approximate bound measurements (e.g.,
received signal strength indicator) supported received beacon packets. Such
measurements and therefore the location info contained in beacon packets
sometimes are brought up as location references. When non-beacon nodes bring
together an adequate amount of position references, these nodes will then
estimate their locations. Localization protocols might become vulnerable once a
WSN is deployed during a hostile atmosphere. As an example, beacon nodes
could also be compromised, so providing misinformation to mislead location
estimation at non-beacon nodes. Therefore, secure location discovery services
are needed to confirm the conventional operation of a WSN. Utilizing
preparation information of a WSN and supported the actual fact that likelihood
distribution functions of detector locations sometimes is sculptured before
preparation, Du et al. [11] propose that every non-beacon node will
expeditiously sight location anomalies by supportive whether or not calculable

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 10


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
locations are according to the preparation information. As an example, if a
bunch of detector nodes are born out of associate degree aero plane consecutive
because the plane flies forward, traditional distributions is wont to model the
preparation distribution of this cluster of detector nodes. Every non-beacon
node will compare its calculable locations with the preparation information. If
the amount of inconsistency is on top of a predefined threshold, detector nodes
will make a decision that established position references are malicious. Liu et
al. [12] additionally propose a set of approaches to strain malicious location
references. The primary approach is predicated on minimum mean sq. error.
Supported the examination that malicious position references and category ones
are sometimes inconsistent, non-beacon nodes will cipher associate degree
inconsistency level of received location references. The inconsistency level is
delineating by a mean square error of estimation. If the mean sq. error is larger
than a threshold, non-beacon nodes may suppose that the received set of
location references is malicious. The second approach is that the voting-based
location estimation methodology. Specifically, the deployed space is split into a
grid of cells. The non-beacon node will then have each received location
reference vote on the cells during which this node might reside and so decide
however possible this node is in every cell. When the choice method, the
middle of the cells with the very best votes could also be used because the
calculable location.
2.2.3 Secure Aggregation in WSNS
Aggregation has become one amongst the specified operations for a WSN to
save lots of energy. One example of associate degree aggregation tree is
illustrated in Fig. 4. Nodes A, B N denotes completely different detector
nodes in WSNs, severally. f denotes associate degree aggregation perform
(average, sum, maximum, minimum, count, etc.). If node I is compromised, it
will send false reports to node J. However, several existing schemes are
designed while not enough security in mind and can't sight the on top of
malicious behavior. Preventing this malicious behavior is that the secure
aggregation downside. Supported applied mathematics estimation theory,
Wagner [13] proposes a hypothetical outline to model and to research the
resilient information aggregation downside. When final that unremarkably used
aggregation functions are insecure, Wagner planned victimization strong
statistics for resilient aggregation. Finally, many general techniques, like
truncation (to place higher and lower bounds on a suitable vary of a detector
reading) and trimming (for instance, to ignore the very best five percentage and
therefore the lowest five percentage of detector readings) are wont to facilitate
improve the resilience of aggregation functions. Combining prevention-based
and detection primarily based approaches, Yang et al. [14] propose Secure

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 11


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
Hop-by-Hop information Aggregation Protocol (SDAP) for WSNs. the
planning of SDAP is predicated on divide-and-conquer and commit-and-attest
principles. Specifically, a probabilistic grouping methodology is employed to
dynamically divide nodes into multiple logical teams of comparable sizes. In
every logical cluster, a hop-by-hop aggregation is performed and one
combination is generated from every cluster. This hop-by-hop aggregation is
increased to confirm that every cluster cannot deny its committed combination.
When receiving all the cluster aggregates, the bottom station will apply
associate degree approach supported the Grubbs take a look at to spot
suspicious teams. This approach will facilitate strike outliers from received
aggregates. Finally, every cluster beneath study should participate within the
attestation method and prove the correctness of its cluster aggregates. When the
attestation method, the bottom station will calculate the ultimate combination
over all the cluster aggregates that are either traditional or have passed the
attestation method.
Motivated by analysis in laptop vision and automatic devising, Buttyn et al.
[15] propose a random sample agreement (RANSAC) paradigm for resilient
aggregation during a WSN. RANSAC is associate degree outlier elimination
technique that may handle a high share of outlier menstruation information.
Specifically, RANSAC uses as few non-attacked information as attainable to
see associate degree initial model. Presumptuous that the non-attacked
information follows traditional distributions, the RANSAC algorithmic
program uses most likelihood estimation (MLE) to estimate the parameters of
the initial model. When the initial model is determined, RANSAC tries to
enlarge the initial information set with consistent data. Outlier measurements
will then be filtered out, notwithstanding an outsized amount of detector nodes
is compromised.
Base
Station
Legend
N
Wireless sensor K J
node
Data transmission L
f
Vi Sensor H M
A (v1,v2,v3)
measurement V
I
3 G
B V
D V1
C 2
E F
Figure 4. An example aggregation tree in WSNs

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 12


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
2.2.4 Future analysis directions
In this section, we tend to discuss future analysis directions to construct IDSs
for each MANETs and WSNs. In the system thought, IDS analysis for each
MANETs and WSNs needs a distributed design and therefore the collaboration
of a bunch of nodes to create correct choices. ID techniques additionally ought
to be integrated with existing painter and WSN applications. This needs
associate degree understanding of deployed applications and connected attacks
to deploy appropriate ID mechanisms. Attack models should be fastidiously
established to facilitate the preparation of ID ways. Also, solutions should
contemplate resource constraints in terms of computation, energy,
communication, and memory. This is often particularly vital within the context
of a WSN.
2.2.5 Extended Kalman Filter-Based Secure Aggregation for a WSN
In this section, we tend to use secure in-network aggregation issues in a very
WSN together example of a way to produce a light-weight ID mechanism [16].
In a WSN, consecutive observations of detector nodes sometimes area unit
extremely correlative in time domains. This correlation, alongside the
cooperative nature of WSNs, makes it doable to predict future ascertained
values supported previous values. Therefore, it's a viable approach to estimate
aggregative in-network values, supported the traditional profiles which will be
made. However, in apply, attributable to high packet-loss rate, harsh
atmosphere, sensing uncertainty, and different problems, it's difficult to supply
associate degree correct estimate for actual aggregative worth. Also, the dearth
of your time synchronization among youngsters and parent nodes may create
aggregation nodes use totally different sets of values for aggregation. The
complexness of existing aggregation protocols conjointly contributes to the
challenges of modeling in-network aggregative values. To construct traditional
profiles for aggregative in-network values within the face of the antecedently
mentioned challenges, solutions supported applied math estimation theory is
applied. Appropriate models should contemplate the necessity of service and
therefore the application atmosphere. For instance, suppose that we tend to
have an interest in estimating temperature values, those area unit scalar
variables. We tend to might adopt associate degree Extended Kalman Filter
(EKF) as a result of associate degree EKF will give associate degree correct
and light-weight estimation [16]. By sanctioning neighbor- watching
mechanisms, every node will use associate degree EKF to watch the behavior
of 1 of its neighbors. Associate degree EKF-based mechanism is appropriate
for WSN nodes, as a result of this mechanism will address those incurred
uncertainties in a very light-weight manner and reason comparatively correct

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 13


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
estimates of aggregative values, that primarily based upon a standard vary, is
approximated. Utilizing a threshold-based mechanism, a promiscuously
overheard worth then is compared with a domestically computed traditional
vary to choose whether or not they area unit considerably totally different.
What is more, the monitored atmosphere demonstrates spatial and temporal
characteristics. Therefore, it's promising to integrate these characteristics into
ID model construction. For instance, there are a unit existing works that model
spatial and temporal properties of correlative information in a very WSN. It is,
therefore, fascinating to integrate these models into the development of
traditional profiles for in-network aggregative values. During this approach,
associate degree anomaly-based ID service is provided for secure aggregation
in a very WSN. A WSN usually is deployed to watch emergency phenomena
(such because the happening of a forest fire), regarding that smart nodes will
trigger necessary events and generate uncommon nevertheless necessary data.
Node collaboration is important for detector networks to create correct choices
regarding abnormal events. Therefore, for WSNs, intrusion detection modules
(IDM) and system watching modules should integrate with one another to
figure effectively [16]. Once node A raises associate degree alert on node B as
a result of a happening E, to choose whether or not E is malicious or nascent,
node A might initiate an additional investigation on E by collaborating with
existing SMMs. WSNs sometimes area unit densely deployed to collaboratively
monitor events. To save lots of energy, some detector nodes area unit
sporadically regular to sleep. Supported this, node A will get up those detector
nodes (denoted as co-detectors in Fig. 5) around node B and request from these
nodes their opinions on the behavior of node B regarding event E.

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 14


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

False report
Fire Compromised node

Alert transmission
False report

Compromised node
Co-detectors
Base station Normal nodes

Figure 5. Collaboration between IDM and SMM to differentiate malicious events from
emergency events.

Once node A collects the knowledge from these nodes, if it finds that the bulk
of detector nodes assume that event E might happen, node A then makes a
choice that E is triggered by some emergency events. On the opposite hand, if
node A finds that the bulk of detector nodes assume that event E shouldn't
happen, then node A thinks that E is triggered by either a malicious node or a
faulty nevertheless smart node. To create a final judgment, node A will still get
up those nodes around event E and request their opinions regarding event E. If
node A finds that the bulk of detector nodes assume that event E shouldn't
happen, node A then suspects that node B is malicious.

3. INTEGRATION OF MOBILITY AND INTRUSION DETECTION IN


MANET
One of the most difficulties in building MANET IDSs is to contemplate
however quality impacts the planning of detection engines. This can be

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 15


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
particularly necessary within the context of MANETs as a result of most
dynamics in MANETs area unit caused by quality. MANET IDSs, while not
properly considering quality, area unit susceptible to a high false positive
magnitude relation. This renders MANET IDSs less effective. Link
modification rate is wont to capture the impact of quality on IDS engines.
Supported the link modification rate, a properly trained traditional profile is
elite at totally different quality levels adaptively. victimization totally different
quality models, like random waypoint model, random drunk model, and
obstacle quality model, associate degree adaptative theme is incontestable to be
less captivated with underlying quality models and might additional cut back
the false positive magnitude relation [16]. However, the performance of the
projected adaptative theme at high quality levels still isn't pretty much as good
for sure. It is also terribly difficult to construct mobility-independent MANET
IDSs as a result of this needs the extraction of mobility-independent options.
What is more, a way to consistently check the performance of MANET IDSs
continues to be associate degree on-going work.
4. CONCLUSION
Intrusion detection systems will determine malicious activities and facilitate to
supply adequate protection. Therefore, associate degree IDS has become an
imperative part to supply defense-in-depth security mechanisms for each
MANETs and WSNs. during this article, we tend to provided associate degree
introduction to mobile unintended networks and wireless detector networks and
given challenges in constructing IDSs for MANETs and WSNs. we tend to then
surveyed existing intrusion detection techniques within the context of MANETs
and WSNs. Finally, victimization secure in-network aggregation for WSNs and
therefore the integration of quality and intrusion detection for MANETs as
examples, we tend to mentioned necessary future analysis directions.
REFERENCES
[1] F. Akyildiz et al., Wireless Sensor Networks: A Survey, Elsevier Comp. Networks,
vol. 38, no. 2, 2002, pp. 393422.
[2] Y. Zhang, W. Lee, and Y. Huang, Intrusion Detection Techniques for Mobile
Wireless Networks, ACM Wireless Networks, vol. 9, no. 5, Sept. 2003, pp. 54556.
[3] H. Debar, M. Dacier, and A. Wespi, A Revised Taxonomy for Intrusion Detection
Systems, Annales des Telecommun., vol. 55, 2000, pp. 36178.
[4] Y. Huang et al., Cross-Feature Analysis for Detecting Ad-hoc Routing Anomalies,
Proc. IEEE ICDCS 03, Providence, RI, May 2003, pp. 47887.
[5] Y. Huang, and W. Lee, A Cooperative Intrusion Detection System for Ad Hoc
Networks, ACM SASN 03, Fairfax, VA, 2003, pp. 13547.
[6] Y. Huang, and W. Lee, Attack Analysis and Detection for Ad Hoc Routing
Protocols, Proc. RAID 04, French Riviera, France, Sept. 2004, pp. 12545.

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 16


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
[7] S. Marti et al., Mitigating Routing Misbehavior in Mobile Ad Hoc Networks, ACM
Mobicom 2000, Boston, MA, Aug. 2000, pp. 25565.
[8] C.-Y. Tseng et al., A Specification-based Intrusion Detection System for AODV,
ACM SASN 03, Fairfax, VA, 2003, pp. 12534.
[9] B. Sun, K. Wu, and U. Pooch, Alert Aggregation in Mobile Ad-Hoc Networks,
ACM WiSe 03 in conjunction with ACM Mobicom 03, San Diego, CA, 2003, pp.
6978
[10] B. Sun et al., Integration of Mobility and Intrusion Detection for Wireless Ad Hoc
Networks, Wiley Intl. J.Commun. Sys., vol. 20, no. 6, June 2007, pp. 695721.
[11] W. Du, L. Fang, and P. Ning, LAD: Localization Anomaly Detection for Wireless
Sensor Networks, J.Parallel and Distrib. Comp., vol. 66, no. 7, July 2006, pp. 874
86.
[12] D. Liu, P. Ning, and W. Du, Attack-Resistant Location Estimation in Sensor
Networks, ACM/IEEE IPSN 05, Los Angeles, CA, Apr. 2005, pp. 99106.
[13] D. Wagner, Resilient Aggregation in Sensor Networks, ACM SASN 04, Washington
DC, 2004, pp. 7887.
[14] Y. Yang et al., SDAP: A Secure Hop-by- Hop Data Aggregation Protocol for Sensor
Networks, ACM Mobihoc 06, Florence, Italy, 2006, pp. 35667.
[15] L. Buttyn, P. Schaffer, and I. Vajda, RANBAR: RANSAC Based Resilient
Aggregation in Sensor Networks, ACM SASN 06, Alexandria, VA, 2006, pp. 8390.
[16] B. Sun et al., Integration of Secure In-Network Aggregation and System Monitoring
for Wireless Sensor Networks, IEEE ICC 07, Glasgow, U.K., June 2007.

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 17


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Performance Evaluation of Sentiment


Mining Classifiers on Balanced and
Imbalanced Dataset
G.Vinodhini
Department of Computer science and Engineering,
Annamalai University, Annamalai Nagar -608002.

R M. Chandrasekaran
Department of Computer science and Engineering,
Annamalai University, Annamalai Nagar -608002.

ABSTRACT
The transition from Web 2.0 to Web 3.0 has resulted in creating the dissemination of social
communication without limits in space and time. Sentiment analysis has really come into its own
in the past couple of years. Its been a part of text mining technology for some time, but with the
rise in social media popularity, the amount of unstructured textual data that can be used as a
machine learning data source, is enormous. Marketers use this data as an intelligent indicator for
customer preferences. This paper aims to evaluate the performance of sentiment mining classifiers
for problems of unbalanced and balanced large data sets for three different products. The
classifiers used for sentiment mining in this paper are Support Vector Machine (SVM), Nave
bayes and C5.The results shows that the performance of the classifiers depends on the class
distribution in the dataset . Also balanced data sets achieve better results than unbalanced datasets
in terms of overall misclassification rate.

KEYWORDS
Sentiment, opinion, SVM, classifiers, balanced, imbalanced.
1. INTRODUCTION
Sentiment analysis is a part of text mining technology, but with the rise in social
media popularity, the amount of unstructured textual data that can be used as a
machine learning data source, is enormous. Sentiment analysis is understanding
the meanings and feelings behind statements made in social media and other
forums (Pang, Bo., & Lee, L.,2004, Kunpeng Zhang et al, 2010, X. Fu et al,
2013). Public opinions and sentiments can have major impact on our society.
They can affect the sales of products, the change of government policy, and even
people's vote in elections. Thus, it is of high significance to study sentiment
analysis also known as opinion mining. In the age of the Web, more and more

ISSN: 1694-2108 | Vol. 5, No. 1. SEPTEMBER 2013 1


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

people choose to express their opinions on a wide range of topics on the Web in
the forms of blogs, product/service reviews, and comments (A. Balahur et al,
2012). The amount of data exchanged over social media is witnessing a major
growth in the last few years. Opinion mining at both the document level and
sentence level has been too coarse to determine precisely what users like or
dislike (Turney, P. D. 2002). In order to address this problem, sentiment mining at
the attribute level is aimed at extracting opinions on products specific attributes
from reviews in this work (Magdalini et al, 2012). Various studies in different
domains investigated extracting sentiment information from this exchanged data.
Less attention was directed toward studying the effect of class imbalance problem
in sentiment mining. In recent years, class imbalance problem has emerged as one
of the challenges in data mining community. This situation is significant since it is
present in many real-world classification problems.
Previous studies have used a balanced dataset, however in the product domain it is
commonly the case that the ratio of positive and negative reviews is unbalanced,
therefore this paper focuses on and investigating the effects of the size and ratio of
a dataset. The proposed system architecture takes customer reviews as input to
each of the classifiers and outputs the dataset split into positive and negative
reviews.
In this work, we analyze the performance of three different classifiers like SVM,
Naive Bayes (NB) and C5 for sentiment mining. The classification model uses
product attributes as features. The models are empirically validated using review
data sets of nokia, ipod and nipon camera.To analyse the effect of class
distribution two data models are developed. Model A using balanced class
distribution i.e. equal number of positive and negative classes. Model B using
unbalanced class distribution i.e. unequal number of positive and negative classes
.The results of three different classifiers are compared for both Model A and
Model B.
This paper is outlined as follows. Section 2 discusses about the related work.
Section 3 describes the proposed work used.. The various classification methods
used to model the prediction system are introduced in Section 4. The
Experimental analysis done is reported in Section 5. Section 6 summarizes the
results and Section 7 concludes our work.
2. RELATED WORK
The area of sentiment mining has seen a large increase in academic interest in the
last few years. Researchers in the areas of natural language processing, data
mining, machine learning, and others have tested a variety of methods of
automating the sentiment analysis process. A number of machine learning
techniques have been adopted to classify the reviews based on sentiment. Various

ISSN: 1694-2108 | Vol. 5, No. 1. SEPTEMBER 2013 2


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

machine learning methods like Support vector machines (SVM), Naive Bayes
(NB), Maximum Entropy (ME), K-Nearest neighbourhood, ID3, C5 and centroid
classifier classification have been already applied in sentiment classification.
(Songho tan et al., 2008 ; Qingliang et al., 2009; Rui Xia et al., 2011, Hassan Saif
et al, 2012). Various comparative studies have been done to find the best choice
of machine learning method for sentiment classification. As the result of a
sentiment analysis varies according to the composition method of a domain and
feature and the type of learning algorithm, a need to perform comparative analysis
arises.
Inspite of using various single classifiers, many works has been done in recent
years focussing on the combination of classifier like hybrid and ensemble
methods to improve the classification accuracy (Rudy Prabowo et al.,2009;
Whitehead et.al., 2008 ) . From the literature review done, it is also observed that
only a very few studies has been conducted in analysing the performance of
classifiers on class imbalanced condition.Most of the existing works are based on
product review datasets because a review usually focuses on a specific product
and contains little irrelevant information. These datasets have an even number of
positive and negative reviews, however in the product domain it is typical that
there are substantially more positive reviews compared to negative reviews. Our
work will therefore compare the effects of a balanced and unbalanced dataset.
The main objective of the work is to perform feature based sentiment mining to
decide whether the opinions are positive or negative. Moreover the main focus in
on evaluating the performance of various classifiers in two different data
distributions i.e. class balanced and class imbalanced.
3. METHOD
The following list describes the methodology of the proposed work
i. Identify the data sources.
ii. Create two datasets i.e balanced and unbalanced for each product.
iii. Preprocess the data to remove noise and redundancy.
iv. Identify the features for creating a word vector model.
v. Develop two word vector model
a. Model A using balanced dataset with term presence method.
b. Model B using unbalanced dataset with term presence method.
vi. Develop the classification models
a. Nave Bayes
b. Support Vector machine
c. C5
vii. Predict the result for classification and compare with the actual results.

ISSN: 1694-2108 | Vol. 5, No. 1. SEPTEMBER 2013 3


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

viii. Evaluate the performance of classifiers using overall misclassification


rate.

a. Classification Methods
The following section describes about the various classification methods used in
this work. Most of the literatures showed that SVM and Naive Bayes and C5 are
perfect methods in sentiment classification.
Nave Bayes Classifier
Bayesian learning algorithms use probability theory as an approach to concept
classification. Bayesian classifiers produce probabilities for class assignments,
rather than a single definite classification. Nave Bayes classifier (NBC) is
perhaps the simplest and most widely studied probabilistic learning method. It
learns from the training data, the conditional probability of each attribute Ai,
given the class label C. The strong major assumption is that all attributes Ai are
independent given the value of the class C. Classification is therefore done
applying Bayes rule to compute the probability of C and then predicting the class
with the highest posterior probability. The assumption of conditional
independence of a collection of random attributes is very critical.
Support Vector Machines
Support Vector Machines (SVMs) are pattern classifiers that can be expressed in
the form of hyper-planes to discriminate positive instances from negative
instances. SVMs have successfully been applied to numerical tasks, including
classification. They perform structural risk minimization and identify key
"support vectors". Risk minimization measures the expected error on an
arbitrarily large test set with the given training set and SVMs non-linearly map
their n-dimensional input space into a high dimensional feature space. In this high
dimensional feature space a non-linear classifier is constructed. Given a set of
points which belong to either of two classes, a linear SVM finds the hyper-plane
leaving the largest possible fraction of points of the same class on the same side,
while maximizing the distance of either class from the hyper-plane. The hyper
plane is determined by a subset of the points of the two classes, named support
vectors, and has a number of interesting theoretical properties.
C5
C5 is one of the simplest forms of supervised learning algorithm. It has been
extensively used in many areas such as statistics and machine learning for the
purposes of classification and prediction. C5 classifiers can be generalize beyond
the training sample so that unseen samples could be classified with as high
accuracy as possible. C5s are non-parametric and a useful means of representing

ISSN: 1694-2108 | Vol. 5, No. 1. SEPTEMBER 2013 4


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

the logic embodied in software routines. C5 takes as input a case or example


described by a set of attribute values, and outputs a Boolean decision . In the
classification case, when the response variable takes value in a set of previously
defined classes the node is assigned to the class which represents the highest
proportion of observations
b. Data Preparation
In order to compare the two classifiers, naive Bayes and language model, we
looked at the results based on a balanced and an unbalanced dataset and also the
consistency of results when the dataset was different sizes. To conduct our
experiments we created the following datasets;
Unbalanced dataset - all reviews extracted, a realistic representation
of the ratio of positive and negative reviews (Model B)
Balanced dataset - all negative reviews and the same number of
positive reviews.(Model A)
We used the publicly available customer review datasets (http://www.cs.uic.edu
/~liub/FBS/sentiment-analysis.html). These dataset contains annotated customer
reviews of various products. We have selected reviews of 3 different products like
Nokia 6600, iPod, Nikon coolpix. There reviews are presented in plain text
format. The dataset consists of negative, positive and neutral reviews. In this
binary classification problem, we have considered only positive and negative
reviews. The product attribute discussed in the review sentences are collected for
each review sentences. Unique product features are grouped, which results in a
final list of product attributes (features). A word vector representation of review
sentences is created for Model A and B. The word vector set can then be reused
and applied to different classifiers. To create the word vector list, the review
sentences are pre-processed. The descriptions of review dataset models to be used
in the experiment are given in Table 1. For our investigation we created two data
model, one balanced and other unbalanced. The dataset is made balanced by
random sampling.
Table 1. Descriptions of review dataset

Product Model A (Balanced) Model B (Unbalanced)


Positive Negative Positive Negative
Nokia 6600 175 175 414 186
Ipod 120 120 328 122
Nikon coolpix 98 98 176 98

ISSN: 1694-2108 | Vol. 5, No. 1. SEPTEMBER 2013 5


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

4. RESULTS & DISCUSSION


The classification model used is employed using Weka tool. The parameters
for classifiers use the default values available in the tool. Experiments used a 10-
fold cross validation. Each dataset was randomly spilt into 10 folds, 9 folds used
for training and 1 fold used for testing. The average of the 10-folds was then used
for performance analysis. In order to evaluate the accuracy of the classification
model, overall misclassification rate is used as a metric. Misclassification rate is
defined as the ratio of number of wrongly classified reviews to the total number of
reviews classified by the prediction system. Misclassification rate considers both
positive and negative reviews in formula. We first focus on the commonly used
balanced dataset. Table 2. and fig 1 show the overall misclassification rate of each
classifier. Then we focus on the unbalanced dataset. Table 3. and fig 1. shows the
overall misclassification rate of each classifier.

Table 2: Results of balanced dataset

Product Overall misclassification rate


Naive Bayes SVM C5
Nokia 6600 12.6 11.1 12.1
Ipod 11.3 9.8 10.5

Nikon coolpix 10.5 9.2 9.9

Table 3: Results of Unbalanced dataset

Product Overall misclassification rate


Naive Bayes SVM C5
Nokia 6600 22.8 20.8 21.7
Ipod 21.4 19.6 20.8
Nikon coolpix 19.3 18.5 19.1

The overall misclassification rate is reduced considerably for Model A than


Model B for all three methods used. This is due to the class imbalance nature of
Model B. Model B has nearly 50% more positive reviews than negative reviews.
This results in higher Type I error (number of negative reviews wrongly classified
as positive). Hence increases the overall misclassification rate.

ISSN: 1694-2108 | Vol. 5, No. 1. SEPTEMBER 2013 6


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Figure 1. Overall misclassification of Model A and Model B

5. CONCLUSION
The major contribution of the paper has been the application of three different
machine learning algorithms to predict sentiment orientation of the review
sentences and to evaluate the effect of class distribution on classifier performance.
Three different product review datasets were utilized for this task. The results
suggest that the machine learning algorithms can be successfully applied in
sentiment mining under balanced distribution of classes. Though classifiers
perform better in balanced distribution, it has been found that among all
classifiers (c5, NB and SVM), SVM performs better in balanced and imbalanced
conditions. While many researches continue, practitioners and researchers may
apply various sampling methods for under sampling and over sampling to
construct a balanced model from an imbalanced model. We plan to replicate our
study to predict the models based on hybrid machine learning algorithms under
data imbalanced condition.

REFERENCES
[1]. A. Balahur, J.M. Hermida, A. Montoyo, Detecting implicit expressions of emotion in
text: a comparative analysis, Decision Support Systems 53 (2012) 742753.
[2]. Hassan Saif, Yulan He and Harith Alani, Semantic Sentiment Analysis of Twitter
Knowledge Media Institute, The Open University, United Kingdom, 2012.
[3]. Kunpeng Zhang, Ramanathan Narayanan, Voice of the Customers: Mining Online
Customer Reviews for Product, 2010.

ISSN: 1694-2108 | Vol. 5, No. 1. SEPTEMBER 2013 7


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

[4]. Magdalini Eirinaki, Shamita Pisal , Japinder Singh , Feature-based opinion mining
and ranking , Journal of Computer and System Sciences 78 (2012) 11751184.
[5]. M. Whitehead, L. Yaeger, Opinion mining using ensemble classification models, in:
International Conference on Systems, Computing Sciences and Software Engineering
(SCSS 08), Springer, 2008.
[6]. Pang, Bo., & Lee, L. (2004). A opinional education: Opinion analysis using
subjectivity summarization based on minimum cuts. In Proceedings 42nd ACL.
[7]. Popescu, A. M., Etzioni, O.: Extracting Product Features and Opinions from Reviews,
In Proc. Conf. Human Language Technology and Empirical Methods in Natural
Language Processing, Vancouver, British Columbia, 2005, 339346.
[8]. Qingliang Miao, Qiudan Li, Ruwei Dai , AMAZING: A sentiment mining and
retrieval system, Expert Systems with Applications 36 (2009) 71927198.
[9]. Sujan Kumar Saha, Sudeshna Sarkar, Pabitra Mitra , Feature selection techniques for
maximum entropy based biomedical named entity recognition , Journal of
Biomedical Informatics 42 (2009) 905911
[10]. Rudy Prabowo, Mike Thelwall, Sentiment analysis: A combined approach .,
Journal of Informatics , (2009) 143157.
[11]. Rui Xia , Chengqing Zong, Shoushan Li, Ensemble of feature sets and classification
algorithms for sentiment classification, Information Sciences 181 (2011) 11381152.
[12]. Songbo Tan , Jin Zhang, An empirical study of sentiment analysis for chinese
documents , Expert Systems with Applications 34 (2008) 26222629.
[13]. Turney, P. D. 2002. Thumbs up or thumbs down? Semantic orientation applied to
unsupervised classification of reviews, In Proceedings of the 40th Annual Meetings of
the Association for Computational Linguistics.
[14]. X. Fu, G. Liu, Y. Guo, Z. Wang, Multi-aspect sentiment analysis for Chinese online
social reviews based on topic modeling and HowNet lexicon, Knowledge-Based
Systems 37 (2013) 186195.

ISSN: 1694-2108 | Vol. 5, No. 1. SEPTEMBER 2013 8


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Demosaicing and Super-resolution for


Color Filter Array via Residual Image
Reconstruction and Sparse Representation
Jie Yin
School of Communication and Information Engineering
Shanghai University, Shanghai, China
Guangling Sun
School of Communication and Information Engineering
Shanghai University, Shanghai, China

Xiaofei Zhou
School of Communication and Information Engineering
Shanghai University, Shanghai, China

Abstract
A novel approach of demosaicing and super-resolution for Color Filter Array (CFA) based
on residual image reconstruction and sparse representation is proposed. Given an
intermediate image produced by certain demosaicing and super-resolution, a residual image
between a final reconstruction image and the intermediate image is reconstructed using
sparse representation. Richer edges and details are found in the final reconstruction image.
Specifically, a generic dictionary is learned from a large set of composite training data
composed of intermediate data and residual data. The learned dictionary implies a mapping
between the two data. A specific dictionary adaptive to the input CFA is learned thereafter.
Using the adaptive dictionary, the sparse coefficients of intermediate data are computed and
transformed to predict residual image. The residual image is added back into the
intermediate image to obtain the final reconstruction image. Experimental results confirm
the state-of-the-art performance in terms of PSNR and subjective visual perception.

Keywords
Demosaicing; Super-resolution; Residual image reconstruction; Sparse representation

1. INTRODUCTION
Single chip named color filter array (CFA) is used in most resource
constrained digital image/video capture devices [1]. The most popular Bayer
pattern is illustrated in figure 1. Often, a full color and enlarged image
produced from a low spatial resolution CFA are both required. Demosaicing
is executed to get a full color image and super-resolution (SR) is executed to

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 1


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

get an enlarged spatial resolution image. Generally, there are two categories
of schemes to achieve this goal in literature: The first is to demosaic CFA
then superresolve the demosaiced CFA [2].Obviously, any approach for
general image SR can be used in SR step; The second is to superresolve
CFA then demosaic the superresolved CFA[3,4]. The major drawback of
this method is that good methods for superresolving general image are not
suitable for CFA image. Moreover, it is very difficult to design an
appropriate SR solution for complex CFA pattern and has to be design
different SR solution for different CFA pattern. The comparison indicates
the more feasibility and flexibility of the former scheme.
While plenty of both demosaicing and SR techniques have been investigated
and their combinations have obtained satisfied results [2, 5, 6], there are still
much improvement work to do. For instance, as stated in [2], after
demosaicing, if SR is implemented in multi-spectral color space
individually, the color artefacts introduced by demosaicing will be worse
during SR. As chromaticity channel is much smoother than intensity
channel [2], SR in intensity channel and chromaticity channel will get better
performance. We will also study the problem along with this direction.
In this work, a full color and enlarged image as an intermediate result is first
obtained by using certain demosaicing and SR method, and then relying on
it, a residual image making use of sparse representation is found to
complement edges and details being lack of in the intermediate image. Two
aspects are distinguished from our previous work [7]: One is a specific
dictionary adaptive to the current image is further learned to improve the
residual image reconstruction quality. The other is sparse coefficients of
intermediate data are transformed to obtain residual image instead of a
simple strategy of scaled residual image. It is necessary to point that in
essence, arbitrary demosaicing and SR techniques or their combinations are
allowed to get the intermediate image; however, better demosaicing and SR
techniques are still expected to achieve more satisfied results. Hence in this
work, an intermediate image is obtained by using methods proposed in [2].
The rest of the paper is structured as follows: section 2 outlines and
discusses the proposed method in detail; section 3 provides experimental
results; section 4 concludes the paper.

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 2


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

2. THE PROPOSED METHOD


2.1Framework of proposed method
Framework of proposed method has been outlined in table 1.As mentioned
above, since it is favoured that intensity and chromaticity channel are
processed respectively and the intermediate image is acquired by method
presented in [2], the intermediate result will contain three such channels:
Green channel, RG difference and BG difference channels. The two
difference channels are R G and B G .It is admitted that the Green
channel corresponds to intensity channel and the two difference channels
correspond to chromaticity channel. Generally, structural and texture
information is contained in intensity channel and chromaticity channel is
merely related with chromatic information. Therefore, the residual image is
only reconstructed for Green channel and nothing done for two difference
channels without degrading visual quality. Another important issue must be
emphasized that the color space of Green and two differences certainly
could be transformed into other color space such as YCbCr; nevertheless,
the artefacts involved in Green and two difference channels brought by
demosaicing and SR will be accumulated in Y channel. That is the reason
why we maintain Green channel, RG difference and BG difference
channels.
2.2. Dictionary learning for residual image reconstruction
In recent years, sparse representation based on dictionary learned from data
has been applied successfully to cope with image restoration tasks such as
image denoising, deblurring and SR et al [8]. Inspired by [9], we also use
sparse representation and dictionary learning to address our problem.
Different from [9], in addition to learning a generic dictionary, we also learn
a specific dictionary adaptive to the input CFA image content
characteristics.
2.2.1 Generic dictionary learning
Given a training image set, the generic dictionary is learned as follows:
First, it is simulated that an original image m is down sampled to a low
spatial resolution image and further down sampled to a CFA image with
Bayer pattern m ' (other patterns easily can be extended). Second, m ' is

demosaiced and superresolved with certain techniques to get m . This

procedure is performed for each training image. Third, from m and m (in
fact the Green channel of them), numerous image patch pairs in which two

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 3


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

patches have same sizes and position at same locations are extracted; from
two patches p and p , residual patch pr is produced by pr = p p and all

such residual patches compose a residual image. Finally, pr and edges of

p are connected to form training data. To detect edges from p , first-order

edge extraction operators with horizontal, vertical, diagonal and

anti-diagonal directions are convolved with m . The four operators have


been shown in figure 2. A target function of dictionary and sparse
representation is designed in model (1) constrained by unit vector of atom:

, S } arg min ( 1 DS X
{D
2
S ) s.t. d Tj d j 1, j 1, 2,...K (1)
F 11
,
D ,S 2
Where D, S are dictionary and representation coefficients matrix
respectively, and X, K and denote training data matrix, the number of
dictionary atom and regularization factor respectively. Alternative scheme is
utilized to solve the function containing l1,1-norm regularization of
representation coefficient matrix: Given dictionary, sparse coefficients are
calculated and given sparse coefficients, dictionary is learned. Algorithms
used for obtaining dictionary is the one introduced in [10] and algorithms
used for obtaining sparse coefficients of each data is coordinate decent
introduced in [11].
2.2.2 Adaptive dictionary learning
To make the generic dictionary more adaptive to the content of input CFA
image, the generic dictionary is further modified according to the input CFA
image. Specifically, the input CFA is demosaiced first and the filled Green
channel is used to generate training data with the same manner as the
generic dictionary learning does. Also, the learning process is driven with
the same methods as the generic dictionary learning. Only difference lies in
the ways of dictionary initialization: random training data compose initial
dictionary for generic dictionary learning, and the learned generic dictionary
compose initial dictionary for adaptive dictionary learning.
2.2.3 Separated data representation

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 4


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Once D is obtained, it is separated into residual dictionary and edge


dictionary denoted as D r and D e respectively. Meanwhile, any training data

x could be approximated well by D and s as follows:

,
x Ds d Tj d j , j1 , ,...K
1 2 (2)

Equation (2) could also been rewritten as residual part and edge part:

x r D r s
x e D e s
s1 d1e

s d e

d1e d1e , d 2e d 2e ,...d Ke d Ke s d1e , d 2e ,...d Ke 2 2
...
,

(3)

sK d Ke

D e s

d d 1, j 1,2,...K
e T
j
e
j

Where D e denotes a normalized dictionary of which each atom is a unit


vector and denotes l2 -norm of a vector. Equation (3) implies that the

sparse representation coefficients obtained from normalized edge dictionary


could be used to approximate the residual data by relying on the relation
between s and s .
2.3. Residual Green channel reconstruction
After an initial enlarged and full Green channel G is obtained, it will be
convolved with four direction edge extraction operators. Then the convolved
result of Green channel is partitioned into overlapping patches and each
patch is represented by vector y e composed of four direction edge features.
The optimal sparse linear combinations of atoms of edge dictionary
for y e are searched through minimizing following formula:

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 5


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

1 e 2
s arg min ( D s ye s 1 )
s 2 2
(4)
Similar to a problem contained in (1), a convex optimization regularized by

l1 -norm is solved by coordinate decent algorithm [11]. Obviously, s plays

the same role as s shown in equation (3) so that residual patch could be
predicted as follows:

s1 d1e

s2 d 2e
yr D
r

...

sK d Ke
(5)

Once each residual patch yr is reconstructed sequentially, the whole residual

Green channel Gr is reconstructed. Consequently, the final reconstructed

Green channel G is obtained as follows:

G G G
r (6)

Finally, B channel and R channel are reconstructed as follows:

R G R G
B G B G (7)

3. EXPERIMENTAL RESULTS
3.1. Training set option and parameter setting
An image set provided by [9] is selected as training set. A patch size of 6*6
is selected so that the dimension of edge component is 36*4=144 and the
dimension of residual component is 36. Thus the total dimension of
complete dictionary atom is 180. The number of dictionary atom is chosen
as 1024 and the regularization factor is set as 0.5. To make a comparison
with [2], enlarging factor 2 is tested.
3.2. Testing image and parameter setting
Kodak database is chosen as testing image set, see figure 3. Overlapping pixel
number between adjacent patches is 2.

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 6


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

3.3. Evaluation by PSNR and subjective visual perception


The proposed method is compared with Zhangs scheme proposed in [2]
both in terms of PSNR and subjective visual perception. Table 2 has listed
average PSNR of three channels. For all testing images, higher performance
has been achieved by the proposed method. From subjective visual
perception criterion, enhancements of edges and details can be observed
obviously in figure 4. In this experiment, three images are tested to
demonstrate effects. Image (a) is input CFA image; Image (b) is a result of
Zhangs method and image (d) is a result of our method. To show the
quality of residual image reconstruction, residual images of the testing
image have been shown in image (c). We can see that from large to small,
multi-scale residual edges and details have been found by our method.

Figure 1.Bayer pattern of CFA

1 1 0 0 0 0 - 1
0 0 0 0 0 0 0
[1 0 1]
1 0 0 1 1 0 0

horizontal vertical anti-diagonal diagonal


Figure 2. Edge extraction operators

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 7


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Figure 3. Twenty-four testing images from Kodak PhotoCD (referred to as image 1 to image
24, enumerated from left to right and top to bottom).

Table 1. Framework of proposed method

1. Input: A low spatial resolution CFA.


2. Main Steps:
2.1 The input CFA is demosaiced and superresolved to produce G , RG
and BG.

2.2 A residual image of G is reconstructed by using an adaptive


dictionary and sparse representation.

2.3 The residual green channel is added back into G and the final green
channel G is obtained accordingly.
3. Output: A high spatial resolution and full color image by changing
from G, RG difference and BG difference into RGB.

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 8


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Table 2. PSNR (dB) of proposed and Zhangs scheme


Image Image
Zhangs proposed Zhangs proposed
name name
Kodim01 24.82 25.22 Kodim13 22.49 22.68
Kodim02 31.16 31.45 Kodim14 26.18 26.41
Kodim03 31.99 32.29 Kodim15 30.97 31.19
Kodim04 31.08 31.28 Kodim16 29.88 30.07
Kodim05 24.42 24.91 Kodim17 30.36 30.72
Kodim06 26.28 26.52 Kodim18 26.61 26.97
Kodim07 30.90 31.08 Kodim19 26.70 27.21
Kodim08 22.12 22.46 Kodim20 29.79 30.42
Kodim09 30.49 30.98 Kodim21 26.87 27.15
Kodim10 30.64 31.00 Kodim22 28.25 28.58
Kodim11 27.42 27.77 Kodim23 31.92 32.31
Kodim12 31.57 31.94 Kodim24 25.26 25.51

(a) (b)

(c) (d)
Figure 4-1

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 9


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

(a) (b)

(c) (d)
Figure 4-2

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 10


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

(a) (b)

(c) (d)
Figure 4-3

Figure 4. Comparison of two methods for the purpose of subjective visual perception

4. CONCLUSION
In this paper, a novel scheme of demosaicing and SR for CFA via residual
image reconstruction and sparse representation is presented. Using a training
image set, a mapping between edge of filled and superresolved Green
channel and corresponding residual image is obtained by dictionary
learning. Given an intermediate Green channel, edges are extracted from it
and sparse coefficients are searched using edge part in dictionary. The
transformed sparse coefficients are utilized to linearly combine residual part
of the dictionary to generate the residual image. Finally, the residual image
is added back into the intermediate Green channel to produce a final
reconstruction Green channel. Intermediate RG and BG channels are
retained. The proposed scheme is capable of improving reconstruction
quality of arbitrary demosaicing and SR method. The experimental results
have demonstrated the state-of-the-art results in both PSNR and subjective
visual perception.

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 11


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

5. ACKNOWLEDGMENTS
The authors are grateful to Professor Shuozhong Wang for his assistance in
improving the language usage.

REFERENCES
[1] R. Lukac, K.N. Plataniotis. Color filter arrays: design and performance analysis, IEEE
Transactions on Consumer electronics, 51 (4) (2005) 12601267.
[2] L. Zhang, David Zhang. A joint demosaicingzooming scheme for single chip digital
color cameras, Computer Vision and Image Understanding 107 (2007) 1425.
[3] K.-L. Chung et al. New joint demosaicing and zooming algorithm for color filter array,
IEEE Transactions on Consumer electronics, 55 (3) (2009) 14771486.
[4] R. Lukac, K.N. Plataniotis, D. Hatzinakos. Color image zooming on Bayer pattern,
IEEE Transactions on Circuits and Systems for Video Technology 15 (2005)
14751492.
[5] X.Li, Bahadir Gunturk, L.Zhang. Image demosaicing: a systematic survey,
Proceedings of the SPIE, Vol 6822(2008) 68221J-68221J-15.
[6] G. Cristobal et al. Superresolution imaging: a survey of current techniques.
Proceedings of the SPIE, Vol 7074(2008) 70740C.
[7] G.Sun, Y.Chen, Z.Shen.Demosaicking and zooming for Color Filter Array via residual
image reconstruction. Proceedings of the Second International Conference on Internet
Multimedia Computing and Service, 2010, 139142.
[8] Michael Elad, Mario A.T. Figueiredo. On the role of sparse and redundant
representations in image processing, Invited Paper, Proceedings of the IEEE Special
Issue on Applications of Sparse Representation and Compressive Sensing,
2010,972982.
[9] J.Yang, J.Wright, T.Huang, Y.Ma. Image super-resolution via sparse representation.
IEEE Transactions on Image Processing, 19(11) (2010) 28612873.
[10] M. Yang, L. Zhang, J. Yang and D. Zhang. Metaface learning for sparse representation
based face recognition. In ICIP, 2010.
[11] Jerome Friedman, Trevor Hastie, and Rob Tibshirani. Regularization paths for
generalized linear models via coordinate descent. Journal of Statistical Software, 33(1)
(2010) 122.

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 12


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Determining Weight of Known


Evaluation Criteria in the Field of
Mehr Housing using ANP Approach
Saeed Safari
MS degree in industrial engineering, Islamic Azad University, Arak, Iran.

Mohammad Shojaee
MS degree in governmental management, Arak, Iran.

Mohammad Tavakolian
MS degree in governmental management, Arak, Iran.

Majid Assarian
MS degree in governmental management, Arak, Iran.

ABSTRACT
According to considerable increase in house price especially in recent years,
organizations and departments are trying to resolve this need at least in their
employees. Based on this fact, in many of organizations, units or better to
say companies called cooperation housing companies are established which
try to resolve this need in their staffs. On the other hand, organizations are
analyzing and evaluating their performance continuously. But this kind of
evaluation is not correct because of relatively high economic turbulence in
recent years and so regression and improving of organizations or companies
are not done correctly. And also this evaluation has to be in comparison
with opponents to be more reliable. In this paper, based on these facts, at
first experts opinion and performed researches in this field have been
identified and then with establishment of communication network among
these criteria, using the ANP (Analytic Network Process) model. We have
evaluated and weighted these criteria and ultimately, suggestions have been
proposed in order to improve efficiency of evaluation criteria and also Mehr
housing cooperation.

Keywords
Mehr Housing Cooperative, Evaluation Criteria, Matrix of Paired
Comparisons, ANP Model.

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 1


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
1. INTRODUCTION
In the modern era, considerable developments in management science,
existing evaluation network is unavoidable so that lack of evaluation
network in different dimensions of organization include using resources and
facilities, staffs, objectives and strategies is considered as one of symptoms
of organization illness.
Each organization in order to know utility average and quality of his
activities especially in complicated and active environment, has an urgent
needs to evaluation network. On the other hand, lack of control and
evaluation network in one system means lack of communication with
internal and external environment which its consequences are oldness and
finally organization death. Its possible that incidence of organization death
is not felt by organization top managers due to not sudden occurrence. So,
studies show that lack of feedback network makes possibility of necessity
reform for growth and improvement in organization activities, impossible.
The consequence of this phenomenon is organization death Performance
evaluation matter has challenged researchers and users for many years. In
the past, Trade organizations were considering financial indicators as
performance evaluation instrument Until Kaplan & Norton in early 80
decade, after investigation and evaluation of management systems, revealed
many of inefficiencies of this information for performance evaluation in
organizations that this inefficiency is resulted from increase in organization
complication, environment mobility and market competition.(Kaplan &
Norton,1992) .
Current research using ANP method and having mentioned approach
identifies functional dimensions of active housing cooperation companies in
Arak city and determines importance of each effective factor.

2. RELATED WORKS

2.1 ANP model:


The ANP is the generalization of the AHP. ANP includes the AHP as a
special case and can be used to treat more sophisticated decision problems
than the AHP. The ANP makes possible to deal systematically with all
kinds of dependence and feedback in a decision system. The ANP is a
coupling of two parts. The first consists of a control hierarchy or network of
criteria and sub-criteria that control the interactions in the system under
study. The second is a network of influences among the elements and
clusters (Saaty, 2001). A decision problem that is analyzed with the ANP is
often studied through a control hierarchy or network. A decision network is
structured of clusters, elements, and links. A cluster is a collection of

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 2


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
relevant elements within a network or sub-network. For each control
criterion, the clusters of the system with their elements are determined. All
interactions and feedbacks within the clusters are called inner dependencies
whereas interactions and feedbacks between the clusters are called outer
dependencies (Saaty, 1999). Inner and outer dependencies are the best way
decision-makers can capture and represent the concepts of influencing or
being influenced, between clusters and between elements with respect to a
specific element. Then pairwise comparisons are made systematically
including all the combinations of element/cluster relationships. ANP uses
the same fundamental comparison scale (1-9) as the AHP. This comparison
scale enables the decision-maker to incorporate experience and knowledge
intuitively (Harker and Vargas, 1990) and indicate how many times an
element dominates another with respect to the criterion. It is a scale of
absolute (not ordinal, interval or ratio scale) numbers. The decision maker
can express his preference between each pair of elements verbally as
equally important, moderately more important, strongly more important,
very strongly more important, and extremely more important. These
descriptive preferences would then be translated into numerical values 1, 3,
5, 7, 9, respectively, with 2, 4, 6, and 8 as intermediate values for
comparisons between two successive judgments. Reciprocals of these
values are used for the corresponding transposed judgments.

2.2 Mehr Housing Scheme:


As of January 2011, the banking sector, particularly Bank Maskan has given
loans up to 102 trillion rials ($10.2 billion) to applicants of Mehr housing
project. Under this scheme real estate developers are offered free lands in
return for building cheap residential units for first-time buyers on 99-year
lease contracts. The government then commissioned agent banks to offer
loans to the real estate developers to prepare the lands and begin
construction projects in an attempt to increase production and create
equilibrium in the supply and demand curve (2008). Close to 400,000 units
have been built and permits have been issued for another 12,000.[11] Mehr
Housing project is expected to provide 600,000 residential units in its first
phaseAbout 3.7 million people have so far registered for Mehr Housing
Plan (2008). About 10 million rials is to be paid by applicants for preparing
the land and another 10 million to be given by the government in the form
of banking facilities. Applicants should pay about 20 percent of the
construction costs. In addition, about 140 million rials worth of housing
loans will be granted to them (10,000 rials=1 USD in 2008).[12] While
most Iranians have difficulties obtaining small home loans, 90 persons have
managed to secure collective facilities totaling $8 billion from banks.[12]

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 3


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

3. METHODOLOGY

3.1 Choosing criteria of performance evaluation:


Basis of each evaluation is criteria which we assess studied cases. So, it is
necessary to obtain evaluation indicators by studying pervious researches,
current papers in this field and also consulting with experts. So that in this
research, we have done all kinds of mentioned process and we finally
identify 14 criteria in the field of housing cooperation which have been used
in this research.

Criteria

1-The amount of state funds (mortgage) allocated to each applicant.


2- Number of cooperative members.
3- Average of participation of cooperative members (number of
meeting hours during one month).
4- Number of replaced managers during project.
5- Cooperative member education (language variable).
6- First charge of each member.
7-Monthly carrying charges (without considering mortgage).
8- Final cost of each square meter of residential apartments (total payment
divide by measurement of each apartment).
9- During of time preparing each apartment (Days ofthe
projectdivided by the numberof apartments).
10- Number of apartments in each flat.
11- Measurement of each apartment.
12-Condominium rate based on square meter.
13- Number of built blocks.
14- Number of people in reservation list of each company.

3.2 Clustering criteria:


After determining criteria, in order to weight them, communication network
among criteria has to be established so that we can rank criteria using ANP
technique. Criteria network and their influences on each other and also
amount of these effects are as following:

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 4


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
Cluster1: Housing criteria
A1: The amount of state funds (mortgage) allocated to each
applicant.
A2: During of time preparing each apartment (Days ofthe projectdivided to
the numberof apartments).
A3: Number of apartments in each flat.
A4: Measurement of each apartment.
A5: Condominium rate based on square meter.
A6: Number of built blocks.
A7: Final cost of each square meter of residential apartments (total payment
divide to measurement of each apartment).
Cluster2: Company criteria
B1: Number of cooperative members.
B2: Average of participation of cooperative members (number of meeting
hours during one month).
B3: Number of replaced managers during project.
B4: Number of people in reservation list of each company.
Cluster3: Member criteria
C1: First charge of each member.
C2: Monthly carrying charges (without considering mortgage).
C3: Cooperative member education (language variable).

Figure 1: Communication network of criteria.

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 5


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
3.3 Example of paired comparison matrixes, calculating
consistency Rate:

To obtain weights,Geometric Mean method [13] has been used. For


example, to calculate weight in the first matrix, following steps have
been done [14]:

Matrix1: Sample Paired comparison matrix.


A1 A2 A3 A4 A5 A6 A7
C1

A1 1 3 5 5 7 7 3
A2 0.333 1 5 3 5 5 1
A3 0.200 0.200 1 1 1 1 0.333
A4 0.200 0.333 1 1 1 3 0.333
A5 0.143 0.200 1 1 1 1 0.143
A6 0.143 0.200 1 0.333 1 1 0.143
A7 0.333 1 3 3 7 7 1

As we can see in figure 1, cluster 3 has effect on cluster 1. So as an example


paired comparison matrix of cluster 1 with 7 criteria and their effects based
on first criterion of cluster 3 are mentioned in above matrix and in next step,
obtained weights are calculated.
7
W1 = 1 3 5 5 7 7 3 = 3.780
Weight:
7
W2 = 0.333 1 5 3 5 5 1 = 1.993
3.780
7
W3 = 0.200 0.200 1 1 1 1 0.333 = 0.540 1.993
0.540
7
W4 = 0.200 0.333 1 1 1 3 0.333 = 0.679 0.679
7
W5 = 0.143 0.200 1 1 1 1 0.143 = 0.456 0.456
0.390
7
W6 = 0.143 0.200 1 0.333 1 1 0.143 2.040

= 0.390
7
W7 = 0.333 1 3 3 7 7 1 = 2.040
[W1 + W2 + W3

Sum: 9.877

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 6


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

In next step, obtained weights are normalized so that:

3.780 1.993
WN1 = 9.877 = 0.383 WN2 = 9.877 = 0.202 Norm. W.

0.383
0.540 0.679 0.202
WN3 = 9.877 = 0.55 WN4 = 9.877 = 0.069
0.055
0.069

0.456 0.390 0.046


WN5 = 9.877 = 0.046 WN6 = 9.877 = 0.039 0.039
0.207

2.040
WN7 = = 0.207
9.877

To calculate consistency rate in the first matrix, we are doing following


steps and Excel software has been used that is considered as an example:

Matrix2: Paired comparison matrix with vertical sum

A1 A2 A3 A4 A5 A6 A7
C1

A1 1 3 5 5 7 7 3
A2 0.333 1 5 3 5 5 1
A3 0.200 0.200 1 1 1 1 0.333
A4 0.200 0.333 1 1 1 3 0.333
A5 0.143 0.200 1 1 1 1 0.143
A6 0.143 0.200 1 0.333 1 1 0.143
A7 0.333 1 3 3 7 7 1

Sum: 2.352 5.933 17.000 14.333 23 25.000 5.952

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 7


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

1 [1 + 0.333 + 0
2 [3 + 1 + 0.2 +
3 [5 + 5 + 1 + 1
4 [5 + 3 + 1 + 1
5 [7 + 5 + 1 + 1
6 [7 + 5 + 1 + 3
7 [3 + 1 + 0.333
0.383 0.20
= [ +
2.352 5.93
= 7.288

7.288 7
. . = = 0.0
6
0.048
. . = = 0.036
1.32

Table 1: Consistency Index and Consistency Ratio of example matrix

max: Reasonable and Acceptable analysis:

0.900
1.197 C.I.(Consistency Index)
0.929 0.048
0.986 C.R. (Consistency Ratio)
1.061 0.036
0.986
1.229 Consistency Threshold:

7.288 0.1 Consistence

Because of this matter that consistency is less than 0.1, so paired


comparison matrix is consistent and obtained weights can be used in next
steps and can be put in super matrix.

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 8


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

3.4 Performing ANP methodology


To obtain weights by paired comparison matrixes,the eigenvectors of each
matrix are put in matrix and ANP super matrix of communication network
is established. That is visible in matrix number 3.

Matrix 3: Super matrix (eigenvectors of each matrix)

In this step, we normalize super matrix using following formula in order to


be able to calculate criteria weight.

Linear normalization [15,16]:



= ,
=1
= 1, , ; = 1, , .

Matrix4: Normalized super matrix using linear normalization.

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 9


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

As following, super matrix will be powered using Matlab software [17] to


reach same numbers. To do this, once matrix is powered to 8 and then is
powered to 9 and finally we get averaged of these two matrixes to reach
stable state of repetition and this average matrix is considered as a final
weight that is shown to 3 decimal figures as follows:

Matrix5: Normalized super matrix powered to 8.

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 10


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
Matrix6: Normalized super matrix powered to 9.

Matrix7: Final super matrix (average of matrix 8 and 9)

5. RESULTS
Therefore, criteria weights which have been obtained from the first column
of final super matrix (matrix7) are equivalent to:
Table 2: Criteria weights (based on first column of matrix 7)
Content Criteria name Weight
The amount of state funds (mortgage) allocated to each applicant.
A1 0.131

During of time preparing each apartment (Days ofthe projectdivided to the


A2 0.078
numberof apartments).
A3 Number of apartments in each flat. 0.036
A4 Measurement of each apartment. 0.037
A5 Condominium rate based on square meter. 0.026

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 11


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
A6 Number of built blocks. 0.020

Final cost of each square meter of residential apartments (total payment divide to
A7 0.112
measurement of each apartment).

B1 Number of cooperative members. 0.126

Average of participation of cooperative members (number of meeting hours


B2 0.068
during one month).
Number of replaced managers during project.
B3 0.103

B4 Number of people in reservation list of each company. 0.036


C1 First charge of each member. 0.096
C2 Monthly carrying charges (without considering mortgage). 0.071
C3 Cooperative member education (language variable). 0.055

6. CONCLUSIONS
To other researchers who want to use this method in their researches also it
is suggested to use Data mining methods and the meta-heuristic algorithms
in order to predict future criteria which can have effect on cooperation
housing projects and also for weighting these criteria. So they can compare
their results with ours and also obtain more reliable results and use them to
prevent unexpected problems. After finishing this research and achieving
final list of Mehr housing cooperative, criteria and ranking them in Arak
city which have been mentioned previously, factors which have been
involved in success or lack of success in number of these cooperative
housing companies based on known criteria in this research are identified so
that other cooperative housing companies based on these principles , try to
resolve their disadvantages and reinforcement their advantages in order to
show higher efficiency in the future. One of the prerequisite for a
successful cooperative is that members and directors receive adequate
training. The process of developing and operating a cooperative can be
complex. Finance (annual audits, monthly financial statements, finance
mechanisms for housing); management (parliamentary procedure. personnel
matters); and the philosophies of cooperation are but a few area s in which
members should have some knowledge. Training programs must also make
members aware of their rights, responsibilities and obligations within the
cooperative organization bylaws, and house policies. The Involvement of
members does not end with the development process. Members have both a
right and a responsibility to be informed about and involved in the operation
of their cooperative. Although, directors have authority to make many
decisions on behalf of the members whom elected them. They should not
act autonomously. Directors should work with members in developing a
consensus or vision on how the cooperative is run.

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 12


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
Once the cooperative is fully occupied and operational, it must begin
accumulating sufficient reserves to take care of contingencies. Unexpected
breakdown of equipment, uninsured property losses, a sudden increase in
the property tax bill . All lead to expenses which cash reserves are needed.
Sound financial planning calls for adequate financial reserves to be built up
year by year, so that as a building's plumbing, roof, or other systems wear
out, the cooperative can afford to replace them.
A budget is a plan for the cooperative's expected resources and expenditures
over a given period. Operating budgets are usually developed for a 1yaer
period, while capital budgets are more long-range. A housing cooperative's
budget is developed by its treasurer, the finance committee, the board and
manager, and sometimes the entire membership. Approval of a
cooperatives annual budget usually rests with the board directors, although
in some cooperatives, members may approve the budget based on a
recommendation from the board.
While many issues surface in managing cooperative housing, some issues
may be recurring. To save time and promote consistency, clear policies
should be developed on how to deal with these matters. While some of these
rules may be included in the bylaws, usually they will appear as policies in
the house policy manual. The purpose of house rules and policies is not to
put unnecessary burdens on individual members. Although the cooperative
may decide what issues to include in its house policies, several important
areas should be covered either in the bylaws or house policies.

REFERENCES
[1] Saaty, Thomas L. (2005). Theory and Applications of the Analytic Network Process:
Decision Making with Benefits, Opportunities, Costs and Risks. Pittsburgh,
Pennsylvania: RWS Publications. ISBN 1-888603-06-2.
[2] Saaty, Thomas L.; Luis G. Vargas (2006). Decision Making with the Analytic Network
Process: Economic, Political, Social and Technological Applications with Benefits,
Opportunities, Costs and Risks. New York: Springer. ISBN 0-387-33859-4.
[3] Saaty, Thomas L.; Brady Cillo (2009). The Encyclicon, Volume 2: A Dictionary of
Complex Decisions using the Analytic Network Process. Pittsburgh, Pennsylvania:
RWS Publications. ISBN 1-888603-09-7.
[4] In 2005, one book cited examples from the United States, Brazil, Chile, Czech
Republic, Germany, India, Indonesia, Italy, Korea, Poland, Russia, Spain, Taiwan, and
Turkey.
[5] Saaty, Thomas L.; Mjgan S. zermir (2005). The Encyclicon: A Dictionary of
Decisions with Dependence and Feedback Based on the Analytic Network Process.
Pittsburgh, Pennsylvania: RWS Publications. ISBN 1-888603-05-4.
[6] Harker, P.T. and Vargas, L.G. (1990), Reply to remarks on the analytic hierarchy
process,Management Science, No. 36, pp. 269-73.
[7] Saaty, T.L. (1999), Fundamentals of the analytical network process, Proceedings of
ISAHP 1999, Kobe, Japan, 12-14 August, pp. 48-63.

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 13


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
[8] Saaty, Thomas L. (1996). Decision Making with Dependence and Feedback: The
Analytic Network Process. Pittsburgh, Pennsylvania: RWS Publications. ISBN 0-
9620317-9-8.
[9] Saaty, T.L. (2001a), Decision Making in Complex Environments: The Analytic
Network Process for Decision Making with Dependence and Feedback, RWS
Publications, Pittsburgh, PA.
[10] No. 3870 | Domestic Economy | Page 4. Irandaily. Retrieved on 2012-07-16.
[11] http://www.turquoisepartners.com/iraninvestment/IIM-Aug10.pdf
[12] Presstv.com. Retrieved on 2012-07-16.
[13] "TPC-D Frequently Asked Questions (FAQ)". Transaction Processing Performance
Council. Retrieved 9 January 2012.
[14] Mitchell, Douglas W. (2004). "More on spreads and non-arithmetic means". The
Mathematical Gazette 88: 142144.
[15] Hwang, C. L., & Yoon, K. (1981). Multiple Attribute Decision Making. Berlin:
Springer-Verlag.http://dx.doi.org/10.1007/978-3-642-48318-9.
[16] Yoon, K. P., & Hwang, C. L. (1995). Multiple Attribute Decision Making: An
Introduction. London: Sage Pub.
[17] Hazewinkel, Michiel, ed. (2001), "Linear algebra software packages", Encyclopedia of
Mathematics, Springer, ISBN 978-1-55608-010-4.

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 14


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Application of the Collaboration


Facets of the Reference Model in
Design Science Paradigm

Lukasz Ostrowski
Dublin City University
Glasnevin, Dublin 9, Ireland

Markus Helfert
Dublin City University
Glasnevin, Dublin 9, Ireland

ABSTRACT
Current challenges in design science research aim for consisting and detailed phases to
guide design science researchers to manage projects in the information systems field. By
having taken this challenge, we present a reference model, which serves as the foundation
to structure information in construction of business process model artefacts in design
science research. It contains activities responsible for literature review, collaboration with
practitioners, and information-modelling. In this paper we demonstrate the collaboration
with practitioners facet of the model to answer a question of how to construct a business
process model artefact with practitioners from the field. The contribution of the paper is
that application of the collaboration with practitioners activities in the context of design
science supports the quality of design science artefacts, and provides design science
researchers with choices of techniques
Keywords
Design Science, Collaboration, Business Processes.
1. INTRODUCTION
Design Science (DS) research methodology has received increased attention
in computing and information systems (IS) research [1]. It has become an
accepted approach for research in the IS discipline, with dramatic growth in
related literature [2]. However, its current stage does not offer consisting
and comprehending phases, which will guide researchers in their choice of
techniques [3]. Thus, in this paper we refer to the reference model [4] (aka
the process oriented reference model) which aims for techniques of meta-
design artefacts. We discuss and present its modelling step in the context of
business process model artefacts.
This paper is organized as follows. The next section reviews the design
science research literature and proposes its challenges and potential ways of
further development. Based on that review, the subsequent sections present

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 1


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
the reference model that covers phases for meta-design step in DS. Then, we
elaborate in depth and demonstrate one of its phases collaboration with
practitioners activities, in the context of process oriented artefacts. Next, we
evaluate the activities by means of the Satisfaction Attainment Theory
(SAT) [5] and the elaborated solutions. This paper helps define future
directions and phases of design science methodology within the full
spectrum of information systems research approaches.
2. DESIGN SCIENCE
Design science focuses on creations of artificial solutions. It addresses
research through the building and evaluation of artefacts designed to meet
identified business needs [6]. Understanding the nature and causes of these
needs can be a great help in designing solutions [7]. Literature reflects
healthy discussion around the balance of rigor and relevance [8] in DS
research, which reflects it as a still shaping field [9].
Views and recommendations on the DS methodology vary among papers,
e.g. [10,11]. DS methodological guidelines from the precursors Hevner [8]
and Walls [12], are seldom applied, suggesting that existing methodology
is insufficiently clear, or inadequately operationalized - still too high level of
abstraction [11]. Descriptions of activities (procedures, tools, techniques)
that are needed to follow the methodology are only briefly indicated. By
having taken up the challenge, 3 main activities were identified as crucial in
the development of DS artefacts [13]. These are: literature review,
collaboration with practitioners, and relevant modelling techniques [14].
The reference model [4] examines these activities in terms of development
of meta-design artefacts [15]. For a better overview, where it fits in design
science methodology, we first introduce our understanding of the current
state of the art of DS and its artefacts.
Researchers understand artefacts as things, i.e. entities that have some
separate existence [16]. They can be in form of a construct, model, method,
and an instantiation [8]. In construction of the artefact, researchers observed
two activity layers [17]: 1) design practice that produces situational design
knowledge and concrete artefacts and 2) meta-design that produces abstract
design knowledge. Meta-design can be viewed as 2a) a preparatory
activity before situational design is started and 2b) a continual activity
partially integrated with the design practice 2c) a concluding theoretical
activity summarizing, evaluating and abstracting results directed for target
groups outside the studied design and use practices [17]. The meta-design
step concentrates on providing an optimal solution for the domain by trying
to cover the whole spectrum. The design practice refers to it, then, by
adjusting and applying it to a concrete business scenario (i.e. an
instantiation).

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 2


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
As abovementioned, abstract and situational design knowledge can be
treated as two individual outcomes of design science. Thus, it seems
reasonable to consider two different evaluation methods for each of them;
these are artificial and naturalistic [18]. Meta-design step plays crucial role
in constructing the knowledge base for a final instantiation and its utility.
Figure 1 illustrates its place in design science research, and the general
relationship among IS artefacts [19]. The aim of the reference model was to
detail activities [13] that are carried out in that step and then use to guide the
design science researchers through it. The three 3 main activities of the
reference model were produced by comparing multiple plausible models of
reality, which were essential for developing reliable scientific knowledge
[20]

Figure 1 The Reference model in the Design Science Research Methodology - adapted
and updated from [11]
Next sections introduce the reference model, and how all activities
cooperate to achieve a desired solution. Then they elaborate and
demonstrate the collaboration with practitioners activities.
3. THE REFERENCE MODEL
The idea behind the reference model was to deliver the knowledge base,
which combines information from two processes: literature review and
collaboration with practitioners. Their main roles are to 1) gather
information related to the investigated domain of interest, and 2) represent
the information in an understandable way to the stakeholders. Before
analysis and combination of solutions from these sources take place, each
process provides its own solution. Thus, to make the analysis and
combination part more effective, the same modelling techniques in both
processes are introduced. These are the ontology engineering and domain
specific modelling language. The former gives researchers the design
rationale of a knowledge base, kernel conceptualization of the world of
interest, semantic constraints of concepts together with sophisticated
theories [21]. In the context of process oriented IS solutions, the latter
introduces business process modelling notation (BPMN). For example, if a

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 3


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
researcher investigates a process of an employee engagement, the ontology
engineering technique will represent the gathered knowledge retrieved from
those two sources. Then, the BPMN will model it into the desired shape of a
process. Figure 2 illustrates the overview of the reference model.

Define
Research
Scope

Select studies Modelling


for detailed
evaluation

Select
Literature Review Construct
activities from
Ontology
studies

Define the Synthesise


Collaboration outcomes
Construct
a Process

Profile Focus Group Update Document the


Practitioners Collaboration Research Process
Scope

Collaboration with Practitioners

Figure 2 The Reference Model Overview [4]


Now, we will introduce the collaboration withpractitioners activities of the
reference model. We concentrate on the case where the artefact investigated
is a business process model. While we acknowledge this iterative nature of
the activities involved, we discuss and present the model as a linear
sequence of steps to keep the description straightforward.
4. COLLABORATION
Practitioners best practices and expertise constitute the second source of
information for the business process model artefacts in the reference model.
This part of the reference model focuses on working along with practitioners
to discover and come up with an agreement on a general process activities
emerging from various experiences. In line with the findings for activities of
meta-design phase, the main goal of the literature review process is to
provide information for the artefact coming from literature review, whereas
collaboration with practitioners is to provide information coming from
industry. Also similarly to the literature review process, the collaboration
with practitioners is represented by BPMN. Researchers may use knowledge
gathered from literature to prepare for the collaboration, however, it has
been found that not disclosing the process based on literature to practitioners
at early stages keeps the collaboration open minded. The key is to
concentrate on the best practices without the interference from other
sources.
To build systematic development of transferable, reusable and predictable
collaboration with practitioners, literature review outlined a collaboration
engineering approach[22]. It focuses on designing purposeful interaction

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 4


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
within the context of a sequence of phases that helps a group to achieve its
goal. Collaboration engineering can be viewed as facilitation, design, and a
training approach that aims to create collaboration processes supported by
tools such as group support decision systems (GDSS) [23]. This approach
was revised and modified to the level presented in the Figure 3 and
demonstrated in the following case study.

Figure 3 Process of Collaboration with Practitioners


5. CASE STUDY
The following demonstration of a case study describes the application of the
collaboration with practitioners process of the reference model for business
process model artefacts in design science research. In the period of March
2012 until November 2012, a business process model artefact was
constructed that guides senior managements through an innovation process
and indicates the points where the value of on-going innovation project can
be measured. During the course of the design science research, the process
oriented reference model artefact was applied.
The following first introduces the research motivation, problem and briefly
findings of the literature review. Then, the course of collaboration with
practitioners is described in detail.
Problem identification for this research started during industrial meetings of
senior managers. They were facing the challenge of measuring innovation
which has to be measured like everything that businesses do which involves
the investment of capital and time. However measuring innovation presents
problems for the process itself that is to be measured. It was also stated that
the risk which the innovation process requires if it is attempted to measure
the wrong things at the wrong time. These senior managers coming from
various enterprises decided to work together in order to design the desired
business process model for measuring innovation. In order to achieve that,
they followed design science research and struggled with its execution. This
was a good opportunity to show application of the reference model, how it
facilitates collaboration with practitioners from different industries and
provides the business process model desired.
Following the model, the collaboration scope was narrowed down. The
analysis of the process model topic, the involved participants, and resources
were conducted. The task analysis was formulated as a business process
model capable of measuring the value of innovation realized by a firm. The
deliverables was to represent the process in BPMN. Overall, seven

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 5


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
participants from five companies were involved in the focus group
collaboration. They participation was voluntarily and motivated by the
opportunity to share experience and best practices between parties involved.
Finally, the resource analysis concerned the available time. Each company
dedicated 90 minutes slot for individual interviews on their site, and 5 hours
for a group meeting. One of the company provided software to facilitate
online meetings. In addition, mind map software was used to make notes
and visualize insights provided by participants. The participants roles in
their organisations were linked close either to facilitation of innovation
projects or execution.
The focus group collaboration followed the activities listed in Table 1. In
the step 0, questions for individual interviews were prepared. The questions
were split into two sections. First section was to understand and determine
participants connection to the innovation process and its measurements.
Thus, the questions were formed around their organizational units, daily
activities, main responsibilities, and personal understanding of the
innovation process. The second section referred to questions that could
allowed for further elaboration on participants expertise regarding the
desired process. For example, the questions of the second section regarded a
formal measurement methodology in place of a particular organizational
unit, people involved in innovation value measurement, milestones and
activities of measurement, as well as metrics used. These rather general
questions were later decomposed into more detail sub-questions as the
interview progressed.
Table 1 Activities Decomposition
Activity of Collaboration
Step 0. Questions preparation
A1 Analyse findings from the literature review, participants profiles, and the
scope.
Step 1.Getting individual participants perspective
B1 Individual contextual interviews to understand participants expertise
B2 Individual domain interviews to gain process relevant activities from the
participants context
B3 Transcript of the interview to summarize and authorize the information
Step 2. Initial analysis
C1 Group activities from domain interviews
Step 3. Focus group meetings
D1 Getting the participants to know each other
D2 Presenting findings from the interviews

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 6


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
D3 Grouping similar activities by participants
D4 Revision of all activities by participants
D4 Consolidation of the Process
Step 4. Conclusion
E1 Summary of the focus group achievements in relation to the scope of the
collaboration.

In the step 1, the interviews with each participant of the focus group were
conducted. This phase was divided into two activities (B1-B2). First,
questions from the first sections were asked to understand and get to know a
participants expertise and perspective to the process. Hence, the researcher
followed laddering interview method and only the first section of questions
was asked. Answers were put and visualized on a mind map. There was 40
minutes allocated for this part. At many occasions participants had prepared
presentations prior to the interview and additional time was needed. These
presentation provided overview of the organisation and the context of
innovation they were into. The last 50 minutes of the interview was
dedicated to the business process investigated. As the interview was
progressing, a sketch of the process was being updated and displayed on the
mind map software in order to allow the participants to track correct
interpretations of their saying. For the B2 step, semi-structured interviews
were chosen. In addition a transcript of each interview was sent for an
authorization with a request for clarification of ambiguities that were
discovered after the interview took place.
In the step 2, all transcripts of the interviews were summarized and
distributed to all participants prior to the focus group meetings. One of the
goals was to provide all participants with the same amount of knowledge, so
that at the focus group meetings more insights could be delivered. The key
finding at this stage of the research was a clear distinguish between
measuring innovation as facilitator and technical IT. Along with the
summary of transcripts, an overview of the agenda for the focus group
meetings was provided.
The following step 3 describes activities of the focus group meetings. An ice
breaker and focus group work methods were applied. Since, some
participants could not attend the meeting in person; the meetings were
carried out through an online collaboration tool. All participants in the room
had a logged in PC to the tool and all questions and summary of answers
were put through that tool. The online tool generated reports of all typed in
words so that enhanced the analysis of the meeting at later stage. The
meeting began with an introduction of the meeting agenda followed by
allocation of 5 minutes for each participant to introduce their organisation,

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 7


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
roles, and relation to the innovation process. This was a result of a simple
ice breaker method to catch up with each other. The participants knew each
other from the time the focus group was established. The rest of the focus
group meeting was structured accordingly to the focus group work method
[24]. Each participant was provided with the process of measuring
innovation derived from their interviews. Then, each participant presented
and described the process model to the rest of the group so that everyone got
an overview of possible perspectives to measure value of innovation
projects. Anyone was allowed to ask questions to the presenter after each
presentation. In addition, after each presentation, there was 5 minutes
brainstorming, so that some additional insights could be added to the model,
e.g. metrics, activities. Once all the business process models were presented
a poll was introduced. The most comprehensive process model was selected
as a core to which additional activities from other process models were
added. The following activity required from participants to work together to
build the business process model of measuring innovation value based on
the most voted process model and the other ones presented. The most voted
business process model was displayed and participants could make
suggestions what else should be added. If majority of participants did not
raise any objections the suggestion was added. The mind map software was
used to move activities of the process for the final consensus. The focus
group meeting ended roughly after 5 hours including 30 minutes break. For
the step 4, a short 40 minutes conclusion meeting was organized at which
the business process model for measuring innovation value was presented.
6. EVALUATION OF THE COLLABORATION
The collaboration with practitioners activities were evaluated from three
different perspectives: perceived net goal attainment, satisfaction with the
meeting outcome as well as satisfaction with the meeting process. These
three perspectives constitute the Satisfaction Attainment Theory which was
used with participants who conducted these activities and were asked to
elaborate on the business process model artefacts modelled. Participants of
these activities were stakeholders of a public organisation. The organisation
provided IT services for various departments. The practitioners in the
numbers of 9 were between 23-40 years of age (M 33, SD 2.5). The gender
was split in 5 males, and 2 females. Their work experience in the
organisation was between 3 to 9 years (M 5, SD 1.3). Their roles were
mainly business analysts from fields of information systems and computer
science. Participants took part in these activities willingly, and therefore, it
was assumed their responses to the questionnaire were genuine.
Table 2 summarizes the results of the evaluation of the meeting satisfaction.
We used 11-point Likert questions (11=best), relating to each of the
elements of the Satisfaction Attainment Theory

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 8


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
Table 2 Evaluation of the collaboration with practitioners activities
Dimension Mean n
PerceivedNetGoalAttainment(PGA) 8.7 9
Satisfactionwith the Meeting Process(SP) 9.5 9
Satisfactionwith the Meeting 10.1 9
Outcome(SO)
The values for the means indicate a high satisfaction of the participants with
each of the three dimensions from the Satisfaction Attainment Theory. Each
element was measured by five questions in the questionnaire. All fifteen
questions can be found in the appendix A of [5].
Feedback received upon and observations made during this case study
enabled a further refinement of the reference model. Participants suggested
that the transcripts of the interviews should be in a narrative form and
divided into two documents. First document summarizes individual
interviews and is sent to relevant interviewees for approval. The second one
sums up the approved content and is distributed among the others
participant who will attend the focus group collaboration meetings. In terms
of the agenda planning, it was observed that the approximate time from the
interview taking place to the approval took around 4 elapsed weeks. Hence,
this has to be taken into account when drawing up schedules. It was
challenging to keep the meetings of the focus group in the time constraints.
Participants, from time to time happened to choose a topic for a discussion
which was not strictly related to the scope of the meeting. These situations
were handled diplomatically and the researcher role was to keep the time
allotted in mind at all time. Finally, almost all participants had some slides
already prepare prior to the interviews. Thus, the extra time for such
unexpected circumstances has to be included in the agenda of the reference
model.
The business process artefacts built with the collaboration with practitioners
activities of the reference model scored explicitly as well as the process of
execution the activities. This concludes the usage of the model for the main
purpose, which was to provide researchers with a structure way to help
conduct and communicate the research outcome with the stakeholders. We
claim that the collaboration activities of the reference model constitute a
consistent method for the meta-design phase in design science research
methodology to guide the design science researchers to manage information
systems projects.

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 9


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
7. CONCLUSIONS
We observed challenges in structuring and standardizing phases of design
science research methodology, which would guide the design science
researchers in their choices of techniques that might be appropriate at each
stage of the project and also help them plan, manage, control and evaluate
information systems projects. We introduced how to construct a business
process model with collaboration with practitioners from the field. The
activities outlined were a part of a reference model that helps structure and
model knowledge in design science research. Our future work involves
revising the model, based on users feedback, and concentrating on
evaluation techniques of its outcome. Hopefully, this will increase the
efficiency and quality of artefacts, while containing or further decreasing the
cognitive effort involved.

8. ACKNOWLEDGMENTS
This research would not have been possible without the help of Irish
Research Council (IRC) under the Enterprise Partnership Scheme, who
gladly provided us with the resources so that we could complete our
research.

REFERENCES
1. Kuechler, B., Vaishnavi, V.: On Theory Development in Design Science Research:
Anatomy of a Research Project. Europeam Journal of Information Systems 17(5), 489-
504 (2008)
2. Carlsson, S. A., Henningsson, S., Hrastinski, S., Keller, C.: Socio-technical IS design
science research: developing design theory for IS integration management. Information
Systems and E-Business Management 9(1), 109-131 (2011)
3. Alturki, A., Gable, G. G., Bandara, W.: A Design Science Research Roadmap. In Jain,
H., Sinha, A. P., Vitharana, P., eds. : DESRIST 2011, Heidelberg, vol. LNCS 6629,
pp.107-123 (2011)
4. Ostrowski, L., Helfert, M.: Reference Model in Design Science Research to Gather and
Model Information. In : 18th Americas Conference on Information Systems, Seattle
(2012)
5. Briggs, R. O., Reinig, B. A., de Vreede, G.-J.: Meeting satisfaction for tech-supported
groups: an empirical validation of a goal-attainment model. Small Group Research 36,
585-611 (2006)
6. Hevner, A. R., March, S. T., Park, J., Ram, S.: Design Science in Information Systems
Research. MIS Quarterly 28, 75-106 (2004)
7. Van Aken, J. E.: Management Research as a Design Science: Articulating the Research
Products of Mode 2 Knowledge Production in Management. British Journal of
Management 16(1), 19-36 (2005)
8. Hevner, A. R., March, S. T., Park, J., Ram, S.: Design Science in Information Systems
Research. MIS Quarterly 28, 75-106 (2004)
9. Iivari, J., Venable, J.: Action research and design science researchseemingly similar

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 10


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
but decisively dissimilar. In : 17th European Conference on Information Systems
(2009)
10. Baskerville, R., Pries-Heje, J., Venable, J.: Soft Design Science Methodology. In :
DESRIST 2009, Malvern (2009)
11. Peffers, K., Tuunanen, T., Rothenberger, M.: A Design Science Research Methodology.
Journal of Management Information Systems 24(3), 45-77 (2007)
12. Walls, J., Widmeyer, G., El Sawy, O.: Building an Information System Design Theory
for Vigilant EIS. Information Systems Research 3(1), 36-59 (1992)
13. Ostrowski, L., Helfert, M., Xie, S.: A Conceptual Framework to Construct an Artefact
for Meta-Abstract Design. In Sprague, R., ed. : 45th Hawaii International Conference
on Systems Sciences, Maui, pp.4074-4081 (2012)
14. Ostrowski, L., Helfert, M., Hossain, F.: A Conceptual Framework for Design Science
Research. In Gabris, J., Kirikova, M., eds. : Business Informatics Research, LNBIP90,
Riga, pp.345-354 (2011)
15. Walls, J., Widmeyer, G., El Sawy, O.: Building an Information System Design Theory
for Vigilant EIS. Information Systems Research 3(1), 36-59 (1992)
16. Goldkuhl, G.: Design Theories in Information Systems A Need for Multi-Grounding.
Journal of Information Technology and Application 6(2), 59-72 (2004)
17. Goldkuhl, G., Lind, M.: A Multi-Grounded Design Research Process. In Winter, R.,
Shao, L., Aier, S., eds. : Global perspectives on design science research DESRIST
2010, Berlin, vol. 6105, pp.45-60 (2010)
18. Pries-Heje, J., Baskerville, R., Venable, J.: Strategies for Design Science Research
Evaluation. In : 16th European Conference on Information Systems, pp.255-266 (2008)
19. Gregor, S., Jones, D.: The Anatomy of a Design Theory. Journal of Assoc. Information
Systems 8, 312-335 (2007)
20. Azevedo, J.: Mapping Reality: An Evolutionary Realist Methodology for the Natural
and Social Sciences., Albany (1997)
21. Mizoguchi, R.: Tutorial on Ontological Engineering. New Generation Computing
21(4), 363-384 (2003)
22. Kolfschoten, G. L., de Vreede, G.-J.: A Design Approach for Collaboration Process: A
Multimethod Design Science Study in Collaboration Engineering. Journal of
Management Information Systems 26, 225-256 (2009)
23. Dennis, A. R., George, J. F., Jessup, L. M., Nunamaker Jr., J. F., Vogel, D. R.:
Information Technology to Support Electronic Meetings. MIS Quarterly 12(4), 591-624
(1988)
24. Yin, R.: Case study research : design and methods. Thousand Oaks: Sage Publications,
California (2009)

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 11


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Personalizing Education News Articles


Using Interest Term and Category Based
Recommender Approaches
S. Akhilan
Research Scholar, Department of Computer Science and Engineering
National Institute of Technology, Tiruchirappalli 620 015.
Tamil Nadu, INDIA.

S. R. Balasundaram
Associate Professor, Department of Computer Applications
National Institute of Technology, Tiruchirappalli 620 015.
Tamil Nadu, INDIA.

Abstract

With the growth of internet technologies, numerous ways and mechanisms


are being developed so that any information to any person about any entity
can be delivered irrespective of time and place. Knowing about the
happenings around the world is the primary interest of almost every
individual. This is achieved with the help of various news service providers
such as Yahoo, Google, Your News etc. While delivering news to its users
most of the news service providers do not take into the account the users
choice or interests. Providing all news to all will not be an appropriate one
when there exists different class of viewers. This major problem can be
solved by adopting personalization and it is the key factor which aims at
providing the appropriate data for the related person. Personalization in the
context of education has resulted in lot of benefits to learners. When
recommending news, including educational related, most of the traditional
approaches are based on TF-IDF, i.e., a term-based weighting method which
is mostly used in information retrieval and text mining. However, many new
technologies have been made available since the introduction of TF-IDF.
This paper proposes certain new methods for recommending educational
news items based on Category Term Based(CTF-IDF) and Weighted
Category Term Based (WCTF-IDF) approaches. CTF-IDF is built and also
tested in Athena, a recommender extension to the Hermes Genesis News
Platform (HGNP). Experiments show that compared to term based, our
approaches such as category term and weighted category term based
approaches perform better. Also, Athena based recommender provides
better results.

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 1


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Keywords: Term based classification, user profile, educational news items,


personalization, category based classification.

1. INTRODUCTION

News is an entity through which an individual can come to know what had
happened as well as what is happening around him/her. With the enhanced
technologies of World Wide Web, the methods adopted to read news
content have changed dramatically from the traditional model of news
reading through physical news paper to access millions of web sources via
internet. News service providers such as Google News, Yahoo News, etc
collect news from various sources and provide an aggregate view of news to
the users around the world. The available news documents make the user to
feel that they are overloaded with lot of news contents. To overcome this
issue, it is a challenging task to find out the users choice of interests in
reading the news articles. In response to this challenge, information filtering
is a technology that helps the user to retrieve what they need. Based on a
profile of user interests and preferences, systems recommend items that may
be of interest to the user [1]. Especially, educational news correspond to
various items such as knowing about education articles,
universities/institutions, courses, reading materials, events etc.
In the present web scenario, recommendation systems play a vital role in
delivering the required news to the required users[2]. Content based
recommendation is one of the often used recommendation methods. Several
content based recommenders for news personalization deploy TF-IDF and
cosine similarity measure. Many times a keyword (term) may be useful to
extract more number of documents. Combined with vector space model this
approach may recommend more news items to any user. In order to obtain
news documents pertaining to the related terms of a keyword, an
enhancement to TF-IDF is suggested in this paper. We refer to CTF-IDF
and WCTF-IDF as classification method that combines the key concepts of
term based traditional based classification.
When employing user profiles that describe users' interest based on the
previously browsed items or profile, these can be translated into vectors
ofTF-IDF weights [3]. Combining the related terms for certain term can be
grouped based on concepts or categories. User profiles are used to extract
the required terms from the data source based on interest terms. One of the
strategies to obtain user terms is through browsing pattern [4]. In this
approach user may search or click only specific terms related to his/her
areas of interest. For example a user interested in knowing about
conferences may browse the keyword conference. The web portal may

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 2


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
provide results based on conference or few related terms thereby improving
the results of the user. The proposed method is tested under Athena, an
extension of Hermes Framework.
The structure of the paper is as follows. Related work is discussed in
Section 2 followed by methodology in Section 3 and 4. In Section 5,
proposed classification method is discussed. Athena framework which is the
implementation of Hermes News Portal is discussed in Section 6. In section
7 results of the proposed method is discussed followed by conclusion in
Section 8.

2. LITERATURE REVIEW

2.1 News Personalization

Information filtering plays a critical role in recommender systems and


thereby in news personalization. It prevents recommending information that
are not been rated and accommodates the individual differences between
users [5]. Apart from news domain, information filtering is applied to
various fields such as email, e-commerce, etc [6]. In the domain of news,
this technology particularly aims at aggregating news articles according to
user interests and creating a personalized newspaper for each user.
Recently, personalized news recommendation has become a desirable
feature for websites to improve user satisfaction by tailoring content
presentation to suit individual users needs [7].

2.1 Classification Methods

Personalization involves a process of gathering and storing user attributes,


managing content assets, and, based on an analysis of current and past
users behaviour, delivering the individually best content to the present user
being served. Personalization can be defined as the use of technology and
user information to tailor the web news documents as per the requirements
of an individual who wants to access the news articles from different news
web sites providers. Each news provider delivers numerous dynamic news
updates collected from various sources. While getting news from service
providers or news portals, they are delivered with news contents which are
un related to the users. To overcome this Buckley et al. (2009) has proposed
different aspects of personalization in their system such that users are
delivered with required news contents. Cleverdon et al. (2009) proposed that
by creating personalized sites where a user can add his/her own interests can
view the most recent and popular news. In personalized news classification,
users can define their personalized categories using few keywords [8, 9]. As
per Mills et al. (2009) personalization includes the attempts that have been

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 3


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
made by the major search engines and portals considers only the issue of
viewing already categorized content according to the users interests.
Classification is a methodology to classify documents of varied domains
based on a particular interest of a user. There are numerous classification
methods as shown in figure 1.

Classification

Term Based Content Based

Weighting Methods
Traditional Content Semantics Based
Based classification classification

Weighting Probabilistic
Scheme Weighting

IDF TF
weighting Weighting

Figure 1: Various Classification Methods

2.2 Term Based Classification

With the advent of web technology it is possible to retrieve any information


irrespective of time and place. The content of Web is vast and it is dynamic
in nature. This causes the users to feel discomfort in using the Web
documents. For a user query, search engines deliver many documents. The
documents that are all delivered may not be useful to all users. For this
reason, term based classifications methods are adopted to reduce the number
of contents delivered to a user such that the user receives the required
documents only. In this regard, TF-IDF is the most prominent one used for
classifying the terms within documents.

TF-IDF, term frequency inverse document frequency, is a numerical


statistic which reflects how important a word is to a document in a
collection or corpus. It is often used as a weighting factor in information
retrieval and text mining [6, 19, 20]. TF-IDF value increases proportionally

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 4


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
to the number of times a word appears in the document. But, it is offset by
the frequency of the word in the corpus, which helps to control the fact that
some words are generally more common than others. TF IDF is the
product of two statistics, term frequency and inverse document frequency.
Various ways for determining the exact values of both statistics exist. In the
case of the term frequency tf(t,d), the simplest choice is to use the raw
frequency of a term in a document, i.e. the number of times that term t
occurs in document d. If we denote the raw frequency of t by f(t,d), then the
simple tf scheme is tf(t,d) = f(t,d).Stop words are filtered from the
documents before calculating the TF-IDF values. The remaining words are
stemmed by a stemmer. Finally the term frequency is calculated which
indicates the importance of a term within a document. In figure 2 term
based classification approach is shown.

Keywords

Stop word Removal

Stemmer

Term Frequency

Figure 2: Term based classification

As per schetwz et al.(2008) there exists different types of classifications


approaches and each method is distinct from one another. Term frequency
and document frequency (tf-idf) is one of the common approach used for
documents classifications. Allen et al. (2009) has used the concepts of
decision tree for classifying the web page contents. When decision tree is
used for classification the results is interpreted as logical relation for
viewers understandings. The major drawback of this method is that there is

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 5


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
no enhancement possibility to develop the results of the classification. But,
when the documents are classified to a specific category a solution for this
existing issue can be addressed. Yan lee et al. (2010) proposed that a user
can manually subscribe to a subset of a large number of pre-defined text
(news) documents. The set of pre-defined categories is usually static and it
corresponds to the categories assigned to the news providing pages when
they are first created. In other words, the subscription-based personalization
approach is rather straightforward and does not require much classification
efforts. Most of the web sites achieve news personalization by adopting the
subscription approach, e.g. Newscan-online.

2.3 Recommender Systems

A good number of news recommender systems are available that act based
on content and semantics. News Dude [9] is a recommender system that
combines both TF-IDF and Nearest Neighbor algorithm. The system
considers entire text for recommendation process in case of both short term
interests and long term interests. In case of Daily Learner [10, 11, 12] users
specify their items of interest. The vector representation of news article is
processed with TF-IDF. Using cosine similarity the article is matched with
user profile and Nearest Neighbor algorithm is used to analyze the most
recently rated news for short term interests [13, 14, 15, 16]. Nave Bayes
Classifier is modeled for long term based interests. Personalized
Recommender System (PRES) is based on content based filtering,
combining TF-IDF and cosine similarity. User interests are updated
whenever the user browses a new item [17, 18, 21, 22].
3 EDUCATIONAL PORTALS
There are numerous education portals available. Each of such portals
intends to provide education related news articles. www.openequalfree.org,
www.citylimits.org, www.self.org, www.ngopost.org, www.reapchild.org
etc. are few most prominent education portals. Based on these web portals
we have considered a corpus of 4876 web documents. All these documents
are pertaining to various categories. The categories we have considered are
academic events (workshop, conference, seminar), job fair (online test,
interview, evaluation), reading material (journal, video, audio) and
admissions (institutions, courses, specialization)

4USER PROFILE CONSTRUCTION

Constructing the profile plays a key role in identifying the interest of a user
so that documents can be recommended more accurately. There are two
methods of user profile construction namely Explicit and Implicit. In the
explicit method, the user is asked to select the interest keywords of his/her

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 6


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
choice from a list of keywords provided. This enables the system to
recommend news based on the interest terms. In the implicit method, the
browsing pattern is used for extracting user interests.

4.1 Explicit Method


User profile based on explicit method considers users and their interests
terms asked explicitly. Table 1 illustrates the details of sample sets of users
with their interests.

Table 1: User profile creation based on explicit method


User Number Interest Terms
U1 I1, I3, I5
U2 I2, I4
U3 I3, I5, I1
U4 I2, I4
U5 I7, I10
U6 I8, I9, I11
U7 I12

In table 1, I refers to the interest terms where I1-workshop; I2-


conference; I3- seminar; I4- online test; I5- interview; I6- evaluation; I7-
journal; I8- audio; I9- video; I10- institutions; I11- courses; I12-
specialization. Figure 3 illustrates the recommendation process based on
various approaches.

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 7


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

User Profile

Web page
TF-IDF 1
Documents
CTF-IDF Web page
2
WCTF-IDF
Web page
3

Web page
4

Personalized web pages


Figure 3. Recommendation by various approaches

4.2 Implicit Method

Based on the browsing pattern the interest of a user is identified. In implicit


method of user profile identification, for every item of interest is the
corresponding category is taken into consideration.
Table 2: User profile construction by implicit method

User Number Interest terms

U1 C1i1, C2i3

U2 C1i3, C2i1, C2i2, C3i2

U3 C1i1, C2i1, C3i2

U4 C1i2, C3i6,C4i10

U5 C1i4, C2i12

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 8


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Table 2 illustrates user profile based interests with their categories. The
interest terms are generated based on users long term and short term
interests.
Table 3: Categories and their interest terms

Category Interest terms

C1 i1-workshop;
(Academic events) i2-conference; i3-seminar

C2 i1-online test;
(Job fair) i2-interview; i3-evaluation

C3 i1-journal;
(Reading material) i2- audio; i3-video

C4 i1-nstitutions; i2-courses;
i3-specialization

Table 3 illustrates the categories and possible interest terms in the category.
In Implicit method for User 1, documents belonging to his/her interest terms
are delivered first followed by the other documents belonging to the rest of
the interest terms. Likewise for all users the documents are delivered.
5. PROPOSED CLASSIFICATION METHODS

5.1Classification method based on Category Terms (CTF-IDF)

The CTF-IDF recommender primarily uses a vector for each item, and
calculates weights for each category terms, instead of going through all the
terms. Then, it stores the calculated weights (together with the
corresponding terms) of a news item in a vector.

The user profile is also a vector of CF-IDF weights, which can be compared
with a news item vector by using cosine similarity. Weights of CTF-IDF are
computed as shown below. First we calculate the Category Frequency, cfi; j
, which is the occurrence of a category ci in document dj , ni;j , divided by
the total number of occurrences of all category in the document.
,
, = ,
3

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 9


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
Subsequently, we calculate the Inverse Document Frequency.We take the
total number of documents, jDj, and divide it bythe number of documents in
which the category ci appears,then taking the logarithm of this division, i.e.,


= 4
:

Finally, cf is multiplied with idf, forming the weight forcategory ci of


document dj.

Therefore,
, = , 5

This change causes the recommender to deal with categoryterms, making it


more effective. Another advantageof this method is that the CTF-IDF
recommender can processthe news items much faster than the TF-IDF
recommender.

5.2 Classification Method based on Weighted Category Terms (WCTF-


IDF)

By considering the time spent and frequency of viewing the terms a


weighted category term based user profile construction is done. This method
refers to a technique that gives a high rank if the interest term appears
frequently in a document. In the proposed WCTF-IDF we have considered
the distribution of feature terms over various news documents related to
education news category. This approach also considers the interest terms
that occurs frequently in the documents. For example the interest list (C1I1,
C2I1, C1I3) of user 2 means that C1I1 has a high weightage based on time
spent on interest term 2 (I2) of category 1 (C1). Subsequently, the other
users have equal or lower weights.

The WCTF-IDF formulas is given as

, = , , 1

2
= 2

Using equation 2, the classification of a word or term is carried out using the
proposed WCTF-IDF approach. The first term namely log [a] is used to
calculate a term by identifying the occurrences of that particular word in a
document. In the first term the denominator value is low if the term appears
more number of times in the document considered for classification.

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 10


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

6. ATHENA

Athena is the extension of Hermes framework which generates


recommendations. In order to make effective recommendation Athena
framework monitors the behavior pattern of the users. Athena uses several
recommender systems especially traditional term based recommender
systems in order to compare news item with the profile created. News items
are recommended to the users when there exists higher similarity measures.

6.1 Hermes Genesis News Portal Platform (HGNPP)


Hermes framework is an extension to Athena. In order to deliver
personalized news documents, Hermes framework is used. In order to
retrieve news items from the data source, sematic based approach is
followed for making recommendations. A category term is assigned to each
news documents so that each news items are considered as input. This input
news items are processed internally in Hermes Genesis News Portal
Platform (HGNPP) as shown in figure 4 for making personalized news
service based on the selection of concepts by the user.

Figure 4: News Articles in HGNPP

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 11


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
6.2 Implementation in Athena

Extension to Hermes framework is Athena, which is used as plug in for


Hermes News Portal. In Athena framework there are three tabs which is
used for user interface. The tabs are used to browse the news items,
recommend the news documents and finally to evaluate the news items that
are recommended to the users. The browser in the Athena is used to browse
the news items by the user. After reading the news items, user can specify
which news items the user is interested. For recommendation the user can
click the recommendation tab in Athena. A user is allowed to select only
one recommendation tab in Athena. After completion of user activity the
Athena analyses the browsing pattern of the user. In Athena,
recommendation is also performed based on concept terms. All categories
are listed in a category list which is created by each user. Based on the
category list, recommendation of news items is done in comparison with the
user profile construction.

7. RESULTSANDDISCUSSIONS

Performance comparison is done to identify whether recommendation based


on traditional term based approach or recommendations based on our
approaches yield better results.Table 4 shows the test results of TF-IDF,
CTF-IDF and WCTF-IDF. The averages indicate that CF-IDF seems to
perform better than other recommenders on various performance measures.
Table 5 shows the test results between TF-IDF and Athena. The dierence
between CF-IDF and WCTF-IDF regarding recall and the precision is
exceptionally large. CF-IDF has good recall value which means that it
classifies news items better than other recommenders. Based on these results
we have concluded that the recommender system based on category terms
performs better precision, recall and accuracy. Performance comparisons of
the recommenders are shown in figure 5 and 6.

Table 4. Test Results for TF-IDF, CTF-IDF and WCTF-IDF recommenders

Performance Traditional Category Terms Weighted Category


Measure Method (CTF-IDF) Term (WCTF-IDF)
(TF-IDF) Method Method
Precision 0.45% 0.92 % 0.79%

Recall 0.19% 0.67 % 0.24%

Accuracy 0.99% 1.87% 1.09%

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 12


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
Table 5. Test Results for TF-IDF and Athena

Performance Traditional Athena


Measure Method Recommendation
(TF-IDF)
Precision 0.79% 0.67 %

Recall 0.67% 0.56 %

Accuracy 1.09% 0.87%

Performance Measure of TF-IDF, CTF-IDF and WCTF-IDF


0.02
0.018
Education News Articles

0.016
0.014
0.012 Traditional Method
0.01
0.008 Category Terms (CTF-IDF)
0.006 Method
0.004 Weighted Category Term
0.002 (WCTF-IDF) Method
0
Precision Recall Accuracy
Recommendation Approaches

Figure 5. Performance comparisons of TF-IDF, CTF-IDF and WCTF-IDF


Recommenders

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 13


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Performance of TF-IDF and Athena Recommendation


0.012

0.01
Eucation News Articles

Traditional Method (TF-


0.008 IDF)
Athena Recommendation
0.006
Athena Recommendation
0.004

0.002

0
Precision Recall Accuracy
Recommendation Approaches

Figure 6. Performance comparisons of TF-IDF and Athena Recommendation

8. CONCLUSIONS

This paper focuses on the improvement of TF-IDF recommendation


approach by using the interest terms, categories and weighted category. By
employing new classification methods, a better recommendation in terms of
recall, precision and accuracy is achieved in CTF-IDF approach.
Experimental results show, the CTF-IDF recommender outperforms the
WTF-IDF approach and other recommenders on several measures. The
CTF-IDF recommender scores signicantly higher compared the WTF-IDF
recommender on accuracy, recall, and precision values. Based on the results
we conclude that there are benets of using semantic techniques for a
recommendation system.

REFERENCES

[1] Chen, C. C., Chen, M. C., Sun, Y, PVA: A self-adaptive personal view agent system,
Proceedings of the seventh ACM SIGKDD international conference on Knowledge
discovery and data mining, 2008.

[2] Shawn R. Wolfe and Yi Zhang, Interaction and Personalization of Criteria in


recommender Systems, LNCS 6075, pp. 183194, Springer-Verlag Berlin Heidelberg
(2010).

[3] Jiahui Liu et. al. 2010, Personalized News Recommendation Based on Click
Behaviour, In the proceedings of ACM- IUI10, February 710, 2010, China.

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 14


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
[4] Deng-Yiv Chiu, Chi-Chung Lee and Ya-Chen Pan, A classification approach of news
web pages from multi-media sources at Chinese entry website-Taiwan Yahoo! as an
example, IEEE proceedings of the Fourth International Conference on Innovative
Computing, Information and Control, pp 1156-1159, 2009.

[5] Carreira, R., Crato, J. M., Gon?alves, D., Jorge, J. A., Evaluating adaptive user profiles
for news classification, Proceedings of the 9th international conference on Intelligent user
interfaces, 2004.

[6] Chen, Y-S., Shahabi, C.: Automatically improving the accuracy of user profiles with
genetic algorithm. In: Proceedings of IASTED International Conference on Artificial
Intelligence and Soft Computing, 2001.

[7] Katakis, I., Tsoumakas, G., Banos, E., Bassiliades, N., Vlahavas, I., An adaptive
personalized news dissemination system, In Journal of Intelligent Information Systems,
Volume 32 , Issue 2. 2009.

[8] Chee-Hong Chan et. al. 2010, Automated Online News Classification with
Personalization, In the proceddings of WWW-2009. Italy

[9] Dipa Dixit and JayantGadge, Automatic Recommendation for Online Users Using
Web Usage Mining, In the proceedings of International Journal of Managing Information
Technology (IJMIT) Vol.2, No.3, August 2010.

[10] Bardul M. Sarwar, George Karypis, Joseph A. Konstan, andJohn T. Riedl, Analysis
of recommendation algorithms for ecommerce,in Electronic Commerce, 2000.

[11] R. V. Meteren, M. V. Someren. Using Content-Based Filtering for


Recommendation, MLnet / ECML2000 Workshop, Spain, 2000.

[12] H. Suo, Y. Liu, and S. Cao, A keyword selection method based on lexical chains,
Journal of Chinese Information Processing, 20(6): 2530, 2006.

[13] ToineBogers and Antal van den Bosch, Comparing and Evaluating Information
Retrieval Algorithms for News Recommendation, InACM Conference on Recommender
Systems 2007 (RecSys 2007), pages141144. ACM, (2007).

[14] Linyuan Yan and Chunping Li, A Novel Semantic-based Text Representation Method
for Improving Text Clustering, In3rd Indian International Conference on Artificial
Intelligence (IICAI 2007), pages 17381750, (2007).

[15] Flavius Frasincar, Jethro Borsje, and Leonard Levering, A Semantic Web-Based
Approach for Building Personalized News Services, International Journal of E-Business
Research (IJEBR), 5(3):3553, (2009).

[16] Tsoumakas, G., Katakis, I., Vlahavas, I, Effective and Efficient Multilabel
Classification in Domains with Large Number of Labels, In: Proceedings ECML/PKDD
2008 Workshop on Mining Multidimensional Data (MMD'08), Antwerp, Belgium (2008).

[17] S. Akhilan and S. R. Balasundaram, News Personalization Using Enhanced Term


Document Frequency (ETF-IDF) Classification Method In the proceedings of
International Conference and Workshop on Emerging Trends and Technology (2011).

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 15


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
[18] Noy, N. F., McGuinness, D. L., Ontology Development 101: A Guide to Creating
Your First Ontology, Knowledge Systems, AI Laboratory, Stanford University, No. KSL-
01-05 (2001).

[19] Sure, Y., Angele J., Staab, S, OntoEdit: Guiding Ontology Development by
Methodology and Inferencing, In: Proceedings of the Confederated International
Conferences on the Move to Meaningful Internet Systems CoopIS DOA and ODBASE
2002, Lecture Notes in Computer Science, Vol. 2519. Springer-Verlag, 1205-1222 (2002).

[20] Antonellis, I., Bouras, C. and Poulopoulos, V., Personalized news categorization
through scalable text classification, 8th Asia Pacific Web Conference (APWEB 06),
(2005).

[21] S.Akhilan and S.R.Balasundaram, Enhanced Term Document Frequency


Classification Approach for Personalizing News Items, International Journal of Computer
Applications, number 2-article 1, published by Foundation of Computer Science, March
2011.

[22] S. E. Middleton, N. R. Shadbolt, and D. C. D. Roure, Ontological User Profiling in


Recommender Systems, ACM Transactions on Information Systems,22(1):5488, 2004.

ISSN: 1694-2108 | Vol. 6, No. 1. OCTOBER 2013 16

S-ar putea să vă placă și