Documente Academic
Documente Profesional
Documente Cultură
An Efficient Classification Mechanism For Network Intrusion Detection System Based on Data Mining
Techniques:A Survey .......................................................................................................................... 1
Subaira A. S. and Anitha P.
Intrusion Detection Techniques for Mobile Ad Hoc and Wireless Sensor Networks .............................. 1
Rakesh Sharma, V. A. Athavale and Pinki Sharma
Performance Evaluation of Sentiment Mining Classifiers on Balanced and Imbalanced Dataset ........... 1
G.Vinodhini and R M. Chandrasekaran
Demosaicing and Super-resolution for Color Filter Array via Residual Image Reconstruction and Sparse
Representation .................................................................................................................................. 1
Jie Yin, Guangling Sun and Xiaofei Zhou
Determining Weight of Known Evaluation Criteria in the Field of Mehr Housing using ANP Approach .. 1
Saeed Safari, Mohammad Shojaee, Mohammad Tavakolian and Majid Assarian
Application of the Collaboration Facets of the Reference Model in Design Science Paradigm ............... 1
Lukasz Ostrowski and Markus Helfert
Personalizing Education News Articles Using Interest Term and Category Based Recommender
Approaches ....................................................................................................................................... 1
S. Akhilan and S. R. Balasundaram
International Journal of Computer Science and Business Informatics
IJCSBI.ORG
An Efficient Classification
Mechanism For Network Intrusion
Detection System Based on Data
Mining Techniques:A Survey
Subaira A. S.
PG Scholar
Dr N. G. P. Institute of
Technology Coimbatore, India
Anitha P.
Assistant Professor
Dr.N. G. P. Institute of Technology
Coimbatore, India
ABSTRACT
In spite of growing information system widely, security has remained one hard-hitting area
for computers as well as networks. In information protection, Intrusion Detection System
(IDS) is used to safeguard the data confidentiality, integrity and system availability from
various types of attacks. Data mining is an efficient artifice that can be applied to intrusion
detection to ascertain a new outline from the massive network data as well as it use to
reduce the strain of the manual compilations of the normal and abnormal behaviour
patterns. This work reviews the present state of data mining techniques and compares
various data mining techniques used to implement an intrusion detection system such as,
Support Vector Machine, Genetic Algorithm, Neural network, Fuzzy Logic, Bayesian
Classifier, K-Nearest Neighbour and decision tree Algorithms by highlighting the
advantages and disadvantages of each of the techniques.
Keywords
Classification, Clustering, Intrusion Detection System, Data mining, Anomaly detection,
Misuse Detection
1. INTRODUCTION
In the era of information society, as network-based computer systems
play fundamental roles, they have become the target for intrusions by
attackers and criminals. Intrusion prevention technique such as firewalls,
user authentication, information protection and data encryption have failed
to completely shield networks and systems behaviour from the growing and
sophisticated attacks and malwares. To protect the computers and networks
from various cyber-attacks and viruses the Intrusion Detection Systems
(IDS) are designed. An IDS is a mechanism that monitors network or
IJCSBI.ORG
system actions for malicious activities and produces reports to a
management station [1].
As a significant application area of data mining is intrusion detection
based on data mining algorithms, aims to solve the troubles of analyzing
enormous volumes of data [8]. IDSs build efficient clustering and
classification models to distinguish normal behaviour from abnormal
behaviour using data mining techniques. This study makes foundation in
this field of research and exploration and implements intrusion detection
model system based on data mining technology.
IJCSBI.ORG
2.5 Hybrid Intrusion Detection
The recent development in intrusion detection is to combine both types host-
based and network-based IDS to design hybrid systems. Hybrid intrusion
detection system has flexibility and it increases the security level. It
combines IDS sensor locations and reports attacks are aimed at particular
segments or entire network [28].
3. TYPES OF ATTACKS
3.1 Dos attack
A denial-of-service attack or distributed denial-of-service attack is an
effort to make a computer resource out of stock to its indented users
[32].This type of attack slows down the system or shut down the system so
it disrupt the service and deny the legitimate authorized user. Due to this
attack high network traffic occurs [15].
3.2 User to Root Attack (U2R)
In this type of attack, the attacker starts with user level like taking down
the password, dictionary attack and finally attacker achieves root to access
the system.
3.3 Probing
In this type of attack, an attacker examines a network to gather
information or discover well-known vulnerabilities. An attacker who has a
record, of which machines and services are accessible on a known network,
can make use of this information to look for delicate points.
3.4 Remote to User Attack (R2U)
In this type of attack, an attacker has the capability to send packet to a
machine over a network but does not have an account on that machine,
make use of some vulnerability to achieve local access as a user of that
machine.
3.5 Eavesdropping attack
Eavesdropping is a network layer attack consisting of capturing packets
from the network transmitted by others' computers and reading the sensitive
information like passwords, session tokens, or any kind of confidential
information.
3.6 Man-In-The-Middle Attack
In this the attacker makes independent connections with the victims and
relays messages between them and making them believe that they are
talking directly to each other over a private connection, but the fact is that
the entire conversation is controlled by the attacker.
4. DRAWBACKS OF IDS
This Intrusion Detection Systems (IDS) have become an important
component in security infrastructures as they permit networks
administrators to identify policy variations. These policy violations range
IJCSBI.ORG
from outside attackers trying to gain unconstitutional access to intruders
abusing their access. Current IDS have a number of considerable drawbacks.
4.1 False Positives
A major problem is the amount of false positives IDS will produce.
Developing distinctive signatures is a complicated task. It is much trickier to
pick out a legitimate intrusion attempt if a signature also alerts regularly on
valid network activity.
4.2 False Negatives
In these IDS does not generate an alert when an intrusion is actually taking
place. It simply put if a signature has not been written for a particular
exploit there is a tremendously good chance that the IDS will not detect it.
Table 1: Performance Measure
Intrusion Normal
IJCSBI.ORG
IJCSBI.ORG
Giordana and Neri has proposed one intrusion detection algorithms called
REGAl. The REGAL System is based on distributed genetic algorithm.
REGAL is a concept learning system that learns First Order Logic multi-
model concept descriptions. The learning examples are stored in relational
database that are represented as relational tuples.
Gonzalez and Dasgupta [26] applied a genetic algorithm, though they were
examined host based IDSs, not network based. They used the algorithm only
for the Meta learning step instead of running algorithm directly on the
feature set. It uses the statistical classifiers for labelled vectors. A 2-bit
binary encoding methodology is used for identifying the abnormality of a
particular feature, ranging from normal to abnormal. Chittur [27] used a
genetic algorithm with decision tree. Decision tree is used to represent the
data. They used the high detection rate that reduces the false positive rate.
The false positive occurrence was minimized by utilizing human input in a
feedback loop [10].
5.3 K-nearest Neighbour
K-Nearest Neighbour (k-NN) is a type of Lazy learning, it simply stores a
given training tuple and waits until it is given a test tuple. It is an instance
based learner that classifies the objects based on closet training examples in
the feature space. For a given unknown tuple, a k-Nearest neighbour looks
the pattern space for the k-training tuples that are closest to the unknown
tuple. It is the simplest algorithm among all the machine learning
algorithms. Here the object is classified by a majority vote of its neighbours.
The object is simply assigned to the class of its neighbour only in the case of
K=1. For a target function this algorithm uses all labelled training instances
model. To obtain the optimal hypothesis function algorithm uses similarity
based search. The intrusion is detected with the combination of statistical
schemes. This technique is computationally expensive and requires efficient
storage for implementation of parallel hardware.
5.4 Neural Networks
Neural Network was traditionally used to refer a network or biological
neurons. In [20], IDS neural network has been used for both anomaly and
misuse intrusion detection. In anomaly intrusion detection the neural
networks were modelled to recognize statistically significant variations from
the users recognized behaviour also identify the typical characteristics of
system users. In misuse intrusion detection the neural network would collect
data from the network stream and analyse the data for instances of misuse
[22]. In neural network the misuse intrusion detection can be implemented
in two ways. The first approach incorporates the neural network component
into an existing system or expert system. This method uses the neural
network to sort the incoming data for suspicious events and forward them to
the existing and expert system. This improves the efficiency of the detection
IJCSBI.ORG
system. The second method uses the standalone misuse detection system.
This system receives data from the network stream and analyses it for
misuse intrusion. It has the ability to learn the characteristics of misuse
attacks and identify instances that are unlike any which have been observed
before by the network. It has high degree of accuracy to recognize known
suspicious events. Generally, it is used to learn complex nonlinear input-
output relationships [12].
5.5 Bayesian Classifier
A Bayesian Classifier provides high accuracy and speed for handling large
database. In network model Bayesian classifier encodes the probabilistic
relationship among the variable of interest. In intrusion detection this
classifier is combined with statistical schemes to produce higher encoding
interdependencies between the variables and predicting events. The
graphical model of casual relationships performs learning technique. This
technique is defined by two components-a directed acyclic graph and a set
of conditional probability tables. Direct Acyclic Graph (DAG) represents a
random variable, which may be discrete or continuous. For each variable
classifier maintain one conditional probability table (CPT) and it requires
higher computational effort.
5.6 Decision Tree
Decision tree is a classification technique in data mining for predictive
models. Decision tree is a flowchart like tree structure where internal node
represents a test on attribute, branch represents an outcome of the test and
leaf node represents a class label. From the pre classified data set it
inductively learns to construct the models. Here each data item is defined by
the attribute values. Initially decision tree is constructed by set of pre-
classified data. The important approach is to select the attributes, which can
best divide the data items into their respective classes based on these
attributes the data item is partitioned [5].
This process is iteratively applied to each partitioned subset of the data
items. If all the data items in current subset belongs to the same class then
the process get terminated. Each node contains the number of edges, which
are labelled along with a possible value of attribute in the parent node. An
edge connects either a node or two nodes. Leaves are always labelled with a
decision value for classification of the data [21]. To classify an unidentified
object, the process is started at the root of the decision tree and followed the
branch. Decision trees can be used for misuse intrusion detection that can
learn a model based on the training data and predict the future data from the
various types of attacks. It works well with large data sets. Decision tree
model can also be used in the rule-based techniques with minimum
processing. It provides high generalization accuracy [9].
IJCSBI.ORG
5.7 Fuzzy Logic
Fuzzy logic is derived from fuzzy set theory; it uses the rule based systems
for classification. Fuzzy can be thought of as the application side of fuzzy
set theory dealing with sound thought out real world expert values for a
complex problem[29].The fuzzy related data mining techniques is used to
extract the patterns behaviour. The sets of fuzzy association rules are used to
mine the network audit data models and to detect the anomalous behaviour
the set of fuzzy association rules are generated [30][31].The audit data and
mined normal data have been compared to identify the similarity. If the
similarity values are below an upper limit, an alarm raises [14].
6. A COMPARATIVE ANALYSIS OF DATA MINING
TECHNIQUES FOR INTRUSION DETECTION SYSTEM
TABLE1. GENERAL CLASSIFIER COMPARISON
Classifier Method Advantages Disadvantages
IJCSBI.ORG
The object is being behaviour dimensionality.
assigned to the class 4.Easy for parallel 3. Slow in
most common implementations classifying and
amongst its k nearest testing tuples.
neighbours. If k = 1,
then the object is
simply assigned to
the class of its
nearby neighbour
Neural A Neural Network is 1. Requires less 1. Process is
Network an adaptive system formal statistical black box.
that changes its training. 2. Greater
structure based on 2. Implicitly computational
external or internal detect the burden.
information that complex 3. Over fitting.
flows through the nonlinear 4. It Requires
network during the relationships long training
learning phase. between time.
dependent and
independent
variables.
3. Highly tolerate
the noisy data.
4. Availability of
multiple training
algorithms.
Bayesian classifier 1. Nave Bayesian 1. The
Bayesian based on the rules. It classifier assumptions
Method uses the joint simplifies the made in class
probabilities of computations. conditional
sample classes and 2. Exhibit high independence.
observations. The accuracy and 2.Lack of
algorithm tries to speed when available
estimate the applied to large probability data
conditional databases.
probabilities of
classes given an
observation.
Decision tree 1. Construction 1. Output
Decision initially builds a tree does not require attribute must be
Tree with classification. any domain categorical.
Each node knowledge. 2. Limited to
represents a binary 2. Can handle one output
IJCSBI.ORG
predicate on one high dimensional attribute.
attribute, one branch data. 3. Decision tree
represents the 3. Representation algorithms are
positive instances of is easy to unstable.
the predicate and the understand. 4. Trees created
other branch 4. Able to process from numeric
represents the both numerical datasets can be
negative instances. and categorical complex.
data.
7. CONCLUSIONS
In this paper, many data mining techniques have been proposed to improve
the classification mechanism of Network Intrusion Detection. Different
classifiers have different knowledge to solve the problem, so combining
more than one data mining algorithm is used to remove the demerits of one
another and a number of trained classifier lead to a superior performance
than any single classifier. Combining SVM with Genetic algorithm takes
advantages in accuracy rate and optimization result. Likewise, while
combining k-nearest approach with Decision tree produces a faster
classification result and handles high dimensional data. Overall, these
techniques provide better performance in Intrusion Detection accuracy rate
and faster running time. To fragment a complex problem into sub problems
for which the solutions obtained are simpler to realize, execute, supervise
and update.
IJCSBI.ORG
REFERENCES
[1] W. Lee, S.J. Stolfo, K.W. Mok, A data mining framework for building intrusion
detection models, in: Proceedings of IEEE Symposium on Security and Privacy, 1999,
pp. 120132.
[2] W. Feng, Q. Zhng, G. Hu, J Xiangji Huang, Mining network data for intrusion
detection through combining SVMs with ant colony networks Future Generation
Computer Systems,2013.
[3] T. Zhang, R. Ramakrishnan, M. Livny, BIRCH: an efficient data clustering method
for very large databases, in: Proceedings of SIGMOD, ACM, 1996, pp. 103114.
[4] L. Khan, M. Awad, B. Thuraisingham, A new intrusion detection system using
support vector machines and hierarchical clustering, The VLDB Journal 16(2007)
507521
[5] X. Xu, Adaptive intrusion detection based on machine learning: feature extraction,
classifier construction and sequential pattern prediction, Information Assurance and
Security 4 (2006) 237246.
[6] J.X. Huang, J. Miao, Ben He, High performance query expansion using adaptive co
training, Information Processing & Management 49 (2) (2013) 441453.
[7] Y. Li u, X. Yu, J.X. Huang, A. An, Combining integrated sampling with SVM
ensembles for learning from imbalanced datasets, Information Processing
&Management 47 (4) (2011) 617631.
[8] V. Vapnik, The Nature of Statistical Learning Theory, Springer, 1999.
[9] Marcelloni, combining supervised and unsupervised learning for data clustering,
Neural Computing & Applications 15 (34) (2006)289297.
[10] C.-F. Tsai, Y.-F. Hsu, C.-Y. Lin, W.-Y. Lin, Intrusion detection by machine learning:
a review, Expert Systems with Applications 36 (2009) 1199412000.
[11] S.X. Wu, W. Banzhaf, The use of computational intelligence in intrusion detection
systems: a review, Applied Soft Computing 10 (2010) 135.
[12] H. Brahmi, I. Brahmi, S.B. Yahia, OMC-IDS: at the cross-roads of OLAP mining and
intrusion detection, in: Advances in Knowledge Discovery and Data Mining, in:
LNCS, vol. 7302, 2012, pp. 1324.
[13] S.-J. Horng, M.-Y. Su, Y.-H. Chen, T.-W. Kao, R.-J. Chen, J.-L. Lai, C.D. Perkasa, A
novel intrusion detection system based on hierarchical clustering and support vector
machines, Expert Systems with Applications 38 (2011) 306313.
[14] Q. Zhang, G. Hu, W. Feng, and Design and performance evaluation of a machine
learning-based method for intrusion detection, in: Software Engineering, Artificial
Intelligence, Networking, and Parallel/Distributed computing, in: Studies in
Computational Intelligence, vol. 295, Springer, 2010, pp. 6983.
[15] T.A. Longstaff, J.T. Ellis, S.V. Hernan, H.F. Lipson, R.D. McMillan, L.H. Pazente,D.
Simmel, Security of the Internet, in: F. Froehlich, A. Kent (Eds.), The
Froehlich/Kent Encyclopedia of Telecommunications. Vol. 15, Marcel Derrek, 1998,
pp. 231254.
[16] S. Axelsson, Research in intrusion detection systems a survey, in: Tech. Rep.TR98-
17, Chalmers University of Technology, Goteborg, Sweden, 2000.
[17] S. Freeman, J. Branch, Host-based intrusion detection using user signatures, in:
Proceedings of the Research Conference RPI., 2002.
IJCSBI.ORG
[18] D. Marchette, A statistical method for profiling network traffic, in: Proceedings f
Workshop on Intrusion Detection and Network Monitoring, 1999,pp. 119128.
[19] T.F. Lunt, A survey of intrusion detection techniques, Computers and Security12 (4)
(1993) 405418.
[20] J. Ryan, M.-J. Lin, R. Miikkulainen,Intrusion detection with neural networks, in:
Proceedings of AAAI-97 Workshop on AI Approaches to Fraud Detection and Task
Management, 1997, pp. 9297.
[21] H. Teng, K. Chen, S. Lu, Security audit trail analysis using inductively generated
predictive rules, in: Proceedings of the 6th Conference on Artificial Intelligence
Applications, Vol. 1, 1990, pp. 2429.
[22] ] D.E. Denning, An intrusion-detection model, IEEE Transactions on Software
Engineering 13 (2) (1987) 222232.
[23] F. Monrose, A. Rubin, Authentication via keystroke dynamics, in: Proceedings of the
4th ACM Conference on Computer and Communications Security, 1997
[24] Neri, F., Comparing local search with respect to genetic evolution to detect intrusion
in computer networks, In Proc. of the 2000 Congress on Evolutionary Computation
CEC00, La Jolla, CA, pp. 238243. IEEE Press, 16-19 July, 2000.
[25] Neri, F. Mining TCP/IP traffic for network intrusion detection, In R. L.de Mantaras
and E. Plaza (Eds.), Proc. of Machine Learning: ECML\2000, 11th European
Conference on Machine Learning, Volume 1810of Lecture Notes in Computer Science,
Barcelona, Spain, pp. 313322.Springer, May 31- June 2, 2000.
[26] Dasgupta, D. and F. A. Gonzalez,An intelligent decision support system for intrusion
detection and response, In Proc. of International Workshop on Mathematical
Methods, Models and Architectures for Computer Networks Security (MMM-ACNS),
St.Petersburg. Springer-Verlag, 21-23 May, 2001
[27] Chittur, A., Model generation for an intrusion detection system using genetic
algorithms, High School Honors Thesis, Ossining High School. In cooperation with
Columbia Univ, 2001.
[28] Crosbie, M. and E. H. Spafford,Active defense of a computer system agents,
Technical Report CSD-TR- 95-008, Purdue Univ. West Lafayette, IN, 15 February
1995.
[29] G. J. Klir,Fuzzy arithmetic with requisite constraints, Fuzzy Sets and Systems,
91:165175, 1997.
[30] http://wenke.gtisc.gatech.edu/project/image004.gif
[31] Luo, J.,Integrating fuzzy logic with data mining methods for intrusion detection,
Masters thesis, Mississippi State Univ., 1999.
[32] Christos Douligeris, Aikaterini Mitrokotsa, DDoS attacks and defense mechanisms:
classification and state-of-the-art ,Computer Networks: The International Journal of
Computer and Telecommunications Networking, Vol. 44, Issue 5 , pp: 643 - 666, 2004.
IJCSBI.ORG
Almas M. N. Siddiqui
Research student, Dr. Babasaheb Ambedkar Marathwada University, A bad.
ABSTRACT
In the world of computer science & Information Technology security is essential and
important issue. Identification and Authentication Techniques plays an important role while
dealing with security and integrity. The human physical characteristics like fingerprints,
face, hand geometry, voice and iris are known as biometrics. These features are used to
provide an authentication for computer based security systems. Biometric verification refers
to an automatic verification of a person based on some specific biometric features derived
from his/her physiological and/or behavioral characteristics. Biometrics is the science and
technology of measuring and analyzing biological data of human body, extracting a feature
set from the acquired data, and comparing this set against to the template set in the
database. The future in biometrics seems to belong to the multimodal biometrics (a
biometric system using more than one biometric feature) as a Unimodal biometric system
(biometric system using single biometric feature) has to contend with a number of
problems. In this paper, a survey of some of the multimodal biometrics is conducted.
Keywords
Biometrics, Unimodal Biometrics, Multimodal Biometrics, Verification, Identification,
Recognition.
1. INTRODUCTION
The term Biometric comes from the Greek word bios which mean life and
metrikos which means measure. It is well known that humans intuitively use
some body characteristics such as face, gait or voice to recognize each other.
Since, a wide variety of application requires reliable verification schemes to
confirm the ID of an individual, recognizing human on basis of their
characteristics [1].The characteristics are as follows: voice, fingerprints,
body contours, retina & iris, face, soft biometrics, etc.
IJCSBI.ORG
Noise in sensed data, Intra class variations, Inter class similarities, Non
universalities, spoof attacks [3]. Multi biometric (multimodal) systems seek
to alleviate some of the drawbacks encountered by Unimodal biometric
systems by consolidating the evidence presented by multiple biometric traits
/ sources [4]. Biometric system can also be designed to recognize a Person
based on information acquired from multiple biometric sources. Such
system is known as Multimodal biometric system. Multibiomeric system
can offer substantial improvement in the matching information being
combined and fusion methodology accuracy of a biometric system
depending upon the adopted. It addresses the issue of non universality or
insufficient population coverage. This system also effectively addresses the
problem of noisy data. These systems also help in continuous monitoring
and tracking of individual in situation when a single trait is not sufficient.
Fusion schemes should be employed to combine the information presented
by multiple biometric sources. There are various data combination levels
that can be considered, examples are the feature level, score level and
decision level [5].
IJCSBI.ORG
2.4 NON UNIVERSALITY
Some persons cannot provide the required standalone biometric, owing to
illness or disabilities [7].
2.5 SPOOFING
Unimodal biometrics is vulnerable to spoofing where the data can be
imitated.
3. LITERATURE SURVEY ON MULTIMODAL BIOMETRICS:
[11] Proposed a multimodal biometric system using Fingerprint and Iris
features. They use a hybrid approach based on: 1) Fingerprint minutiae
extraction and 2) Iris template encoding through a mathematical
representation of the extracted Iris region. This approach was based on two
recognition modalities and every part provided its own decision. The final
decision was taken by considering the Unimodal decision through an AND
operator. No experimental results have been reported for recognition
performance.
[12] Proposed a multimodal biometric system using Finger print and Face.
They use Scale Invariant Features Transform (SIFT), Fingerprint
Verification based on Minutiae matching Technique and Feature Level
Fusion for the recognition. This paper present multimodal biometric system
based on the integration of face and fingerprint traits at feature extraction
level were presented. Both fingerprint and face images are processed with
compatible feature extraction algorithms to obtain comparable Features
from the raw data. As even in the literature, it is claimed that ensemble of
classifier operating on uncorrelated features increases the performance in
comparison to correlated features.
[13] Proposed a multimodal biometric system using Ear and Face. They use
Iterative Closest Point (ICP) algorithm, Local 3D feature, PCA. The
approach is based on local 3D features which are very fast to compute and
robust to pose and scale variations and occlusions due to hair and earrings.
An expression-robust multimodal ear-face biometric recognition approach is
proposed with fusion at the score level in this paper.
[14] Proposed a multimodal biometric system using Fingerprint and Iris.
They use PCA (Principal Component Analysis) and FLD (Fisher Linear
Discriminant) methodology for Biometric recognition. This paper presents
the difference between borda count method and logistic regression methods.
From the comparison results that rank-level fusion with the logistic
regression approach provided the better performance in terms of error rate
and increase the recognition rate of multi biometric systems, because in this
approach, weights are assigned to different matchers according to their
performance.
IJCSBI.ORG
[15] Proposed a multimodal biometric system using Face. They use the
methodology PCA (Principal Component Analysis) +RMPM (Reduced
Multiple Polynomial Model) to develop the multimodal Biometric system.
This paper presents stereo face recognition formulation which combines
appearance and depth at feature level. A Reduced Multivariate Polynomial
Model was adopted to fuse the appearance and disparity images. RMPM is
extended so that the problem of new user registration can be overcome. The
face recognition approach is useful for some online application such as
visitors.
[16] Proposed a multimodal biometric system using Face and Finger Veins.
They use the LDA methodology for this system. This paper presents the
multimodal low resolution face and finger veins recognition system at score
level fusion. Proposed multimodal recognition system is very efficient to
reduce the FAR .000026 and increase GAR 97.4.The proposed system is
difficult due to the extra processing required for the feature spaces.
[17] Proposed a multimodal biometric system using fingerprint and voice.
They use Leave-One-Out Cross Validation technique (LOOCV) and
Gaussian mixture model for score level fusion. The proposed system in this
paper Optimum reliability ratio based integration weight optimization
scheme for fingerprint and voice modalities is implemented. The
performance of the system is calculated under different noise condition. One
drawback of this method is that under extreme noise conditions it gives
attenuating fusion.
[18] Proposed a multimodal biometric system using Fingerprint and Finger
Vein They use MHD(modified Hausdorff distance)algorithm as well as
Minutia extraction and matching based on turnery vector .This paper
proposed the system of score level fusion based on finger print and finger
vein. The process of recognition experiments based on homologous
biometrics database.
[19] Proposed a multimodal biometric system using palm print by using
rank level fusion. The authors in this paper have investigated the rank level
combination for palm print matchers using four different approaches, i.e.,
Borda count, weighted Borda count, Highest and product of ranks, and
Bucklin majority voting, and also proposed a new nonlinear approach for
combining the ranks. The experimental results suggested in this paper put
forward that the considerable performance improvement in the recognition
accuracy can be achieved from rank-level combinations as compared to
those from individual palm print representations
[20]Proposed a multimodal biometric system using Finger print and Face by
using Normalization method and Adaptive method. This paper studies a
population approaching 1,000 individuals which is larger. The performance
of multimodal biometric authentication systems using state-of-the-art
IJCSBI.ORG
Commercial Off-the-Shelf (COTS) fingerprint and face biometric matchers
on a population approaching 1,000 individuals is studied in this paper.
[21] Proposed a multimodal biometric system using face and signature with
Score level fusion technique The performance of single modality based
biometric recognition has been suffered from the different noisy data, non-
universality of biometric data, and susceptibility of spoofing. The
multimodal biometric system can improve the performance of the system. In
this paper shows that face and signature based bimodal biometric system
can improve the accuracy rate about 10%, than single Face/signature based
biometric system.
4. MULTIMODAL BIOMETRICS
Noisy data, Infraclass Variation, Interclass Similarities, Non universality,
Spoofing etc problems are imposed by Unimodal biometric systems which
tend to increase False Acceptance Rate [FAR] and False Rejection Rate
[FRR], ultimately reflecting towards poor performance of the system. Some
of the limitations imposed by Unimodal biometrics can be overcome by
including multiple source of information for establishing identity of person
[7]. Multimodal biometrics refers to the use of a combination of two or
more biometric modalities in a Verification or Identification system. They
address the problem of non- universality, since multiple traits ensure
sufficient population coverage [8].
IJCSBI.ORG
4.2 AUTHENTICATION PHASE
In authentication phase, once again traits of a user captured and system uses
this to either identify or verify a person. Identification is one to many
matching which involves comparing captured data with templates
corresponding to all users in database while verification is one to one
matching which involves comparing captured data with template of claimed
identity only [7].
IJCSBI.ORG
machine, Bayesian inference, DempsterShafer theory, dynamic Bayesian
networks, neural networks and maximum entropy model. Note that we can
further classify these methods as generative and discriminative models from
the machine learning perspective. For example, Bayesian inference and
dynamic Bayesian networks are generative models, while support vector
machine and neural networks are discriminative models.
6. CONCLUSIONS
Unimodal biometric systems fail in case of lack of proper biometric data for
a particular trait. It is robust to use multiple biometrics for providing
authentication .We have observed that multimodal biometrics overcome the
problems related with Unimodal biometrics like noisy data, interclass
similarities, intra class variation, non universality and spoofing. There are
many multimodal biometric systems in existence for authentication of a
person but still selection of appropriate modals, choice of optimal fusion
level and redundancy in the extracted features are some challenges in
designing multimodal biometric system that needs to be solved.
7. ACKNOWLEDGMENTS
We are thankful to our Guide Dr. P. D. Deshmukh for providing valuable
guidance and technical support.
REFERENCES
[1] Neena Godbole, Information Security System, Wiley Publication.
[2] Prof. V. M. Mane and Prof. (Dr.) D. V. Jadhav, Review of Multimodal Biometrics:
Applications, challenges and Research Areas, International Journal of Biometrics
and Bioinformatics (IJBB), Volume 3, Issue 5.
[3] Arun A. Ross, Karthik Nandakumar, Anil K. Jain, Hand book of Multibiometrics,
Springer International Edition.
[4] Arun Rossa and Rohin Govindarajanb, Feature Level Fusion Using Hand and Face
Biometrics, Appeared in Proc. of SPIE conference on biometric technology for
human identification II.
[5] Fortuna, J., Sivakumaran, P., Ariyaeeinia, A. and Malegaonkar, A., 2004. Relative
effectiveness of score normalization methods in open-set speaker identification.
Proc.IEEE Speaker and language Recognition Workshop (Odyssey'04), pp. 369-376.
[6] P. S. Sanjekar and J. B. Patil ,AN OVERVIEW OF MULTIMODAL BIOMETRICS,
Signal & Image Processing : An International Journal (SIPIJ) Vol.4, No.1, February
2013.
IJCSBI.ORG
[7] M. Golfarelli, D. Maio and D. Maltoni,On the Error-reject Tradeoff in Biometric
Verification Systems, IEEE Trans. on Pattern Analysis and Machine Intelligence, vol.
19, no. 7, pp.786-796, July 1997.
[8] A. Ross and A. Jain, Information Fusion in Biometrics, Journal of Pattern
Recognition Letters,vol. 24, pp. 21 15-2125, 2003.
[9] V. Mane and D. Jadhav, Review of Multimodal Biometrics: Applications, Challenges
and Research Areas, International Journal of Biometrics and Bioinformatics (IJBB),
vol. 3, no. 5, pp. 90-95, 2009.
[10] Dapinder Kaur , Gaganpreet Kaur ,Level of Fusion in Multimodal Biometrics: a
Review International Journal of Advanced Research in Computer Science and
Software Engineering 3(2), February 2013.
[11] F. Besbes, H. Trichili, and B. Solaiman, Multimodal biometric system based on
Fingerprint identification and Iris recognition, in Proc. 3rd Int. IEEE Conf. Inf.
Commun. Technol.: From Theory to Applications (ICTTA 2008), pp. 15. DOI:
10.1109/ ICTTA.2008.4530129.
[12] A. Rattani, D. R. Kisku, M. Bicego, Member, IEEE and M. Tistarelli, Feature level
fusion of face and finger Biometric.
[13] S.M.S. Islam, M. Bennamoun, A.S. Mian, d R. Davies, Score Level Fusion of Ear
and Face Local 3DFeatures for Fast and Expression-Invariant Human Recognition,
ICIAR 2009, LNCS 5627, pp. 387396, 2009.Springer-Verlag Berlin Heidelberg
2009.
[14] N. Radha, A. Kavitha, Rank Level Fusion Using Fingerprint and Iris Biometric,
Indian Journal of Computer Science and Engineering (IJCSE) ISSN: 0976-5166 Vol.
2 No. 6 Dec 2011-Jan 2012.
[15] Jian-Gang Wang , Kar-Ann Toh, Eric Sung , Wei-Yun Yau, A Feature-level Fusion
of Appearance and Passive Depth Information for Face Recognition, Source: Face
Recognition, Book edited by:Kresimir Delac and Mislav Grgic, ISBN978-3-902613-
03-5, pp.558, I-Tech,Vienna, Austria, June2007
[16] Muhammad Imran Razzak, Muhammad Khurram Khan Khaled Alghathbar,Rubiyah
Yusof, Multimodal Biometric Recognition Based on Fusion of Low Resolution Face
and Finger Veins , International Journal of Innovative Computing, Information and
Control ICIC International 2011 ISSN 1349-4198 Volume 7, Number 8, August 2011
pp. 4679{4689}.
[17] Anzar S .M. Sathidevi P. S. Optimal Score Level Fusion using Modalities Reliability
and Separability Measures, International Journal of Computer Applications (0975 -
8887) Volume 51 - No. 16, August 2012.
[18] Feifei CUI, Gongping YANG, Score Level Fusion of Fingerprint and Finger Vein
Recognition, F. Cui et al. Journal of Computational Information Systems 7:16
(2011) 5723-5731 5724.
[19] Ajay Kumar, Senior Member, IEEE, Sumit Shekhar, Personal Identification Using
Multibiometrics Rank-Level Fusion, IEEE TRANSACTIONS ON SYSTEMS,
MAN, AND CYBERNETICSPART C: APPLICATIONS AND REVIEWS.
[20] Robert Snelick, Umut Uludag, Alan Mink, Michael Indovina and Anil Jain, Large
Scale Evaluation of Multimodal Biometric Authentication Using State-of-the-Art
Systems IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 27,
No. 3, Mar 2005, pp 450-455.
[21] Kazi M. M. , Rode Y.S. ,Dabhade S. B. , Al-dawla N. N. H. ,Mane A.V. , Manza R.
R. ,Kake K.V. Multimodal Biometric System Using Face And Signature : A Score
Level Fusion Approach, Advances in Computational Research ISSN: 0975-3273 &
E-ISSN: 0975-9085, Volume 4, Issue 1, 2012.
IJCSBI.ORG
Maitanmi Olusola
Computer Science, Babcock University,
Ilisan Remo, Ogun State, Nigeria
Gregory Onwodi
School of Science & Technology,
National Open University of Nigeria
ABSTRACT
The intelligent Car Park System is a system designed to prevent problems usually
associated with car parks. This study would cover a wireless transmitter and receiver which
is the sensor. The principle of electromagnetic field would be employed in which an
inductor would be wound and buried under the entry and exit post of parking garage as a
sensor to detect the metal under the vehicle; once the vehicle passes across the searching
coil and it automatically opens the parking garage door and closes it after a few minutes.
This aimed at solving congestion, indiscriminate parking and the problem of locating
empty parking lots. Others significances of the paper are: to eliminate the need for manual
operations and make life easier and secure for car owner and eradicates human
inconsistencies. It also examines in details how the automatic gate system works and to
understand the concepts involved so as being able to incorporate such into an
intelligent Car Parking System.
Keywords
Car parking systems, electromagnetic field.
1. INTRODUCTION
For over thirty years, traffic information has been provided to help motorists
make en-route decisions. The development of Intelligent Transportation
Systems (ITS) and Advanced Traffic Management Systems (ATMS) have
begun to improve transportation through the use of technology. Along the
same lines, system like Intelligent Vehicle Highway Systems (IVHS),
acquire, analyse, communicate, and present information to assist surface
transportation. Travellers are moving from a starting location to their
desired destination. Data from IVHS can now be utilized as information for
en-route assistance as well as collection of traffic data. Information
Technology is beginning to recognize the importance of post-trip
information dissemination by providing information on the location and
IJCSBI.ORG
IJCSBI.ORG
Research carried out by [2] discusses the development of a remote bar code
reader-applied management system for arriving and departing vehicles. The
system designed for use at a substation or similar facility, scans a bar code
sticker on the windshield of an approaching vehicle and sends signals to
open a motor-driven entrance gate, while at the same time automatically
recording the type of vehicle, license plate number, vehicle-owners name,
time of arrival/departure and other relevant data.
2.3 Siespace Car Park Control and Information Systems
Siemens car park control and information system, SIESPACE, guides the
motorist to vacant parking areas, SIESPACE is a modular system designed
for use in an urban network where a choice of car parking is available. A
computer based in station collates information and determines the
appropriate messages to be displayed on guidance signs.
2.4 Saint Paul Advanced Parking Information System
Developments in Minnesota, (1995) shows that Advanced Parking
Information System was deployed in late 1995 and early 1996 as a test case
under the Minnesota Guidestar Program. Minnesota Guidestar provides
overall direction for the Minnesota Department of Transportations
(MnDOT). It is a program that provides a focus for Strategic planning,
project management and evaluation. The APIS itself was utilized in the
Downtown Saint Paul area to inform drivers of parking location and
availability so that they would have the opportunity to make advance
decisions hopefully helping reduce congestion and Pollution. The goal of
MnDOT is to use the system continuously however, the APIS was only
deployed during events in which the Civic Center attracted greater than
2,000 visitors, the Music Center attracted greater than 1,000 visitors, or a
combination of events occurred simultaneously.
The system operates using loop detectors, ticket splitters or cash registers as
vehicle counting equipment located at each garage or lot. A controller
interface is also required since the equipment is not capable of calculating a
space available number. This interface counts and calculates space
availability as each car enters or exits the lot in real time. This number is
then transmitted via modem to a central computer at the city. The central
computer then sends the required signal along to the variable message signs
with the help of MnDOTs Microsoft-based Ramp Management Software.
The operator controls the system from the central computer and, at any
time, has the ability to read and modify sign messages, correct parameters,
check the state of an entire mounted electronic sign mast or parking facility,
and take appropriate action as required [5].
IJCSBI.ORG
3 METHODOLOGY
IJCSBI.ORG
adjust. Once a metal enters the field, the output of the comparator goes
LOW and triggers a monostable stage for time (T), during which two
operations are achieved.
1. A monostable multivibrator, which clocks the counter to count-up for
ENTRY. If a car enters into the building, while the EXIT counter count
down, an independent counter is configured to count UP/ DOWN to
monitor both the Entry and Exit. This counter will display the NET count,
which is the number of cars still left inside a particular car lot or garage.
2. The switching of the sliding door was achieved using a logic gate and
timer to control the opening and closing of the door. Once the metal under
the car is detected two monostable is trigger, one is design to activate the
switching circuit which is responsible for the opening of the gate, while the
other monostable later activate the switching circuit which is responsible for
the closing of the gate. Once the full capacity of the garage is achieved, the
garage gate automatically denied entry to others cars coming in, also truck
and bike of heavy and lower capacities of range set are disallowed.
10nF
C3 IN 4148
TR1
2.2K
4.7k R2 1k
R4
IJCSBI.ORG
fo = 10KHz = 10,000Hz
C = [C1C2 / (C1 + C2)]
C1 = 100nF = 100 x 10-9 F = 1 x 10-7 F
C2 = 220nF = 220 x 10-9 F = 2.2 x 10-7 F
L=?
C = [C1C2 / (C1 + C2)] = [(1 x 10-7) x (2.2 x 10-7)] / [(1 x 10-7) + (2.2 x 10-7)]
Therefore,
C = 6.875 x 10-8 F (or 68.75nF)
Transposing L and substituting for all the values in fo,
fo= 1 / (2LC)
L = 1 / (2fo)2xC
L = 1 / [(2 x 3.14 x 1000)2 x 6.875 x 10-8]
L = 0.368814519H (or 368.8mH) being inductance().
To get a uniform field, a toroid has to be designed to represent the inductor
of inductance = 368.8mH.
No turn
IJCSBI.ORG
and A = r2
where d = 1cm = 2r
therefore, r = 0.5cm
Thus r2 = (0.5)2
A = x (0.5)2
A =[ (22/7) x 0.25] = 0.79cm2
From the equation,
= [(oAN2)/ Lo]
Substituting for all the values,
0.368814519 = [((4 x 10-7) x 0.79 x N2) / 0.15]
Therefore, N2 = [(Lox ) / (ox A)]
Therefore, N = 236 turns.
Hence winding 236turns for the inductor will give the required inductance.
Resistors R1, R2 and R4 are dc bias resistors for the transistor T1.
R15
LED2
1K
3 +
8 R8
C4 LM 393 1
IC1
2 100uF
- 4 C5
V+
1K
VR2 2.2K
VR2 is set at 2.6 V(this is because when metal is detected the voltage drops
from approximately 4V to 1.5V), any voltage below 2.6V in the inverting
input will make the output of the comparator go low, to trigger the
monostable stage since,
Vout = A0 Vin
Where A0 = open loop voltage gain.
IJCSBI.ORG
8 4
R6
2 IC2
7 555
6
C3 1
IJCSBI.ORG
V+(12V)
D5
h
IC = fe ------------------------------------------- equation (3.0)
Rb = ------------------------------------------- equation (4.0)
Where,
IC = collector current
IB = base current
Vin = input voltage
Vt = supply voltage
VCE = collector-emitter voltage
Hfe= current gain.
From 1.0, 12 = IcRc +Vce
IJCSBI.ORG
12 = Ic (400) + 0
and, Ic = 30mA
From 3.0, IB = 30mA/300
= 100uA
From 2.0, 3.5= 100uA RB + 0.6
RB = 11.4/200uA
= 29K
=30 K (preferred value).
Hence R11 R14 = 29K.
Rx
V+
16
g f e d c b a
8
7447
D C B A
FROM COUNTER
OUTPUT.
Figure 6. Seven Segment Digital Display. Source: [7]
IJCSBI.ORG
R 9 R 8
R 10
V+ V+
V+
16 16 16
8 7447 IC7 8 8
7447 IC8 7447 IC 6
V+
V+ V+ V+
12 1 5 11 8 9 12 1 5 11 8 9 12 1
5 11 8 9
S1 14
7490 IC4 14 7490 IC3
7490 IC5 14
6 7 10 2 3 6 7 10 2 3 6 7 10
2 3
R7
IJCSBI.ORG
+12V IC5
15V
T1
D4 D1
+5V IC4
D3 D2
47F C2 1000F C1 220VAC
0
The choice of the filter capacitor is dependent on the output current. Given
that:
Vr (rms) = 2.4 Il/CFI .. (1)
Were Vr (rms) = Rectified D.C ripple voltage
Il= Load current (mA)
CFI = Filter capacitor (F)
For a load current of a (500mA), and a ripple factor of 5 %
Vrms= Vpeakx2
= 15v x 2
= 21.2V
For a ripple factor of 5%
Vr(rms) = 5/100 x 21.2
= 1.06V
From (1)
2.12V = 2.4 x 500mA/CFI
GFI =2.4 x 500mA / 1.06V
=1,132F
= 1000F preferred value.
Hence, C1= 1000uF, C2 = 47uF.
3.8 Logic Gate (Xor Gate)
The logic control circuit in this project controls the switching of the relay
that is responsible for opening and closing of the gate. The XOR gate only
gives a HIGH output when either of the inputs is HIGH and the other is
LOW (AB + AB); hence term unequal comparator as shown in fig. 9.
A
C
B
Figure 9. Shows the exclusive OR gate symbol Source: [5]
IJCSBI.ORG
IJCSBI.ORG
The first Vero board contains the power supply, oscillator and comparators
and the second contains the XOR gate and the monostable stages, the third
contains the counter, decoder and display stages; and the fourth contains the
switching circuits which could not be displayed because of size of file.
IJCSBI.ORG
insulation and give ecstatic value which we couldnt show because of its
large size.
ITEM QUANTITY
in4007 rectifier diode 9
7806 5v dc linear regulator 1
7812 12v dc linear regulator 1
7 segment common anode display 3
7447 decoder 3
7490 decade counter 3
lm393 voltage comparator 4
tip 41 buffer transistor 1
BC 337 switching transistor 4
7411 logic XOR gate 1
ne555 timer 6
3300uf(25v) filtering capacitor 1
47uf(25v) 6
10uf(25v) 1
4.7k resistor 5
15k 5
10k 5
auto reverse dc motor 2
light emitting diode 1
Coil 4
IC socket (8 pin) 8, (14 pin) 2, (16 pin) 6
electromagnetic switch relay 4
5. CONCLUSION
The project which is the design and construction of an intelligent car
parking system was designed considering some factors such as economic
application, design economy, availability of components and research
materials, efficiency, compatibility, portability and durability. The
performance of the project after testing met design specifications. However,
the general operation of the project and performance is dependent on the
user who is prone to human error such as failure to perform or omitting a
task, slips of action, performing the task incorrectly, lapses of memory,
knowledge- based mistakes, etc. The operation is dependent on how well
the soldering is done, and the positioning of the components on the Vero-
board. If poor soldering lead is used the circuit might form dry joint early
and in that case the project might fail. Also if logic elements are soldered
near components that radiate heat, overheating might occur and affect the
performance of the entire system. Other factors that might affect
performance include transportation, packaging, ventilation, quality of
components, handling and usage.
IJCSBI.ORG
REFERENCES
[1] Mehta,V. K. Principles of Electronics (117-205, Transistors, and General
References), published by S. Chand & Company Ltd (2003).
[2] Robert, I. Boylestad and Louis Nashelsky, Electronics devices and circuit theory
(eighth edition), published by Prince-Hall (2002).
[3] Maddock, R. J. & Calcutt, D. M. Electronics a course for Engineers. (pages 341-
349, IC Timers, 249-263 counters, 290-293 decoder drivers), published by
Longman (1994).
[4] Tom Duncan, Success in Electronics (pages 44-75, other passive components,
107-119, op-to devices and transducers), Published by Longman (1983).
[5] George Loveday, Essential Electronics (pages 241-244 transistors, general
references). Published by Pitman (1984).
[6] Jacob Millman, Micro Electronics (General references), McGraw-Hill book
company (1979).
[7] Luecke G., J. P. Mize and W. N. Carr, Semiconductor Memory Design and
Applications (Chap. 3 & 4, General References), McGraw-Hill Book Company,
1973.
[8] Everyday Electronics Journal, May-1998 edition (Power Supplies and
Multivibrators), Wimborne Publishing.
[9] Clayton hallmark, IC Cookbook (Pin Configurations of all the ICS), Mc-Graw
Hill Book Company, 1986.
[10] NTE data book 12th edition, (General Data Sheets for all the components).
IJCSBI.ORG
V. A. Athavale
Department of Computer science & Engineering, Gulzar Group of institutions
Khanna,Punjab,India
Pinki Sharma
Department of Computer science & Engineering, HCTM Technical Campus
Kaithal,Haryana,India
ABSTRACT
Mobile circumstantial networks and wireless detector networks have secure a large form of
applications. However, they are usually deployed in probably adverse or perhaps hostile
environments. Therefore, they cannot be without delay deployed while not first addressing
security challenges. Intrusion detection systems supply a necessary layer of in-depth protection
for wired networks. However, relatively little or no analysis has been performed concerning
intrusion detection at intervals the areas of mobile ad-hoc networks and wireless sensing
element networks. Throughout this text, first we tend to shortly introduce mobile ad-hoc
networks and wireless sensor networks and their security issues. Then, we tend to concentrate
on their intrusion detection capabilities. Specifically, we tend to gift the challenge of
constructing intrusion detection systems for mobile ad-hoc networks and wireless detector
networks, survey the prevailing intrusion detection techniques, and indicate important future
analysis directions
Keywords
Mobile ad-hoc networks, wireless sensor networks, attacks, AODV, IDS, secure aggregation.
1. INTRODUCTION
IJCSBI.ORG
vary of applications in each military and civilian environments. As an example,
a painter might be deployed quickly for military communications within the
field. A painter additionally might be deployed quickly in situations like a
gathering space, a town transportation wireless network, for hearth fighting,
and so on. To create such a cooperative and self-configurable network, each
mobile host ought to be a forthcoming node and keen to dispatch messages for
others. Within the original style of a painter, international trait in nodes at
intervals the full network may be a basic security assumption. Recent progress
in wireless communications and Micro-Electro Mechanical Systems (MEMS)
technology has created it possible to make miniature wireless detector nodes
that integrate sensing, processing, and communication capabilities. These
miniature wireless detector nodes is very tiny, as little as a cubic centimeter.
Compared with typical computers, the inexpensive, powered, detector nodes
have a restricted energy offer, tight process and communications capability, and
memory is inadequate. The planning and implementation of relevant services
for WSNs should keep these limitations in mind. Supported the cooperative
efforts of an outsized variety of detector nodes, WSNs became smart candidates
to supply economically viable solutions for a large vary of applications, like
environmental observance, scientific information assortment, health
observance, and military operations [1].An example WSN is illustrated in Fig.
1, the WSN is deployed to sight targets.
IJCSBI.ORG
usually are deployed in adverse or perhaps hostile environments. Therefore,
they can't be without delay deployed while not 1st addressing security
challenges. Because of the options of associate degree release medium, the low
degree of physical defense of movable nodes, a vibrant topology, a restricted
power offer, and therefore the absence of a central management purpose [2],
MANETs are a lot of prone to malicious attacks than ancient wired networks
ar. In WSNs, the shortage of physical security combined with unattended
operations creates detector nodes liable to an elevated threat of being capture
and compromised, creating WSNs prone to a range of attacks. So far, study to
search out protection solutions for MANETs and WSNs has originated from the
interference function of read. As an example, in each network, there exist
several key distribution and management schemes that may be designed
supported link-layer security design, interfering of denial of service attacks, and
vulnerable direction-finding protocols. Theres additionally analysis targeted to
specific services and applications. As an example, one amongst the foremost
vital functions of deploying WSNs is to gather relevant information. During an
information assortment method, aggregation was needed to save lots of energy,
so prolonging the lifespan of a WSN. However, aggregation primitives are
prone to node compromise attacks. This results in incorrectly mass results by a
compromised collector. Hence, effective techniques are needed to verify the
integrity of mass results. Prevention-based approaches will considerably scale
back potential attacks. However, they can't whole eliminate intrusions. When a
node is compromised, all the secrets related to the node are receptive attacks.
This renders prevention- primarily based techniques less useful for guarding
against malicious insiders. In observe, insiders will cause abundant larger
injury. Therefore, Intrusion Detection Systems (IDSs), serving because the next
line of defense, are essential in providing a highly-secured system. By
modeling behaviors of correct activities, IDS will effectively establish potential
intruders and so offer in-depth protection. During this article, we tend to first
offer a short introduction to IDS. Then, we tend to gift challenges in
constructing IDSs for mobile circumstantial networks and wireless detector
networks and analysis their existing intrusion detection techniques. Finally, we
tend to imply vital future analysis directions.
IJCSBI.ORG
Nations agency have legitimate access to the system however are abusing their
privileges. The system is a number laptop, network instrumentation, a firewall,
a router, a company network, or any system being monitored by intrusion
detection system. Associate degree IDS dynamically monitors a system and
users actions within the system to sight intrusions. As a result of associate
degree system will suffer from numerous sorts of security vulnerabilities, it's
each technically tough and economically pricey to make and maintain a system
that's not vulnerable to attacks. Expertise teaches America ne'er to place
confidence in one defensive technique. IDS, by analyzing the system and users
operations, in search of uninvited and doubtful behavior, might effectively
monitor and defend against threats. Generally, there are two kinds of intrusion
detection: misuse-based detection and anomaly primarily based detection [3]. A
misuse-based detection method encodes identified attack signatures and system
vulnerabilities and stores them during information. If deployed IDS notice a
match between current activities and signatures, associate degree alarm is
generated. Misuse sight ion techniques don't seem to be effective to detect
novel attacks as a result of the shortage of corresponding signatures. Associate
degree anomaly-based detection technique creates time-honored profiles of
system states or user behaviors and compares them with existing behavior. If a
major deviation is ascertained, the IDS raises associate degree alarm. Anomaly
sight ion will detect unknown attacks. However, traditional profiles are
sometimes terribly tough to make. As an example, in a MANET, mobility-
induced dynamics create it difficult to tell apart between normalcy and
anomaly. It is, therefore, tougher to tell apart between false alarms and real
intrusions. The potential to determine traditional profiles is crucial in planning
economical, anomaly-based IDS. As a promising various, specification
primarily based detection techniques mix the benefits of misuse detection and
anomaly detection by victimization physically developed condition to
distinguish justifiable system behaviors. Specification-based sight ion
approaches are kind of like anomaly detection techniques in this each of them
detect attacks as deviations from a traditional profile. However, specification-
based detection approaches are supported manually developed specifications,
so avoiding the high rate of false alarms. However, the drawback is that the
event of careful specifications is long.
IJCSBI.ORG
Source
A B
C D
A can overhear Bs
transmission to decide
Destination
whether B is
misbehaving
IJCSBI.ORG
In addition to those, attacks like speeding, wormhole, and spoofing even have
been mentioned within the context of a painter. What is more, it is not tough to
fabricate intrusions supported the mix of attacks mentioned antecedently.
2.1.1 Existing analysis
The pioneer ID analysis within the context of a painter seems during a series of
works in [26]. Within the system thought, associate degree agent is hooked up
to every node. Every node will perform intrusion detection and response
practicality separately. One amongst the foremost vital steps in IDS analysis is
to construct effective options. specializing in painter routing protocols, Zhang
et al. [2] use associate degree unattended methodology to construct a feature set
and choose an important set of options (e.g., distance to a destination, node
moving rate, the proportion of modified routes, the proportion of changes
within the total of hops of all routes, etc.) that have high info gain. Info gain is
a vital metric to live the effectiveness of options. Options with high info gain
will facilitate made IDS to realize fascinating performance. Different routing
protocols might end in different feature sets. Intrusion detection is developed as
a pattern classification downside, during which classifiers are designed to
classify ascertained activities as traditional or intrusive. In [2], supported
associate degree known feature set, Zhang et al. apply two renowned
classifiers, manslayer and support vector machine (SVM) lightweight, to
construct a set of anomaly detection models. Manslayer may be a decision- tree
equivalent classifier for rule induction. By separating provided information into
acceptable categories, manslayer will cipher rules for the system. SVM
lightweight will turn out a lot of correct classifier once the info that's provided
cannot be delineate by the given set of options. As a result of the neck of the
woods of one intrusion session, post-processing is also introduced to strain
false alarms. In post-processing, if there are a lot of abnormal predictions than
traditional predictions during a pre-defined amount of your time, activities
outlined during this amount of your time is deemed abnormal. During this
approach, spurious errors that occur throughout traditional sessions are
removed. As a result of the importance of feature choice in IDS analysis,
Huang et al. [4] more introduce a brand new learning-based methodology to
utilize cross-feature analysis to capture inter-feature correlation patterns.
Suppose that L options, f1, f2, , fL, are known, wherever every fi denotes
one feature characterizing either topology or route activities. The classification
downside to be solved is to form a group of classification model Ci: {f1, fi-1,
fi+1, , fL} fi from the coaching method. Here one feature fi is chosen
because the target to classify. Then, the classification model Ci is wont to
establish temporal correlation between one feature and every one of the
opposite options. The prediction of Ci is extremely possible in traditional
IJCSBI.ORG
things. However, once there are malicious events, the prediction of Ci becomes
not possible. Supported this, traditional events and abnormal events is
distinguished. Native detection alone is not enough as a result of the distributed
nature of a painter. Huang and Lee [5] more elaborate on mechanisms during
which one node will collaborate with its neighbors and initiate a detection
method over a broader vary. This will offer not solely a lot of correct detection
results, however additionally a lot of info in terms of attack sorts and sources.
When fairly and sporadically electing an observance node during a cluster of
neighboring painter mobiles, a cluster-based detection theme is planned. Every
node maintains a finite state machine, with attainable states of Initial, Clique,
Done, and Lost. Supported the finite state machine, a group of protocols, as
well as a coterie computation protocol, a cluster-head computation protocol, a
cluster-valid assertion protocol, and a cluster recovery protocol are careful.
Resource constraint issues sweet-faced by a painter are addressed once these
protocols are designed. Supported a specification-based approach to explain
major practicality of Ad-hoc On Demand Distance Vector (AODV) routing
algorithms at information layers and routing layers, Huang associate degreed
Lee [6] recommend an extended finite state automaton (EFSA), wherever
transitions and states will carry a finite set of parameters. During this approach,
the planned EFSA will sight invalid state violations, incorrect transition
violations, and sudden action violations. The development of EFSA will lead
naturally to a specification-based approach. Supported a group of applied
mathematics options, datum learning algorithms are then adapted to sight
abnormal patterns from abnormal basic events. Supported Dynamic supply
Routing (DSR) protocols, subverted et al. [7] propose to put in additional
facilities, watchdog and path rater, to spot and reply to routing misbehaviors
during a painter. In information transmission processes, a node might
misdemeanor by agreeing to forward packets so fail to try to so. Contemplate
the instance illustrated in Figure.2 to know the watchdog approach. Suppose a
path exists from a supply node S to a destination node D through intermediate
nodes A, B, and C. Node A will take in node Bs transmissions. Node A cannot
transmit on to node C and should undergo node B. To sight whether or not node
B is mischievous, node A will maintain a buffer of packets recently sent by
node A. Node A then compares every overheard packet from node B with a
buffered packet of node A to examine if there's a match. A failure tally for node
B will increase if node A finds that node B is meant to forward a packet
however fails to try to thus. If the tally is on top of one threshold, node B is
deemed to be misbehaving. Every node maintains a rating for every node it is
aware of regarding within the network. Then, a path metric is calculated by
averaging the node ratings within the path. Pathrater [7] will then choose the
IJCSBI.ORG
trail with the very best metric. Subverter et al. [7] additionally discuss many
limitations of this approach, as well as limitations ensuing from packet
collisions, false reports of node wrongdoing, and potential watchdog evasion
mechanisms. Specializing in AODV routing protocols, Tseng et al. [8] propose
a specification-based ID technique. A finite state machine (FSM) is built to
specify correct behaviors of AODV, that is, to keep up every branch of a route
request/route reply (RREQ/RREP) flow by observance all of the RREQ and
RREP messages from a supply node to a destination node. Then the created
specification is compared with actual behaviors of monitored neighbors. The
distributed network monitor passively listens to AODV routing protocols,
captures RREQ and RREP messages, and detects run-time violations of the
specifications. A tree system and a node coloring theme also are planned to
sight most of the extraordinary attacks. A tree system and a node coloring
theme are also planned to sight most of the intense attacks. Sun et al. [9]
propose employing a Markov process (MC) to characterize traditional
behaviors of painter routing tables. A MC-based native detection engine will
capture temporal characteristics of painter routing behaviors effectively. As a
result of the distributed nature of a painter, a private alert raised by one node
should be mass with others to enhance performance. Motivated by this, a non-
overlapping zone-based intrusion detection system (ZBIDS) is planned to
facilitate alert correlation and aggregation [9], as illustrated in Figure. 3.
Specifically, the full network is split into non-overlapping zones. Entree nodes
(also referred to as inter-zone nodes, i.e., those nodes that have physical
connections to completely different zones) of every zone are chargeable for
aggregating and correlating domestically generated alerts within a zone. Intra-
zone nodes when detection an area anomaly generates associate degree alert
and broadcast this alert within the zone. Solely entree nodes will utilize alerts to
get alarms, which might effectively scale back false alarms. In a ZBIDS, the
aggregation algorithmic program will scale back the warning quantitative
relation and improve the detection quantitative relation. Associate degree alert
information model conformed to intrusion detection message exchange format
(IDMEF) is also given to facilitate the ability of IDS agents. Supported this,
entree nodes will more offer a wider read of attack situations. Considering that
one amongst the most challenges in building a painter IDS is to integrate
quality with IDSs and to regulate IDS behavior, Sun et al. [10] demonstrate that
a nodes moving speed, a unremarkably used parameter in calibration painter
performance, isn't an efficient metric to tune IDS performance beneath
completely different quality models. Sun et al. then propose associate degree
adaptative theme, during which appropriate traditional profiles and
corresponding correct thresholds is elite adaptively by every native IDS
IJCSBI.ORG
through sporadically activity its native link modification rate, a planned
performance metric that may replicate quality levels. The planned theme is a
smaller amount hooked in to underlying quality models and might more
improve performance.
9 8 7
IDS IDS
IDS IDS IDS
IDS 11
IDS IDS 7 IDS
IDS1 IDS
IDS 10
IDS
IDS
IDS
2
IDS IDS
5IDS
IDS Alert
IDS IDS IDS
IDS
IDS
concentration
IDS
3
IDS 6
8 IDS point
4
IDS
3 6 Zone 5 4
IDS
3
IJCSBI.ORG
the development of a WSN IDS. What is more; completely different
applications and services motivated by WSNs demonstrate different
characteristics. Therefore, it's necessary to integrate ID approaches with
corresponding applications as a result of attacks targeted at completely different
applications and services demonstrate different manifestations. Within the
following, we tend to use two vital services of a WSN, secure aggregation and
secure localization, maybe current WSN IDS analysis efforts.
2.2.1 Challenges
The distinctive characteristics of detector nodes cause challenges to the
development of a WSN IDS. A WSN contains a restricted power offer, so
requiring energy-efficient protocols and applications to make best use of the
lifespan of detector networks. Detector nodes have tight system resources in
terms of memory and process capabilities, creating intensive calculations
impractical. Detector nodes are liable to failure. This leads to frequent
configuration changes. Also, a WSN sometimes is densely deployed, inflicting
serious radio channel competition and measurability issues. The planning of an
efficient WSN IDS should bear in mind all of those challenge.
2.2.2 Secure Localization in WSNS
Many WSN applications need that detector nodes have location info. Because
of value issues, it's still not sensible to equip each detector node with a global
positioning system (GPS) receiver. Therefore, several localization protocols are
planned to assist detector nodes to approximate their locations. To utilize
localization protocols, some special nodes, referred to as beacon nodes, usually
are used. These beacon nodes are implicit to identify with their locations and
transmit their locations to different non-beacon nodes from end to end beacon
packets. Non-beacon nodes as well approximate bound measurements (e.g.,
received signal strength indicator) supported received beacon packets. Such
measurements and therefore the location info contained in beacon packets
sometimes are brought up as location references. When non-beacon nodes bring
together an adequate amount of position references, these nodes will then
estimate their locations. Localization protocols might become vulnerable once a
WSN is deployed during a hostile atmosphere. As an example, beacon nodes
could also be compromised, so providing misinformation to mislead location
estimation at non-beacon nodes. Therefore, secure location discovery services
are needed to confirm the conventional operation of a WSN. Utilizing
preparation information of a WSN and supported the actual fact that likelihood
distribution functions of detector locations sometimes is sculptured before
preparation, Du et al. [11] propose that every non-beacon node will
expeditiously sight location anomalies by supportive whether or not calculable
IJCSBI.ORG
locations are according to the preparation information. As an example, if a
bunch of detector nodes are born out of associate degree aero plane consecutive
because the plane flies forward, traditional distributions is wont to model the
preparation distribution of this cluster of detector nodes. Every non-beacon
node will compare its calculable locations with the preparation information. If
the amount of inconsistency is on top of a predefined threshold, detector nodes
will make a decision that established position references are malicious. Liu et
al. [12] additionally propose a set of approaches to strain malicious location
references. The primary approach is predicated on minimum mean sq. error.
Supported the examination that malicious position references and category ones
are sometimes inconsistent, non-beacon nodes will cipher associate degree
inconsistency level of received location references. The inconsistency level is
delineating by a mean square error of estimation. If the mean sq. error is larger
than a threshold, non-beacon nodes may suppose that the received set of
location references is malicious. The second approach is that the voting-based
location estimation methodology. Specifically, the deployed space is split into a
grid of cells. The non-beacon node will then have each received location
reference vote on the cells during which this node might reside and so decide
however possible this node is in every cell. When the choice method, the
middle of the cells with the very best votes could also be used because the
calculable location.
2.2.3 Secure Aggregation in WSNS
Aggregation has become one amongst the specified operations for a WSN to
save lots of energy. One example of associate degree aggregation tree is
illustrated in Fig. 4. Nodes A, B N denotes completely different detector
nodes in WSNs, severally. f denotes associate degree aggregation perform
(average, sum, maximum, minimum, count, etc.). If node I is compromised, it
will send false reports to node J. However, several existing schemes are
designed while not enough security in mind and can't sight the on top of
malicious behavior. Preventing this malicious behavior is that the secure
aggregation downside. Supported applied mathematics estimation theory,
Wagner [13] proposes a hypothetical outline to model and to research the
resilient information aggregation downside. When final that unremarkably used
aggregation functions are insecure, Wagner planned victimization strong
statistics for resilient aggregation. Finally, many general techniques, like
truncation (to place higher and lower bounds on a suitable vary of a detector
reading) and trimming (for instance, to ignore the very best five percentage and
therefore the lowest five percentage of detector readings) are wont to facilitate
improve the resilience of aggregation functions. Combining prevention-based
and detection primarily based approaches, Yang et al. [14] propose Secure
IJCSBI.ORG
Hop-by-Hop information Aggregation Protocol (SDAP) for WSNs. the
planning of SDAP is predicated on divide-and-conquer and commit-and-attest
principles. Specifically, a probabilistic grouping methodology is employed to
dynamically divide nodes into multiple logical teams of comparable sizes. In
every logical cluster, a hop-by-hop aggregation is performed and one
combination is generated from every cluster. This hop-by-hop aggregation is
increased to confirm that every cluster cannot deny its committed combination.
When receiving all the cluster aggregates, the bottom station will apply
associate degree approach supported the Grubbs take a look at to spot
suspicious teams. This approach will facilitate strike outliers from received
aggregates. Finally, every cluster beneath study should participate within the
attestation method and prove the correctness of its cluster aggregates. When the
attestation method, the bottom station will calculate the ultimate combination
over all the cluster aggregates that are either traditional or have passed the
attestation method.
Motivated by analysis in laptop vision and automatic devising, Buttyn et al.
[15] propose a random sample agreement (RANSAC) paradigm for resilient
aggregation during a WSN. RANSAC is associate degree outlier elimination
technique that may handle a high share of outlier menstruation information.
Specifically, RANSAC uses as few non-attacked information as attainable to
see associate degree initial model. Presumptuous that the non-attacked
information follows traditional distributions, the RANSAC algorithmic
program uses most likelihood estimation (MLE) to estimate the parameters of
the initial model. When the initial model is determined, RANSAC tries to
enlarge the initial information set with consistent data. Outlier measurements
will then be filtered out, notwithstanding an outsized amount of detector nodes
is compromised.
Base
Station
Legend
N
Wireless sensor K J
node
Data transmission L
f
Vi Sensor H M
A (v1,v2,v3)
measurement V
I
3 G
B V
D V1
C 2
E F
Figure 4. An example aggregation tree in WSNs
IJCSBI.ORG
2.2.4 Future analysis directions
In this section, we tend to discuss future analysis directions to construct IDSs
for each MANETs and WSNs. In the system thought, IDS analysis for each
MANETs and WSNs needs a distributed design and therefore the collaboration
of a bunch of nodes to create correct choices. ID techniques additionally ought
to be integrated with existing painter and WSN applications. This needs
associate degree understanding of deployed applications and connected attacks
to deploy appropriate ID mechanisms. Attack models should be fastidiously
established to facilitate the preparation of ID ways. Also, solutions should
contemplate resource constraints in terms of computation, energy,
communication, and memory. This is often particularly vital within the context
of a WSN.
2.2.5 Extended Kalman Filter-Based Secure Aggregation for a WSN
In this section, we tend to use secure in-network aggregation issues in a very
WSN together example of a way to produce a light-weight ID mechanism [16].
In a WSN, consecutive observations of detector nodes sometimes area unit
extremely correlative in time domains. This correlation, alongside the
cooperative nature of WSNs, makes it doable to predict future ascertained
values supported previous values. Therefore, it's a viable approach to estimate
aggregative in-network values, supported the traditional profiles which will be
made. However, in apply, attributable to high packet-loss rate, harsh
atmosphere, sensing uncertainty, and different problems, it's difficult to supply
associate degree correct estimate for actual aggregative worth. Also, the dearth
of your time synchronization among youngsters and parent nodes may create
aggregation nodes use totally different sets of values for aggregation. The
complexness of existing aggregation protocols conjointly contributes to the
challenges of modeling in-network aggregative values. To construct traditional
profiles for aggregative in-network values within the face of the antecedently
mentioned challenges, solutions supported applied math estimation theory is
applied. Appropriate models should contemplate the necessity of service and
therefore the application atmosphere. For instance, suppose that we tend to
have an interest in estimating temperature values, those area unit scalar
variables. We tend to might adopt associate degree Extended Kalman Filter
(EKF) as a result of associate degree EKF will give associate degree correct
and light-weight estimation [16]. By sanctioning neighbor- watching
mechanisms, every node will use associate degree EKF to watch the behavior
of 1 of its neighbors. Associate degree EKF-based mechanism is appropriate
for WSN nodes, as a result of this mechanism will address those incurred
uncertainties in a very light-weight manner and reason comparatively correct
IJCSBI.ORG
estimates of aggregative values, that primarily based upon a standard vary, is
approximated. Utilizing a threshold-based mechanism, a promiscuously
overheard worth then is compared with a domestically computed traditional
vary to choose whether or not they area unit considerably totally different.
What is more, the monitored atmosphere demonstrates spatial and temporal
characteristics. Therefore, it's promising to integrate these characteristics into
ID model construction. For instance, there are a unit existing works that model
spatial and temporal properties of correlative information in a very WSN. It is,
therefore, fascinating to integrate these models into the development of
traditional profiles for in-network aggregative values. During this approach,
associate degree anomaly-based ID service is provided for secure aggregation
in a very WSN. A WSN usually is deployed to watch emergency phenomena
(such because the happening of a forest fire), regarding that smart nodes will
trigger necessary events and generate uncommon nevertheless necessary data.
Node collaboration is important for detector networks to create correct choices
regarding abnormal events. Therefore, for WSNs, intrusion detection modules
(IDM) and system watching modules should integrate with one another to
figure effectively [16]. Once node A raises associate degree alert on node B as
a result of a happening E, to choose whether or not E is malicious or nascent,
node A might initiate an additional investigation on E by collaborating with
existing SMMs. WSNs sometimes area unit densely deployed to collaboratively
monitor events. To save lots of energy, some detector nodes area unit
sporadically regular to sleep. Supported this, node A will get up those detector
nodes (denoted as co-detectors in Fig. 5) around node B and request from these
nodes their opinions on the behavior of node B regarding event E.
IJCSBI.ORG
False report
Fire Compromised node
Alert transmission
False report
Compromised node
Co-detectors
Base station Normal nodes
Figure 5. Collaboration between IDM and SMM to differentiate malicious events from
emergency events.
Once node A collects the knowledge from these nodes, if it finds that the bulk
of detector nodes assume that event E might happen, node A then makes a
choice that E is triggered by some emergency events. On the opposite hand, if
node A finds that the bulk of detector nodes assume that event E shouldn't
happen, then node A thinks that E is triggered by either a malicious node or a
faulty nevertheless smart node. To create a final judgment, node A will still get
up those nodes around event E and request their opinions regarding event E. If
node A finds that the bulk of detector nodes assume that event E shouldn't
happen, node A then suspects that node B is malicious.
IJCSBI.ORG
particularly necessary within the context of MANETs as a result of most
dynamics in MANETs area unit caused by quality. MANET IDSs, while not
properly considering quality, area unit susceptible to a high false positive
magnitude relation. This renders MANET IDSs less effective. Link
modification rate is wont to capture the impact of quality on IDS engines.
Supported the link modification rate, a properly trained traditional profile is
elite at totally different quality levels adaptively. victimization totally different
quality models, like random waypoint model, random drunk model, and
obstacle quality model, associate degree adaptative theme is incontestable to be
less captivated with underlying quality models and might additional cut back
the false positive magnitude relation [16]. However, the performance of the
projected adaptative theme at high quality levels still isn't pretty much as good
for sure. It is also terribly difficult to construct mobility-independent MANET
IDSs as a result of this needs the extraction of mobility-independent options.
What is more, a way to consistently check the performance of MANET IDSs
continues to be associate degree on-going work.
4. CONCLUSION
Intrusion detection systems will determine malicious activities and facilitate to
supply adequate protection. Therefore, associate degree IDS has become an
imperative part to supply defense-in-depth security mechanisms for each
MANETs and WSNs. during this article, we tend to provided associate degree
introduction to mobile unintended networks and wireless detector networks and
given challenges in constructing IDSs for MANETs and WSNs. we tend to then
surveyed existing intrusion detection techniques within the context of MANETs
and WSNs. Finally, victimization secure in-network aggregation for WSNs and
therefore the integration of quality and intrusion detection for MANETs as
examples, we tend to mentioned necessary future analysis directions.
REFERENCES
[1] F. Akyildiz et al., Wireless Sensor Networks: A Survey, Elsevier Comp. Networks,
vol. 38, no. 2, 2002, pp. 393422.
[2] Y. Zhang, W. Lee, and Y. Huang, Intrusion Detection Techniques for Mobile
Wireless Networks, ACM Wireless Networks, vol. 9, no. 5, Sept. 2003, pp. 54556.
[3] H. Debar, M. Dacier, and A. Wespi, A Revised Taxonomy for Intrusion Detection
Systems, Annales des Telecommun., vol. 55, 2000, pp. 36178.
[4] Y. Huang et al., Cross-Feature Analysis for Detecting Ad-hoc Routing Anomalies,
Proc. IEEE ICDCS 03, Providence, RI, May 2003, pp. 47887.
[5] Y. Huang, and W. Lee, A Cooperative Intrusion Detection System for Ad Hoc
Networks, ACM SASN 03, Fairfax, VA, 2003, pp. 13547.
[6] Y. Huang, and W. Lee, Attack Analysis and Detection for Ad Hoc Routing
Protocols, Proc. RAID 04, French Riviera, France, Sept. 2004, pp. 12545.
IJCSBI.ORG
[7] S. Marti et al., Mitigating Routing Misbehavior in Mobile Ad Hoc Networks, ACM
Mobicom 2000, Boston, MA, Aug. 2000, pp. 25565.
[8] C.-Y. Tseng et al., A Specification-based Intrusion Detection System for AODV,
ACM SASN 03, Fairfax, VA, 2003, pp. 12534.
[9] B. Sun, K. Wu, and U. Pooch, Alert Aggregation in Mobile Ad-Hoc Networks,
ACM WiSe 03 in conjunction with ACM Mobicom 03, San Diego, CA, 2003, pp.
6978
[10] B. Sun et al., Integration of Mobility and Intrusion Detection for Wireless Ad Hoc
Networks, Wiley Intl. J.Commun. Sys., vol. 20, no. 6, June 2007, pp. 695721.
[11] W. Du, L. Fang, and P. Ning, LAD: Localization Anomaly Detection for Wireless
Sensor Networks, J.Parallel and Distrib. Comp., vol. 66, no. 7, July 2006, pp. 874
86.
[12] D. Liu, P. Ning, and W. Du, Attack-Resistant Location Estimation in Sensor
Networks, ACM/IEEE IPSN 05, Los Angeles, CA, Apr. 2005, pp. 99106.
[13] D. Wagner, Resilient Aggregation in Sensor Networks, ACM SASN 04, Washington
DC, 2004, pp. 7887.
[14] Y. Yang et al., SDAP: A Secure Hop-by- Hop Data Aggregation Protocol for Sensor
Networks, ACM Mobihoc 06, Florence, Italy, 2006, pp. 35667.
[15] L. Buttyn, P. Schaffer, and I. Vajda, RANBAR: RANSAC Based Resilient
Aggregation in Sensor Networks, ACM SASN 06, Alexandria, VA, 2006, pp. 8390.
[16] B. Sun et al., Integration of Secure In-Network Aggregation and System Monitoring
for Wireless Sensor Networks, IEEE ICC 07, Glasgow, U.K., June 2007.
IJCSBI.ORG
R M. Chandrasekaran
Department of Computer science and Engineering,
Annamalai University, Annamalai Nagar -608002.
ABSTRACT
The transition from Web 2.0 to Web 3.0 has resulted in creating the dissemination of social
communication without limits in space and time. Sentiment analysis has really come into its own
in the past couple of years. Its been a part of text mining technology for some time, but with the
rise in social media popularity, the amount of unstructured textual data that can be used as a
machine learning data source, is enormous. Marketers use this data as an intelligent indicator for
customer preferences. This paper aims to evaluate the performance of sentiment mining classifiers
for problems of unbalanced and balanced large data sets for three different products. The
classifiers used for sentiment mining in this paper are Support Vector Machine (SVM), Nave
bayes and C5.The results shows that the performance of the classifiers depends on the class
distribution in the dataset . Also balanced data sets achieve better results than unbalanced datasets
in terms of overall misclassification rate.
KEYWORDS
Sentiment, opinion, SVM, classifiers, balanced, imbalanced.
1. INTRODUCTION
Sentiment analysis is a part of text mining technology, but with the rise in social
media popularity, the amount of unstructured textual data that can be used as a
machine learning data source, is enormous. Sentiment analysis is understanding
the meanings and feelings behind statements made in social media and other
forums (Pang, Bo., & Lee, L.,2004, Kunpeng Zhang et al, 2010, X. Fu et al,
2013). Public opinions and sentiments can have major impact on our society.
They can affect the sales of products, the change of government policy, and even
people's vote in elections. Thus, it is of high significance to study sentiment
analysis also known as opinion mining. In the age of the Web, more and more
IJCSBI.ORG
people choose to express their opinions on a wide range of topics on the Web in
the forms of blogs, product/service reviews, and comments (A. Balahur et al,
2012). The amount of data exchanged over social media is witnessing a major
growth in the last few years. Opinion mining at both the document level and
sentence level has been too coarse to determine precisely what users like or
dislike (Turney, P. D. 2002). In order to address this problem, sentiment mining at
the attribute level is aimed at extracting opinions on products specific attributes
from reviews in this work (Magdalini et al, 2012). Various studies in different
domains investigated extracting sentiment information from this exchanged data.
Less attention was directed toward studying the effect of class imbalance problem
in sentiment mining. In recent years, class imbalance problem has emerged as one
of the challenges in data mining community. This situation is significant since it is
present in many real-world classification problems.
Previous studies have used a balanced dataset, however in the product domain it is
commonly the case that the ratio of positive and negative reviews is unbalanced,
therefore this paper focuses on and investigating the effects of the size and ratio of
a dataset. The proposed system architecture takes customer reviews as input to
each of the classifiers and outputs the dataset split into positive and negative
reviews.
In this work, we analyze the performance of three different classifiers like SVM,
Naive Bayes (NB) and C5 for sentiment mining. The classification model uses
product attributes as features. The models are empirically validated using review
data sets of nokia, ipod and nipon camera.To analyse the effect of class
distribution two data models are developed. Model A using balanced class
distribution i.e. equal number of positive and negative classes. Model B using
unbalanced class distribution i.e. unequal number of positive and negative classes
.The results of three different classifiers are compared for both Model A and
Model B.
This paper is outlined as follows. Section 2 discusses about the related work.
Section 3 describes the proposed work used.. The various classification methods
used to model the prediction system are introduced in Section 4. The
Experimental analysis done is reported in Section 5. Section 6 summarizes the
results and Section 7 concludes our work.
2. RELATED WORK
The area of sentiment mining has seen a large increase in academic interest in the
last few years. Researchers in the areas of natural language processing, data
mining, machine learning, and others have tested a variety of methods of
automating the sentiment analysis process. A number of machine learning
techniques have been adopted to classify the reviews based on sentiment. Various
IJCSBI.ORG
machine learning methods like Support vector machines (SVM), Naive Bayes
(NB), Maximum Entropy (ME), K-Nearest neighbourhood, ID3, C5 and centroid
classifier classification have been already applied in sentiment classification.
(Songho tan et al., 2008 ; Qingliang et al., 2009; Rui Xia et al., 2011, Hassan Saif
et al, 2012). Various comparative studies have been done to find the best choice
of machine learning method for sentiment classification. As the result of a
sentiment analysis varies according to the composition method of a domain and
feature and the type of learning algorithm, a need to perform comparative analysis
arises.
Inspite of using various single classifiers, many works has been done in recent
years focussing on the combination of classifier like hybrid and ensemble
methods to improve the classification accuracy (Rudy Prabowo et al.,2009;
Whitehead et.al., 2008 ) . From the literature review done, it is also observed that
only a very few studies has been conducted in analysing the performance of
classifiers on class imbalanced condition.Most of the existing works are based on
product review datasets because a review usually focuses on a specific product
and contains little irrelevant information. These datasets have an even number of
positive and negative reviews, however in the product domain it is typical that
there are substantially more positive reviews compared to negative reviews. Our
work will therefore compare the effects of a balanced and unbalanced dataset.
The main objective of the work is to perform feature based sentiment mining to
decide whether the opinions are positive or negative. Moreover the main focus in
on evaluating the performance of various classifiers in two different data
distributions i.e. class balanced and class imbalanced.
3. METHOD
The following list describes the methodology of the proposed work
i. Identify the data sources.
ii. Create two datasets i.e balanced and unbalanced for each product.
iii. Preprocess the data to remove noise and redundancy.
iv. Identify the features for creating a word vector model.
v. Develop two word vector model
a. Model A using balanced dataset with term presence method.
b. Model B using unbalanced dataset with term presence method.
vi. Develop the classification models
a. Nave Bayes
b. Support Vector machine
c. C5
vii. Predict the result for classification and compare with the actual results.
IJCSBI.ORG
a. Classification Methods
The following section describes about the various classification methods used in
this work. Most of the literatures showed that SVM and Naive Bayes and C5 are
perfect methods in sentiment classification.
Nave Bayes Classifier
Bayesian learning algorithms use probability theory as an approach to concept
classification. Bayesian classifiers produce probabilities for class assignments,
rather than a single definite classification. Nave Bayes classifier (NBC) is
perhaps the simplest and most widely studied probabilistic learning method. It
learns from the training data, the conditional probability of each attribute Ai,
given the class label C. The strong major assumption is that all attributes Ai are
independent given the value of the class C. Classification is therefore done
applying Bayes rule to compute the probability of C and then predicting the class
with the highest posterior probability. The assumption of conditional
independence of a collection of random attributes is very critical.
Support Vector Machines
Support Vector Machines (SVMs) are pattern classifiers that can be expressed in
the form of hyper-planes to discriminate positive instances from negative
instances. SVMs have successfully been applied to numerical tasks, including
classification. They perform structural risk minimization and identify key
"support vectors". Risk minimization measures the expected error on an
arbitrarily large test set with the given training set and SVMs non-linearly map
their n-dimensional input space into a high dimensional feature space. In this high
dimensional feature space a non-linear classifier is constructed. Given a set of
points which belong to either of two classes, a linear SVM finds the hyper-plane
leaving the largest possible fraction of points of the same class on the same side,
while maximizing the distance of either class from the hyper-plane. The hyper
plane is determined by a subset of the points of the two classes, named support
vectors, and has a number of interesting theoretical properties.
C5
C5 is one of the simplest forms of supervised learning algorithm. It has been
extensively used in many areas such as statistics and machine learning for the
purposes of classification and prediction. C5 classifiers can be generalize beyond
the training sample so that unseen samples could be classified with as high
accuracy as possible. C5s are non-parametric and a useful means of representing
IJCSBI.ORG
IJCSBI.ORG
IJCSBI.ORG
5. CONCLUSION
The major contribution of the paper has been the application of three different
machine learning algorithms to predict sentiment orientation of the review
sentences and to evaluate the effect of class distribution on classifier performance.
Three different product review datasets were utilized for this task. The results
suggest that the machine learning algorithms can be successfully applied in
sentiment mining under balanced distribution of classes. Though classifiers
perform better in balanced distribution, it has been found that among all
classifiers (c5, NB and SVM), SVM performs better in balanced and imbalanced
conditions. While many researches continue, practitioners and researchers may
apply various sampling methods for under sampling and over sampling to
construct a balanced model from an imbalanced model. We plan to replicate our
study to predict the models based on hybrid machine learning algorithms under
data imbalanced condition.
REFERENCES
[1]. A. Balahur, J.M. Hermida, A. Montoyo, Detecting implicit expressions of emotion in
text: a comparative analysis, Decision Support Systems 53 (2012) 742753.
[2]. Hassan Saif, Yulan He and Harith Alani, Semantic Sentiment Analysis of Twitter
Knowledge Media Institute, The Open University, United Kingdom, 2012.
[3]. Kunpeng Zhang, Ramanathan Narayanan, Voice of the Customers: Mining Online
Customer Reviews for Product, 2010.
IJCSBI.ORG
[4]. Magdalini Eirinaki, Shamita Pisal , Japinder Singh , Feature-based opinion mining
and ranking , Journal of Computer and System Sciences 78 (2012) 11751184.
[5]. M. Whitehead, L. Yaeger, Opinion mining using ensemble classification models, in:
International Conference on Systems, Computing Sciences and Software Engineering
(SCSS 08), Springer, 2008.
[6]. Pang, Bo., & Lee, L. (2004). A opinional education: Opinion analysis using
subjectivity summarization based on minimum cuts. In Proceedings 42nd ACL.
[7]. Popescu, A. M., Etzioni, O.: Extracting Product Features and Opinions from Reviews,
In Proc. Conf. Human Language Technology and Empirical Methods in Natural
Language Processing, Vancouver, British Columbia, 2005, 339346.
[8]. Qingliang Miao, Qiudan Li, Ruwei Dai , AMAZING: A sentiment mining and
retrieval system, Expert Systems with Applications 36 (2009) 71927198.
[9]. Sujan Kumar Saha, Sudeshna Sarkar, Pabitra Mitra , Feature selection techniques for
maximum entropy based biomedical named entity recognition , Journal of
Biomedical Informatics 42 (2009) 905911
[10]. Rudy Prabowo, Mike Thelwall, Sentiment analysis: A combined approach .,
Journal of Informatics , (2009) 143157.
[11]. Rui Xia , Chengqing Zong, Shoushan Li, Ensemble of feature sets and classification
algorithms for sentiment classification, Information Sciences 181 (2011) 11381152.
[12]. Songbo Tan , Jin Zhang, An empirical study of sentiment analysis for chinese
documents , Expert Systems with Applications 34 (2008) 26222629.
[13]. Turney, P. D. 2002. Thumbs up or thumbs down? Semantic orientation applied to
unsupervised classification of reviews, In Proceedings of the 40th Annual Meetings of
the Association for Computational Linguistics.
[14]. X. Fu, G. Liu, Y. Guo, Z. Wang, Multi-aspect sentiment analysis for Chinese online
social reviews based on topic modeling and HowNet lexicon, Knowledge-Based
Systems 37 (2013) 186195.
IJCSBI.ORG
Xiaofei Zhou
School of Communication and Information Engineering
Shanghai University, Shanghai, China
Abstract
A novel approach of demosaicing and super-resolution for Color Filter Array (CFA) based
on residual image reconstruction and sparse representation is proposed. Given an
intermediate image produced by certain demosaicing and super-resolution, a residual image
between a final reconstruction image and the intermediate image is reconstructed using
sparse representation. Richer edges and details are found in the final reconstruction image.
Specifically, a generic dictionary is learned from a large set of composite training data
composed of intermediate data and residual data. The learned dictionary implies a mapping
between the two data. A specific dictionary adaptive to the input CFA is learned thereafter.
Using the adaptive dictionary, the sparse coefficients of intermediate data are computed and
transformed to predict residual image. The residual image is added back into the
intermediate image to obtain the final reconstruction image. Experimental results confirm
the state-of-the-art performance in terms of PSNR and subjective visual perception.
Keywords
Demosaicing; Super-resolution; Residual image reconstruction; Sparse representation
1. INTRODUCTION
Single chip named color filter array (CFA) is used in most resource
constrained digital image/video capture devices [1]. The most popular Bayer
pattern is illustrated in figure 1. Often, a full color and enlarged image
produced from a low spatial resolution CFA are both required. Demosaicing
is executed to get a full color image and super-resolution (SR) is executed to
IJCSBI.ORG
get an enlarged spatial resolution image. Generally, there are two categories
of schemes to achieve this goal in literature: The first is to demosaic CFA
then superresolve the demosaiced CFA [2].Obviously, any approach for
general image SR can be used in SR step; The second is to superresolve
CFA then demosaic the superresolved CFA[3,4]. The major drawback of
this method is that good methods for superresolving general image are not
suitable for CFA image. Moreover, it is very difficult to design an
appropriate SR solution for complex CFA pattern and has to be design
different SR solution for different CFA pattern. The comparison indicates
the more feasibility and flexibility of the former scheme.
While plenty of both demosaicing and SR techniques have been investigated
and their combinations have obtained satisfied results [2, 5, 6], there are still
much improvement work to do. For instance, as stated in [2], after
demosaicing, if SR is implemented in multi-spectral color space
individually, the color artefacts introduced by demosaicing will be worse
during SR. As chromaticity channel is much smoother than intensity
channel [2], SR in intensity channel and chromaticity channel will get better
performance. We will also study the problem along with this direction.
In this work, a full color and enlarged image as an intermediate result is first
obtained by using certain demosaicing and SR method, and then relying on
it, a residual image making use of sparse representation is found to
complement edges and details being lack of in the intermediate image. Two
aspects are distinguished from our previous work [7]: One is a specific
dictionary adaptive to the current image is further learned to improve the
residual image reconstruction quality. The other is sparse coefficients of
intermediate data are transformed to obtain residual image instead of a
simple strategy of scaled residual image. It is necessary to point that in
essence, arbitrary demosaicing and SR techniques or their combinations are
allowed to get the intermediate image; however, better demosaicing and SR
techniques are still expected to achieve more satisfied results. Hence in this
work, an intermediate image is obtained by using methods proposed in [2].
The rest of the paper is structured as follows: section 2 outlines and
discusses the proposed method in detail; section 3 provides experimental
results; section 4 concludes the paper.
IJCSBI.ORG
procedure is performed for each training image. Third, from m and m (in
fact the Green channel of them), numerous image patch pairs in which two
IJCSBI.ORG
patches have same sizes and position at same locations are extracted; from
two patches p and p , residual patch pr is produced by pr = p p and all
, S } arg min ( 1 DS X
{D
2
S ) s.t. d Tj d j 1, j 1, 2,...K (1)
F 11
,
D ,S 2
Where D, S are dictionary and representation coefficients matrix
respectively, and X, K and denote training data matrix, the number of
dictionary atom and regularization factor respectively. Alternative scheme is
utilized to solve the function containing l1,1-norm regularization of
representation coefficient matrix: Given dictionary, sparse coefficients are
calculated and given sparse coefficients, dictionary is learned. Algorithms
used for obtaining dictionary is the one introduced in [10] and algorithms
used for obtaining sparse coefficients of each data is coordinate decent
introduced in [11].
2.2.2 Adaptive dictionary learning
To make the generic dictionary more adaptive to the content of input CFA
image, the generic dictionary is further modified according to the input CFA
image. Specifically, the input CFA is demosaiced first and the filled Green
channel is used to generate training data with the same manner as the
generic dictionary learning does. Also, the learning process is driven with
the same methods as the generic dictionary learning. Only difference lies in
the ways of dictionary initialization: random training data compose initial
dictionary for generic dictionary learning, and the learned generic dictionary
compose initial dictionary for adaptive dictionary learning.
2.2.3 Separated data representation
IJCSBI.ORG
,
x Ds d Tj d j , j1 , ,...K
1 2 (2)
Equation (2) could also been rewritten as residual part and edge part:
x r D r s
x e D e s
s1 d1e
s d e
d1e d1e , d 2e d 2e ,...d Ke d Ke s d1e , d 2e ,...d Ke 2 2
...
,
(3)
sK d Ke
D e s
d d 1, j 1,2,...K
e T
j
e
j
IJCSBI.ORG
1 e 2
s arg min ( D s ye s 1 )
s 2 2
(4)
Similar to a problem contained in (1), a convex optimization regularized by
l1 -norm is solved by coordinate decent algorithm [11]. Obviously, s plays
the same role as s shown in equation (3) so that residual patch could be
predicted as follows:
s1 d1e
s2 d 2e
yr D
r
...
sK d Ke
(5)
G G G
r (6)
R G R G
B G B G (7)
3. EXPERIMENTAL RESULTS
3.1. Training set option and parameter setting
An image set provided by [9] is selected as training set. A patch size of 6*6
is selected so that the dimension of edge component is 36*4=144 and the
dimension of residual component is 36. Thus the total dimension of
complete dictionary atom is 180. The number of dictionary atom is chosen
as 1024 and the regularization factor is set as 0.5. To make a comparison
with [2], enlarging factor 2 is tested.
3.2. Testing image and parameter setting
Kodak database is chosen as testing image set, see figure 3. Overlapping pixel
number between adjacent patches is 2.
IJCSBI.ORG
1 1 0 0 0 0 - 1
0 0 0 0 0 0 0
[1 0 1]
1 0 0 1 1 0 0
IJCSBI.ORG
Figure 3. Twenty-four testing images from Kodak PhotoCD (referred to as image 1 to image
24, enumerated from left to right and top to bottom).
2.3 The residual green channel is added back into G and the final green
channel G is obtained accordingly.
3. Output: A high spatial resolution and full color image by changing
from G, RG difference and BG difference into RGB.
IJCSBI.ORG
(a) (b)
(c) (d)
Figure 4-1
IJCSBI.ORG
(a) (b)
(c) (d)
Figure 4-2
IJCSBI.ORG
(a) (b)
(c) (d)
Figure 4-3
Figure 4. Comparison of two methods for the purpose of subjective visual perception
4. CONCLUSION
In this paper, a novel scheme of demosaicing and SR for CFA via residual
image reconstruction and sparse representation is presented. Using a training
image set, a mapping between edge of filled and superresolved Green
channel and corresponding residual image is obtained by dictionary
learning. Given an intermediate Green channel, edges are extracted from it
and sparse coefficients are searched using edge part in dictionary. The
transformed sparse coefficients are utilized to linearly combine residual part
of the dictionary to generate the residual image. Finally, the residual image
is added back into the intermediate Green channel to produce a final
reconstruction Green channel. Intermediate RG and BG channels are
retained. The proposed scheme is capable of improving reconstruction
quality of arbitrary demosaicing and SR method. The experimental results
have demonstrated the state-of-the-art results in both PSNR and subjective
visual perception.
IJCSBI.ORG
5. ACKNOWLEDGMENTS
The authors are grateful to Professor Shuozhong Wang for his assistance in
improving the language usage.
REFERENCES
[1] R. Lukac, K.N. Plataniotis. Color filter arrays: design and performance analysis, IEEE
Transactions on Consumer electronics, 51 (4) (2005) 12601267.
[2] L. Zhang, David Zhang. A joint demosaicingzooming scheme for single chip digital
color cameras, Computer Vision and Image Understanding 107 (2007) 1425.
[3] K.-L. Chung et al. New joint demosaicing and zooming algorithm for color filter array,
IEEE Transactions on Consumer electronics, 55 (3) (2009) 14771486.
[4] R. Lukac, K.N. Plataniotis, D. Hatzinakos. Color image zooming on Bayer pattern,
IEEE Transactions on Circuits and Systems for Video Technology 15 (2005)
14751492.
[5] X.Li, Bahadir Gunturk, L.Zhang. Image demosaicing: a systematic survey,
Proceedings of the SPIE, Vol 6822(2008) 68221J-68221J-15.
[6] G. Cristobal et al. Superresolution imaging: a survey of current techniques.
Proceedings of the SPIE, Vol 7074(2008) 70740C.
[7] G.Sun, Y.Chen, Z.Shen.Demosaicking and zooming for Color Filter Array via residual
image reconstruction. Proceedings of the Second International Conference on Internet
Multimedia Computing and Service, 2010, 139142.
[8] Michael Elad, Mario A.T. Figueiredo. On the role of sparse and redundant
representations in image processing, Invited Paper, Proceedings of the IEEE Special
Issue on Applications of Sparse Representation and Compressive Sensing,
2010,972982.
[9] J.Yang, J.Wright, T.Huang, Y.Ma. Image super-resolution via sparse representation.
IEEE Transactions on Image Processing, 19(11) (2010) 28612873.
[10] M. Yang, L. Zhang, J. Yang and D. Zhang. Metaface learning for sparse representation
based face recognition. In ICIP, 2010.
[11] Jerome Friedman, Trevor Hastie, and Rob Tibshirani. Regularization paths for
generalized linear models via coordinate descent. Journal of Statistical Software, 33(1)
(2010) 122.
IJCSBI.ORG
Mohammad Shojaee
MS degree in governmental management, Arak, Iran.
Mohammad Tavakolian
MS degree in governmental management, Arak, Iran.
Majid Assarian
MS degree in governmental management, Arak, Iran.
ABSTRACT
According to considerable increase in house price especially in recent years,
organizations and departments are trying to resolve this need at least in their
employees. Based on this fact, in many of organizations, units or better to
say companies called cooperation housing companies are established which
try to resolve this need in their staffs. On the other hand, organizations are
analyzing and evaluating their performance continuously. But this kind of
evaluation is not correct because of relatively high economic turbulence in
recent years and so regression and improving of organizations or companies
are not done correctly. And also this evaluation has to be in comparison
with opponents to be more reliable. In this paper, based on these facts, at
first experts opinion and performed researches in this field have been
identified and then with establishment of communication network among
these criteria, using the ANP (Analytic Network Process) model. We have
evaluated and weighted these criteria and ultimately, suggestions have been
proposed in order to improve efficiency of evaluation criteria and also Mehr
housing cooperation.
Keywords
Mehr Housing Cooperative, Evaluation Criteria, Matrix of Paired
Comparisons, ANP Model.
IJCSBI.ORG
1. INTRODUCTION
In the modern era, considerable developments in management science,
existing evaluation network is unavoidable so that lack of evaluation
network in different dimensions of organization include using resources and
facilities, staffs, objectives and strategies is considered as one of symptoms
of organization illness.
Each organization in order to know utility average and quality of his
activities especially in complicated and active environment, has an urgent
needs to evaluation network. On the other hand, lack of control and
evaluation network in one system means lack of communication with
internal and external environment which its consequences are oldness and
finally organization death. Its possible that incidence of organization death
is not felt by organization top managers due to not sudden occurrence. So,
studies show that lack of feedback network makes possibility of necessity
reform for growth and improvement in organization activities, impossible.
The consequence of this phenomenon is organization death Performance
evaluation matter has challenged researchers and users for many years. In
the past, Trade organizations were considering financial indicators as
performance evaluation instrument Until Kaplan & Norton in early 80
decade, after investigation and evaluation of management systems, revealed
many of inefficiencies of this information for performance evaluation in
organizations that this inefficiency is resulted from increase in organization
complication, environment mobility and market competition.(Kaplan &
Norton,1992) .
Current research using ANP method and having mentioned approach
identifies functional dimensions of active housing cooperation companies in
Arak city and determines importance of each effective factor.
2. RELATED WORKS
IJCSBI.ORG
relevant elements within a network or sub-network. For each control
criterion, the clusters of the system with their elements are determined. All
interactions and feedbacks within the clusters are called inner dependencies
whereas interactions and feedbacks between the clusters are called outer
dependencies (Saaty, 1999). Inner and outer dependencies are the best way
decision-makers can capture and represent the concepts of influencing or
being influenced, between clusters and between elements with respect to a
specific element. Then pairwise comparisons are made systematically
including all the combinations of element/cluster relationships. ANP uses
the same fundamental comparison scale (1-9) as the AHP. This comparison
scale enables the decision-maker to incorporate experience and knowledge
intuitively (Harker and Vargas, 1990) and indicate how many times an
element dominates another with respect to the criterion. It is a scale of
absolute (not ordinal, interval or ratio scale) numbers. The decision maker
can express his preference between each pair of elements verbally as
equally important, moderately more important, strongly more important,
very strongly more important, and extremely more important. These
descriptive preferences would then be translated into numerical values 1, 3,
5, 7, 9, respectively, with 2, 4, 6, and 8 as intermediate values for
comparisons between two successive judgments. Reciprocals of these
values are used for the corresponding transposed judgments.
IJCSBI.ORG
3. METHODOLOGY
Criteria
IJCSBI.ORG
Cluster1: Housing criteria
A1: The amount of state funds (mortgage) allocated to each
applicant.
A2: During of time preparing each apartment (Days ofthe projectdivided to
the numberof apartments).
A3: Number of apartments in each flat.
A4: Measurement of each apartment.
A5: Condominium rate based on square meter.
A6: Number of built blocks.
A7: Final cost of each square meter of residential apartments (total payment
divide to measurement of each apartment).
Cluster2: Company criteria
B1: Number of cooperative members.
B2: Average of participation of cooperative members (number of meeting
hours during one month).
B3: Number of replaced managers during project.
B4: Number of people in reservation list of each company.
Cluster3: Member criteria
C1: First charge of each member.
C2: Monthly carrying charges (without considering mortgage).
C3: Cooperative member education (language variable).
IJCSBI.ORG
3.3 Example of paired comparison matrixes, calculating
consistency Rate:
A1 1 3 5 5 7 7 3
A2 0.333 1 5 3 5 5 1
A3 0.200 0.200 1 1 1 1 0.333
A4 0.200 0.333 1 1 1 3 0.333
A5 0.143 0.200 1 1 1 1 0.143
A6 0.143 0.200 1 0.333 1 1 0.143
A7 0.333 1 3 3 7 7 1
= 0.390
7
W7 = 0.333 1 3 3 7 7 1 = 2.040
[W1 + W2 + W3
Sum: 9.877
IJCSBI.ORG
3.780 1.993
WN1 = 9.877 = 0.383 WN2 = 9.877 = 0.202 Norm. W.
0.383
0.540 0.679 0.202
WN3 = 9.877 = 0.55 WN4 = 9.877 = 0.069
0.055
0.069
2.040
WN7 = = 0.207
9.877
A1 A2 A3 A4 A5 A6 A7
C1
A1 1 3 5 5 7 7 3
A2 0.333 1 5 3 5 5 1
A3 0.200 0.200 1 1 1 1 0.333
A4 0.200 0.333 1 1 1 3 0.333
A5 0.143 0.200 1 1 1 1 0.143
A6 0.143 0.200 1 0.333 1 1 0.143
A7 0.333 1 3 3 7 7 1
IJCSBI.ORG
1 [1 + 0.333 + 0
2 [3 + 1 + 0.2 +
3 [5 + 5 + 1 + 1
4 [5 + 3 + 1 + 1
5 [7 + 5 + 1 + 1
6 [7 + 5 + 1 + 3
7 [3 + 1 + 0.333
0.383 0.20
= [ +
2.352 5.93
= 7.288
7.288 7
. . = = 0.0
6
0.048
. . = = 0.036
1.32
0.900
1.197 C.I.(Consistency Index)
0.929 0.048
0.986 C.R. (Consistency Ratio)
1.061 0.036
0.986
1.229 Consistency Threshold:
IJCSBI.ORG
IJCSBI.ORG
IJCSBI.ORG
Matrix6: Normalized super matrix powered to 9.
5. RESULTS
Therefore, criteria weights which have been obtained from the first column
of final super matrix (matrix7) are equivalent to:
Table 2: Criteria weights (based on first column of matrix 7)
Content Criteria name Weight
The amount of state funds (mortgage) allocated to each applicant.
A1 0.131
IJCSBI.ORG
A6 Number of built blocks. 0.020
Final cost of each square meter of residential apartments (total payment divide to
A7 0.112
measurement of each apartment).
6. CONCLUSIONS
To other researchers who want to use this method in their researches also it
is suggested to use Data mining methods and the meta-heuristic algorithms
in order to predict future criteria which can have effect on cooperation
housing projects and also for weighting these criteria. So they can compare
their results with ours and also obtain more reliable results and use them to
prevent unexpected problems. After finishing this research and achieving
final list of Mehr housing cooperative, criteria and ranking them in Arak
city which have been mentioned previously, factors which have been
involved in success or lack of success in number of these cooperative
housing companies based on known criteria in this research are identified so
that other cooperative housing companies based on these principles , try to
resolve their disadvantages and reinforcement their advantages in order to
show higher efficiency in the future. One of the prerequisite for a
successful cooperative is that members and directors receive adequate
training. The process of developing and operating a cooperative can be
complex. Finance (annual audits, monthly financial statements, finance
mechanisms for housing); management (parliamentary procedure. personnel
matters); and the philosophies of cooperation are but a few area s in which
members should have some knowledge. Training programs must also make
members aware of their rights, responsibilities and obligations within the
cooperative organization bylaws, and house policies. The Involvement of
members does not end with the development process. Members have both a
right and a responsibility to be informed about and involved in the operation
of their cooperative. Although, directors have authority to make many
decisions on behalf of the members whom elected them. They should not
act autonomously. Directors should work with members in developing a
consensus or vision on how the cooperative is run.
IJCSBI.ORG
Once the cooperative is fully occupied and operational, it must begin
accumulating sufficient reserves to take care of contingencies. Unexpected
breakdown of equipment, uninsured property losses, a sudden increase in
the property tax bill . All lead to expenses which cash reserves are needed.
Sound financial planning calls for adequate financial reserves to be built up
year by year, so that as a building's plumbing, roof, or other systems wear
out, the cooperative can afford to replace them.
A budget is a plan for the cooperative's expected resources and expenditures
over a given period. Operating budgets are usually developed for a 1yaer
period, while capital budgets are more long-range. A housing cooperative's
budget is developed by its treasurer, the finance committee, the board and
manager, and sometimes the entire membership. Approval of a
cooperatives annual budget usually rests with the board directors, although
in some cooperatives, members may approve the budget based on a
recommendation from the board.
While many issues surface in managing cooperative housing, some issues
may be recurring. To save time and promote consistency, clear policies
should be developed on how to deal with these matters. While some of these
rules may be included in the bylaws, usually they will appear as policies in
the house policy manual. The purpose of house rules and policies is not to
put unnecessary burdens on individual members. Although the cooperative
may decide what issues to include in its house policies, several important
areas should be covered either in the bylaws or house policies.
REFERENCES
[1] Saaty, Thomas L. (2005). Theory and Applications of the Analytic Network Process:
Decision Making with Benefits, Opportunities, Costs and Risks. Pittsburgh,
Pennsylvania: RWS Publications. ISBN 1-888603-06-2.
[2] Saaty, Thomas L.; Luis G. Vargas (2006). Decision Making with the Analytic Network
Process: Economic, Political, Social and Technological Applications with Benefits,
Opportunities, Costs and Risks. New York: Springer. ISBN 0-387-33859-4.
[3] Saaty, Thomas L.; Brady Cillo (2009). The Encyclicon, Volume 2: A Dictionary of
Complex Decisions using the Analytic Network Process. Pittsburgh, Pennsylvania:
RWS Publications. ISBN 1-888603-09-7.
[4] In 2005, one book cited examples from the United States, Brazil, Chile, Czech
Republic, Germany, India, Indonesia, Italy, Korea, Poland, Russia, Spain, Taiwan, and
Turkey.
[5] Saaty, Thomas L.; Mjgan S. zermir (2005). The Encyclicon: A Dictionary of
Decisions with Dependence and Feedback Based on the Analytic Network Process.
Pittsburgh, Pennsylvania: RWS Publications. ISBN 1-888603-05-4.
[6] Harker, P.T. and Vargas, L.G. (1990), Reply to remarks on the analytic hierarchy
process,Management Science, No. 36, pp. 269-73.
[7] Saaty, T.L. (1999), Fundamentals of the analytical network process, Proceedings of
ISAHP 1999, Kobe, Japan, 12-14 August, pp. 48-63.
IJCSBI.ORG
[8] Saaty, Thomas L. (1996). Decision Making with Dependence and Feedback: The
Analytic Network Process. Pittsburgh, Pennsylvania: RWS Publications. ISBN 0-
9620317-9-8.
[9] Saaty, T.L. (2001a), Decision Making in Complex Environments: The Analytic
Network Process for Decision Making with Dependence and Feedback, RWS
Publications, Pittsburgh, PA.
[10] No. 3870 | Domestic Economy | Page 4. Irandaily. Retrieved on 2012-07-16.
[11] http://www.turquoisepartners.com/iraninvestment/IIM-Aug10.pdf
[12] Presstv.com. Retrieved on 2012-07-16.
[13] "TPC-D Frequently Asked Questions (FAQ)". Transaction Processing Performance
Council. Retrieved 9 January 2012.
[14] Mitchell, Douglas W. (2004). "More on spreads and non-arithmetic means". The
Mathematical Gazette 88: 142144.
[15] Hwang, C. L., & Yoon, K. (1981). Multiple Attribute Decision Making. Berlin:
Springer-Verlag.http://dx.doi.org/10.1007/978-3-642-48318-9.
[16] Yoon, K. P., & Hwang, C. L. (1995). Multiple Attribute Decision Making: An
Introduction. London: Sage Pub.
[17] Hazewinkel, Michiel, ed. (2001), "Linear algebra software packages", Encyclopedia of
Mathematics, Springer, ISBN 978-1-55608-010-4.
IJCSBI.ORG
Lukasz Ostrowski
Dublin City University
Glasnevin, Dublin 9, Ireland
Markus Helfert
Dublin City University
Glasnevin, Dublin 9, Ireland
ABSTRACT
Current challenges in design science research aim for consisting and detailed phases to
guide design science researchers to manage projects in the information systems field. By
having taken this challenge, we present a reference model, which serves as the foundation
to structure information in construction of business process model artefacts in design
science research. It contains activities responsible for literature review, collaboration with
practitioners, and information-modelling. In this paper we demonstrate the collaboration
with practitioners facet of the model to answer a question of how to construct a business
process model artefact with practitioners from the field. The contribution of the paper is
that application of the collaboration with practitioners activities in the context of design
science supports the quality of design science artefacts, and provides design science
researchers with choices of techniques
Keywords
Design Science, Collaboration, Business Processes.
1. INTRODUCTION
Design Science (DS) research methodology has received increased attention
in computing and information systems (IS) research [1]. It has become an
accepted approach for research in the IS discipline, with dramatic growth in
related literature [2]. However, its current stage does not offer consisting
and comprehending phases, which will guide researchers in their choice of
techniques [3]. Thus, in this paper we refer to the reference model [4] (aka
the process oriented reference model) which aims for techniques of meta-
design artefacts. We discuss and present its modelling step in the context of
business process model artefacts.
This paper is organized as follows. The next section reviews the design
science research literature and proposes its challenges and potential ways of
further development. Based on that review, the subsequent sections present
IJCSBI.ORG
the reference model that covers phases for meta-design step in DS. Then, we
elaborate in depth and demonstrate one of its phases collaboration with
practitioners activities, in the context of process oriented artefacts. Next, we
evaluate the activities by means of the Satisfaction Attainment Theory
(SAT) [5] and the elaborated solutions. This paper helps define future
directions and phases of design science methodology within the full
spectrum of information systems research approaches.
2. DESIGN SCIENCE
Design science focuses on creations of artificial solutions. It addresses
research through the building and evaluation of artefacts designed to meet
identified business needs [6]. Understanding the nature and causes of these
needs can be a great help in designing solutions [7]. Literature reflects
healthy discussion around the balance of rigor and relevance [8] in DS
research, which reflects it as a still shaping field [9].
Views and recommendations on the DS methodology vary among papers,
e.g. [10,11]. DS methodological guidelines from the precursors Hevner [8]
and Walls [12], are seldom applied, suggesting that existing methodology
is insufficiently clear, or inadequately operationalized - still too high level of
abstraction [11]. Descriptions of activities (procedures, tools, techniques)
that are needed to follow the methodology are only briefly indicated. By
having taken up the challenge, 3 main activities were identified as crucial in
the development of DS artefacts [13]. These are: literature review,
collaboration with practitioners, and relevant modelling techniques [14].
The reference model [4] examines these activities in terms of development
of meta-design artefacts [15]. For a better overview, where it fits in design
science methodology, we first introduce our understanding of the current
state of the art of DS and its artefacts.
Researchers understand artefacts as things, i.e. entities that have some
separate existence [16]. They can be in form of a construct, model, method,
and an instantiation [8]. In construction of the artefact, researchers observed
two activity layers [17]: 1) design practice that produces situational design
knowledge and concrete artefacts and 2) meta-design that produces abstract
design knowledge. Meta-design can be viewed as 2a) a preparatory
activity before situational design is started and 2b) a continual activity
partially integrated with the design practice 2c) a concluding theoretical
activity summarizing, evaluating and abstracting results directed for target
groups outside the studied design and use practices [17]. The meta-design
step concentrates on providing an optimal solution for the domain by trying
to cover the whole spectrum. The design practice refers to it, then, by
adjusting and applying it to a concrete business scenario (i.e. an
instantiation).
IJCSBI.ORG
As abovementioned, abstract and situational design knowledge can be
treated as two individual outcomes of design science. Thus, it seems
reasonable to consider two different evaluation methods for each of them;
these are artificial and naturalistic [18]. Meta-design step plays crucial role
in constructing the knowledge base for a final instantiation and its utility.
Figure 1 illustrates its place in design science research, and the general
relationship among IS artefacts [19]. The aim of the reference model was to
detail activities [13] that are carried out in that step and then use to guide the
design science researchers through it. The three 3 main activities of the
reference model were produced by comparing multiple plausible models of
reality, which were essential for developing reliable scientific knowledge
[20]
Figure 1 The Reference model in the Design Science Research Methodology - adapted
and updated from [11]
Next sections introduce the reference model, and how all activities
cooperate to achieve a desired solution. Then they elaborate and
demonstrate the collaboration with practitioners activities.
3. THE REFERENCE MODEL
The idea behind the reference model was to deliver the knowledge base,
which combines information from two processes: literature review and
collaboration with practitioners. Their main roles are to 1) gather
information related to the investigated domain of interest, and 2) represent
the information in an understandable way to the stakeholders. Before
analysis and combination of solutions from these sources take place, each
process provides its own solution. Thus, to make the analysis and
combination part more effective, the same modelling techniques in both
processes are introduced. These are the ontology engineering and domain
specific modelling language. The former gives researchers the design
rationale of a knowledge base, kernel conceptualization of the world of
interest, semantic constraints of concepts together with sophisticated
theories [21]. In the context of process oriented IS solutions, the latter
introduces business process modelling notation (BPMN). For example, if a
IJCSBI.ORG
researcher investigates a process of an employee engagement, the ontology
engineering technique will represent the gathered knowledge retrieved from
those two sources. Then, the BPMN will model it into the desired shape of a
process. Figure 2 illustrates the overview of the reference model.
Define
Research
Scope
Select
Literature Review Construct
activities from
Ontology
studies
IJCSBI.ORG
within the context of a sequence of phases that helps a group to achieve its
goal. Collaboration engineering can be viewed as facilitation, design, and a
training approach that aims to create collaboration processes supported by
tools such as group support decision systems (GDSS) [23]. This approach
was revised and modified to the level presented in the Figure 3 and
demonstrated in the following case study.
IJCSBI.ORG
participants from five companies were involved in the focus group
collaboration. They participation was voluntarily and motivated by the
opportunity to share experience and best practices between parties involved.
Finally, the resource analysis concerned the available time. Each company
dedicated 90 minutes slot for individual interviews on their site, and 5 hours
for a group meeting. One of the company provided software to facilitate
online meetings. In addition, mind map software was used to make notes
and visualize insights provided by participants. The participants roles in
their organisations were linked close either to facilitation of innovation
projects or execution.
The focus group collaboration followed the activities listed in Table 1. In
the step 0, questions for individual interviews were prepared. The questions
were split into two sections. First section was to understand and determine
participants connection to the innovation process and its measurements.
Thus, the questions were formed around their organizational units, daily
activities, main responsibilities, and personal understanding of the
innovation process. The second section referred to questions that could
allowed for further elaboration on participants expertise regarding the
desired process. For example, the questions of the second section regarded a
formal measurement methodology in place of a particular organizational
unit, people involved in innovation value measurement, milestones and
activities of measurement, as well as metrics used. These rather general
questions were later decomposed into more detail sub-questions as the
interview progressed.
Table 1 Activities Decomposition
Activity of Collaboration
Step 0. Questions preparation
A1 Analyse findings from the literature review, participants profiles, and the
scope.
Step 1.Getting individual participants perspective
B1 Individual contextual interviews to understand participants expertise
B2 Individual domain interviews to gain process relevant activities from the
participants context
B3 Transcript of the interview to summarize and authorize the information
Step 2. Initial analysis
C1 Group activities from domain interviews
Step 3. Focus group meetings
D1 Getting the participants to know each other
D2 Presenting findings from the interviews
IJCSBI.ORG
D3 Grouping similar activities by participants
D4 Revision of all activities by participants
D4 Consolidation of the Process
Step 4. Conclusion
E1 Summary of the focus group achievements in relation to the scope of the
collaboration.
In the step 1, the interviews with each participant of the focus group were
conducted. This phase was divided into two activities (B1-B2). First,
questions from the first sections were asked to understand and get to know a
participants expertise and perspective to the process. Hence, the researcher
followed laddering interview method and only the first section of questions
was asked. Answers were put and visualized on a mind map. There was 40
minutes allocated for this part. At many occasions participants had prepared
presentations prior to the interview and additional time was needed. These
presentation provided overview of the organisation and the context of
innovation they were into. The last 50 minutes of the interview was
dedicated to the business process investigated. As the interview was
progressing, a sketch of the process was being updated and displayed on the
mind map software in order to allow the participants to track correct
interpretations of their saying. For the B2 step, semi-structured interviews
were chosen. In addition a transcript of each interview was sent for an
authorization with a request for clarification of ambiguities that were
discovered after the interview took place.
In the step 2, all transcripts of the interviews were summarized and
distributed to all participants prior to the focus group meetings. One of the
goals was to provide all participants with the same amount of knowledge, so
that at the focus group meetings more insights could be delivered. The key
finding at this stage of the research was a clear distinguish between
measuring innovation as facilitator and technical IT. Along with the
summary of transcripts, an overview of the agenda for the focus group
meetings was provided.
The following step 3 describes activities of the focus group meetings. An ice
breaker and focus group work methods were applied. Since, some
participants could not attend the meeting in person; the meetings were
carried out through an online collaboration tool. All participants in the room
had a logged in PC to the tool and all questions and summary of answers
were put through that tool. The online tool generated reports of all typed in
words so that enhanced the analysis of the meeting at later stage. The
meeting began with an introduction of the meeting agenda followed by
allocation of 5 minutes for each participant to introduce their organisation,
IJCSBI.ORG
roles, and relation to the innovation process. This was a result of a simple
ice breaker method to catch up with each other. The participants knew each
other from the time the focus group was established. The rest of the focus
group meeting was structured accordingly to the focus group work method
[24]. Each participant was provided with the process of measuring
innovation derived from their interviews. Then, each participant presented
and described the process model to the rest of the group so that everyone got
an overview of possible perspectives to measure value of innovation
projects. Anyone was allowed to ask questions to the presenter after each
presentation. In addition, after each presentation, there was 5 minutes
brainstorming, so that some additional insights could be added to the model,
e.g. metrics, activities. Once all the business process models were presented
a poll was introduced. The most comprehensive process model was selected
as a core to which additional activities from other process models were
added. The following activity required from participants to work together to
build the business process model of measuring innovation value based on
the most voted process model and the other ones presented. The most voted
business process model was displayed and participants could make
suggestions what else should be added. If majority of participants did not
raise any objections the suggestion was added. The mind map software was
used to move activities of the process for the final consensus. The focus
group meeting ended roughly after 5 hours including 30 minutes break. For
the step 4, a short 40 minutes conclusion meeting was organized at which
the business process model for measuring innovation value was presented.
6. EVALUATION OF THE COLLABORATION
The collaboration with practitioners activities were evaluated from three
different perspectives: perceived net goal attainment, satisfaction with the
meeting outcome as well as satisfaction with the meeting process. These
three perspectives constitute the Satisfaction Attainment Theory which was
used with participants who conducted these activities and were asked to
elaborate on the business process model artefacts modelled. Participants of
these activities were stakeholders of a public organisation. The organisation
provided IT services for various departments. The practitioners in the
numbers of 9 were between 23-40 years of age (M 33, SD 2.5). The gender
was split in 5 males, and 2 females. Their work experience in the
organisation was between 3 to 9 years (M 5, SD 1.3). Their roles were
mainly business analysts from fields of information systems and computer
science. Participants took part in these activities willingly, and therefore, it
was assumed their responses to the questionnaire were genuine.
Table 2 summarizes the results of the evaluation of the meeting satisfaction.
We used 11-point Likert questions (11=best), relating to each of the
elements of the Satisfaction Attainment Theory
IJCSBI.ORG
Table 2 Evaluation of the collaboration with practitioners activities
Dimension Mean n
PerceivedNetGoalAttainment(PGA) 8.7 9
Satisfactionwith the Meeting Process(SP) 9.5 9
Satisfactionwith the Meeting 10.1 9
Outcome(SO)
The values for the means indicate a high satisfaction of the participants with
each of the three dimensions from the Satisfaction Attainment Theory. Each
element was measured by five questions in the questionnaire. All fifteen
questions can be found in the appendix A of [5].
Feedback received upon and observations made during this case study
enabled a further refinement of the reference model. Participants suggested
that the transcripts of the interviews should be in a narrative form and
divided into two documents. First document summarizes individual
interviews and is sent to relevant interviewees for approval. The second one
sums up the approved content and is distributed among the others
participant who will attend the focus group collaboration meetings. In terms
of the agenda planning, it was observed that the approximate time from the
interview taking place to the approval took around 4 elapsed weeks. Hence,
this has to be taken into account when drawing up schedules. It was
challenging to keep the meetings of the focus group in the time constraints.
Participants, from time to time happened to choose a topic for a discussion
which was not strictly related to the scope of the meeting. These situations
were handled diplomatically and the researcher role was to keep the time
allotted in mind at all time. Finally, almost all participants had some slides
already prepare prior to the interviews. Thus, the extra time for such
unexpected circumstances has to be included in the agenda of the reference
model.
The business process artefacts built with the collaboration with practitioners
activities of the reference model scored explicitly as well as the process of
execution the activities. This concludes the usage of the model for the main
purpose, which was to provide researchers with a structure way to help
conduct and communicate the research outcome with the stakeholders. We
claim that the collaboration activities of the reference model constitute a
consistent method for the meta-design phase in design science research
methodology to guide the design science researchers to manage information
systems projects.
IJCSBI.ORG
7. CONCLUSIONS
We observed challenges in structuring and standardizing phases of design
science research methodology, which would guide the design science
researchers in their choices of techniques that might be appropriate at each
stage of the project and also help them plan, manage, control and evaluate
information systems projects. We introduced how to construct a business
process model with collaboration with practitioners from the field. The
activities outlined were a part of a reference model that helps structure and
model knowledge in design science research. Our future work involves
revising the model, based on users feedback, and concentrating on
evaluation techniques of its outcome. Hopefully, this will increase the
efficiency and quality of artefacts, while containing or further decreasing the
cognitive effort involved.
8. ACKNOWLEDGMENTS
This research would not have been possible without the help of Irish
Research Council (IRC) under the Enterprise Partnership Scheme, who
gladly provided us with the resources so that we could complete our
research.
REFERENCES
1. Kuechler, B., Vaishnavi, V.: On Theory Development in Design Science Research:
Anatomy of a Research Project. Europeam Journal of Information Systems 17(5), 489-
504 (2008)
2. Carlsson, S. A., Henningsson, S., Hrastinski, S., Keller, C.: Socio-technical IS design
science research: developing design theory for IS integration management. Information
Systems and E-Business Management 9(1), 109-131 (2011)
3. Alturki, A., Gable, G. G., Bandara, W.: A Design Science Research Roadmap. In Jain,
H., Sinha, A. P., Vitharana, P., eds. : DESRIST 2011, Heidelberg, vol. LNCS 6629,
pp.107-123 (2011)
4. Ostrowski, L., Helfert, M.: Reference Model in Design Science Research to Gather and
Model Information. In : 18th Americas Conference on Information Systems, Seattle
(2012)
5. Briggs, R. O., Reinig, B. A., de Vreede, G.-J.: Meeting satisfaction for tech-supported
groups: an empirical validation of a goal-attainment model. Small Group Research 36,
585-611 (2006)
6. Hevner, A. R., March, S. T., Park, J., Ram, S.: Design Science in Information Systems
Research. MIS Quarterly 28, 75-106 (2004)
7. Van Aken, J. E.: Management Research as a Design Science: Articulating the Research
Products of Mode 2 Knowledge Production in Management. British Journal of
Management 16(1), 19-36 (2005)
8. Hevner, A. R., March, S. T., Park, J., Ram, S.: Design Science in Information Systems
Research. MIS Quarterly 28, 75-106 (2004)
9. Iivari, J., Venable, J.: Action research and design science researchseemingly similar
IJCSBI.ORG
but decisively dissimilar. In : 17th European Conference on Information Systems
(2009)
10. Baskerville, R., Pries-Heje, J., Venable, J.: Soft Design Science Methodology. In :
DESRIST 2009, Malvern (2009)
11. Peffers, K., Tuunanen, T., Rothenberger, M.: A Design Science Research Methodology.
Journal of Management Information Systems 24(3), 45-77 (2007)
12. Walls, J., Widmeyer, G., El Sawy, O.: Building an Information System Design Theory
for Vigilant EIS. Information Systems Research 3(1), 36-59 (1992)
13. Ostrowski, L., Helfert, M., Xie, S.: A Conceptual Framework to Construct an Artefact
for Meta-Abstract Design. In Sprague, R., ed. : 45th Hawaii International Conference
on Systems Sciences, Maui, pp.4074-4081 (2012)
14. Ostrowski, L., Helfert, M., Hossain, F.: A Conceptual Framework for Design Science
Research. In Gabris, J., Kirikova, M., eds. : Business Informatics Research, LNBIP90,
Riga, pp.345-354 (2011)
15. Walls, J., Widmeyer, G., El Sawy, O.: Building an Information System Design Theory
for Vigilant EIS. Information Systems Research 3(1), 36-59 (1992)
16. Goldkuhl, G.: Design Theories in Information Systems A Need for Multi-Grounding.
Journal of Information Technology and Application 6(2), 59-72 (2004)
17. Goldkuhl, G., Lind, M.: A Multi-Grounded Design Research Process. In Winter, R.,
Shao, L., Aier, S., eds. : Global perspectives on design science research DESRIST
2010, Berlin, vol. 6105, pp.45-60 (2010)
18. Pries-Heje, J., Baskerville, R., Venable, J.: Strategies for Design Science Research
Evaluation. In : 16th European Conference on Information Systems, pp.255-266 (2008)
19. Gregor, S., Jones, D.: The Anatomy of a Design Theory. Journal of Assoc. Information
Systems 8, 312-335 (2007)
20. Azevedo, J.: Mapping Reality: An Evolutionary Realist Methodology for the Natural
and Social Sciences., Albany (1997)
21. Mizoguchi, R.: Tutorial on Ontological Engineering. New Generation Computing
21(4), 363-384 (2003)
22. Kolfschoten, G. L., de Vreede, G.-J.: A Design Approach for Collaboration Process: A
Multimethod Design Science Study in Collaboration Engineering. Journal of
Management Information Systems 26, 225-256 (2009)
23. Dennis, A. R., George, J. F., Jessup, L. M., Nunamaker Jr., J. F., Vogel, D. R.:
Information Technology to Support Electronic Meetings. MIS Quarterly 12(4), 591-624
(1988)
24. Yin, R.: Case study research : design and methods. Thousand Oaks: Sage Publications,
California (2009)
IJCSBI.ORG
S. R. Balasundaram
Associate Professor, Department of Computer Applications
National Institute of Technology, Tiruchirappalli 620 015.
Tamil Nadu, INDIA.
Abstract
IJCSBI.ORG
1. INTRODUCTION
News is an entity through which an individual can come to know what had
happened as well as what is happening around him/her. With the enhanced
technologies of World Wide Web, the methods adopted to read news
content have changed dramatically from the traditional model of news
reading through physical news paper to access millions of web sources via
internet. News service providers such as Google News, Yahoo News, etc
collect news from various sources and provide an aggregate view of news to
the users around the world. The available news documents make the user to
feel that they are overloaded with lot of news contents. To overcome this
issue, it is a challenging task to find out the users choice of interests in
reading the news articles. In response to this challenge, information filtering
is a technology that helps the user to retrieve what they need. Based on a
profile of user interests and preferences, systems recommend items that may
be of interest to the user [1]. Especially, educational news correspond to
various items such as knowing about education articles,
universities/institutions, courses, reading materials, events etc.
In the present web scenario, recommendation systems play a vital role in
delivering the required news to the required users[2]. Content based
recommendation is one of the often used recommendation methods. Several
content based recommenders for news personalization deploy TF-IDF and
cosine similarity measure. Many times a keyword (term) may be useful to
extract more number of documents. Combined with vector space model this
approach may recommend more news items to any user. In order to obtain
news documents pertaining to the related terms of a keyword, an
enhancement to TF-IDF is suggested in this paper. We refer to CTF-IDF
and WCTF-IDF as classification method that combines the key concepts of
term based traditional based classification.
When employing user profiles that describe users' interest based on the
previously browsed items or profile, these can be translated into vectors
ofTF-IDF weights [3]. Combining the related terms for certain term can be
grouped based on concepts or categories. User profiles are used to extract
the required terms from the data source based on interest terms. One of the
strategies to obtain user terms is through browsing pattern [4]. In this
approach user may search or click only specific terms related to his/her
areas of interest. For example a user interested in knowing about
conferences may browse the keyword conference. The web portal may
IJCSBI.ORG
provide results based on conference or few related terms thereby improving
the results of the user. The proposed method is tested under Athena, an
extension of Hermes Framework.
The structure of the paper is as follows. Related work is discussed in
Section 2 followed by methodology in Section 3 and 4. In Section 5,
proposed classification method is discussed. Athena framework which is the
implementation of Hermes News Portal is discussed in Section 6. In section
7 results of the proposed method is discussed followed by conclusion in
Section 8.
2. LITERATURE REVIEW
IJCSBI.ORG
made by the major search engines and portals considers only the issue of
viewing already categorized content according to the users interests.
Classification is a methodology to classify documents of varied domains
based on a particular interest of a user. There are numerous classification
methods as shown in figure 1.
Classification
Weighting Methods
Traditional Content Semantics Based
Based classification classification
Weighting Probabilistic
Scheme Weighting
IDF TF
weighting Weighting
IJCSBI.ORG
to the number of times a word appears in the document. But, it is offset by
the frequency of the word in the corpus, which helps to control the fact that
some words are generally more common than others. TF IDF is the
product of two statistics, term frequency and inverse document frequency.
Various ways for determining the exact values of both statistics exist. In the
case of the term frequency tf(t,d), the simplest choice is to use the raw
frequency of a term in a document, i.e. the number of times that term t
occurs in document d. If we denote the raw frequency of t by f(t,d), then the
simple tf scheme is tf(t,d) = f(t,d).Stop words are filtered from the
documents before calculating the TF-IDF values. The remaining words are
stemmed by a stemmer. Finally the term frequency is calculated which
indicates the importance of a term within a document. In figure 2 term
based classification approach is shown.
Keywords
Stemmer
Term Frequency
IJCSBI.ORG
no enhancement possibility to develop the results of the classification. But,
when the documents are classified to a specific category a solution for this
existing issue can be addressed. Yan lee et al. (2010) proposed that a user
can manually subscribe to a subset of a large number of pre-defined text
(news) documents. The set of pre-defined categories is usually static and it
corresponds to the categories assigned to the news providing pages when
they are first created. In other words, the subscription-based personalization
approach is rather straightforward and does not require much classification
efforts. Most of the web sites achieve news personalization by adopting the
subscription approach, e.g. Newscan-online.
A good number of news recommender systems are available that act based
on content and semantics. News Dude [9] is a recommender system that
combines both TF-IDF and Nearest Neighbor algorithm. The system
considers entire text for recommendation process in case of both short term
interests and long term interests. In case of Daily Learner [10, 11, 12] users
specify their items of interest. The vector representation of news article is
processed with TF-IDF. Using cosine similarity the article is matched with
user profile and Nearest Neighbor algorithm is used to analyze the most
recently rated news for short term interests [13, 14, 15, 16]. Nave Bayes
Classifier is modeled for long term based interests. Personalized
Recommender System (PRES) is based on content based filtering,
combining TF-IDF and cosine similarity. User interests are updated
whenever the user browses a new item [17, 18, 21, 22].
3 EDUCATIONAL PORTALS
There are numerous education portals available. Each of such portals
intends to provide education related news articles. www.openequalfree.org,
www.citylimits.org, www.self.org, www.ngopost.org, www.reapchild.org
etc. are few most prominent education portals. Based on these web portals
we have considered a corpus of 4876 web documents. All these documents
are pertaining to various categories. The categories we have considered are
academic events (workshop, conference, seminar), job fair (online test,
interview, evaluation), reading material (journal, video, audio) and
admissions (institutions, courses, specialization)
Constructing the profile plays a key role in identifying the interest of a user
so that documents can be recommended more accurately. There are two
methods of user profile construction namely Explicit and Implicit. In the
explicit method, the user is asked to select the interest keywords of his/her
IJCSBI.ORG
choice from a list of keywords provided. This enables the system to
recommend news based on the interest terms. In the implicit method, the
browsing pattern is used for extracting user interests.
IJCSBI.ORG
User Profile
Web page
TF-IDF 1
Documents
CTF-IDF Web page
2
WCTF-IDF
Web page
3
Web page
4
U1 C1i1, C2i3
U4 C1i2, C3i6,C4i10
U5 C1i4, C2i12
IJCSBI.ORG
Table 2 illustrates user profile based interests with their categories. The
interest terms are generated based on users long term and short term
interests.
Table 3: Categories and their interest terms
C1 i1-workshop;
(Academic events) i2-conference; i3-seminar
C2 i1-online test;
(Job fair) i2-interview; i3-evaluation
C3 i1-journal;
(Reading material) i2- audio; i3-video
C4 i1-nstitutions; i2-courses;
i3-specialization
Table 3 illustrates the categories and possible interest terms in the category.
In Implicit method for User 1, documents belonging to his/her interest terms
are delivered first followed by the other documents belonging to the rest of
the interest terms. Likewise for all users the documents are delivered.
5. PROPOSED CLASSIFICATION METHODS
The CTF-IDF recommender primarily uses a vector for each item, and
calculates weights for each category terms, instead of going through all the
terms. Then, it stores the calculated weights (together with the
corresponding terms) of a news item in a vector.
The user profile is also a vector of CF-IDF weights, which can be compared
with a news item vector by using cosine similarity. Weights of CTF-IDF are
computed as shown below. First we calculate the Category Frequency, cfi; j
, which is the occurrence of a category ci in document dj , ni;j , divided by
the total number of occurrences of all category in the document.
,
, = ,
3
IJCSBI.ORG
Subsequently, we calculate the Inverse Document Frequency.We take the
total number of documents, jDj, and divide it bythe number of documents in
which the category ci appears,then taking the logarithm of this division, i.e.,
= 4
:
Therefore,
, = , 5
, = , , 1
2
= 2
Using equation 2, the classification of a word or term is carried out using the
proposed WCTF-IDF approach. The first term namely log [a] is used to
calculate a term by identifying the occurrences of that particular word in a
document. In the first term the denominator value is low if the term appears
more number of times in the document considered for classification.
IJCSBI.ORG
6. ATHENA
IJCSBI.ORG
6.2 Implementation in Athena
7. RESULTSANDDISCUSSIONS
IJCSBI.ORG
Table 5. Test Results for TF-IDF and Athena
0.016
0.014
0.012 Traditional Method
0.01
0.008 Category Terms (CTF-IDF)
0.006 Method
0.004 Weighted Category Term
0.002 (WCTF-IDF) Method
0
Precision Recall Accuracy
Recommendation Approaches
IJCSBI.ORG
0.01
Eucation News Articles
0.002
0
Precision Recall Accuracy
Recommendation Approaches
8. CONCLUSIONS
REFERENCES
[1] Chen, C. C., Chen, M. C., Sun, Y, PVA: A self-adaptive personal view agent system,
Proceedings of the seventh ACM SIGKDD international conference on Knowledge
discovery and data mining, 2008.
[3] Jiahui Liu et. al. 2010, Personalized News Recommendation Based on Click
Behaviour, In the proceedings of ACM- IUI10, February 710, 2010, China.
IJCSBI.ORG
[4] Deng-Yiv Chiu, Chi-Chung Lee and Ya-Chen Pan, A classification approach of news
web pages from multi-media sources at Chinese entry website-Taiwan Yahoo! as an
example, IEEE proceedings of the Fourth International Conference on Innovative
Computing, Information and Control, pp 1156-1159, 2009.
[5] Carreira, R., Crato, J. M., Gon?alves, D., Jorge, J. A., Evaluating adaptive user profiles
for news classification, Proceedings of the 9th international conference on Intelligent user
interfaces, 2004.
[6] Chen, Y-S., Shahabi, C.: Automatically improving the accuracy of user profiles with
genetic algorithm. In: Proceedings of IASTED International Conference on Artificial
Intelligence and Soft Computing, 2001.
[7] Katakis, I., Tsoumakas, G., Banos, E., Bassiliades, N., Vlahavas, I., An adaptive
personalized news dissemination system, In Journal of Intelligent Information Systems,
Volume 32 , Issue 2. 2009.
[8] Chee-Hong Chan et. al. 2010, Automated Online News Classification with
Personalization, In the proceddings of WWW-2009. Italy
[9] Dipa Dixit and JayantGadge, Automatic Recommendation for Online Users Using
Web Usage Mining, In the proceedings of International Journal of Managing Information
Technology (IJMIT) Vol.2, No.3, August 2010.
[10] Bardul M. Sarwar, George Karypis, Joseph A. Konstan, andJohn T. Riedl, Analysis
of recommendation algorithms for ecommerce,in Electronic Commerce, 2000.
[12] H. Suo, Y. Liu, and S. Cao, A keyword selection method based on lexical chains,
Journal of Chinese Information Processing, 20(6): 2530, 2006.
[13] ToineBogers and Antal van den Bosch, Comparing and Evaluating Information
Retrieval Algorithms for News Recommendation, InACM Conference on Recommender
Systems 2007 (RecSys 2007), pages141144. ACM, (2007).
[14] Linyuan Yan and Chunping Li, A Novel Semantic-based Text Representation Method
for Improving Text Clustering, In3rd Indian International Conference on Artificial
Intelligence (IICAI 2007), pages 17381750, (2007).
[15] Flavius Frasincar, Jethro Borsje, and Leonard Levering, A Semantic Web-Based
Approach for Building Personalized News Services, International Journal of E-Business
Research (IJEBR), 5(3):3553, (2009).
[16] Tsoumakas, G., Katakis, I., Vlahavas, I, Effective and Efficient Multilabel
Classification in Domains with Large Number of Labels, In: Proceedings ECML/PKDD
2008 Workshop on Mining Multidimensional Data (MMD'08), Antwerp, Belgium (2008).
IJCSBI.ORG
[18] Noy, N. F., McGuinness, D. L., Ontology Development 101: A Guide to Creating
Your First Ontology, Knowledge Systems, AI Laboratory, Stanford University, No. KSL-
01-05 (2001).
[19] Sure, Y., Angele J., Staab, S, OntoEdit: Guiding Ontology Development by
Methodology and Inferencing, In: Proceedings of the Confederated International
Conferences on the Move to Meaningful Internet Systems CoopIS DOA and ODBASE
2002, Lecture Notes in Computer Science, Vol. 2519. Springer-Verlag, 1205-1222 (2002).
[20] Antonellis, I., Bouras, C. and Poulopoulos, V., Personalized news categorization
through scalable text classification, 8th Asia Pacific Web Conference (APWEB 06),
(2005).