Vol 7 No 1 - November 2013

International Journal of Computer Science
and Business Informatics

(IJCSBI.ORG)
ISSN: 1694-2507 (Print)

VOL 7, NO 1
ISSN: 1694-2108 (Online) NOVEMBER 2013
IJCSBI.ORG
Table of Contents VOL 7, NO 1 NOVEMBER 2013
A Hybrid Approach for Supervised Twitter Sentiment Classification .................................................... 1

K. Revathy and Dr. B. Sathiyabhama
A Survey of Dynamic Duty Cycle Scheduling Scheme at Media Access Control Layer for Energy
Conservation ..................................................................................................................................... 1
Prof. M. V. Nimbalkar and Sampada Khandare
A Survey on Privacy Preserving Data Mining Techniques .................................................................... 1

A. K. Ilavarasi, B. Sathiyabhama and S. Poorani
An Ontology Based System for Predicting Disease using SWRL Rules ................................................... 1
Mythili Thirugnanam, Tamizharasi Thirugnanam and R. Mangayarkarasi
Performance Evaluation of Web Services in C#, JAVA, and PHP .......................................................... 1

Dr. S. Sagayaraj and M. Santhosh Kumar
Semi-Automated Polyhouse Cultivation Using LabVIEW ...................................................................... 1

Prathiba Jonnala and Sivaji Satrasupalli
Performance of Biometric Palm Print Personal Identification Security System Using Ordinal Measures 1
V. K. Narendira Kumar and Dr. B. Srinivasan
MIMO System for Next Generation Wireless Communication.............................................................. 1

Sharif, Mohammad Emdadul Haq and Md. Arif Rana
International Journal of Computer Science and Business Informatics
IJCSBI.ORG
A Hybrid Approach for Supervised

Twitter Sentiment Classification
K. Revathy
PG Scholar, Department of Computer Science and Engineering,
Sona College of Technology,
Salem, India.
Dr. B. Sathiyabhama
Professor & Head, Department of Computer Science and Engineering,
Salem, India.
ABSTRACT
Micro blogging Websites like Twitter, Facebook have become rich source of opinions. This
information can be leveraged by different communities to perform sentiment analysis.
There is a need for automatically detecting the polarity of Twitter messages. A semantic
sentiment mining system is proposed to determine the contextual polarity of a sentence.
This hybrid approach uses three different machine learning models for classifying the
sentiment as positive and negative. The system presents more significant approach towards
the contextual information in the document which is one of the drawbacks of the systems
which are available for determining contextual information. The first model uses rule-based
classification based on compositional semantic rules that identifies expression level
polarity. The second one performs sense-based classification based on WordNet senses as
features to Support Vector Machine classifier. Further to provide a meaningful
classification, semantics are incorporated as additional feature into the training data by the
interpolation method. Thus, the third model performs entity-level analysis based on
concepts obtained. The outputs of three models are handled by knowledge inference system
to predict the polarity of sentence. This system is expected to produce better results when
compared to the baseline system performance. The system aims to predict consumer moods
and the attitude in real-time which can be efficiently utilized by the firms to increase
productivity and revenue.
Keywords
WordNet, Twitter, Support Vector Machine, Interpolation, machine learning models,
features, polarity.
1. INTRODUCTION
The rapid proliferation of social networking Websites provides a new set of
challenges in mining and acquiring knowledge. Traditionally, the Internet
was perceived as information corpus, where users are passive. Social
networking sites such as Twitter, Facebook and tumblr paved the way where
1
ISSN: 1694-2108 | Vol. 7, No. 1. NOVEMBER 2013
IJCSBI.ORG
users can collaborate, form communities and share opinions on almost all
aspects of everyday life.
Among the social networking sites, Twitter recently attracted researches due
to its sudden growth. Twitter was created in March, 2006 as online micro
blogging service, which allows users to create status message called tweet .
One user can also view other users tweet by following them and can
forward tweet to their followers as retweet. The user-generated content in
Twitter is about various topics like product, event, people and political
affairs. It can be useful in decision making process by business entities and
other different communities. Twitter messages are considered as rich source
for sentiment analysis [17] due to the following reasons:
1. Tweets are of length 140 characters and are more abstract in nature
2. Real time analysis can be performed
3. Large number of tweets available to perform analysis.
Sentiment analysis aims to identify and extract opinions from user generated
content. There has been a progress in the area of sentiment analysis from
review sites to micro blogs. To perform sentiment analysis in Twitter is
challenging due to its unique features [6] like
1. The length of tweet is limited to 140 characters
2. Tweets have more misspelled words and
3. Tweets use internet slangs and emoticons
There is a need for automated techniques to perform sentiment analysis that
tags the given piece of text as positive and negative. The Twitter mining
approaches available in the literature can be broadly classified as lexicon
based and machine learning based [1][2][3] to classify tweets. The lexicon
based approaches use general bag of words model for classification [8]. The
polarity of document identified by calculating score based on the semantic
orientation of words in the dictionary. This technique provides high
precision and low recall. The lexicon based approach is not suitable for
Twitter because a lexicon does not have jargons, idioms and Twitter slangs.
The machine learning method performs by training the classifier with
labeled examples. The model will produce better classification accuracy by
training with the proper and equally distributed dataset. The sentiment
classification over Twitter by using rule-based classification [12] based on
compositional semantic rules will classify better than bag-of-words model.
The sentiment analysis will be beneficial to organizations to understand
consumer moods in real-time.
2. LITERATURE SURVEY
Sentiment detection is a task under sentiment analysis, which aims to
automatically tag the text as positive and negative. Many approaches for
2
IJCSBI.ORG
classifying sentiment based on machine learning algorithm
[1][2][3][4][5][6][7]. The opinion search engine was developed [8] to
retrieve the reviews about products. This approach gives more importance
to adjectives which directly implies sentiments. For example- Good, bad,
worst. An adjective directly implies polarity is considered as opinion words.
Based on opinion words, reviews are classified and semantic orientations of
specific features are obtained. One of the major problems in utilizing these
techniques to Twitter messages is due to its data sparseness that leads to
deal with noisy and unstructured data [6]. Twitter is a noisy medium with
specific features such as hash tags, emoticons, slangs, abbreviations, links,
target users and retweets [17].
The task of performing sentiment analysis on Twitter messages using distant
supervision where emoticons serve as noisy labels [6]. Emoticons are
removed from training data so that the classifier will learn from other
features. Subjectivity detection and polarity detection based on Meta words
and tweet syntax features [10]. Tweets are noisy and unstructured text,
where POS-tagging and parsing may not produce desired results. Discourse
relations like conjunctives, connectives, modals and conditions alter the
polarity of a sentence [9]. Incorporating discourse relation along with Bag-
of-Words model produce better accuracy on Web based applications.
The task of sentiment detection needs more than the bag-of-words and
machine learning approaches. The rule-based approach is used to classify
sentiments based on compositional semantics [12]. They used a set of seven
rules and a compose function to assign sentiments. The sentiment elicitation
system uses compositional semantic rule algorithm, numeric sentiment
identification algorithm and bag-of-words with rule-based algorithm to train
machine learning model for classifying a tweet [13]. The semantic features
are used to classify the sentiment of a document. Words are replaced by
senses with the help of WordNet [15] and the unknown concepts in the test
dataset are replaced by similar concepts in training dataset [11]. The
similarity metrics such as LIN [18], Lesk [19] and LCH [20] are used to
identify similar concepts. Another significant approach utilized the semantic
concepts as an additional feature into training dataset by interpolation
method which improves the accuracy of the classifier [14]. An unsupervised
approach for sentiment classification [22] proposed a framework for word
polarity detection based on Unsupervised WSD using wordnet and
sentiment sense inventory built from sentiwordnet. Once all the words are
disambiguated, the rule based classifier detects the polarity of the sentence.
There is no training process involved in classification.
3. SEMANTIC SENTIMENT MINER (SSM)
In this paper, a hybrid approach is proposed which uses three different
machine learning models shown in Figure 1. This system presents a
3
IJCSBI.ORG
semantic analysis on Twitter posts to analyze and classify them as positive
and negative. One of the important tasks is to identify subjective matter
based on contextual information. The polarity of a sentence is identified
based on the output of the multiple classifiers.
Compositional Random forest

Tweets semantic rules model
Knowledge
Preprocessing Coarse WSD SVM classifier inference
system
WordNet
Sentiment
Entity Naive Bayesian
Extractor classifier
Figure 1. Architecture of Semantic Sentiment Miner
In the first step tweets are preprocessed and POS tagger [21] used for
lemmatization. The contextual word polarity is identified using three
models. The first model uses Random forest trained based on compositional
semantic rules [13]. The second model uses support vector machine that
uses senses as features for classification [11]. It performs coarse word sense
disambiguation based on WordNet [15]. The third model uses nave
Bayesian classifier with semantic concepts as additional features [14]. The
entity extractor was used to extract concepts and entities. The knowledge
inference system determines the polarity of sentence as positive and
negative.
3.1 DATA PREPROCESSING

Twitter message has length limited to 140 characters [6] with slangs,
abbreviations, hyperlinks, emoticons and hashtags. The data preprocessed
by removing hyperlink, target users, stopwords and replacing emoticons by
words with emoticon dictionary from Wikipedia emoticon dictionary[23] .
The hashtag like #sad will be replaced by sad. The social media content
has misspelled words hence spell correction was done. The POS tagger [21]
will be used for tagging the words as noun, verb, adjective and adverbs.
4
IJCSBI.ORG
3.2 SEMANTIC SENTIMENT MINING SYSTEM DESCRIPTION
A semantic Sentiment Mining system is proposed which combines different
machine learning models to detect sentence sentiment polarity. Initially, the
first model identifies expression level polarity by incorporating
compositional semantics. Words interact with each other to predict the
expression level polarity [12]. The principle of compositionality refers to the
meaning of the compound expression is the function of meaning of its parts
and of the syntactic rules by which they are combined [12]. Negation words
play a significant role in flipping the polarity of a sentence and hence they
are identified as content word negator and function word negator. Consider
the sentence this week is not going as I had hoped. The word hoped
specifies positive sentiment and the negator not flips the polarity of a
sentence. The learning based approach incorporates structural inference by
compositional semantics which is done in two steps. In the first step,
polarity of the constituents in the expression is detected with the help of
lexicons. The next step is to detect the polarity of a sentence by applying
rules recursively.
The second model uses semantic features for polarity detection. Words are
replaced by senses from WordNet [15] either by manual annotation or Word
Sense Disambiguation (WSD) engine [11]. For example apple is replaced
by its synset id from WordNet. Consider the following sentences
1. He has feel for animals.
2. He felt for his wallet.
The first sentence is objective, feel gives the sense that the person has
intuition for animals and the second sentence is negative, felt gives the
sense that the person emotion. Thus, a word has different sense in the
context they appear. Manual annotation will be better than the WSD engine.
The SVM classifier [16] can be trained based on these senses as features.
The third model performs feature engineering [14] i.e., the semantics are
given as additional features in the training dataset and measures the
correlation with the concepts. Consider the tweet Dr.A.P.J.Abdul kalam
returns India. For entity Dr.A.P.J.Abdul kalam adds the semantic
concept people and to India adds the semantic concept country. This
semantic concepts as an additional feature helps in determining sentiments
of similar entities. The nave Bayesian classifier will be trained and tested
for classification. Finally, the first model will identify expression level
polarity by principle of compositional semantics. The second model takes
into account senses of every word. The third model simply includes the
knowledge as additional feature. Thus, the knowledge inference system will
detect the polarity of the sentences.
5
IJCSBI.ORG
3.3 RANDOM FOREST MODEL
Compositional semantic rule helps in learning the meaning of contextual
information for random forest. It has a rule to identify the meaning of the
sentences. The compose function provides the polarity of the compound
expression. For example, this book is not informative, the word
informative specifies the positive sentiment but the previous word not
alters the sentiment of the sentence. This is addressed by polarity (not
(arg1)) = polarity (arg1). This work is based on an algorithm in the
sentiment elicitation system proposed by Zhang et al [13]. The random
forest model is trained based on the rules given in the Table1 to classify
tweets. The compositional semantic rules are listed in the Table 1. The
compose function used to detect polarity is given in Table 2.
Table 1. Compositional Semantic Rules

Rules Example
1.polarity(not[arg1])= polarity(arg1) Not[good]{arg1}
2.polarity[VP1][NP1])=compose([V [destroyed]{VP}the
P],[NP]) [terrorism] {NP}
3.polarity([VP1]to[VP2])=compose( [Refused]{VP1}to{to}
[VP1],[VP2]) [deceive] {VP2} the man
4.polarity([ADJ]to[VP1])=compose( [Unlikely]{ADJ}to{to}
[ADJ],[VP1]) [destroy] {VP} the planet
5.polarity([NP1]in[NP1])=compose( [lack]{NP1}offing[crime]{NP
[NP1],[NP2]) 2}in rural
6.polarity([NP1][VP1])=compose([ Crime {NP1} has decreased]
VP1],[NP1]) {VP1}
7.polarity([NP1]be[ADJ])=compose [damage]{NP1}is {be}
([ADJ],[NP1]) [minimal] {ADJ}
8.polarity([NP1]in[VP1])=compose( [lack]{NP1}offing killing
[NP1],[VP1]) {VP1} in rural areas
9.polarity(as[ADJ]as[NP])=if(polari As{as}ugly {ADJ} as {as}a
ty(NP)!=0: return polarity(NP) else : rock {NP}
return polarity(ADJ)
10.polarity(not as [ADJ] as [NP] )=- That was not {not} as {as}
polarity (ADJ) [bad] {ADJ}as the [original]
{NP2}
11. If the sentence contains but , And I have never liked that
disregard all previous sentiment director, [but] I loved this
only take the sentiment of the movie.
sentence after but
6
IJCSBI.ORG
12.If the sentence contains despite I love that movie, despite the
, only the sentiment in the previous fact that I hate the director.
part of the sentence is counted
The compose function used to calculate the polarity of the expression are
given in Table 2. The output of the function will be from -2 to 2. The
sentiment of the sentence is tagged as positive for the value greater than
zero and negative for lesser than zero.
Table 2. Compose Function

Compose(arg1,arg2)= if arg1 is negative:
if arg2 is not neutral :return:
polarity (arg2) else: return -1
else if arg1 is positive and arg2 is
not neutral: return polarity(arg2)
else if polarity(arg1) equals
polarity (arg2): return 2
polarity(arg1)
else if (arg1 is positive and arg2 is
neutral) or (arg2 is positive and
arg1 is neutral): return
polarity(arg1) + polarity (arg2)
else: return 0
3.4 SVM CLASSIFIER

In general, the work in the context of supervised sentiment analysis mainly
focused on lexeme-based features for sentiment classification. WordNet
[15] is a large lexical database which provides different senses for a single
word. Replacing the word by its sense will improve the accuracy of a
sentiment classifier. The WordNet senses are better features compared to
word. Every word is replaced by its corresponding synset ID. The first digit
in ID refers to parts-of-speech and the remaining digits refer to its meaning.
Thus, the SVM classifier [16] is trained based on senses as features.
3.5 NAIVE BAYESIAN CLASSIFIER

The semantics concepts as feature for supervised sentiment classifier can
provide better classification [14]. The entity extractor like Alchemy API ,
Zemanta can be used to extract entity and concepts. The concepts are
inserted as additional features in the training data. The multinomial nave
Bayesian classifier performs the classification. Nave Bayesian classifier is a
simple probabilistic classifier. The semantic concepts are included into the
training set by interpolation method.
7
IJCSBI.ORG
The language model with the interpolation component is given by
= + (, , ) (1)
where is the original unigram model calculated via maximum
likelihood estimation. (, , ) is the interpolation component which can
be decomposed into
(, , ) = ( |) (2)
where is the j-th feature of type i , is the distribution of the
in the training data given the class C and is the distribution of
words in the training data given the feature . Both the distribution
computed via maximum likelihood estimation.
3.6 KNOWLEDGE INFERENCE SYSTEM

The Knowledge Inference system will detect the polarity of a sentence
based on majority votes by three models. The outputs of the three models
are in the form: (-2 to 2), pos/neg , pos/neg. If all models classifies tweet as
positive then the inference system declares the tweet as positive sentiment.
If two models predict tweet as positive sentiment and the other model
predict tweet as negative sentiment then the inference system predicts based
on the majority votes and declares the tweet as positive.
4. RESULTS AND DISCUSSION

The experiment is conducted on Pentium(R) Dual Core processor with
installed memory of 4.00 GB RAM .We trained the Semantic Sentiment
Miner by the twitter dataset and tested to obtain the average accuracy of the
system. The performance is evaluated by four measures. They are precision,
recall, F-measure and accuracy. The values obtained for the above-said
measures are shown in Table 3.
Table 3. Comparison of Results
Feature Positive sentiment Negative sentiment
Precision Recall F- Precision Recall F-

measure measure
senses 93.61 88.3 90.87 88.4 93.64 90.94
The Semantic Sentiment Miner performs better than baseline system

comparatively. The baseline system [9] features are unigram, bigram,
unigram with bigram and unigram with POS for different classifiers such as
Nave Bayesian (NB), Support Vector Machine (SVM) and Maximum
Entropy (ME). In figure 2 the classifier accuracy with various features are
8
IJCSBI.ORG
shown. Among all features, senses have achieved the Maximum accuracy.
The contextual information in the document is given importance with the
help of senses as feature that predicts the polarity of the sentence.
90
80
70
60 Unigram
50
Bigram
40
Unigram+Bigram
30
20 Unigram+POS
10 Senses
0
NB Max SVM
Entropy
Figure 2. Accuracy of Different Classifiers with Various Feature.
The graph presents senses as feature which predicts the sentiment at higher
accuracy. The Semantic Sentiment Miner which utilized this sense feature
for effective classification. The Twitter dataset [6] has training set with
800,000 tweets with positive emoticons and 800,000 tweets with negative
emoticons, a total of 1,600,000 tweets. The test data of 177 negative tweets
and 182 positive tweets. The STS [14] has 30,000 positive tweets and
30,000 negative tweets, a total of 60,000 tweets. The test data has 470
positive tweets and 530 negative tweets. The rule-based classification [12]
in the machine learning achieved 90.7% accuracy in classifying the
document with the compositional semantics incorporated. This semantic
Sentiment Mining system combining both the rule-based and sense-based
classification will classify the documents with higher accuracy.
The comparisons of different features in different classifiers are shown in
Table3.The bigram and POS as features are not useful and they reduced the
accuracy. When both unigram and bigram are used as features, the accuracy
increased for both NB and Max entropy and the SVM classifier shows
marginal decrease. When senses are used as features for SVM shows
85.48% [11] accuracy and NB classifier shows 83.90% accuracy [14]. The
Semantic Sentiment Miner achieves higher accuracy of 88.2%. The
Semantic Sentiment Miner outperforms all the other systems since it
identifies expression-level polarity, word polarity and also by entity-level
analysis of the document. In this system, Word Sense Disambiguation
performed by manual annotators can performs better than using WSD
engine.
9
IJCSBI.ORG
5. CONCLUSIONS
Semantic Sentiment Mining system detects the polarity of a sentence by
expression-level, by replacing words with corresponding senses and also
providing knowledge to the system. The system combines both the rule-
based approach and machine-learning algorithm (Random Forest Model,
SVM and Naive Bayesian) to classify tweets. This system will detect
polarity at the maximum accurate level since contextual information
understood better by the learning models. A lexicon used to detect polarity
of word, but fails to handle unknown words. Dictionary for content words
negators are difficult to construct. Content words negator flips the polarity
of a sentence based on the specific context. Manual annotators are required
to perform sense annotation to achieve better accuracy. Manual labeling is
one of the major drawbacks. A disambiguation engine can be designed to
perform the sense annotation as similar to manual annotation. In the future
work, the neutral tweets will be handled. Proper attention to neutral tweets
will further improve the classification. Neutral tweet can be the tweets that
appear in the headlines of the newspaper, which will be considered as
objective sentence. Neutral tweet represents the fact without any sentiments
and also helps in identifying the subjective sentences.
REFERENCES
[1] B. Pang, L. Lee, S. Vaithyanathan, (2002), Thumbs up? Sentiment classification using
Machine learning Techniques, in: Proceedings of the ACL-02 Conference on
Empirical Methods in Natural Language Processing, Volume 10, pg. 7986.
[2] K. Dave, S. Lawrence, D.M. Pennock, (2003), Mining the peanut gallery: opinion
extraction and semantic Classification of product reviews, in: Proceedings of the 12th
International Conference on World Wide Web, pg. 519 528.
[3] T. Kudo, Y. Matsumoto, (2004), A boosting algorithm for classification of semi-
structured Text, in: Proceedings of EMNLP.
[4] C. Chen, F. Ibekwe-SanJuan, E. SanJuan, C. Weaver, (2006), Visual analysis of
conflicting opinions, in: Visual Analytics Science and Technology, IEEE Symposium
On, pg. 5966.
[5] M.Annett, G. Kondrak, (2008), A comparison of sentiment analysis techniques:
polarizing movie Blogs, in: Advances in Artificial Intelligence, pg: 25-35.
[6] A.Go, R. Bhayani, L. Huang, (2009), Twitter sentiment classification using distant
supervision, in: CS224N Project Report, Stanford, pg. 112.D.
[7] Davidov, O. Tsur, A. Rappoport, (2003), Enhanced sentiment learning using Twitter
hashtags and smileys, in: Proceedings of the 23rd International Conference on
Computational Linguistics, Posters, pg. 241249.
[8] Magdalini Eirinaki , Shamita Pisal , Japinder Singh (2012), feature-based opinion
mining and ranking , Journal of Computer and System Sciences 78 (2012), pg 1175
1184.
10
IJCSBI.ORG
[9] Karan Chawla, Ankit Ramteke, Pushpak Bhattacharya, IITB-Sentiment-Analysts:
Participation in Sentiment Analysis in Twitter SemEval 2013 Task. Second Joint
Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings
of the Seventh International Workshop on Semantic Evaluation (SemEval
2013).Association for Computational Linguistics. Pg: 495500
[10] L. Barbosa, J. Feng. Robust Sentiment Detection on Twitter from Biased and Noisy
Data. COLING 2010: Poster Volume, pp. 36-44.
[11] Balamurali A , Aditya Joshi, Pushpak Bhattacharyya, Robust Sense-Based Sentiment
Classification, Proceedings of the 2nd Workshop on Computational Approaches to
Subjectivity and Sentiment Analysis, ACL-HLT 2011, pages 132138,24 June, 2011,
Portland, Oregon, USAc 2011 Association for Computational Linguistics.
[12] Y. Choi, and C. Cardie, Learning with compositional semantics as structural inference
for subsentential sentiment analysis. Proceedings of the Conference on Empirical
Methods in Natural Language Processing, pages 793801, 2008.
[13] Kunpeng Zhang, Yu Cheng, Yusheng Xie, Daniel Honbo Ankit Agrawal, Diana
Palsetia, Kathy Lee, Wei-keng Liao, and Alok Choudhary, SES: Sentiment Elicitation
System for Social Media Data, 2011 11th IEEE International Conference on Data
Mining Workshops.
[14] Saif, Hassan;He, Yulan and Alani, Harith (2012). Semantic Sentiment analysis of
Twitter. In: The 11th International Semantic Web Conference (ISWC 2012), 2012, pg-
508-524.
[15] George A. Miller. 1995. Wordnet: A lexical database for english. Comsmunications of
the ACM, 38:3941.
[16] N. Cristianini and J. Shawe-Taylor. An Introduction to Support Vector Machines and
Other Kernel-based Learning Methods. Cambridge University Press, March 2000.
[17] Pak, A. and Paroubek, P. 2010. Twitter as a Corpus for Sentiment Analysis and
Opinion Mining, in 'Proceedings of the Seventh conference on International Language
Resources and Evaluation (LREC'10)' , European Language Resources
Association(ELRA), Valletta, Malta.
[18] Dekang Lin. 1998. An information-theoretic definition of similarity. In Proc. of the
15th International Conference on Machine Learning,pages 296304
[19] Satanjeev Banerjee and Ted Pedersen. 2002. An adapted lesk algorithm for word sense
isambiguation using wordnet. In Proc. of CICLing02, pages 136145, London, UK
[20] Claudia Leacock and Martin Chodorow. 1998. Combining local context with wordnet
similarity for word sense identification. In WordNet: A Lexical Reference System and
its Application.
[21] Helmut Schmid. 1994. Probabilistic part-of-speech tagging using decision trees. In
proceeding of: Proceedings of International Conference on New Methods in Language
Processing
[22] Reynier Ortega, Adrian Fonseca, Yoan Gutierrez and Andres Montoyo.2013 .SSA-UO:
Unsupervised Twitter Sentiment Analysis. Second Joint Conference on Lexical and
Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International
Workshop on Semantic Evaluation (SemEval 2013) Association for Computational
Linguistics; pages 501507.
[23] http://en.wikipedia.org/wiki/Listofemoticons
11
IJCSBI.ORG
A Survey of Dynamic Duty Cycle

Scheduling Scheme at Media Access
Control Layer for Energy Conservation
Prof. M. V. Nimbalkar
Assistance Professor
Sinhgad College of engineering,
Vadgaon. Pune, India.
Sampada Khandare
Sinhgad College of engineering,
Vadgaon. Pune, India.
ABSTRACT
From the last few years, wireless sensor network (WSN) s have gained increasing attention
from both researcher and actual users. The sensor nodes are generally battery powered. The
critical aspects of sensor nodes are to reduce the energy consumption of the nodes, so that
the network lifetime can extended to reasonable time. WSN need to provide better
performance by reducing the sleep latency, while balancing energy consumption among the
sensor nodes. The duty cycle media access control (MAC) scheme have been proposed in
WSNs mainly to reduce energy consumption of sensor nodes. There are many scheme
developed to reduce the energy consumption to increase the life time of the sensor
networks. The scheme Dynamic Duty-cycle and Dynamic schedule assignment (DDDSA)
reduced the number of RTS and CTS packet by dynamically updating duty cycle value to
achieve energy efficiency .The scheme duty-cycle scheduling based on residual (DSR)
energy reduced the sleep latency while balancing energy consumption among sensor nodes.
Another scheme is duty-cycle scheduling based on prospective (DSP) increase in residual
energy to increase the residual energy of nodes using harvesting and DSR. DSP reduce
sensor nodes duty cycle to increase the lifetime of the network through harvesting
technique.
Keywords
Medium Access Control (MAC), Duty cycle, Scheduling, Power consumption, Energy
efficiency.
1. INTRODUCTION
Wireless sensor networking is an emerging technology that has a wide range
of potential applications including environment monitoring, smart spaces,
medical systems and robotic exploration. Network consists of large number
of distributed nodes that organize themselves into multi-hope wireless
routing to perform task. Each sensor consists of one or more sensors,
embedded processors and low power radio. The sensing, processing and
wireless communication subsystems form the wireless sensor. Each
subsystem having the different function such as sensing system sense the
data on environment change or according to application requirement,
processing system process the acquired data and also stored it in file or
ISSN: 1694-2108 | Vol. 7, No. 1. NOVEMBER 2013 1

IJCSBI.ORG
database and wireless communication subsystem can used for data
transmission over the network. Sensor nodes are normally battery operated.
The power source supplies the energy needed by the device to perform the
specified task. It is often difficult to recharge or change batteries for nodes,
because nodes may be deployed where human beings unable to reach for
example in furnace to sense the temperature etc. The network life time can
increase long enough to fulfill the application requirement, by minimizing
the energy consumption of the nodes [2].
The energy consumption is greatly affected by the communication between
nodes. So, communication protocols at different layers are designed with the
energy conservation in the mind. The medium access control (MAC) layer
has been proposed in wireless sensor networks mainly to reduce energy
consumption of sensor nodes. MAC plays a vital role for successful working
of the networks. Also it is responsible for deciding the manner of wireless
communication channel and limited resource allocation of communication
among nodes. MAC protocols must fulfill some essential factors, such as
energy efficiency, effective collision avoidance, scalability and adaptively,
minimum latency and efficient throughput of the channel and bandwidth is
the secondary in sensor networks. It plays vital role in the performance of
wireless sensor network (WSN) s. The major sources of the energy waste
are collision, overhearing and control packet overhead. The Collision
defined as when a transmitted packet is corrupted, it has to be discarded, and
follow on re-transmission increase the energy consumption. Collision
increases the latency as well. The Overhearing defined as a node picks up
packets that are destined to other nodes. The Control packet overhead
defined as RTS and CTS request consume energy too. The major source of
energy consumption is idle listening. If nothing is sensed, nodes are in the
idle mode for most of the time. Idle listening consumes approximately same
power as in transmitting and receiving mode [3].
Most of the sensor node designed to operate for long time. The node will be
in idle state for long time and consume more energy. Thus, idle listening is a
central factor of energy waste in such cases. The problems occur by idle
listening can be solved by the efficient technique called duty cycle. Duty
cycle can be defined as the ratio of the node active period to the lifetime of
the node. The most effective energy conservation operation is putting the
radio trans-receiver sleep mode or low power whenever communication is
not required Ideally, radio should off when there is no data to send or
receive and should be resume as soon as data or packets becomes available.
Depending on networks activity nodes becomes alternate between active
and sleep period. This behavior is referred as duty cycling. The duty cycle
and node radio ON time are directly preoperational to each other, so the
significantly saves the energy. The wireless communication system cam
performs the transmission and receiving of the data during active sate of the

IJCSBI.ORG
node. They consume more energy because of radio is ON continuously. In
sleeping state, they did not perform any activity, so there is minimum
energy consumption by keeping radio OFF [5].
This paper is organized as follows. Section 2 surveys the duty cycle
scheduling. Section 3 describes the contention based MAC. Section 4
discusses the different dynamic duty scheduling scheme for energy
harvesting in WSN. Finally, Section 5 concludes this paper.
2. DUTY CYCLING
A duty cycling can be achieved through two different approaches such as
topology control and power management [3]. The optimal subset of nodes
which guarantee connectivity is referred as topology control (TC). Therefore
the basic idea behind topology control is to utilize the network redundancy
to make longer the network longevity, typically increase the network
lifetime with respect to a network with all nodes always on. Here, active
nodes do not need to maintain their radio continuously on. They can switch
off radio when there is no network activity. Duty cycling operated on active
node as power management (PM) through coordinating the sleep periods of
neighboring nodes.
Hence, nodes operate on low duty cycle and consume less energy which
helps to increase the lifespan of the sensor nodes. Therefore both TC and
PM are complementary technique. PM technique can be implemented either
at the MAC layer by join together duty cycle scheme at MAC layer protocol
or by implementing protocol on the top of the MAC layer.
PM technique classified as sleep/wakeup protocols and MAC protocol with
low duty cycle. The sleep/wakeup schemes implemented as independent
protocols on the top of the MAC protocol. Independent sleep/wakeup
protocol can be further classified as on-demand, scheduled rendezvous and
asynchronous schemes.
The basic idea behind on-demand scheme is that a node wakeup when it
wants to established the communication over the network to send or receive
data. The main drawback of this scheme is how to establish the
communication among the sleeping node and the sender.
A scheduled rendezvous approach solve this problem by using the node
should wakeup only at the same time as its neighbors. The node wakeup
according to a wake schedule and remain active for short duration of the
time and goes back to sleep for next rendezvous time .The major advantage
of this scheme is that when a node is awake it is guaranteed that its
neighbors are awake to send broadcast message to all neighbors. The major
advantage of this scheme is that node and its neighbors are wake up at the
same time to establish the communication among all the neighbors.

IJCSBI.ORG
Therefore, nodes try to transmit simultaneously, thus causing a large
number of collisions. In this scheme, the size of wakeup and active periods
is fixed and does not adapt to variations in the traffic pattern and network
topology.
Duty Cycling
Topology Power
Control management
Location Connection Sleep/wakeup MAC Protocols with

driven driven protocols low Duty Cycle
On- Scheduled Asynchrono TDMA Contentio Hybrid

demand rendezvou us n Based
Synchronous Asynchronous
Figure 1. Taxonomy of approaches to Duty cycling in sensor networks

Finally an asynchronous sleep/wakeup protocol can be used. With
asynchronous protocol node can wakeup when it wants and still be able to
communicate with its neighbors. In asynchronous nodes finds the receiver
active when it wakes up, is forcing the receiver to listen periodically. The
receiver wakes up periodically and listens for a short time to discover any
potential asynchronous sender. If it does not detect any activity on channel it
returns to sleep, otherwise remains active to send/receive packets.
Exploiting cross-layer information seems to be a factor often neglected in
the design of asynchronous protocols.
There are several MAC protocols are available, but here power management
protocol with low duty cycle are consider. They are classified as TDMA-
based, contention-based and hybrid protocols. In TDMA-based MAC
protocols time is divided into frames and each frame consists of certain
number of time slots. Every node is assign to one or more slots for
transmitting or receiving packets to or from other nodes. By assigning slot
assignment algorithm and correct sizing of the protocol parameters, it is
possible to minimize energy consumption. The TDMA-based MAC protocol

IJCSBI.ORG
generally used to solve the problem associated with interfacing among the
nodes. It has many drawbacks such as limited flexibility and scalability.
Because in the real scenario there may be frequent topology changes due to
many factors and slot allocation may be problematic, so centralized
approach is adopted. Also, it needs the tight synchronization and very
sensitive to interference. It has low performance in low traffic conditions.
The contention-based MAC protocols have good performance than TDMA-
based protocols because of scalability and robustness. It introduces the
lower delay and can easily adapt the traffic conditions. Their energy
consumption is more than TDMA because of contention and collision. But
duty-cycle mechanism can help to reduce the energy consumption by using
low sleep latency.
Contention based protocols achieved duty cycling by tightly integrating
channel access functionalities with a sleep/wakeup scheme. This scheme is
not an independent of MAC protocol but it is tightly coupled. Contention-
based duty cycle MAC can be classified into two categories: synchronous
and asynchronous protocols. Synchronous approaches such SMAC [1], T-
MAC [8] all nodes listen at the same time for the sender and receiver
synchronization. Here the node broadcast their next wake up time to their
neighbor and tight time synchronization is required. Neighboring nodes
starts exchanging packets only within the common active time. These
approaches greatly reduce the idle listening time but synchronization
includes extra overheads and complexity and also nodes need to wakeup
multiple times if its neighbors are on different scheduled.
Asynchronous protocol such as XMAC [1], RI-MAC [4], RC-MAC [8], AS-
MAC [8] are not tightly time synchronous, node can operate on own
different duty cycle schedule. Receiver wakeup and if listen the preamble
over the channel, it remains in the active state till communication end
between the sender and the receiver. They achieve high energy efficiency
and remove the synchronization overhead. However they optimize packet
delivery ratio (PDA), minimize sleep latency, minimize end to end delay
(E2ED) also minimize the energy conservation in the network by using the
long preambles.
Finally, hybrid based protocols try to combine the strengths of the both
protocols while off setting their weakness. However, these techniques seem
to be complex in deployments with high number of nodes. It based on the
principle of switching the protocols behavior between TDMA and CSMA
depends upon the traffic in the network.
3. DUTY-CYCLE SCHEDULING
The energy consumption problem occurred by idle listening and solved by
duty cycle scheduling in wireless sensor networks. Tight synchronization

IJCSBI.ORG
can achieve by defining constant duty cycle over the network. The constant
duty cycle defined as static duty cycle scheduling. On the other hand,
adaption of duty cycle according to uniform traffic conditions is beneficial
in evenly distributed packet traffic which defined as dynamic duty cycle
scheduling.
3.1 Static Duty Cycle Scheduling (SDCS)
SDCS based on periodic scheduled wake up scheduled for data exchange
which consists of the sleep period and an active period. Sensor node can
communicate with each other when it in active mode otherwise in sleep
mode radio of sensor node is off so they cannot communicate with each
other. This approach reduces the idle listening time of the sensor nodes
which leads to reduce the energy consumption of the sensor nodes and
increases the lifespan of the network. Thus, scheme introduces coordinate
sleep scheduled of all nodes to maintain the network level connectivity. The
drawback of SDCS is that it increases the sleep latency in multi hope
network .So when traffic load is more, it increase number of queue packets
due to sleep latency. Many packets dropped due to buffered overflow which
increases end to end latency of node. The end to end latency is inversely
proportional to throughput. In WSN nodes generally deployed in different
location requires different duty cycle to minimize the energy consumption
and to increase the lifespan of network. For example, consider the parking
system monitoring in malls, they continuously monitors the parked vehicles
and data is continuously send to the from leaf node to sink or control system
to observed the capacity of the parking place. So nodes deploy near sink
needs to transfer the more data than leaf node. Which gives nodes deploy
near the sink required different duty cycle according to traffic conditions.
Dynamic duty cycle scheduling scheme are proposed to overcome these
problems and to increase the energy conservation over the network.
3.2 Dynamic Duty Cycle Scheduling (DDCS)
A DDCS used to solve the problems occurred in SDCS. It does not give
former knowledge about the global or local timing information and
schedules to the nodes in the network. Nodes do not need to remember the
scheduled of its neighbors. A DDCS scheme used to dynamically adjusting
listening or active period according to traffic load, for the time being, it
avoid buffer overflow. It has ability to quickly adapt the sleep/wake up
period of each single node to actual operating condition, gives the longer
lifespan of the sensor networks. DDCS scheme represent each schedule is
independent. Following table describes pros and cons of the SDCS and
DDCS.

IJCSBI.ORG
Table 1. Comparison of Static and Dynamic Duty-cycle Scheduling
Static Duty-cycle Scheduling Dynamic Duty-cycle Scheduling
(SDCS) (DDCS)
Required fixed duty cycle for Required variable duty cycle for
energy conservation. energy conservation
It cannot adapt to network condition Dynamically adjust the sleep
changes such as interference interval of each node based on the
incurred by addition work load at delay constraint and network
runtime. condition changes.
Save less energy than dynamic Save more energy while meeting
duty-cycle scheduling. end-to-end delay requirement.
4. Dynamic Duty-Cycle scheduling Scheme

Power consumption is a most important issues in WSNs. Dynamic duty
cycle scheduling scheme used to reduced the energy consumption by
decreasing the number of idle nodes though sleep/wakeup schedule. To used
dynamic duty cycle according to traffic condition, Zhang et al. [6] gives a
novel duty cycle scheme called Traffic adaptive Distanced-based Duty
Cycle Assignment (TDDCA). It assigns dynamic duty cycle value according
to traffic and distance which helps to reduce the energy consumption. But in
overlapping areas, as the numbers of nodes increase the parameter in
overlapping area calculated again and again. It cant give the better
performances. To overcome the problem associated TDDCA, Lee and Kin
gives Dynamic Phase shift (DPS) approach for reducing the energy
consumption [6]. It allows the receiver and sender can exchange the
information with each other. It also analysis the uplink and downlink
relationships to select the most suitable methods by sender or by receiver. It
can avoid collision as well as delay occurs in asynchronous scheme. It used
the fixed duty cycle vale for the network.
4.1 Dynamic Duty-cycle and dynamic schedule assignments (DDDSA)
scheme
To reduce the problem associated with TDDCA and DSP new scheme was
developed , DDDSA scheme used to reduce RTS (Request to send)/CTS
(clear to send) packets but also updates the duty cycle value according to
traffic conditions in network [7]. When there is large number of the sensor
node in network, then there is more numbers of idle listening nodes and
more duty cycle assignment. To solve these problem DDDCS scheme are
used [7]. Firstly, calculate the counter value and broadcast it to other nodes
in the network. Also at the same time, a node receives the counter value
from the neighboring nodes also. After getting the counter value from the
neighboring node it stores it in a queue. Duty cycle increase when

IJCSBI.ORG
differences between the current received RTS packets are more than original
received RTS transmission packets. This value used as priority to other
node. Duty cycle values is increase if collision occurs frequently occur in
the overlapping area. Depending on the priority it decides which node
should serve first and reduce the traffic load. There are some more scheme
are developed to reduce the energy consumption of the node one of them is
DSR which helps to reduce the energy consumption of the sensor nodes.
4.2 Duty-cycle scheduling based on residual energy (DSR) scheme
.
Figure 2. Decision graph of DSP scheme

Duty-cycle scheduling based on residual energy is used to reduce the sleep
latency, while balancing energy consumption among sensor nodes [1]. Sleep
latency defined as a sensor node which has packet to transmit is usually
required to wait for long period before transmitting the packet. Minimum
sleep latency reduces energy utilization of sensor nodes, so it helps to
increase the lifetime of sensor the networks. DSR allows each sensor node
to determine its duty cycle considering only its residual energy every time
node wakes up. Thus, duty cycle is inversely to residual energy. Calculate
the duty cycle of each node and residual energy of node. Let node number
denoted by i , Idc denotes duty cycle of node number i and Er denotes
1
the residual energy of the node .Generally, Where is duty cycle

of ith node and is residual energy of ith node. In DSR, maximum duty

cycle is denoted by , which is application specific parameter. If

less than then set to .If greater than then calculate

= ( ) where, represent the maximum .

Outcomes of the mechanism are in case where becomes equal to ,
the node I awakes all the time. As residual energy of the node ( )

decreases, duty cycle of the node ( ) increases.

IJCSBI.ORG
4.3 Duty-cycle Scheduling based on Prospective increase in residual
energy (DSP) scheme
In DSP, the residual energy of nodes can increase over time depending on
their harvesting opportunity, regardless of the progress of the energy
consumption [1]. If node adopt this prospective to increase whenever wake,

they allow to used duty cycle ( ) smaller than the current residual energy.
Let consider the one scenario, where all nodes are equipped with a solar
panel to harvest the energy and one node A the covered by shade at all
time and has no chance of harvesting, while at the same time another node
B has continuous chance of harvesting opportunity.
I1
I2

1 2
Figure 3. Decision graph of DSP scheme [1]

At start both node A and B have same residual energy such as

( ) ( ) , in DSR duty cycle of both node is ( ). In DSP, node B

can have ( ) less than node duty cycle of the node A ( ) because of the
harvesting opportunity available to node B. The residual energy of the
node B increases over time due to harvesting opportunity in spite of the
energy consumption. When node wakes up then calculates its increase in

residual energy with time( ) as, = ( )Where,

is energy harvesting rate and is energy consumption rate. DSP

allows node i to reduce ( ) of the node more aggressively than DSR.

Here, increase continuously only if greater than and greater
than otherwise its working same as DSR. Suppose consider node i

work at time c its residual energy() . So during last time

interval T ahead at time c+T and residual energy becomes (+) =

() + ( T) .
Therefore, to select smallest duty cycle which minimize the sleep latency
without losing current amount of residual energy even in T second. In

figure 3 let e1 and e2 denotes () and (+) and sets to I2 not
I1.Energy harvesting opportunities depends on the spatial temporal
variations in energy availability so to predict T value becomes difficult. If

IJCSBI.ORG
T value is not proper it gives the energy duplication. To minimize the

energy depletion DSP required to be recomputed every time when node

wakes up. The T value should large enough to allow node i to reduce

aggressively also it should small as decrease over time.
5. CONCLUSIONS
There are many scheme developed to reduce the energy consumption to
increase the life time of the sensor networks. The scheme Dynamic Duty-
cycle and Dynamic schedule assignment (DDDSA) reduced the number of
RTS and CTS packet by dynamically updating duty cycle value to achieve
energy efficiency. Both DSP and DSR have better performance.DSP gives
the higher packet delivery ratio and lower end to end delay than static duty
cycle scheduling scheme. It proves DSP scheme has better performance than
DDDSA and DSR regardless to the performance metrics because of the
aggressive behavior. In future, we expect more efficient scheme for energy
conservation by modifying the DSP scheme.
6. ACKNOWLEDGMENTS
The authors would like to thank Manish Satyavijay Oswal from TU Ilmenau
University, Germany and Prof. Pranav M. Pawar from Pune University,
India for their in-depth discussion and feedback. Also we would thank the
anonymous reviewers for their valuable feedback that helped to improve this
paper.
REFERENCES
[1] Hongseok Yoo, Moonjoo Shim and Dongkyun Kim, Dynamic Duty-Cycle Scheduling
Schemes for Energy Harvesting Wireless Sensor Networks IEEE communication letter,
Vol 16, No. 2, February 2012, page no.202-204.
[2] Jennifer Yick, Biswanath Mukherjee, Dipak Ghosal, Wireless sensor network survey,
Elsevier , Computer Networks 52 (2008), page no. 2292-2330.
[3] Giuseppe Anastasi, Macro Conti, Mario Di Francesco, Andrea Passarella, Energy
conservation in wireless sensor networks: A survey, Elsevier, Ad-Hoc Networks 7
(2009), page no. 537-568.
[4] Yanjun Sun, Omar Gurewitz, David B. Johnson, RI-MAC: A Receiver-initiated
Asynchronous Duty Cycle MAC Protocol for Dynamic Traffic Loads in Wireless
Sensor Networks, ACM Sensys, 2008.
[5] Yan-Xiao Li, Hao-Shan Shi, Shui-Ping Zhang, An Energy-Efficient MAC protocol for
wireless sensor network, 3rd International Conference on Advanced Computer Theory
and Engineering(ICACTE), 2010.
[6] Giuseppee Anastasi, Macro Conti, Mario Di Franceso, Extending the Lifetime of
Wireless sensor Networks Through Adaptive Sleep, IEEE Transactions on industrial
informatics, VOL5,NO3, August 2009.

IJCSBI.ORG
[7] Hsin Hung cho, Jian-Ming chang An energy efficient dynamic duty cycle and dynamic
schedule assignment scheme for WSNs, 2011 IEEE Asia specific Service Computing
Conference.
[8] Pei Huang, Li Xiao, Soroor Soltani, Matt W. Mutka and Ning Xi The Evolution of
MAC Protocols in Wireless Sensor Networks: A Survey, IEEE Communications survey
and tutorials, VOL 15, No 1, First Quarter 2013.

IJCSBI.ORG
A Survey on Privacy Preserving

Data Mining Techniques
A. K. Ilavarasi
Assistant Professor, Department of Computer Science and Engineering,
Salem, India
B. Sathiyabhama
Head of the Department,
Department of Computer Science and Engineering,
Salem, India
S. Poorani
PG Scholar, Department of Computer Science and Engineering,
Salem, India.
ABSTRACT
Many kinds of anonymization techniques have been in the subject of research. This paper
will present a detailed review of several anonymization techniques particularly in the area
called Privacy Preserved Data Mining. Recent experiments shown that some of the
anonymization techniques like generalization, bucketization doesnt ensure the privacy
preservation. And it is experimentally shown that slicing provides significant level of utility
and also prevents membership disclosure. Thus, detailed analysis is done on the Post
anonymization techniques and the necessity for privacy preservation is also reviewed in
detail.
Keywords:
Anonymization, Privacy preservation, data mining, k-anonymity, l-diversity.
1. INTRODUCTION
Data mining is the process of analysing data from various perspectives and
acquiring the useful information. Knowledge discovery is the ultimate goal
of data mining. Nowadays, the data through the internet and other social
media are plenty. Hence the privacy preservation deserves the serious
attention. Privacy Preservation in Data Mining (PPDM) is a novel technique
in data mining, where mining algorithms are incorporated. The significance
of PPDM varies from different perspective because while publishing the
data, the individuals identity and other details should not get disclosed. As
well the information loss due to privacy preservation highly affects the data
utility. PPDM, balance the trade-off between utility and privacy
preservation by using various anonymization techniques.

IJCSBI.ORG
2. TAXONOMY
In general, the personal identifications will be removed before publishing
the data for mining purpose. Privacy preservation is a serious issue and it
can be gained through different techniques. Figure 1 describes the taxonomy
of Privacy preservation in data mining. The three main approaches of
Privacy preservation are Perturbation, Anonymization and Cryptography.
2.1 Perturbation
The perturbation method for categorical data can be used by organizations
to prevent or limit disclosure of confidential data for identifiable records
when the data are provided to analysts for classification. Based on the needs
of privacy protection the perturbation approach will ensure the statistical
properties of the data. As the medical dataset has high probability of linking
attack, perturbation can be effectively applied to such field.
Privacy Preservation in
Data Mining (PPDM)
Perturbation Anonymization Cryptographic

approach approach methods
additive symmetric-key
k-anonymity
multiplicative
l-diversity public-key
matrix
multiplicative t-closeness cryptanalysis
categorical data
slicing cryptosystems
data swapping
resampling
micro aggregation
data shuffling
Figure 1. Taxonomy
There are different kinds of data perturbation methods available for data
protection. The methods includes additive, multiplicative, matrix

IJCSBI.ORG
multiplicative, micro aggregation, categorical, resampling, data swapping

and data shuffling, probability distribution approach and the value distortion
approach.
2.2 Data Anonymization

Anonymization reduces the risk of identity disclosure whereas the data
remains still realistic. Micro data contains information about an individual,
a household or an enterprise. Each such dataset will be having i) Personal
identification like Name, Address or Social Security Number (SSN) which
uniquely identifies an individual ii) Sensitive Attributes (SAs) like salary
and disease iii) The values of Quasi Identifiers (QI) such as Gender, Age,
Zip code will leads to identity disclosure when taken together. Two main
Privacy Preserving approaches are k-anonymity and l-diversity.
k-anonymity prevents the identification of individual records in the data and
l-diversity prevents the association of an individual record with the sensitive
value attribute. k-anonymity has the limitations of revealing sensitive
attributes and background knowledge attack. And it cannot be applied to
high-dimensional data without complete loss of utility.
2.2.1 Generalization
Generalization is one of the conventional anonymization techniques. It was
the widely used technique which replaces the QI values with less-specific
but semantically consistent value. Due to high dimensionality of the QI,
the generalization would cause high information loss. Records in the
equivalence class should be close to each other in order to avoid
information loss. Another defect is the over generalization which makes the
data useless. Effective analysis of attribute correlation also gets lost due to
separate generalization of each attribute.From an l-diverse generalized table,
an adversary can gain 1/l sensitive data of every individual.
Table 1. A 2 diverse generalize paper
age Sex Zipcode Disease
[21,60] M [10001,6000] Pneumonia
[21,60] M [10001,6000] Dispepsia
[21,60] M [10001,6000] Dispepsia
[21,60] M [10001,6000] Pneumonia
[61,80] F [10001,6000] Flu
[61,80] F [10001,6000] Pneumonia
[61,80] F [10001,6000] Dispepsia
[61,80] F [10001,6000] Pneumonia
l-diversity makes the group of k different records that all share a particular
quasi identifier. A QI-group with m tuples is l-diverse, if each sensitive

IJCSBI.ORG
value appears no more than m / l times in the QI-group. A table is l-diverse,

if all of its QI-groups are l-diverse.
Table 2. Published voter's list
name age sex Zip code disease
John 23 M 10000 Pneumonia
Peter 35 M 13000 Flu
Martin 61 F 54000 Pneumonia
The defect of generalization is for the query like
SELECT COUNT (*) from Unknown-Micro data

WHERE Disease = pneumonia AND Age in [0, 30]
AND Zip code in [10001, 20000]
Estimated answer for query A: 2 * p = 0.1
Table 2.
Age Sex Zip code Disease
[10001,
[21, 60] M Pneumonia
60000]
[10001,
[21, 60] M Pneumonia
60000]
2.2.2 Bucketization
Anatomy is one of the new techniques for publishing the sensitive data.
Anatomy protects privacy by releasing all the Quasi-Identifiers and
sensitive values in two separate tables. This technique provides effective
data analysis than the generalization. And it also achieves privacy-
preserving publication by capturing the exact QI-distribution. The
experimental results derive the highly accurate aggregate information with
the average error below 10% when compared with that of generalization.
The QIT has the schema

A1qi , A2qi ,.....Adqi , Group ID
The ST has the schema
(Group-ID, , Count)
Table 3. Quasi-entifier Table (QIT)
Age Sex Zipcode Group-id

IJCSBI.ORG
23 M 1100 1
27 M 13000 1
35 M 59000 1
59 M 12000 1
61 F 54000 2
65 F 25000 2
65 F 25000 2
70 F 30000 2
Table 4. Sensitive table (ST)

Group-ID Disease Count
1 Dyspepsia 2
1 Pneumonia 2
2 Bronchitis 1
2 Flu 2
2 Gastritis 1
David J.Martin et al. [3] describe the necessity of considering the attackers
background knowledge when discussing the privacy in data publishing. The
polynomial time algorithm was proposed to measure the sensitive
information disclosure in the worst-case. Thus the worst-case background
knowledge helps to analyse the knowledge that the attacker possess. The
background knowledge can be sanitized by two methods called
bucketization and full-domain generalization.
Bucketization, partition the set of tuples T into buckets and each sensitive
attribute will be randomly permuted within each bucket. The buckets will
provide the sanitized data with permuted values. Bucketization has better
utility than generalization but bucketization does not prevent membership
disclosure. Bucketization does not have any clear separation between QIs
and SAs.
Aggregate queries cannot be answered well with the presently available
generalization based anonymization approaches. This problem is focused in
[4], which provides a framework for accurate aggregate queries with
permutation based anonymization. This would be more accurate than
generalization based approach. Permutation based anonymization is carried
out by data swapping techniques where privacy is achieved by exchanging
the sensitive attributes and it provides high micro data utility.
Anonymization through permutation is carried out because of following
reasons. The individuals identity can be recovered by three ways: (1) the
link between the identifier and quasi-identifiers in the public database P; (2)

IJCSBI.ORG
the link between the QIs in P and those in the deidentified micro data D;
(3) the link between QIs and the sensitive value D. Breaking the
associations of the above links will ensure privacy.
Domain generalization weakens only the second and third links. Instead of
using domain generalization we can permute the association between the
quasi-identifiers and the sensitive attributes. Even if an attacker can link an
individuals identifier with tuples QI he will not be able to know with
certainty the exact value of the individuals sensitive attribute.
Table 5. 3-anonymity table after generalization- satisfies 3-diversity
Quasi-identifiers Sensitive
group-ID tuple-ID Age zip code gender Salary
1 1 [31-40] 271* * $56,000
1 2 [31-40] 271* * $54,000
1 3 [31-40] 271* * $55,000

2 4 [41-50] 272* * $65,000
2 5 [41-50] 272* * $75,000
2 6 [41-50] 272* * $70,000

3 7 [51-60] 276* * $80,000
3 8 [51-60] 276* * $75,000
3 9 [51-60] 276* * $85,000
Table 6. 3-anonymous table after permutation

Quasi-identifiers Sensitive
group-ID tuple-ID Age zip code gender Salary
1 1 40 27130 M $54,000
1 2 38 27120 M $55,000
1 3 35 27101 M $56,000
2 4 41 27229 F $65,000
2 5 43 27269 F $70,000
2 6 47 27243 M $75,000
3 7 52 27656 M $75,000
3 8 53 27686 F $80,000
3 9 58 27635 M $85,000
Slicing discussed in [5], is one of the novel techniques which better
preserves the data utility than generalization. Slicing also protects
membership disclosure than bucketization. High dimensional data can be
handled better by slicing based anonymization. Privacy is ensured by
partitioning the attributes into columns and that breaks the association of

IJCSBI.ORG
uncorrelated attributes. Data utility is preserved by preserving the highly

correlated attributes.
Slicing partitions the dataset both horizontally and vertically. The
objective of slicing is to break the association of poorly correlated attributes
among columns but the association within each column will be preserved.
Multiple matching buckets ensure privacy. Randomly permuting each
values within each bucket will break the linking between different columns.
The law of total probability calculates the probability of the sensitive
value, (, )
pt , s B pt , B ps t , B
(1)
Consider Tuple t which may have many matching buckets, in the
whole data D ts matching degree can be given as f (t) = f (t, B).
The probability that t is in bucket B is:
f t , B
pt, B
f t (2)
l-diverse Slicing: A tuple t satisfies l-diversity iff for any sensitive value s,
p t , s 1
1 (3)
A Sliced table satisfies l-diversity iff every tuple in it satisfies l-diversity.
FACT: For any tuple t D, , = 1
PROOF:

s pt , s S B pt , B p s t , B
B pt , B S ps t , B
B p t , B
1
Chi-square measure of correlation analysis is used as follows:
A1, A2
2 1 d d ( f ij f i . f j )
2
(4)
1 1

mind1, d 2 1 i 1 j 2 fi . f j
Advantages of slicing over generalization:

[1] Generalization fails on high-dimensional data due to the curse of
dimensionality.
[2] It also cause too much of information loss due to uniform-
distribution.

IJCSBI.ORG
Advantages of slicing over bucketization:

[1] Slicing prevents membership disclosure which bucketization fails to
do.
[2] Better preservation of attribute correlation between sensitive
attributes and the QI attributes.
Tiancheng Li and Ninghui Li, explains that there is no proper trade-off
between privacy and utility. The trade off presents systematic methodology
for measuring privacy loss and utility loss. [6] Also provides quantitative
interpretations to the trade off which guides the data publishers to choose
right privacy-utility trade-off.
COROLLORY: Privacy should be measured against the trivially-
anonymized data whereas utility should be measured using the original data
as the baseline.
Utility can be measured using utility loss instead of using utility gain.
Well achieved privacy-preserving method should result in zero privacy loss
and zero utility loss.
Privacy loss can be measured using the JS divergence distance measure:
1

P (t ) JS (Q, P(t )) KL(Q, M ) KL( P(t ), M
loss 2

1
where M = 2 (Q+ P (t)) and KL (,) is the KL-divergence
q
KL(Q, P ) iqi log i
p
i
The worst- case privacy loss is measured as the maximum privacy loss for
all tuples in the data:
P loss max t Ploss (t )

(5)
Utility loss can be measured again using JS divergence:
U loss ( y ) JS ( Py , Py ) (6)
Because utility is an aggregate concept, utility loss is measured by
averaging the utility loss (y) for all large population y. Maximum
utility can be achieved when = 0.
y y U loss y
1

U loss Y
(7)
where Y is the set of all large populations

IJCSBI.ORG
In [7], the two k-anonymity algorithm was proposed which provides the
most accurate classification models based on the mutual information
obtained. It is also discussed that the data generalization should be based on
classification capability of data rather than the privacy requirement to
ensure the perfect anonymization.
Mutual information is used to measure which generalization level is best for
the classification. The uncertainty associated with the set of class labels is
described as:

H C kp1 freq C k log 2 freq C k
(8)
where H(C) indicates the classification uncertainty without using other
attribute information.
The mutual information is biased towards attributes of many values. Such
bias should be avoided and this can be achieved by normalising the mutual
information. denotes normalized mutual information.

Ai l ; C

I
I N Ai l ; C N
H ( Ai (l ) )
(9)
The normalised mutual information of all possible generalization levels
should be compared. The one with the highest normalized mutual
information is the best for the classification. The algorithm maximises the
classification capability by generalization. Suppression is done by privacy
requirements K [IACk] or distributional constraints [IACc]. The proposed
method IACk supports anonymization with better classification model than
by utility-aware method.
In [8] both global recoding and local recoding are discussed. The global
recoding method maps the domain of the QI attributes to generalized or
changed values. Global recoding does not achieves effective anonymization
in case of discernability and query answering accuracy. Local recoding
covers both numerical and categorical data which global recoding fails to
do. Here two algorithms namely the bottom-up algorithm and the top-down
greedy search methods are used to perform local recoding. The bottom-up
algorithm reduces the weighted certainty penalty which reflects the utility of
anonymized data. The top down approach partition the table iteratively by
using the binary partitioning. The number of groups that are smaller than k
is much less than the worst case. Thus, the top down method is
comparatively faster than the bottom-up method.
Raymond Chi-Wing Wong et al., describes the minimality attack as the
knowledge of the mechanism or the algorithm of anonymization for data
publication which will lead to privacy breech. The mechanism which tries

IJCSBI.ORG
to minimize the information loss and such an attempt leads to minimality

attack. The minimality attack deals with m-confidentiality which can
prevent the attacks with less information loss.
The main objective of privacy preservation is to limit the probability of the
linkage from any individual to any sensitive value set s in the sensitive
attribute. The probability or credibility can be defined as:
Let T* be the published table which is generated from T. Consider an
individual o O and a sensitive value set s in the sensitive attribute.
Credibility (o, s, Kad) is the probability that the adversary can infer from T*
and background knowledge Kad that o is associated with s.
A table T is said to satisfy m-confidentiality if, for any individual o
and any sensitive value set s, Credibility ( o, s, Kad) does not exceed 1/m.
Then the information loss of a tuple t* in T* is given by,
t* T *ILt *
*
*
Dist T , T
T
(10)
The technique in [10] states that the quality of anonymized data can
be better measured with the purpose for which the data been used. This can
be done with the series of techniques like queries, classification and
regression models which provide the high quality data. Hence large-scale
datasets can be anonymized only based on their measure of usage. Two
techniques called scalable decision tree and sampling are developed which
allows anonymization algorithm to be applied to large datasets.
2.3 Cryptographic Methods
Cryptography is the technique which focuses mainly on securing the
information from the third parties. Information security has various aspects
like data confidentiality, authentication and data integrity. Cryptographic
methods like symmetric-key cryptography, public-key cryptography,
cryptanalysis and cryptosystems are widely used privacy preservation
methods.
3. CONCLUSION
The detailed survey on various anonymization methods are carried out.
Every anonymization techniques have their own significance.
Generalization causes too much of information loss and bucketization fails
in privacy preservation due to identity disclosure. Slicing performs better
than generalization, bucketization and many other anonymization methods.
Slicing provides high dimensional data by partitioning highly correlated
attributes into columns and further breaks the association of uncorrelated
attributes. Thus slicing in combination with correlation analysis has the high
data utility and ensures privacy in PPDM.

IJCSBI.ORG
REFERENCES
[1] Raymond Chi-Wing Wong et al., (alpha, k) Anonymity: An Enhanced k- anonymity
Model for Privacy Preserving Data Publishing, ACM, 2006.
[2] Xiaokui Xiao, Yufei Tao, Anatomy: Simple and Effective Privacy Preservation,
ACM, 2006.
[3] David J.Martin et al., WorstCase Background Knowledge for Privacy-Preserving
Data Publishing, National Science Foundation under Grants.
[4] Q. Zhang et al., Aggregate Query Answering on Anonymized Tables, In ICDE,
2007.
[5] Tiancheng Li et al., Slicing: A New Approach to Privacy Preserving Data
Publishing, ACM.
[6] Tiancheng Li and Ninghui Li, On the Trade off between Privacy and Utility in Data
Publishing, ACM 2009.
[7] Jiyoung Li et al.,Information Based Data Anonymization for classification utility,
Elsevier, 2011.
[8] Jian Xu et al.,Utility-Based Anonymization Using Local recoding, ACM 2006.
[9] Raymond Chi-Wing Wong et al., Minimality Attack in Privacy Preserving Data
Publishing, ACM, 2007.
[10] Kristen Le Fevre and Raghu Ramakrishnan, Workload-Aware Anonymization
Techniques for Large-Scale Datasets, ACM, 2008.
[11] Rakesh Agarwal et al., Privacy-preserving Data Mining, ACM SIGMOD
Conference on Management of Data, 2000.
[12] G. Sai Chaitanya Kumar et al., Suppression of Multidimensional Data Using K-
anonymity, International Journal of Computer Science and Communication
Networks, Vol 2(4), 501-505.
[13] Ali Inan, Murat Kantarcioglu, Elisa Bertino, Using Anonymized Data for
Classification, AFOSR.
[14] Benjamin C.M. Fung et al., Anonymizing Classification Data for Privacy
Preservation, IEEE.
[15] Patrick Sharkey et al., Privacy-Preserving Data Mining through Knowledge Model
Sharing, NSF.
[16] Aris Gkoulalas-Divanis, Grigorios Loukides PCTA: Privacy-constrained Clustering-
based Transaction Data Anonymization, ACM 2011.
[17] L.Sweeney,Guarenteeing Anonymity When Sharing Medical Data, the Datafly
System Journal of Informatics Association, pages 51-55, 1997.
[18] Neha V. Mogre, Girish Agarawal, Pragati Patil, A Review On Data Anonymization
Technique For Data Publishing, IJERT, 2012.
[19] L.Kaufman and P. Rousueeuw, Finding Groups in Data: An Introduction to Cluster
Analysis, John Wiley & Sons, 1990.
[20] G.Ghinita et al.,On the Anonymization of sparse high-dimensional data, In ICDE,
pages 205-216, 2005.
[21] K.LeFevre et al., Mondrian multidimensional k-anonymity, In ICDE, page 25, 2006.
[22] Y. Xu, K. Wang, A.W.C.Fu, and P. S. Yu, Anonymizing transaction database for
publication, In KDD, pages 767-775,2008.

IJCSBI.ORG
[23] H.Wang, R.Liu, Privacy-preserving publishing micro data with full functional
dependencies, Data & Knowledge Engineering, 2011.
[24] L.Sweeney, k-anonymity: A model for protecting privacy, International Journal on
uncertainty, Fuzziness and knowledge based systems, 2002.
[25] K.Wang and B.Fung, Anonymizing sequential releases, In SIGKDD, 2006.
[26] X.Xiao and Y.Tao,Personalized Privacy Preservation, In Sigmoid, 2006.
[27] G. Agarwal et al., Anonymizing tables, In ICDT, pages 246-258, 2005.
[28] Agarwal et al., A framework for high-accuracy privacy-preserving mining, IEEE,
2005.
[29] Srikant et al., Limiting privacy breaching in Privacy Preserving Data Mining,
ACM, 2003.
[30] V.S.Iyengar Transforming data to satisfy privacy constraints, In KDD, 2002.
[31] J. Li, R.Wong, A. Fu, and J.Pei, Acheiving anonymity by clustering in attribute
hierarchical structures, In DaWak, pages 405-416, 2006.
[32] B.C.M Fung, K.Wang, and P.S.Yu,Top-down Specialization for Information and
Privacy Preservation, In ICDE, 2005.
[33] K.Wang et al., Bottom-up Generalization: A data mining solution to privacy
protection, In ICDM, 2004.
[34] P.Samarati, Protecting Respondents Identities in Microdata Release, IEEE, 2001.
[35] K.Levere et al., Incognito: Efficient Full-Domain k-anonymity, ACM, 2005.

IJCSBI.ORG
An Ontology Based System for

Predicting Disease using SWRL Rules
Mythili Thirugnanam
Assistant Professor (Senior), School of Computing Science and Engineering
VIT University, Vellore, Tamilnadu, India
Tamizharasi Thirugnanam
Assistant Professor , School of Computing Science and Engineering
R. Mangayarkarasi
Assistant Professor (Senior), School of Information Technology and Engineering
ABSTRACT
This paper basically provides information about the various diseases to the user with the
help of ontology. The developed ontology consists of disease and its relationship with
symptoms and SWRL rules (Semantic Web Rule Language) are designed for predicting
diseases. The developed ontology system has contains two stages. The first stage is defining
the class hierarchy and defining the object and data properties. The second stage is
executing rules which extract the disease details with symptoms based on the rule specified.
Finally the inferred axioms reflected in the ontology. The various testing shows the
successful execution of ontology. The analysis of the results obtained followed by their
discussion gives the final risk value to the user of the system.
Keywords
Ontology, Disease, Semantic Web Rule Language (SWRL)
1. INTRODUCTION
In todays world everyday new diseases are identified and diagnosed.
Everyday normal people become patients and get diagnosed. Generally
diseases are identified by the symptoms that our body creates. A cause for a
disease may differ from one another. But a common cause for most of the
diseases is ignorance, which itself is caused by the lack of knowledge about
the symptoms indicated by human body. To address this problem we need a
common knowledge sharing tool that can help us to share knowledge about
diseases through the symptoms of them. Ontology is a technology created
for sharing the knowledge in machine understandable format. With the help
of ontology an information system about diseases and symptoms is created,
some relationships between diseases and symptoms are identified and rules
are written using those relationships in order to diagnose a particular disease
out of given symptoms. Those rules are written in SWRL (Semantic Web
Rule Language) format. The choice of information system over a database is

IJCSBI.ORG
to provide precise and relevant information for user query. Presently similar
disease information systems using ontology is available but they do not deal
with human diseases [4].The motivation behind the development of this
system is to spread knowledge about the diseases and their symptoms to the
various kind of people. Lack of knowledge about the diseases is also a cause
for diseases. This emphasizes the essential of developing a disease
information system. Another important thing is that this ontology can be
reused to create similar system and thus it is very efficient.
2. BACKGROUND STUDY
MajaHazic et al., [9] had proposed ontology based information system for
bio medical field. Ontology based grid middleware was developed for
human disease research, to resolve the medical issues and disease factors.
The proposed system fails to concentrate for improving the usability of the
system. Maja Hadzic et al., [10] introduced Ontology based multi-agent
system to support human disease study and control. A new ontology called
generic human disease ontology (GHDO) was designed to represent
knowledge about the human disease. Knowledge representation is included
the four attributes such as disease types, symptoms, causes and treatments.
The security and interaction of the developed system is not appreciable.
Maja Hadzic at el., [11] had enhanced their study on implementing
Ontology based Disease System. They developed the GHDO (Generic
Human Disease Ontology) to represent the disease knowledge and the
information of the diseases is organized into four dimensions such as
Disease types, Symptoms, Causes and treatment. In addition to this, the
system was aimed to support the study of complex disorders caused by
many different factors simultaneously. Illhoi Yoo et al., [7] studied various
document clustering approaches for MeSH ontology to improve the
clustering quality. The results obtained from the developed biomedical
ontology MeSH, enhanced the clustering quality on biomedical documents
and decent document clustering approaches performance was improved.
Hongyi Zhang et al., [6] introduced a method for obtaining biological
functions of gene by using GENE ontology method. They were able to
differentiate the relation between parents gene and their childrens genes of
all types of genes. From the results it was possible to construct probabilistic
gene regulatory networks with the method of coefficient of determination
(CoD). Akifumi Tokosumi et al., [1] evaluated existing medical ontologies
and proposed future directions of medical knowledge repository system with
three knowledge repository such as localized nature of knowledge,
collective acquisition of knowledge and usable knowledge repository. The
locality of knowledge was suggested to be used in medical ontologies. Tran
Quoc Dung et al., [15] developed Ontology based health care information
extraction system VnHIES. Elements extracting algorithm and new
semantic elements were used to extract the health care semantic word.

IJCSBI.ORG
Document weighting algorithm was applied to get summary of health
information. The result of the proposed system is more accurate. Tharam S.
Dillon et al., [14] had carried out work on ontology in bio medical
information storage and processing. The importance of ontologies in
representing bio medical information storage and processing knowledge
models are discussed and also explained the uses of ontologies in semi-
automatic and automatic tasks. Dahua Xu et al., [5] implemented pest and
disease information system based on WEBGIS named as Diseases and Pest
Information system (DIPS). A DIPS was a pest and disease control system
and warning system designed for crop pests. The system architecture
consists of different components and sub-systems such as WEBGIS
component the interface component, DIPS component which was used for
data access, and Component model management sub-system. Component
model management Sub-system manages the centralization between DIPS
and DIPS and manages its components like mathematical models, pest
components and interface component. The COM + component technology
was used in order to incorporate object oriented technology, thus improved
the scalability, reusability of the proposed system and enabled distributed
computing. Shastri L. Nimmagadda et al., [13] proposed ontology based
data warehouse modeling and managing ecology of human body for disease
and drug prescription management. The system was focused on introducing
the concept of ontology based warehouse modeling and representing human
body system in ontological representation. The proposed system was yet to
be put into practice. Antonio J. Jara et al., [3] implemented ontology and
rule based intelligent information system to detect and predict myocardial
diseases. The developed system was used in pre-hospital health
emergencies, remote monitoring of patients with chronic conditions and
medical collaboration through sharing health-related information resources.
Rule based system was designed to predict the illness by applying
chronobiology algorithm. Ontology trees were constructed to in order to
provide knowledge base of diseases. To avoid the observation periods in the
hospital the system was used to send the information about the detected
symptom or disease. Though this system was useful the chronobiology
algorithm was not based on the diagnosis and improvements in the artificial
intelligence layer were required. Ali Adeli et al., [2] had developed a Fuzzy
Expert System for Heart Disease Diagnosis based on the V.A. Medical
Center, Long Beach and Cleveland Clinic Foundation database. It had 13
input fields and one output field. The input fields were attributes of heart
disease such as chest pain, resting electrocardiography (ECG), etc. The
output field was an integer value ranges from 0 to 4 to denote different
levels of the heart disease. The system showed 94% accuracy in classifying
the heart disease. Ersin Kaya et al., [5] had developed a Diagnostic Fuzzy
Rule-Based System for Congenital Heart Disease. They retrieved medical

IJCSBI.ORG
dataset of patients from Pediatric Cardiology Department at Selcuk
University, from years 2000 to 2003. They classified the medical dataset
into 4 groups for fuzzy classifications and then the fuzzy rules were created
based on various attributes in the data set. These attributes includes 8
Conditional attributes, 4 Decision attributes. After classifying fuzzy rules,
they weighted it two different methods such as weighted vote method and
single winner method and compared the results. They increased the
accuracy of Classification of Congenital Heart Diseases. Yip Chi Kiong et
al., [16] had developed Health Ontology System to store clinical databases
into a shared cumulative ontology so that it can be intercepted by machines.
Such system was built upon their previous work that is Ontology generator,
Ontology Distiller and Ontology Accumulator. These are software tools
used in the system generate ontology. Ontology generator generates
ontology from a database. Ontology Distiller does the reverse process by
storing ontology into a database. The Ontology accumulator does the
integration of similar types of ontology. Integration of these tools helped to
convert small databases to complex database tables. Lynn Marie Schriml et
al., [8] created disease ontology that provides a backbone for disease
semantic integration. The developed disease ontology contains a knowledge
base of 8043 human diseases. It was designed as a web interface designed
for high speed, efficiency and robustness through the use of graph database.
It supported querying of disease name, synonym, and definition of diseases.
This work was not extended to relations among symptoms, causes and
diseases. Above literature clearly shows, there is no specific domain
ontology is not yet developed for human disease. Hence, there is need for
developing such a domain ontology which focuses only human diseases to
help the people to get to know the information without any difficulty.
Therefore, this work focused to develop ontology based system for
predicting the human diseases.
3. PROCEDURES ADOPTED FOR DEVELOPED SYSTEM
Mythili et al., [12] proposed the ontology based disease information system.
The developed system adopted the procedure from the proposed work .The
developed system consists of three different phases such as Knowledge
acquisition phase, Rule engine phase and Query processing phase. Creation
of ontology is done in knowledge acquisition phase. Writing the SWRL
rules are done in the rule engine phase. Query processing phase get the
query from the user and responds to his or her query.

IJCSBI.ORG
Figure 1. Components of the developed ontology based disease information system

3.1 Knowledge Acquisition Phase
Ontology of a domain is useful to share and reuse the explicit knowledge of
that domain. Creating medical related ontologys concepts are very much in
need. Creating ontology and identifying relationships in them can be done
with the help of a tool called protg. Building ontology consists of different
modules such as creating ontology, creating classes and creating or
properties, identifying relationships and adding records. Sample ontology
class construction is shown in Table1 and protg implementation is shown
in the Figure2.
Table 1 Sample ontology classes and their properties.
Class 1 Class 2 Class 3
Class name Heart disease Chest_Pain Fever
Object has_symptom is_a_symptom_of is_a_symptom_of
property
Data property has_name has_id Has_Temperature

IJCSBI.ORG
Figure 2. Class Hierarchy (using Protg).

Here the required knowledge for the system is collected and organized in
required format. The knowledge is collected from various resources such as
World Wide Web, Physicians, etc. The Detailed description of diseases,
symptoms are stored in a datasheet. As already mentioned, the medical
sciences field is very vast. So incorporating every disease is not possible.
3.1.1 Creating Ontology
Creating ontology is done with the help of Protg tool. The required
ontology created and saved in local storage. The actual code for the
ontology is created by protg where the code is in XML format. The
extension of created ontology is given as .owl where OWL stands for
Ontology Web Language. With this file we can create desired classes and
their properties. Creating ontology is simple step while using protg.
Protg simply creates the required XML file and store it in OWL format so
that the file can be accessed as ontology. The implementation of same is
shown in the Figure 3.

IJCSBI.ORG
Figure 3. Creating ontology using protg.

3.1.2 Creating Classes and Properties
The system classifies the diseases into human body system wise diseases.
Hence each disease system is considered as a subclass of super class called
diseases. So we create a subclass of symptoms under the super class. So that
it serves as a repository of a symptoms and they related with any disease
system, where disease systems classes and symptoms are siblings of disease.
Finally a class called people is created in order to relate diseases and
symptoms. This people class is a disjoint class from symptom. The
implemented class architecture is shown in the Figure 3. As it is strongly
recommended to identify and add the properties of a class or subclass in
order to make the classes more understandable, properties are created. The
main data properties of all classes are hasName and hasID. The main
object properties of diseases class, Symptoms class and person class are
hasSymptom, isSymptomof and hasDisesae respectively. The creation of
data and object properties using protg is shown in the Figure 4.

IJCSBI.ORG
Figure 4. Data Properties and Object Properties

3.1.3 Identifying the Relationships between Classes
The relationship between a system disease and a symptom can be
hasSymptom relationship. For example stroke hasSymptom dizziness.
We can create a vice versa relationship isSymptomOf. For example:
dizziness is a symptom of stroke. Finally the person classes have two
important relations with these two classes, such as it as relationship with
diseases as hasDisease and with symptoms class as hasSymptom. For
example: Person-1 hasSymptom Dizziness, Fainting, etc. Sample snapshot
for identifying the relationship between classes are show in the Figure 5.
Figure 5. Relationship between classes.

IJCSBI.ORG
3.2 Rule Engine Phase
The Rule engine phase of the system consists of a semantic reasoner and
SWRL Rules. The semantic reasoner is used for checking the consistency of
the relationship among the classes and their properties (Figure 6). This is
done in order to validate the SWRL rules. SWRL rules are created from
valid relationships to detect disease from the given symptoms. For example
if a person has a symptom chest pain, shortness of breathing, pain in arms,
dizziness, eye color red and fastest heartbeat we should say that the person
has coronary heart disease. The following rule is a sample SWRL rule
created using Protg and SWRLTab. The following figures show some the
rules in graphical format. There is another format to write these rules that is
known as SQWRL which provides SQL like querying functionalities. Those
rules also show in graphical format.
Figure 6. SWRL Reasoner Dialogue - checking consistency

IJCSBI.ORG
Figure 7. SQWRL and SWRL Rules in graphical format.

3.3 Query processing phase
In query processing phase user interaction is managed. If user enters some
symptom he or she has, the system request the query processor. The query
processor checks with SWRL rules for relations between the diseases. It
returns the diseases associated with symptoms entered. The query processor
then displays the output to the user. In protg these querying is done with
the help of JessTab.
4. RESULT & DISCUSSION
The output SWRL and SQWRL rules are the main form of output in order
to retrieve information out of ontology. After checking the consistency of

IJCSBI.ORG
the class hierarchy the SWRL rules are executed in jess inference engine.
The output of the jess inference tab is shown in the following Figure 8.
Figure 8. SWRL output (Jess Inference process)
Figure 9. SWRL output (Inferred Axioms).

4.1 Result Analysis
The results are successfully obtained from the jess inference engine and
SQWRL Tab. In figure 10.part of the inferred axioms from the ontology is
shown. The jess result is in the form of person-x has Disease Name; The
SQWRL output is in the form of Person Disease. Both results are shown
in Figure 10 and Figure 11.

IJCSBI.ORG
Figure 10. Output of SQWRL

These results are stored in the properties of each instance of class. An
instance of class person who has symptoms for example palpitations and
Shortness of breath may have the disease hypertensive cardiopathy. This is
defined in the SWRL rule and the output is stored into the ontology.
Figure 11. Replication of output in ontology after using Jess.

These results can be viewed using visualization tools. Jambalaya is a
visualization tool that provides options to query the ontology. Hence we
customized that tool so that it will work as per this projects requirements.
Some of the visualizations are shown in Figure 12.

IJCSBI.ORG
Figure 12. Visualization using Jambalaya.

4.2 Discussion
This system depicts the use of ontology as a knowledge acquisition method
and successfully proved it in the disease symptoms domain. This is just an

IJCSBI.ORG
initation or example of how powerful the semantic web technologies can be
used in knowledge sharing and envisage the resuablity of ontologies.
5. CONCLUSION AND FUTURE WORK

This paper provides an approach to create disease information system about
diseases and symptoms with the help of ontology. The developed disease
information system will help the users to be aware of diseases and their
symptoms and help them to take viable actions. Thus this work points out
the importance of creating a disease information system and demonstrated a
successful one. The future developments to system include inclusion of
other diseases, inclusion of cause-disease relationship, inclusion of
Treatment-disease and presenting the ontology in along with easily
accessible user interface.
REFERENCES
[1] Akifumi Tokosumi, Naoko Matsumoto, and Hajime Murai, Medical Ontologies as
a Knowledge Repository, Proceedings of IEEE/ICME international conference on
complex medical engineering, 2007, pp.487-490.
[2] Ali. Adeli, Mehdi. Neshat,, A Fuzzy Expert System for Heart Disease Diagnosis.
Proceedings of the International Multi Conference of Engineers and Computer
Science, Vol I, IMECS 2010 ,Hong Kong , March 17-19 ,2010.
[3] Antonio J.Jara, Francisco J Blaya, Miguel A. Zamora and Antonio F.G. Skarmeta,
An ontology and rule based intelligent information system to detect and predict
Myocardial Diseases, Proceedings of the 9th International Conference on
information technology and Applications in Bio-medicine, 2009.
[4] Dahu aXu, Yinsheng Zhang, Huilin Shan , Develop of diseases pest information
system based on WEBGI , Proceedings of International Seminar on Future Bio
Medical Information Engineering ,pp.461-464.
[5] Ersin Kaya, Bulent Oran and Ahmet Arslan , A Diagnostic Fuzzy Rule-Based
System for Congenital Heart Disease, World Academy of Science, Engineering
and Technology, vol 54, 2010 ,pp.253-256.
[6] Hongyi Zhang, Jiexin Pu, Junying Zhang, Construction of gene regulatory
networks based on gene ontology and multivariable regression, Proceedings of
International conference on mechatronics and automation , 2006, pp.1324-1328.
[7] Illhoi Yoo, Xiaohua HU , Biomedical Ontology MeSH improves Document
Clustering Qualify on MEDLINE Articles :A comparison study, Proceedings of
the 19th IEEE Symposium on computer based medical systems,2006, pp. 577-582.
[8] Lynn Marie Schriml, Cesar Arze, Suvarna Nadendla, Yu-Wei Wayne Chang,
Mark Mazaitis, Victor Felix, Gang Feng and Warren Alden Kibbe, Disease
Ontology: a backbone for disease semantic integration , Nucleic Acids Research,
Vol 40 ,2011,pp.947-956.

IJCSBI.ORG
[9] Maja Hadzic, Elizabeth Chan, Pronpit Wongthongtham, Robert Meersman,
Disease Ontology based Grid Middleware for Human Disease Research Study,
Proceedings of 30th Annual Conference of the IEEE Industrial Electronics Society,
vol no. 1 , 2004, pp.480- 486.
[10] Maja Hadzic, Elizabeth Chang. Ontology-based Multi-agent System support
Human Disease Study and Control, Proceedings of the conference on Self-
Organization and Autonomic Informatics 2005, pp.129-141.
[11] Maja Hadzic, Elizabeth Chang, Ontology-based Support for Human Disease
Study, Proceedings of 38thHawaii International Conference on System Sciences,
2005,pp.1-7.
[12] Mythili Thirugnanam, Mangayarkarsi, Paatabiraman, Sivakumar, Ontology based
disease information system , Proceedings of International conference on Modeling
Optimization and Computing ,2012, pp. 3235-3241.
[13] Shastri L. Nimmagadda, SashiK. Nimmagadda and Heinz Dreher, Ontology based
data warehouse modeling and managing ecology of human body for disease and
drug prescription management, Proceeding of second IEEE International
Conference on Digital Ecosystems and Technologies, 2008, pp.212-220.
[14] Tharam S. Dillon, Elizabeth Chang, Maja Hadzic, Ontology Support for
Biomedical Information Resources, Proceedings of 21st IEEE International
Symposium on Computer-Based Medical Systems, 2008, pp.7-16.
[15] Tran Quoc Dung, Wataru Kameyama,, A Proposal of Ontology-based Health Care
Information Extraction System: Vn HIES, Proceedings of IEEE international
Conference in research , innovation and vision for the future , 2007, pp. 1-7.
[16] Yip Chi Kiong, Sellappan Palaniappan, Nor Adnan Yahaya, Health Ontology
System, Proceedings of 7th International Conference on IT in Asia (CITA), 2011.

IJCSBI.ORG
Performance Evaluation of Web

Services in C#, JAVA, and PHP
Dr. S. Sagayaraj,
Assistant Professor, Department of Computer Science
Sacred Heart College (Autonomous),Tirupattur - 635 601
Tamil Nadu, India
M. Santhosh Kumar,
Research Scholar, Department of Computer Science
Sacred Heart College (Autonomous),Tirupattur - 635 601
Tamil Nadu, India
ABSTRACT
Web Services integrates Internet and Web Technologies. Individuals or industry place their
core business processes on the Internet as a collection of Web Services. Command Line
Interface (CLI) and Graphical User Interface (GUI) are the two approaches used to develop
the Web services. The Web Service development Languages and Tools performance varies
one from another. This paper presents the performance evaluation of the GUI based Web
Services developed in C#, JAVA, and PHP by investigating to improve the web service
usability. Quantitative and Qualitative methodologies are applied for performance
evaluation. Comparison of languages is evaluated using Calculator Web Service developed
in C#, JAVA, and PHP. The value of each criterion is analyzed manually with the
Calculator Web Service to identify the better language for the Web Service development.
Keywords
Web Service, Performance, Evaluation, Methodologies.
1. INTRODUCTION
. Internet provides plenty of services for the information society. The power
lies in the fact that it holds and encourages everyone to contribute their
creative ideas, knowledge, and works and make them available to others
interactively on the Internet [1]. The internet has distributed environment
and access the service everywhere and every time. Extensible markup
language (XML) is used to represent data in the Internet. Any application
written in any language running on any platform can process XML data [2].
Web services communicate using XML, a text based protocol that all
applications can understand. Before the arrival of Web services, other
technologies and architectures met the functionality of todays web service.
Microsoft's version of these technologies is called Component Object Model
(COM), and consequently Distributed COM (DCOM) and WinDNA.
Software industries work with various vendors, suppliers, contractors and
other entities who have developed software system in homogenous platform

IJCSBI.ORG
with huge investments. It is almost impossible for any of them to change
their system compatibility for heterogeneous platform. So, software
developer/software industry needs to use an independent, network-
accessible application that is called web service [3]. A Web Service
provides functionality to a consumer through Internet or intranet,
programmable Uniform Resource Locator (URL) and functions called over
the Internet. CLI and GUI are the two approaches used to develop Web
Services with different languages and tools.
The web services performance can vary from one language to another, the
users will be looking for the best service from the available services. Hence,
the user has to go through the various web services and find out the better
service suitable for their domain. It would be better to compare the web
service performance developed in different languages and tools. This paper
has taken few GUI languages and designed a specific service, evaluating the
web service performance.
The earlier research work in this area is discussed in section 2. The proposed
framework is presented in section 3 and section 4 describes the evaluation of
web services. The calculator application versus calculator verses Web
Service application comparison is presented in section 5. The analysis and
results are briefed in section 6. Section 7 provides the conclusion and future
work.
2. RELATED WORKS
The web service development and its usability have grown greatly. The
various research works done in the performance comparison of Web
services are summarized in the following section.
In 2005 Sanjay P. Abuja and Raquel Clark have focused on evaluation of
the performance of several web service technologies [4]. They took the
quantity elements such as time to market, availability of plug-in software,
maintainability, language support, portability, scalability and cost.
Chen Wu and Elizabeth Chang discussed the Web Service Architectures
Based on quality properties in 2005. This property has two styles such as
broker based and peer to peer. These styles have qualitative and quantitative
elements such has loosely coupling, inoperability, scalability, simplicity,
and extensibility, performance, security, reliability, visibility and
compatibility. Finally, they evaluated the web service architecture
performance [5].
In 2008 Toyotaro Suzumura et al explored the web services development
based on few languages of GUI and CLI. They compared the web service
languages such as PHP, JAVA and C. They reported that the PHP performs
better than the other two languages based on qualitative and quantitative

IJCSBI.ORG
elements. They incorporated GUI based languages for the development of
web services in a domain [6].
Tommi Aihkisalo and Tuomas Paaso focused on performance comparison
of web services in object marshalling and unmarshalling solutions in 2012.
They described the performance of the underlying service or business logic,
which is usually stabilized and evaluated the web service object
performance [7].
3. CJP FRAMEWORK
C-sharp Java Php (CJP) framework is developed for heterogeneous platform
web services using GUI languages C#, JAVA and PHP as shown in Figure
1. GUI languages reduce the designing time and increases end user
interaction. Generally, web service developing languages and tools
performance vary from one to another [8]. To test the performance of the
web service, researcher has taken a simple calculator web service
application and compared it with calculator web services designed using
GUI languages of this framework.
CJP process framework [9-10] developed three web service separately in
above mentioned languages. The C# calculator web service is developed in
.NET framework and published in IIS server. C# web service application
development is not only standalone application for the client, but also it can
access web service application through URL by generating the proxy.
Java calculator web service is developed in eclipse IDE and published in
Apache-Tomcat server. The Location of Service is created and compiled to
run java web service by Ant tool. This tool compiles Java code, to locate the
service in directory to add Java Application Archive (JAR) files which are
build manually and creates web service deployable package. This package is
deployed in the Apache-Tomcat server. This web service uses an URL to
verify the proper running and generates the output in SOAP message
format.
The PHP calculator web service is developed with HTML and published in
Wamp server. Wamp server provides www workspace folder and the
developer copies .php file in the workspace to run with the server. Now the
service is ready to be accessed by the client through the URL.
The building and deployment of C# and PHP web services are very easy
compared to JAVA web service. These web services are similar in the
functionality but their performances vary one from another. CJP process
framework evaluates the above mentioned web services performance.
The performance of the web services are evaluated based on the quantitative
and qualitative elements. The quantitative measured data are time flexibility,
memory footprint, soap message comparisons and line of program. The
qualitative measured data are testing methodology, scalability, security and

IJCSBI.ORG
maintainability. These element values are measured manually with
calculator web service.
Figure 1. CJP web service process framework architecture
4. EVALUATION OF WEB SERVICES

Web services are evaluated based on the quantitative and qualitative
elements. These elements are depicted in Figure - 2. Each element value
varies from one service to another. The evaluation compares the web
services variations in these languages.
4.1 Quantitative Elements
Quantitative is the representation of measure. The measure presents the
collection of data or number [11]. The data calculated from measurement
of different fields to analyze particular field is called evaluation. The
quantitative elements are evaluated in the following section.
Time Utilization [12] - Each web service takes time to run and deployed in
different environments uses various web service developing languages are
compared.
Memory Footprint [13] - The memory footprint measures web services
memory usage and compares this element on web services developed in
these languages.

IJCSBI.ORG
Figure 2. Comparisons of Web Services in C#, JAVA, PHP
SOAP Message Comparison [14] - This comparison is based on input and

output soap message tags. The soap message provides communication
between web services.
Line of code (LOC) [15] Compares the number of code lines taken for the
calculator web services.
4.2 Qualitative Element
Qualitative is the representation of excellence which is presents the merit of
different fields and to analyze a particular field [16]. The qualitative
elements are evaluated in the following section.
Testing [17] To check validation and verification testing and the results
are compared on web services.
Scalability [18] - Developer adds one or more methods in calculator web
service and compare the scalability based on CPU utilization.
Security [19] After publishing in the server, Web services security is
checked and compared.
Maintainability [20] Web service development languages uses syntax
from java and C. Developers can compare the maintainability of these web
services.
5. IMPLEMENTATION
Computer performs complex calculations and series of calculations. Before
the commercial computers, calculators dominated the world of calculations

IJCSBI.ORG
and business. To test the performance and evaluation the developer has
designed a simple calculator web services. The calculator web services
provided addition, subtraction, multiplication, division and modulus
operations. For the proof of concept the researcher compares the
performances between simple calculator and calculator web service user
interfaces are discussed in the following sub-section.
5.1 Calculator Application
The calculator application allows the user to perform five basic
mathematical operations. This application consists of 17 Buttons and 1
Textbox as shown in Figure-3.
Figure 3. Calculator Interface

5.2 Calculator Web Services
Calculator web services are developed in C#, JAVA, and PHP that allows
the user to perform the five mathematical operations as per the following
steps:
Projects are created and name as Cal.aspx, Wsj.JAVA, and ws.PHP.
Addition, subtraction, division, multiplication and modulus methods
are created. Each web service is deployed and published in the
following URLs:
http://localhost:2773/cal/Service.asmx
http://localhost:8081/Wj/services/Wjs
http://localhost:8080/ws.PHP
User enters the input in the text box and clicks calculate button.
5.3 Comparison between User Interfaces
Three web services and calculator application are compared based on user
interface. C# web service interface is shown in Figure 4 and JAVA web
service interface is shown in Figure 5 both have one invoke button, two
text boxes and one submit button. PHP web service interface includes two
text boxes, five option buttons and one submit button as shown in Figure
6. The output of these web services are XML or text based. The three web
services have reduced number of components in user-interface compared to

IJCSBI.ORG
calculator application. The interfaces of calculator application and three web
services are compared in Table 1.
Table 1. Interfaces components of Calculator application and Calculator web services
Sl. Interface
NO Services Format
Input Output
ButtonsButtons
1. Calculator 17 Buttons 1 Text
Application Box
2. C# Web 1 Invoke Xml
Service Button, Format
2 Text box
3 JAVA Web 1 Invoke Xml
Service Button,2 Format
Text box
4 PHP Web 2 Text box Xml
Service 5optional Format /
button Text
Figure 4. C# Calculator Web Service
Figure 5. JAVA Calculator Web Service

IJCSBI.ORG
Figure 6. PHP Calculator Web Service
6. ANALYSIS AND RESULTS

The analysis compares the performances of the three web services and finds
out the better language for the web service application development. The
results obtained manually from the quantitative and qualitative measured
data are captured from the calculator web services implementation. Web
services performance are compared based on the following measured data.
6.1 Time Utilization
The deploying time of calculator web services are obtained and compared
against other web services. The evaluation time utilization of web services is
depicted in Figure 7.
Figure 7. Time utilization
The time utilization of calculator web services deployment in C# is 0.0037

sec, JAVA is 0.0034 sec and PHP is 0.0030 Sec. Figure -7 concludes that
the better web service language in time of utilization is PHP web service.
6.2 Memory Footprint
Memory footprint obtains memory utilized by calculator web services and
compared. The memory footprint for the various web services are shown in
Figure -8. The memory footprint for the calculator web service developed
in C# is 400MB, JAVA is 375 MB and PHP is 380.2MB. From these data
the best programming language in memory usage is JAVA web service.

IJCSBI.ORG
Figure 8. Memory Footprint

6.3 Soap Messages
The soap messages input and output tags are identified for calculator web
services and compared. The calculator web services utilize two input and
one output soap message tags in C# and JAVA. PHP calculator web service
utilizes three input and one output soap message tags. The C# and JAVA
languages are better than the PHP language in soap message tag utilization.
6.4 Line of Code
LOC for the calculator web services are estimated and compared. The LOC
for calculator web services developed in C# is 10 lines, JAVA is 10 lines
and PHP is 28 lines. From these the C# and JAVA web service language are
better in terms of LOC.
6.5 Testing
Three web services are tested and their results are compared. C# web
service is developed in .NET IDE which provide validation and verification.
The developer need to write bit of code for testing. Java web service is
developed in eclipse IDE and PHP web service is developed in HTML does
not provide validation and verification. The developer writes methods and
spends much time in testing these web services. C# calculator web service
spends less time on testing compared to other services.
6.6 Scalability
The scalability of web services has been estimated based on the CPU
utilization by adding methods in calculator web services and compared. The
scalability of web services is shown in Figure -9. The C# and PHP
calculator web service scalability are equal and less than the JAVA
calculator web service scalability.

IJCSBI.ORG
Figure 9. Scalability
6.7 Security
After publishing the web services in the server, the security aspects are
identified and compared. Table -2 compares the web services security. In C#
and JAVA web services, code is not editable in the server. PHP web service
code is editable. So, C# and JAVA are the languages that provide better
security.
Table 2. Security on Calculator Web services
Sl. Calculato Security

No r Web
Service
1. C-SHARP After service published
in the server and code is
not editable in the server
(Good)
2. JAVA After service published
and code is not editable
in the server (Good)
3. PHP After service published
in the server
(Average) and code is
editable in the server
6.8 Maintainability
Maintenance of the web service is based on the syntax of the languages. C#
includes the syntax of C++ and java, Java derived its syntax from C and
C++ and PHP includes C and C++ syntax. C#, Java and PHP supports client
and server programming which is easier to maintain by the developer knows
these base languages. Hence the maintenance of these languages are good
for all three languages.

IJCSBI.ORG
The comparisons of quantitative and qualitative elements are presented in
Table-3. The C# calculator web service fulfills the six attributes out of eight
attribute values and JAVA calculator web service fulfills four out of eight
and PHP calculator web service fulfills three out of eight attribute values.
Web service languages ratio [21] of analysis is 6:4:3 and the percentage [22]
is 75%: 62.5%: 37.5%. The outcome of comparison proves that C# language
is better among all the three languages.
Table 3. Analysis of Quantitative and Qualitative elements
Sl. Measured Data Calculator Web Services

No C# JAVA PHP
1 Time Utilization 0.0037 Sec 0.0034 Sec 0.0030 Sec
- -
2 Memory Foot Print 400 MB 375 MB 380.2 MB
- -
3 Soap Messages 3 Tags 3 Tags 4 Tags
-
4 Line of Code 10 Line 10 Line 23 Line
-
5 Testing Good Average Average
- -
6 Scalability Good Average Good
-
7 Security Good Good Average
-
8 Maintainability Good Good Good

Total Full Fill Attributes 6 5 3
7. CONCLUSION
The web service manages the complexity and maximizes the reuse of code
of web services. Construction of software systems with reusable web
services, bring many advantages to industry/individuals. Cost, efforts and
time get reduced in testing, security, scalability and maintainability by
reusing web services. So, these web services are easy to handle in all the

IJCSBI.ORG
platforms which reduce the components when compared to existing
calculator application. This paper focused on quantitative and qualitative
measures of data for the performance evaluation on web services. The
measured data helped the developer to reason out how differently web
services are developed in C#, JAVA, and PHP are used to identify the better
web service. After analyzing the results, C# is the better language for
implementing the simple web services. This paper explains only the
performance evaluation of development and usability of web services for
C#, JAVA, and PHP. The future research work can integrate the web
services as application and to add more modules in different domains to
obtain the quantitative and qualitative elements. The data from these
elements are collected manually can be automated.
REFERENCES
[1] Leonard Kleinrock,Ucla,"History of The Internet And Its Flexible
Future",IEEE,2008.
[2] S. Arroyo, R. Lara, J. Gomez, D. Berka, Y. Ding and D. Fensel, Semantic
Aspects of Web Services: Practical Handbook of Internet Computing, Chapman &
Hall and CRC Press, 2004.
[3] Why develop Web service information available
URL:http://www.vkinfotek.com/webservice/whycreatewebservice.aspx.
[4] Sanjay P. Ahuja Raquel Clark, Comparison of Web Services Technologies from a
Developers Perspective, IEEE, 2005.
[5] Chen Wu and Elizabeth Chang, Comparison of Web Service Architectures Based
on Architecture Quality Properties, IEEE, 2005.
[6] Toyotaro Suzumura, Scott Trent, Michiaki Tatsubori, Akihiko Tozawa , Tamiya
Onodera , Performance Comparison of Web Service Engines in PHP, Java, and
C, IEEE, 2008.
[7] Tommi Aihkisalo, Tuomas Paaso, A Performance Comparison of Web Service
Object Marshalling and Unmarshalling Solutions, IEEE, 2011.
[8] Framework Definition Available URL:
http://whatis.techtarget.com/definition/framework
[9] Roger S. Pressman Software Engineering : A Practitioner's Approach,6 th
Edition, Tata McGraw,2010.
[10] Sommerville, Software Engineering", eighth edition, Pearson Education, 2011.
[11] C.R Kothari, Research Methodology Methods and Techniques, New Age
International (P) Limited, 2004.
[12] Dranidis.D, Ramollari.E, Kourtesis.D,"Run-time Verification of Behavioural
Conformance for Conversational Web Services",IEEE,2009.
[13] Q.Hu,A.Vandecappelle, P.G.Kjeldsberg,F. Catthoor,M.Palkovic, Fast Memory
Footprint Estimation based on Maximal Dependency Vector
Calculation,IEEE,2007.
[14] Al-Shammary. D,Khalil.I ,"Dynamic Fractal Clustering Technique for SOAP Web
Messages",IEEE,2011.
[15] Morozoff.E.P, "Using a Line of Code Metric to Understand Software ",IEEE,2010.

IJCSBI.ORG
[16] YK Singh, Fundamental of Research Methodology and Statistics, New Age
International (P) Limited, 2006.
[17] Minzhi Yan,Hailong Sun,Xu Wang,Xudong Liu, "WS-TaaS: A Testing as a
Service Platform for Web Service Load Testing",IEEE,2012.
[18] Litoiu.M,"Migrating to Web services - latency and scalability",IEEE,2002.
[19] Bertino.E, Martino.L, Tutorial 6: Security in SOA and Web
Services,IEEE,2006.
[20] Jinbo Huang,Anqing Liu," Research on the maintainability analysis and
verification methods of armored equipment based on virtual reality technology
",IEEE,2011.
[21] S.C. Gupta, V.K. Kapoor Sultan Chand & Sons, Fundamentals of Applied
Statistics, 4th Edition ,Sultan Chand ,2010.
[22] R.Panneerselvam, Research Methodology, PHI, New Delhi,2005.

Semi-Automated Polyhouse
Cultivation Using LabVIEW
Prathiba Jonnala
School of Electronics, Vignan University
Vadlamudi, Andhra Pradesh, India
Sivaji Satrasupalli
School of Electronics, Vignan University
Vadlamudi, Andhra Pradesh, India
ABSTRACT
The optimum solution for polyhouse maintenance with minimum hardware and human
effort is developed. By using this proposed model the temperature inside the polyhouse can
be measured and controlled. For measuring the temperature inside the polyhouse LM35
sensor is used. And for maintaining constant temperature inside the polyhouse cooling fan
is used. By using national instruments ELVIS-II board these hardware components are
interfaced. The overall implementation is done with the help of LabVIEW programming.
The proposed model provides the cost effective solution and has the advantage of easy
installation. This model operates on the given threshold value of temperature. Whenever the
temperature increases beyond the programmed value the cooling fan will starts working to
lower the temperature without any further manual instruction. By provision of this
automatic cooling function the human effort can be reduced to a maximum extent and
cultivation can be performed in a fruitful way by providing optimum conditions for the
growth of the plants. For this proposed model the threshold value temperature is taken as
35oC. The status of the environmental conditions inside polyhouse can be observed in the
computer with the help of LabVIEW. The GUI provided by LabVIEW shows the value of
temperature and its conditional parameters and status of cooling fan.
Keywords
DAQ, ELVIS II, LabVIEW, LM35, Polyhouse cultivation, Sensor.
1. INTRODUCTION
By using polythene sheets polyhouses are constructed to provide secured
and controlled environment for the proper growth of the plant. As the plants
grow in a controlled environment inside a polyhouse gives the advantage of
high yield irrespective of environmental changes,climatic changes and also
location. Also it provides suitable environment for the growth of the plant
and protect the plants growing inside the polyhouse from abnormal weather
conditions and from different plant diseases. The required environment for
plants growth and increased productivity can be met by adopting polyhouse
cultivation method. The automation of polyhouse is crucial for controlled
the environment inside the polyhouse. For the proper growth of the plant
and for high yield of production, the monitoring and controlling of different
climatic parameters need to be maintained continuously. Few of the
parameters that can be monitored and controlled are temperature, humidity,

soil moisture and light intensity inside the polyhouse. For good health
conditions of the plant and increased crop yield the above mentioned
parameters are to be monitored on fixed intervals.
Various papers are published based on the automation of polyhouse over the
past few years with different mechanisms of controlling modules and
monitoring stations by using various technologies. In the Table 1 the
existing mechanisms are compared based on various controlling parameters.
Purna Prakash Dondapati proposed an Automated Multi Sensor Green
House Management system [1] which explains how to overcome the effects
and disadvantages which are observed in the normal cultivation without
human observation. It also explains the effective working of sensors which
help the project to become automated to yield more useful results in
cultivation.Greenhouse Automation System[2] proposed by Uday A.
Waykole used WSN technology in which Zigbee is used for node to node
communication. In this system [2], wireless sensor network has been used
for collecting information from point to point and the environmental
parameters inside the greenhouse are measured by sensors and the collected
data is stored in a database and is furthertransferred to the receiving station.
Finally the information so far receivedby the receiving station is displayed
with the help of LCD display unit andfurther monitored.
The system [3] is an automated greenhouse management system proposed
by Sumit A. Khandelwal used GSM for node to node communication.
Compared to the previously stated systems [1] and [2] this system monitors
more environmental parameters such as fire, absence of light and rain in
addition to shade, light and controlling of motor pump in addition to the
previously mentioned parameters. As it monitors more number of crucial
parameters, it has the advantage of increasing the productivity of crop and
also has the provision of remote access and control as it uses GSM.
InduGautam has proposed a system [4] titled Innovative GSM Bluetooth
based Remote Controlled Embedded System for Irrigation which has the
advantage of using both GSM and Bluetooth based on the position of farmer
or end user. When the user is within the distance of 10 meters from the
controlling unit, using Bluetooth technology the information will be
transmitted to the user through the Bluetooth module and whenever the user
is not in thepremises of 10 meters the information will be transmitted to the
user by using GSM module. This system provides the information about
electricity consumption, temperature, humidity, water level and also about
any fire accidents which can monitored with the help of smoke sensor MQ2.
The main advantage of this system is monitoring more parameters and cost
effective solution when communicating with the user/farmer.
K. Ranganand T. Vigneswaran proposed Embedded Systems Approach to
Monitor Green House [5]. This system [5] monitors the parameters like
Humidity, Water pH, Soil wetness, Light intensity and temperature with

thehelp of respective sensor units which were placed in various locations

and the data is collected by the main controlling unit, the acquired data is
processed and analysed according to the program and sends the information
to the receiving end using GSM modem. Kiran Sahu and Mrs. Susmita
Ghosh Mazumdar proposed a Digital Greenhouse Monitoring and
Controlling System based on Embedded System [6], which is a wired
technique for monitoring parameters such as temperature, humidity, soil
moisture and sunlight of the existing environmentinside the Greenhouse for
achieving proper plant growth and high yield of the crop. Rajeev G
Vishwakarma and Vijay Choudhary proposed a system called Wireless
Solution for Irrigation in Agriculture [7] which helps the farmer to control
up to eight devices from remote location using GSM with the specific
commands for controlling water motors etc., thereby utilizing less man
power and human effort. Vandana Pandya and Deepali Shukla proposed a
system [8] of GSM Modem Based Data Acquisition System collects the
information about temperature, rainfall, humidity etc. and the acquired data
is processed by AT mega 644P and using GSM the collected information
will be transmitted to the end user through SMS. G.K. Banerjee and Rahul
Singhal proposed a Microcontroller based Polyhouse Automation Controller
[9], which measures the information about temperature and humidity inside
the polyhouse and uses LCD display for monitoring the same, the whole
system is based on wired communication.
Table 1.Comparison of Existing Remote Monitoring and Control Systems [11]
References Technology Processor Monitoring
Station
[1] Wired AT89C52 LCD Display
[2] Zigbee, WSN PIC LCD Display
[3] GSM Modem (-) PC
[4] GSM Modem, PIC16F877A Mobile
Bluetooth
[5] GSM Modem PIC16F877A Mobile, LCD
[6] Wired AT89C51 LCD Display
[7] GSM AT89C51 Mobile, LCD
[8] GSM ATmega 644P Mobile, PC
[9] Wired PIC16F877A LCD Display
Both the proposed systems [1] and [6] are based on the wired technology
and uses LCD display in the monitoring stations for monitoring the
parameters information, the controller inside these two systems is AT89C52
and AT89C51. As the PIC microcontrollers has the advantage of inbuilt
ADC, the above mentioned system [9] used PIC 16F877A and also the
same wired technology and LCD display for monitoring station like the

systems proposed in [1] and [6] are used. Whereas the system [7] is
implemented with wireless technologywhich uses GSM for communication,
for processing and analysing AT89C51 microcontroller is used and for
monitoring LCD is used. The models [4] and [5] used PIC16F877A
controller which uses GSM for communication and mobile is the monitoring
station. In contrast to [5], the system [4] used the Bluetooth technology for
communicating the information to the end user, provided if the end user has
the provision of Bluetooth option in the mobile device. The system [8] uses
PC for monitoring and GSM for communication and main controlling
module is AT mega Arduino.
The problems and complexities of the above mentioned techniques are
overcome with the proposed system of polyhouse cultivation using
LabVIEW. Here the external requirement of hardware is minimum. The
hardware modules used in the proposed system are Temperature sensor for
monitoring the temperature inside the polyhoue and a cooling system like
FAN to maintain the required temperature. For monitoring the temperature
one LM35 temperature sensor is used. And for maintaining constant
temperature cooling fan is used. The overall monitoring of the proposed
system can be done by using LabVIEW software. For this proposed model
PC is the main central monitoring unit.
2. HARDWARE DESIGN OF PROPOSED

POLYHOUSEAUTOMATION SYSTEM
2.1 Hardware architecture
Temperature Sensor LabVIEW

ELVIS II
(LM35) SMX 4.4
Cooling FAN
Figure 1. Block level representation of Proposed Model

LM 35 is used to read the ambient temperature in the target place that is
inside the polyhouse. The output of LM35 is analog voltage with range 0-
5V. Zero volts represent the 0oC and increases by 10 mV for every 1oCrise
in temperature. The output of LM35 is interface to ELVIS-II board with
input and output pins. Data acquisition system of LabVIEW is used to read

the analog voltage from ELVIS-II board. After acquiring the required data
from DAQ, it is compared with the preset temperature. So that if the
ambient temperature is greater than preset temperature the DC motor will be
ON which in turns ON the fan, here irrespective of the ambient temperature
the fan will be ON state for three minutes.The proposed system is designed
in such a way that after every three minutes the same process repeats.
2.2 ELVIS-II
It was developed by national instruments (NI). ELVIS-II series combines
hardware and software into one complete laboratory suit. ELVIS-II series
prototyping board connects to the work station. The prototyping board
supports all the signal terminals of the NI ELVIS-II series for use through
the distribution strips on either side of the bread board area. It has the
availability of eight different AI channels. In the proposed system LM35
output is connected to AIO. As ELVIS-II has variable power supply, the
power required for LM35 is also acquired from ELVIS-II only.
The NI ELVIS mx software includes SFP instruments, LabVIEW express
VIs, and signal express blocks for programming the ELVIS-II hardware. To
access the NI ELVIS mx express VIs, open LabVIEW block diagram and
select measurement I/O. It has got 24 digital I/O pins. Out of these one is
used for LED and the other for switching ON/OFF The temperature
controlling element is FAN.
2.3 Temperature sensor
Figure2.LM35
LM35 is a temperature sensor with output range from 0oC-100oC. It has the
sensitivity of 10 mV/oC. It has very good linear characteristics and various
advantages which are suitable for industrial applications. The output of this
device is analog voltage with inbuilt signal conditioning. It requires +5V dc
supply. For this proposed model, is operated in the range of 30oC to 50oC
and it shows suitable linear characteristics as repeated for 10 times making
less hysteresis. Among the availability of various temperature sensors LM35
is used for the implementation of the proposed system because of its
different advantages like more accuracy than using Thermistor, as it is
sealed provides no oxidation, produces high output voltages than
thermocouples which leads to no further provision of amplification. And it
does not require any external calibration or trimming to provide typical

accuracies of 14C at room temperature and 34Cover a full of 55 Cto

+150C temperature range. It has very low self-heating of less than 0.1C in
still air, as it draws only 60 A from its supply. The LM35 is rated to
operate over a 55C to +150C temperature range.
3. FLOW CHART
Start
Temperature Sensor LM35

gives output voltage
proportional to Temperature
AIO pin of ELVIS II will read

the analog voltage
DAQ in LabVIEW acquires the

analog voltage
DAQ output 6volts applied

voltage > to digital IO Fan
preset value pin of ELVIS on
II
Wait for 2mins
Read the DAQ

voltage
Figure 3. Flow chart for the proposed model

The operation of proposed system is shown above.LM35 is used to sense the
ambient temperature and its internal signal conditioning circuits will
condition the output voltage to 0-5 volts, which is proportional to the
ambient temperature with sensitivity of 1mV/oCi.e., for every 0oC rise in
ambient temperature the output voltage of LM35 increases by 1mV.The
analog output of LM35 is interfaced to ELVIS II workstation with AIO
(analog input port 0). LabVIEW is used to acquire the data from ELVIS II

board using DAQ assistance. DAQ is configured for analog input voltage
through the pin AIO. So the output of DAQ is voltage which represents the
ambient temperature. Another DAQ is configured for digital output. Now
compare the analog voltage of DAQ with preset voltage of 3.5mV, so that if
temperature is greater than 35oC,6V digital output is written to the digital IO
pin of ELVIS II and it is connected to cooling fan and the fan goes to ON
state,otherwise it writes 0v to digital IO pins of ELVIS II so that cooling fan
will be in OFF state. It was programmed in such a way that if cooling fan is
powered once it will continue in the same state for 3 minutes to avoid
frequent switching in between the fan ON/OFF states.
4. SOFTWARE DESCRIPTION
Figure 4. LabVIEW
It is a graphical programming environment developed by national
instruments (NI). LabVIEW 2013 version is used for implementing the
proposed model. It has two windows, one is front panel and the other is
block diagram. Front panel shows the control elements and indicators.
Block diagram is used to connect different blocks to realize the application.
In this package, it is required to install ELVIS mxs. In block diagram go to
function and select express VI then pick DAQ for acquiring data from
ELVIS-II. And can easily configure the express VI according to the
requirements of the application.
5. RESULTS
The Table 2 shows some of the readings of LM35 sensor and its actual
computed value. When the reading of LM35 is observed as 0.30 mV the
actual ambient temperature inside polyhouse is 30oC. Similarly, whenever
the temperature reading is 0.60 mV the corresponding temperature would be
60oC.The status of the cooling fan is dependent on the value of temperature
sensor output. For the case of 0.30 mV given by LM35 the cooling fan will
be in OFF mode,as the programmed threshold value is 0.35 mV. And during
the cases of 0.40, 0.50 and 0.60 the fan will be in ON state only.

Table 2. Tabular Wave Form about Temperature readings

S.No. LM 35 Reading Actual Value
(mV) (oC)
1. 0.30 30
2. 0.35 35
3. 0.40 40
4. 0.50 50
5. 0.60 60
The Figure 5 shows how ELVIS II board which is interfaced with the
temperature sensor and cooling fan.
Figure 5. ELVIS II Board

Figure 6. Simulation setup of the proposed model

The overall setup of the proposed model can be seen in the Figure 6.
Figure 7. Block level implementation in LabVIEW

The overall software implementation of the proposed model can be observed
from the Figure 7. It shows the blocks of logic level indicators and the
functional loops based DAQ and ELVISmx.

Figure 8. Variation of temperature readings of LM35 and position of fan when

temperature is below 35oC
The variation of temperature curve when the temperature is below 35oC can
be observed from the Figure 8. It is the final output screen which shows the
temperature level, logical indicators and status of fan.


temperatureis greater than 35oC
When the temperature is greater than the threshold value of 0.35mV the
position of the temperature curve appears as shown in Figure 9. Similarly
across 0.35mV the waveforms of the temperature isshown in Figure 10.

temperature is equal to 35 oC
6. CONCLUSIONS
The problem of providing constant temperature for the plant growth which
in turn increases crop production can be achieved with the minimum
hardware requirement and less utilization of human effort by using this
proposed model.Other climatic parameters like humidity, soil moisture
content, light intensity, carbon dioxide levels inside the polyhouse can be
measure, monitored and required control action can be performed by
interfacing the above said parameter monitoring sensors and processing the
acquired data which in turns performs the required action results in the
proper plant growth and increased crop yield. As the proposed system has
the advantage of the cooling system the temperature inside the polyhouse is
monitored and controlled at regular intervals. It is evident that the proposed
system has reduced the human effort to its minimum. Also, it has the
advantage of cost effectiveness. It needs only the monitoring sensor and
cooling fan with the additional controlling module of national instruments
ELVIS-II board. All these were controlled by using the LabVIEW software.

REFERENCES
[1] PurnaPrakashDondapati and K. GovindaRajulu. An Automated Multi Sensored Green
House Management.Journal of Technological Exploration and Learning (IJTEL),Vol.
1, Issue 1,August 2012, 21-24.
[2] Waykole UA. Greenhouse Automation System.In proceedings of the 1st International
Conference on Recent Trends in Engineering & Technology, Special Issue of
International Journal of electronics, Communication & Soft Computing Science &
Engineering, 2012, 161-166.
[3] Sumit A. Khandelwal. Automated Green House Management Using GSM
Modem.International Journal of Computer Science and Information Technologies,Vol.
3(1), 2012, 30993102.
[4] InduGautam and S R N Reddy. Innovative GSM Bluetooth based Remote Controlled
Embedded System for Irrigation. International Journal of Computer Applications,Vol.
47, No.13, June 2012, 1-7.
[5] K. Ranganandand T. Vigneswaran. An Embedded Systems Approach to Monitor
Green House,Recent Advances in Space Technology Services and Climate Change
(RSTSCC), 2010, 61-65.
[6] KiranSahu andSusmitaGhoshMazumdar. Digitally Greenhouse Monitoring and
Controlling of System based on Embedded System. International Journal of Scientific
& Engineering Research,Vol. 3, Issue 1, January 2012,1-4.
[7] Vishwakarma RG andChoudhary V. Wireless Solution for Irrigation in Agriculture.In
the Proceedings of International Conference on Signal Processing, Communication,
Computing and Networking Technologies (ICSCCN),July 2011, 61-63.
[8] VandanaPandya anddeepaliShukla. GSM Modem Based Data Acquisition System.
International Journal of Computational Engineering Research, IJCER, Vol. 2, Issue 5,
September 2012, 1662-1667.
[9] Banerjee GK andSinghal R. Microcontroller based Polyhouse Automation Controller.
International Symposium on Electronic System Design (ISED),December 2010, 158-
162.
[10] SenguntharGayatri R andEkataMehul. A Survey on Greenhouse Automation
Systems.International Journal of Engineering & Science Research, IJESR, Vol. 3,
Issue2, February 2013, 2321-2327.
[11] National Semiconductor,Precision Centigrade Temperature Sensors, LM35,data
sheet,November 2000.
[12] J. Travis and J. Kring. LabVIEW for Everyone: Graphical Programming Made Easy
and Fun, 3rd edition. Prentice Hall, 2006.

IJCSBI.ORG
Performance of Biometric Palm Print

Personal Identification Security
System Using Ordinal Measures
V. K. Narendira Kumar
Assistant Professor, Department of Information Technology,
Gobi Arts & Science College (Autonomous),
Gobichettipalayam 638 453, Erode District, Tamil Nadu, India.
Dr. B. Srinivasan
Associate Professor, PG & Research Department of Computer Science,
Gobi Arts & Science College (Autonomous),
Gobichettipalayam 638 453, Erode District, Tamil Nadu, India.
Abstract
Personal recognition is individual of the most imperative necessities in each and every one
e-commerce and against the law detection appliance. In this structure, a work of fiction
palm print demonstration method, that is to say orthogonal line ordinal features, is
proposed. The palm print register, characteristic extraction, palm turn out substantiation
and palm print acknowledgment component are intended to administer the palm prints and
the palm print list component is premeditated to amass their palm prints and the person
particulars in the database. The feature extraction component is future to take out the
ordinal dimensions intended for the palm prints. The verification component is intended to
confirm the palm print with the individual classification evidence. The recognition
component is planned to come across out the applicable human being connected with the
palm produce image. The future palm print detection method uses the concentration and
vividness to determine the ordinal dimension. The ordinal procedures be predictable
designed for the 4 X 4 section of the palm print images.
Index Terms Biometrics, Palm Print, Recognition, Ordinal Measures, Feature

Extraction.
1. Introduction
Several identification and verification schemes that exist today but the most
accurate identification is in the area of biometrics. Some examples of
identifying biometric characteristics are fingerprints, hand geometry, retina
and iris patterns, facial geometry, and signature and voice recognition.
Biometric identification may be preferred over traditional methods (e.g.
passwords, smart-cards) because its information is virtually impossible to
steal. Although in some cases it may become possible to impersonate a [2].

IJCSBI.ORG
Two interesting properties of biometric identification are:
1.The person to be identified is required to physically be present at the
point of identification.
2.Identification is based on the biometric technique that does not depend
on the user to remember a password or to carry a token.
There are two distinct functions for biometric devices:
1.To prove you are who say you are
2.To prove you are not who you say you are not.
The personal identification system using palm prints is designed to carry
out the palm print recognition and verification operations. The system uses
the ordinal measurement for the recognition and verification tasks. The
color intensity and brightness values are used in the system. The region
based ordinal measurement estimation is used in this system [1]. The
system does not consider the orientation factors for the palm print
comparison.
The remaining sections are organized as follows: Brief outline of palm print
recognition is presented in section 2. System methodology is presented in
section 3. The system implementation like database, feature extraction,
verification and recognition of palm print images are briefly explained in
section 4. The testing of biometric system is briefly explained in section 5.
Experimental results are given in Section 6. Finally, Section 7 describes the
concluding remarks.
2. Background of the Study

A number of palm print recognition research have been reported in the
literature and most of them address the efficiency of the feature extraction
algorithms. The proposed palm print representation schemes include Eigen
palms (C. Harold and M. Charles, 1943), Fisher palms (X. Wu et al., 2003),
Gabor code (D. Zhang et al., 2003), Competitive Code (W. K. Kong and D.
Zhang, 2004), Ordinal feature (Z. Sun et al., 2005), line features (J. Fonda et
al., 1998), and feature points (D. Zhang and W. Shoo, 1999). However, not
much detail of the palm print acquisition method was provided although the
acquisition process is one of the key considerations in developing a fast and
robust online recognition system. In earlier study, inked-based palm print
images (J. Fonda et al., 1998) (D. Zhang and W. Shoo, 1999) were used.
The palm prints were inked to paper and digitized using scanner. The two-
step process was slow and is not suitable for online system. Recently,
various input sensor technology like flatbed scanner, CCD camera, CMOS
camera, and infrared sensor have been introduced for more straight-forward
palm print acquisition. Among the technology, scanner and CCD camera are

IJCSBI.ORG
the commonly used input devices (C. Harold and M. Charles, 1943) (X. Wu
et al., 2003). Scanner and CCD camera are able to provide very high quality
images with little loss of information. However, the process of scanning a
palm image requires some time (a few seconds) and the delay cannot cope
with the requirement of an online system. Zhang et al. (D. Zhang et al.,
2003) proposed the use of CCD camera in semi-closed environment for
online palm print acquisition and good results had been reported by using
this approach. In this paper, we explore the use of a low-resolution web-cam
for palm print acquisition and recognition in real-time system.
2.1 Objectives of the System
The person identification system is designed to perform the person
identification operation using the palm prints. The palm print recognition
and verification are the main objectives of the system. The ordinal measures
can be used to rank and order the palm print images. The system should
perform the recognition and verification operation orientation
independently. The region based comparison technique can be applied for
the system. The ordinal feature and edge feature are the main features for
the palm print recognition process. The orientation independent personal
identification process is the main motive for the system. The person
identification with palm print recognition and verification is the design goal
of the system.
3. System Methodology
Biometrics makes use of the physiological or behavioral characteristics of
people such as fingerprint, iris, face, palm print, gait, and voice, for personal
identification, which provides advantages over non-biometric methods such
as password, Personal Identification Number, and ID cards [3]. Palm print is
the unique inner surface pattern of human hand, including a number of
discriminating features, such as principal lines, wrinkles, ridges, minutiae
points, singular points, texture etc. Compared with other biometric traits, the
advantages of palm print are the availability of large palm area for feature
extraction, the simplicity of data collection and high user acceptability.
Various palm print representations have been proposed for recognition such
as Line features, Feature points, Fourier spectrum, Eigen palms features,
Sobels and morphological features, Texture energy, Wavelet signatures,
Gabor phase, Fusion code, Competitive code, etc.The Iris biometric scheme
is attempt to explore a convincing solution, that of using ordinal measures,
as an answer to the representation problem. The representation of ordinal
measures unifies several state-of-the-art palm print recognition algorithms
of David Zhang. The internal representations of those algorithms can be
seen as special cases of ordinal measures. This form a framework, which
may help to understand the discriminate power of palm print pattern, guide
further research and enlighten new ideas (see figure 1).

IJCSBI.ORG
Personal Identification
System
Palmprint DB Feature Verification Recognition

Extraction
Palmprint Feature Identify Image Selection

Register
Palmprint list Ordinal Feature

Measurement Comparison
Palmprint View
Image Selection Result Details Ordinal List Ordinal Measures
Figure 1: Palm Print System Architecture

3.1 Ordinal Measures
Ordinal measures come from a simple and straightforward concept. For
example, system could easily rank or order the heights or weights of two
persons, but it is hard to answer their precise differences [7]. For computer
vision, the absolute intensity information associated with an object can vary
because it can change under various illumination settings. However, ordinal
relationships among neighborhood image pixels or regions present some
stability with such changes and reflect the intrinsic natures of the object.
The symbols denote the inequality between the average intensities of two
image regions. The inequality represents an ordinal relationship between
two regions and this yields a symbolic representation of the relations. For
digital encoding of the ordinal relationship, only a single bit is used. For
example, 1 denotes A>B and 0 denotes A<B and the equality case
can be assigned to either.
3.2 Unified Framework for Palm print Recognition
A general framework of palm print recognition is based on the ordinal
representation is proposed. To input a palm print image, the central sub
image in the aligned coordinate system is cropped from it for feature
extraction. To obtain the special measurements for ordinal comparison, the

IJCSBI.ORG
normalized palm image is transformed to feature image. Then the ordinal
measures are obtained by qualitatively comparing several quantities in
feature image. In practice, the transformation and ordinal comparison can be
combined into one step via differential filtering. The result of ordinal
comparison may be the sign of an inequality, the rank order of all
measurements involved in comparison, maximum or minimum value
associated index and so on. After ordinal comparison, all results are
coarsely quantized into binary bits so as to strengthen the robustness of
palm feature and facilitate matching step. All binary codes are concatenated
to generate a palm print feature, which is the input of matching engine.
Finally, the dissimilarity between the input palm prints ordinal feature and
the template stored in database is measured by their Hamming distance.
The framework has some desirable properties for palm print recognition:
The ordinal measure renders the palm print representation robust against
various intra-class variations such as illumination settings, dirties or sweats
on palm, signal noises, pose change, misalignment and nonlinear
deformations [4].
Each bit palm print feature code represents an ordinal relationship among
several image regions, which is rich of information. Because the palm print
code has equal probability to be 1 or 0 for an arbitrary pattern, its entropy is
maximized. Although the discriminability of a single palm code is limited, a
composite palm template formed by numerous of ordinal feature codes has
sufficiently high degrees-of-freedom to differentiate all individuals in the
world. Thus the randomness of palm print pattern is well encoded. The
dissimilarity between two palm prints can be measured by bitwise XOR
operator, which could be computed on-the-fly [8].
3.3 Gabor Based Representations
Based on the proposed framework, the system the Gabor based
representations proposed by Zhang, which reported the best recognition
performance in literature, are special cases of ordinal measures. Gabor
based encoding filters used in palm code are essentially ordinal operators.
For odd Gabor filtering of local palm print region, the image regions
covered by two excitatory lobes are compared with the image regions
covered by two inhibitory lobes. The filtered result is qualitatively encoded
as 1 or 0 based on the sign of this inequality. Similarly, even Gabor
generated palm code is mainly determined by the ordinal relationship
between one excitatory lobe-covered region and two small inhibitory lobes-
covered regions. Because the sum of original even Gabor filters
coefficients is not equal to 0, the average coefficient value is reduced from
the filter to maximize the information content of the corresponding palm
code.

IJCSBI.ORG
However, ordinal relationship is not restricted to intensity measurement. As
a byproduct of Gabor phase measure the orientation energy or magnitude
was also obtained by orthogonal Gabor filtering. Thus it is possible to
combine ordinal intensity measures and ordinal energy measures together.
In, the local energy along four different orientations were compared each
other to obtain the maximum. Then the palm print is represented using the
Gabor filtered ordinal intensity measures whose basic lobes are along the
maximum energy orientation. Fusion code was demonstrated as a more
discriminative representation than single type ordinal measure based
representation. (a) Odd Gabor filter (orientation= 450). (b) Ordinal
comparison of image regions using odd Gabor filter, + denotes excitatory
lobe covered image region and - represents inhibitory lobe covered image
region. (c) Even Gabor filter (orientation= 450). (d) Ordinal comparison
of image regions using even Gabor filter.
The field of palm print recognition, competitive code proposed by Kong
performs the best in terms of accuracy. There, each palm print image region
has a dominant line segment and its orientation is regarded as the palm print
feature. Because the even Gabor filter is well suited to model the line
segment, it was used to filter the local image region along six different
orientations, obtaining the corresponding contrast magnitudes. Based on the
winner-take-all competitive rule, the index of the minimum contrast
magnitude was represented by three bits, namely competitive code. The
success of this method also depends on ordinal measures because the
process of competition for winner is based on ordinal comparison
essentially. The ordinal measures, these algorithms all perform well in large
scale testing, in terms of both accuracy and efficiency. Therefore, the
ordinal measures are perhaps the most suitable representation for palm
print-based identification system.
3.4 Orthogonal Line Ordinal Features
The ultimate purpose of the proposed framework is to guide the
development of new algorithms. Following the framework, a possible
improvement could be made by choosing well-designed ordinal measures as
the palm print representation, into which the characteristics of palm print
pattern should be incorporated. A novel palm print representation, namely is
proposed, Orthogonal Line Ordinal Features (OLOF), where normalized sub
image is referenced by finger gaps using an algorithm similar to Zhang.
OLOF is so called because the two regions involved in ordinal comparison
are elongated or line-like, and the two are geometrically orthogonal.

IJCSBI.ORG
Figure 2: Orthogonal line ordinal features for palm print recognition.

The ideas are motivated by the most stable and robust ordinal measures
available in palm print pattern, i.e. randomly distributed negative line
segments versus their orthogonal regions. In low-resolution palm print
images, the line patterns are mainly constituted by principal lines and
wrinkles, whose intensity is much lower than their orthogonal regions. Of
course detection of all line segments in palm print is impossible in real time
applications. Nevertheless if system applies thousands of ordinal operators
onto a palm print image, most of them correspond to robust ordinal
measures. This assumption is verified in the following experiments. 2D
Gaussian filter to obtain is used the weighted average intensity of a line-like
region. Its expression is as follows:
f(x, y,) = [exp[- (x cos +y sin ) 2 / x] - [(-x sin + y cos ) 2 ] /
y] (1)
Where denotes the orientation of 2D Gaussian filter denotes the filters
horizontal scale and y denotes the filters vertical scale. Control the scale
ratio x /y higher than 3 to make its shape like a line. The orthogonal line
ordinal filter, comparing two orthogonal line-like palm print image regions,
is specially designed as follows:
OF () = f(x, y,) f(x, y, + / 2) (2)

IJCSBI.ORG
For each local region in normalized palm print image, three ordinal filters,
OF (0), OF (/6) and OF( / 3) , are performed on it to obtain three bit
ordinal codes based on the sign of filtering results. Finally, three ordinal
templates named as ordinal code are obtained as the feature of the input
palm print image see figure 2. The matching metric is also based on
Hamming distance.
4. Palm Print System Implementation
The palm print verification and recognition system is implemented to
perform the person identification operations. The palm prints are one of the
important biometric features that are used to identify a person with unique
properties. The palm print and relevant person details are maintained in the
database. Palm print features such as line, edge, ridge and orientation
features are maintained in the database. This system uses a new feature
named as dominal features. The dominal features are extracted with
reference to the color and texture properties. The person identification task
is divided into two types. They are biometric verification and recognition
operations. The palm print is used as the biometric features for the person.
The verification process performs the similarity checking for the given
person id and the palm print image. The palm print database image for the
specified person is compared with the input image. The verification process
requires only one comparison process. The recognition is the process of
identifying an unknown person with reference to the biometric features. In
this mechanism the palm print database images are compared with the given
input image. Most relevant image is referred as similar palm print for the
person.
The person identification using palm prints is developed as a graphical user
interface based application. The Java language is used as the front end and
the Oracle relational database is used as the back end for the system. The
system is divided into four major modules. They are palm print database,
feature extraction, verification and recognition. To administer the palm print
and person details under the database. The feature extraction module is
designed to fetch the ordinal and line features for the palm prints. The palm
print verification for the given person is performed in the verification
module.
4.1 System Development
The personal identification using palm print system is developed as a
graphical user interface based tool. The system is developed using the J2EE
language and the Oracle relational database. Palm print features and person
details are maintained in the database. The palm print images are maintained
with their id. A separate folder is allocated to maintain the palm print
images.

IJCSBI.ORG
An online system captures palm print images using a palm print capture
sensor that is directly connected to a computer for real-time processing. The
palm print images are captured in two ways. They are image scanners and
thermal scanners. The thermal scanners are used to capture the palm print
images directly from the hand. The image scanners are used capture the
palm print images from pictures. The images are maintained in two
formats. They are Joint Photographical Expert Group (JPEG) and Graphical
Interchange Format (GIF). The palm print images are divided into 4 x 4
regions. The 16 regions are used for the feature extraction process. The
color brightness and intensity values are used for the feature extraction
process. The feature values are maintained for each image under the
database. The verification process is done with the person id and the palm
print image. The recognition process is carried out with the palm print
image only. The verification and recognition operations are conducted with
a threshold value for the similarity rate estimation process.
4.2 Palm Print Database
The palm print database module is the initial module in this application. The
personal detail and palm print data are maintained in the database. Palm
prints are maintained as image files. The palm print database updates the
palm print image name under the database. The palm print database also
maintains the color and ordinal features in the database. The ordinal values
for 16 regions are maintained in the database. The system also maintains the
rank or order for the palm prints with reference to its ordinal values. The
ranking values are used in the recognition process. The person details and
ordinal features are maintained in two different tables [5].
The palm print database module is divided into three sub modules. They are
palm print registration; palm print list and palm print view. The palm print
registration sub module is designed to register a person details with the palm
print information. The person name, address, city, state, country and e-mail
details collected and updated into the database. The system assigns a unique
person identification number for the registered persons. The palm print
image path is also registered into the palm print registration process [9]. The
palm print is copied into the images folder maintained in the application
path. The palm print image is renamed with reference to the person
identification number assigned by the system.
The file open dialog window option supports the palm print path selection
process. The user can easily select the path for the palm print image. The
palm print list sub module shows the list of registered person details with
the palm print details. The palm print list is prepared by using the palm print
database. The palm print image view sub module is designed to print the
palm print for the selected person from the palm print list.

IJCSBI.ORG
4.3 Feature Extraction
The feature extraction module is designed to extract and update the palm
print image features. The recently updated palm print image features are
extracted and updated under the database. The color and texture feature are
the main feature that considered in this system. The color and texture
feature is referred as ordinal features. The line features are used for the
comparison process. The palm print image pixel values are used for the
feature extraction process. The color brightness and intensity values are
used for the feature extraction process.
The feature extraction module is divided into two sub modules, feature
identification and ordinal measurement. The feature identification process is
carried out over the regions on the palm print images. The palm print image
is divided into 4 x 4 regions. All the 16 image regions are passed into the
feature extraction process. The color intensity and brightness for each block
is compared with another palm print image. The ordinal values are
estimated with reference to the color brightness and intensity values. The
ordinal values are represented as 0s and 1s. The block ordinal value is
summed and they are compared with the similar block in the other palm
print image. The comparison results are also maintained as 0s and 1s. The
total ordinal measure for the image is calculated and assigned for the image.
The ordinal measurement sub module displays all the ordinal measurements
and ratio level for each image.
4.4 Palm Print Verification
The verification process performs the palm print comparison process only
one iteration [1]. The person id and input palm print image are collected
from the user. The palm print selection module is designed for the palm
print selection process. The feature for the input palm print is extracted and
compared with the palm print image features that are maintained for the
relevant person under the database. The feature comparison module is
designed to carry out the comparison process [6]. The ordinal features and
line features are used for the feature comparison process. The false
acceptance ratio and false rejection ratio are used for the comparison
process. The final result of the verification process is given as the palm
print which is similar to the referred person.
5. Testing of Biometric System
Biometrics can be fooled by presenting an artificial or simulated palm print
biometric sample. What is required for protecting against these attacks, is
liveness testing: an automated test performed to determine if the sample
presented to the system comes from a live human being. Consider Examples
for palm print [7]:

IJCSBI.ORG
A palm print scanner by Sony FIU-500 is an optical sensor. However, it
tests for liveness by measuring the capacitance of the skin. If it is not within
norms, the sample is rejected. A method developed at West Virginia
University captures the perspiration pattern using a capacitive palm print
scanner. Live palm sweats, pores fill with moisture, and the capacitance
changes. The measurement is done in 5 seconds. Other things than
capacitance can be measured: Properties of a living body: Mechanical:
weight, density, elasticity. Electrical: resistance, impedance, dielectric
constant. Visual: color, opacity, appearance. Signals generated by a living
body: Pulse, blood pressure, heat, thermal gradients, electrical signals
generated by the heart, etc. Responses to a stimulus: Voluntary: Tactile,
visual. Auditory: respond to feeling/seeing/hearing something. Involuntary:
Electromyography, pupil dilation, reflexes.
6. Experimental Results
Palm print images from 284 individuals using the palm print capture device
as described in are collected. During the data sets 108 people are male and
the age allocation of the area under discussion is: about 89% are younger
than 30, about 10% are aged between 30 and 50, and about 1% is older than
50. The palm print images were collected on two separate occasions, at an
interval of around two months. On each occasion, the subject was asked to
provide about 10 images each of the left palm and the right palm. The size
of all the test images used in the following experiments was 384284 with a
resolution of 75dpi. The database is divided into two datasets, training and
testing. Testing set contained 9,599 palm print images from 488 different
palms and training set contained the rest of them.
The training set is used to adjust the parameters of the Gabor filters only.
All the experiments were conducted on the testing set. We should
emphasize that matching palm prints from the same sessions was not
counted in the following experiments. In other words, the palm prints from
the first session were only matched with the palm prints from the second
session. Number of genuine and imposter matching are 48,276 and
22,667,422 respectively.
A feature-level coding scheme for palm print identification is presented. On
the top of Palm Code, a number of improvements for developing Fusion
Code are made. 1) The circular Gabor filter in Palm Code is replaced by a
bank of elliptical Gabor filters. 2) A feature level fusion scheme is proposed
to select a filter output for feature coding. 3) The static threshold in Palm
Code is replaced by the dynamic threshold. A series of experiments has
been conducted to verify the usefulness of each improvement.
Ordinal measures are robust against illumination, contrast and misalignment
variations. The experimental results proved that ordinal measures are the
most suitable for palm print feature representation. On the other hand, the

IJCSBI.ORG
results also proved the statistical richness of ordinal information in palm
print. Although these palm codes are all based on ordinal measures, their
recognition performance could differ to a large extent due to the use of
different ordinal operators. In terms of accuracy, ordinal code performs the
best, followed by competitive code, fusion code and palm code. Another
compelling advantage of our ordinal code is that its processing speed is
nearly twice fast as that of competitive code.
6.1 Performance of Palm Print Biometrics
In this experiment, different numbers of elliptical and circular Gabor filters
are examined. For the circular Gabor filters, I use the previous parameters
for these comparisons. Figure 3(a) shows the four ROC curves obtained
from elliptical Gabor filters. Each of the ROC curve represents different
numbers of Gabor filters used in the fusion rule. Figure 3(b) shows the
results obtained from the circular Gabor filters. In this test, the static
threshold is used, rather than the dynamic threshold. According to Figure 3,
we have two observations. 1) The elliptical Gabor filters perform better than
the circular Gabor filters. 2) Using two filters for fusion is the best choice
for both cases. The first observation can be easily understood. The elliptical
Gabor filters have more parameters so that they can be well tuned for palm
print features. The reason for the second observation is not obvious.
Therefore, another set of experiments is conducted.
In this set of experiments, the elliptical case is considered only. First of all,
the imposter distributions without considering translated matching are
plotted in Figure 4(a). We can see that the imposter distributions from two
to four filters are very similar. Their means, s are 0.497 and standard
deviations, s are around 0.0258. However, the imposter distribution from a
single filter has a relatively large variance. If binomial distribution is used to
model the imposter distributions, the imposter distributions from two to four
filters have around 370 degrees-of-freedom. However, the imposter
distribution from the single filter only has 250 degrees-of-freedom. The
degrees-of-freedoms are estimated by s (1- s)/ s2. These values
demonstrate that using more than two filters cannot improve the imposter
distributions but increasing number of filters from one to two can get a great
improvement. Although increasing number of filters can reduce the
variances of the imposter distributions, it would adversely influence the
genuine distributions. Given two patches of palm prints from the same palm
and same location, if the number of filters is increased, the fusion rule has
high probability to select different filters for coding. To demonstrate this
phenomenon, all the palm prints from the same hand is matched. If the
fusion rule selects the same filter, the matching distance of these local
patches is zero; otherwise it is one. Then, the local matching distances are
summed as a global matching distance for comparing two palm prints. The

IJCSBI.ORG
global matching distance is normalized by the matching area. In other
words, the matching function is still a hamming distance.
Figure 4(b) shows the cumulative distributions of the genuine hamming
distances. We see that the fusion rule using four filters is the easiest to select
different filters. When the hamming distance is shorter than 0.3, the fusion
rule using three filters performs better than that using two filters. It
contradicts our expectation. The reason is that the direction of one of the
three filters is close to one of the principal lines. Thus, it provides an extra
robustness to the filter selection. Nevertheless, when the hamming distance
is longer than 0.3, fusion rule using two filters performs better. This range is
more important since false acceptance tends to happen in that region.
Combining the influences for the imposter and genuine distributions, the
best choice is to employ two filters for fusion. In the following experiments,
I study only the two elliptical filters case.
(a1)

IJCSBI.ORG
(b1)
Figure 3: Evaluation sandwiched between dissimilar numbers of filters used
in fusion, (a1) elliptical Gabor filter and (b1) circular Gabor filters.
(a)

IJCSBI.ORG
(b)
Figure 4: Examination of dissimilar numbers of filters designed for fusion.
The proposed dynamic threshold and original static threshold are
compared. For graphical presentation convenience, I dynamically scale the
hamming distances rather than the threshold. In fact, they have the same
effect. Figure 5 shows their ROC curves. We can see that dynamic threshold
effectively improves the accuracy. Combining all the proposed
improvements including elliptical Gabor filters, fusion rule and dynamic
threshold, the proposed method obtains around 15% improvement for
genuine acceptance rate when the false acceptance rate is 10-6%.
Figure 5: Comparison between dynamic and static thresholds

IJCSBI.ORG
Table 1: Genuine and false acceptance rates with different threshold values,
(a) Verification results and (b) 1-to-488 identification results
(a)
Threshold False Acceptance Rate False Rejection rate
0.317 1.2x10-5% 7.77%
-4
0.324 1.3x10 % 6.07%
0.334 1.0x10-3% 4.15%
-2
0.350 1.0x10 % 2.32%
(b)
Threshold False Acceptance Rate False Rejection Rate
0.309 6.9110-3% 4.56%
0.315 1.3810-2% 3.67%
-1
0.323 1.2410 % 2.61%
0.333 9.6810-1% 1.74%
Table 1(a) lists some false acceptance rates and false rejection rates and the
corresponding thresholds. The results demonstrate that the proposed method
is comparable with the previous palm print approaches and other hand-
based biometric technologies, including hand geometry and fingerprint
verification [7]. It is also comparable with other fusion approaches [5]. A
detailed comparison between Fusion Code and other palm print algorithms
can be in Competitive Coding Scheme for Palm print Identification.
7. Conclusion
The person identification system uses the palm print biometric feature to
identify a person uniquely. The palm print images applied into the
transformation and ordinal measurement evaluation process. The ordinal
measures are used to rank and order the palm print images. The system
performs the region based comparison mechanism to carry out the
recognition and verification operations. The palm print database is updated
with a set of personal details and palm print image collection. The feature
extraction process is verified and the ranking process is also validated. The
palm print recognition and verification operations are tested with a set of
sample image collections. The ordinal measures and similarity ratio are
estimated and verified for each analysis process. The similar palm prints
are identified with high similarity ratio and irrelevant palm print images are
not recognized by the system. The system provides the palm print
recognition mechanism without the orientation related factors. The ordinal
measurement based upon the persons identification mechanism is a feasible
solution for the personal identification requirements.

IJCSBI.ORG
References
[1] A.K. Jain, Ruud Bolle and Sharath Pankanti Palm print Biometrics: Personal
Identification in Networked Society Publisher: Springer; 2nd printing edition 2005.
Page No. 56-64.
[2] D.D. Zhang Palm print Authentication Publisher: Springer; 1st Edition 2004. Page
No. 25-36.
[3] Harold L. Alexander Classifying Palm prints: A Complete System of Coding, Filing
and Searching Palm prints Publisher: Charles C Thomas Pub Ltd 2009. Page No. 98-
108.
[4] John D. Woodward Jr. and Nicholas M. Orlans Biometrics McGraw-Hill Osborne
Media; 1st Edition 2002. Page No. 12-19.
[5] John R. Vacca Biometric Technologies and Verification Systems Butterworth-
Heinemann; 1st edition 2007. Page No. 88-93.
[6] K.S. Thyagarajan Palm print Image Processing with Applications, Focal Press, 2005.
Page No. 256-272.
[7] Kenneth R. Castleman Palm print Image Processing Prentice Hall 2010. Page No. 72-
77.
[8] Patricia Anne Kolb H.I.T: A manual for the classification, filing, and retrieval of palm
prints Publisher: Thomas 2011. Page No. 122-129.
[9] Samir Nanavati, Michael Thieme and Raj Nanavati Palm print Biometrics: Identity
Verification in a Networked World Publisher: Wiley; 1st edition 2002. Page No. 32-
44.

IJCSBI.ORG
MIMO System for Next Generation

Wireless Communication
Sharif
Lab Demonstrator, Dept. of EEE, University of Information Technology and
Sciences (UITS), Baridhara, Dhaka-1212, Bangladesh
Mohammad Emdadul Haq

Lab Demonstrator, Dept. of EEE, University of Information Technology and
Md. Arif Rana

Lab Demonstrator, Dept. of ECE, University of Information Technology and
ABSTRACT
It is well known that Multiple Input-Multiple Output systems enhance the spectral efficiency
significantly of wireless communication systems. However, their remarkable hardware and
computational burden hinders the wide deployment of such architectures in modern systems. It is
a big challenge to reduce hardware complexity, use power with flexible capacity, data rate and bit
error rate for the MIMO technology. In this paper we give a theoretical overview of several
important theoretical concepts which is related to MIMO. The main part of this thesis can be
considered as a most efficient decision for highest capacity with the reasonable BER performance.
Hence we came up with an idea to write this thesis paper where we investigated the techniques
which we can utilize in order to increase capacity (Bit/s/Hz) and hence provide the less BER in
MIMO system. For this system we use Hybrid selection/Maximal-ratio transmission technique
over a faded correlated MIMO quasi-static channel. Here faded correlated channel has the ability
to carry high capacity data and HS/MRT technique minimizes the BER of system. For this system
we use MSK modulation technique which reduces the complexity at transmission side. In
practically, For more complexity free in this technique excludes the up converter, amplifier and
filtered stage which also alleviate the cost. In this paper at first we try to focus the main important
related techniques used in MIMO with the advantages and drawbacks. Then we review the MIMO
system from the several past paper works and then we compare the performance of this technique.
From this comparison result and using best technique we suggest a future work which will give
the best performance. Finally, we describe the technique which we used in our suggested work.
Keywords
Multiple Input-Multiple Output, BER Performance, Wireless Communication, Capacity
Improvement of MIMO, SNR Improvement with BER.
1. INTRODUCTION
Traditional wireless communication systems use multiple antennas for transmission and
reception. The system is developed from SISO, SIMO and MISO. SISO established with single
transmitter and single receiver. In SIMO system, single antenna for transmitter and multiple
antennas for receiver [1]. In MISO system multiple transmitter are use for transmit but single
receiver is used. Recently, multiple antennas are used at the transmitter and receiver both side as a

IJCSBI.ORG
result the systems performance has improved. Thats the system is called Multiple Input Multiple
Output (MIMO) system. This is an antenna technology at which both transmitter and receiver
equipment are used for wireless communication system [2]. The latest IEEE MIMO standard is
802.11n. MIMO has become popular because of the fading of the signal is low and reach signals
destination anyway in any environment like urban or rural at which the signals will bounce off
trees and buildings but in different directions It also provide significant high data rate and link
range without additional bandwidth or increased transmit power. It also provide significant high
data rate and link range. It saves extra bandwidth and transmission power by spreading same
power over the transmitters. In MIMO system, receiving ends use a special signal processing to
sort out one that has originally transmitted. MIMO has some performance drawback. Additional
cost for initial implement and high power consumption are main challenge in MIMO.
Competitive challenges of consumer impact tolerable cost (area), while battery life time and
thermal effect on power consumption of portable device of wireless communication. Lack of
perfect implementation can make cross-talk, Inter channel Interference (ICI). So the modern
MIMO will be most cost efficient to implement and ultra-low cost solution that which increase
battery life for mobile device.
2. TECHNIQUE USING IN MIMO

i. Spacetime coding
ii. Spacetime code design criteria
iii. Theory of space time coding
iv. Optimal designs for Space-Time Linear Pre-coders and Decoders
v. Hierarchical Diversity-Embedding Space Time Block Coding
Modulation
Binary symbols Space time
Data codewords
(Encoder) Space time
Modulator mapper
Kbin Ksym nT * nw
Figure 1: A typical Space time coder
3. MODULATION
Modulation is the process where a radio frequency of light waves amplitude, frequency, or phase
is changed in order to transmit intelligence. The characteristics of the carrier wave are
instantaneously varied by another modulating waveform. Naturally carrier signals occur at
frequencies which are directed by the properties of the data signal. Transmitting signal over
frequency band need to understand the physical properties of media and data frequency and
transmission frequency band must not be same [3]. The data spectrum needs to shift to frequency
band for transmitting a signal through a medium without fascinated by the medium or aliasing.
This is modulation process. The choice of modulation type varied with bandwidth of the channel,
efficiency and signals to noise ratio and power requirements. To improve the S/N its need to
increase bandwidth. The error rate becomes specific by transmitting maximum amount of data
with given bandwidth in a coding system.
i. PSK (Phase shift Keying Modulation.

ii. BPSK (Binary Phase Shift Keying) Modulation Technique of BPSK
iii. MIMO M-ary code-selected DS-BPSK communication system
iv. QPSK (Qudrature Phase shift Keying Modulation)
v. Modern MIMO with PSK
vi. MSK (Minimum shift keying)
vii. ACM (Adaptive Coded Modulation)

IJCSBI.ORG
Figure 2: Binary phase shift keying, BPSK
2
In phase Component
-1
-2
-2 -1 0 1 2
Quadrature Component
Figure 3: Constellation point for QPSK
Figure 4: Signal using MSK modulation
coherent detection
Frequency flat Antenna 1 Buffer
MRC combiner/
and adaptive
fading channel
decoding
Decoder information bit
Frequency
Adaptive encoder/ Pilot symbol Antenna Buffer
Information bits flat fading
modulator insertion H
channel
Optimal wiener
Channel predictor
estimator
Zero error
return channel
Optimal
wiener
estimator
Figure 5: Adaptive Coded Modulation
4. MULTIPLEXING
i. TDM (Time Division Multiplexing)
ii. An architecture for TSI-Free Non-blocking Optical TDM Switches
iii. FDM (Frequency Division Multiplexing)
iv. Low-complexity and frequency-scalable analog real-time FDM receiver
v. OFDM (Orthogonal Frequency Division Multiplexing)

IJCSBI.ORG
a a c a a a
b b b .. a b c a b c a b c .. b b b
Multiplexer Demultiplexer
High bit rate multiplex data stream is
c c c transmitted across a single high c c c
capacity channel
Low bit rate input Low bit rate output
line from different lines to different
sources users
Figure 6: Time division multiplexing
Figure 7: Time division multiplexing (TDM)
Sub-channel
Magnitude
Channel
Figure 8: Frequency division multiplexing
Transmitter 1
Source A Data with
Center
frequency
92.1 MHz
Transmitter 2 Channel
Data with capable All data passes through
Source A Of passing this band
Center
92.0-92.6 MHz
frequency
Frequency
92.3 MHz band
Transmitter 3
Source A Data with
Center
frequency
92.5 MHz
Figure 9: A system using frequency division multiplexing
Common Source band width
Signal from Signal from Signal from

Source A to A Source B to B Source C to C
92.1 92.3 92.5

Frequency in MHz
Figure 10: Spectral occupancy of signals in an FDM system
ADC Re Y0
Fc
...
Y1
FFT
90 Parallel to
Yn-2 series
ADC
Dm
Yn-1
Figure 11: A transmitter architect with OFDM system

IJCSBI.ORG
Constellation Symbol Detection
mapping
Re
ADC
X0
S(n) X1
Spatial to
parallel
FFT
Fc
90
S(t)
X n-2
ADC
Dm
X n-1
Figure 12: A receiver architect with OFDM system
ZF1
S
S1
S(n)
Z*F1 Output
S/P Rx
Signal to be
NS
F
Transmitted
Z
S2
Z*FN
Feedback
ZF1, . . . ZFN
Figure 13: A block Diagram of a Spatial Multiplexing
5. DIVERSITY
In wireless communications, diversity is a method by which reliability can be improved using
more than one communication channel. It improves the reliability of a data with reducing fading,
co-channel interference and minimizing error bursts. [4]. Diversity techniques can be used as a
multipath propagation and it is called diversity gain. In a channel with multiple transmit or
receive antennas spaced sufficiently, diversity is such an important resource, a wireless system
typically uses several types of diversity.
i. Time diversity
ii. Frequency diversity
iii. Space diversity
iv. Polarization diversity
v. Multiuser diversity
Frequency
S(t) S(t) Ws
Delta(t)
Time
Figure 14: Time Diversity

IJCSBI.ORG
Transmitted
Transmitted Receiver 1
Signal IF combining Receiveed
Hitless switch Signal
Receiver 1
Transmitter 1
Transmitted
Receiver 1
IF combining Receiveed
Signal Hitless switch Signal
Receiver 1
Transmitter 2
Figure 15: Frequency diversity in various antenna techniques
Reflector
Transmitter
Tree
Reflector
Receiver
Reflector Reflector
Reflector
Figure 16: Signal faded or reflected in various object of multipath
S1 (t) S*n+1,Sn =(. S*6S5 S*4S3 S*2S1 Y(t)
TX RX
-S*n, Sn+1 =(. -S*5S6 -S*3S4 -S*1S2
S2 (t)
Figure 17: A simple block coding scheme used in the Alamouti scheme
Figure 18: BER performance of Alamouti BPSK with MRRC STBC schemes under flat
Rayleigh fading for various set of Antenna.
Receiver
Transmitter
Receiver
Figure 19: Receive diversity
6. MIMO CHANNEL
MIMO channel describes the connection between the transmitter (Tx) and receiver (Rx). In
following, only 2 antennas at the Tx and 2 antennas at the Rx are considered, i.e. 2x2 MIMO

IJCSBI.ORG
system. Figure 20 illustrates a 2x2 MIMO system with the H channel matrix and the scattering
medium around.
n
h 1.1
h 1.2
S Tx Tx Y
Channel
Processing H Processing
h 2.1
g g
h 2.2
M-antenna N-antenna
Scattering
medium
Figure 20: MIMO channel representation where M=N=2 represents

Number of antenna at Tx and Rx , respectively.
For the above 2x2 MIMO channel, the input-output relationship can be expressed as y(t) = H(t) *
s(t) + n(t) (1)
Where s(t) is the transmitted signal, y(t) is the received signal, n(t) is additive white Gaussian
noise (AWGN), H(t) is an N by M channel impulse response matrix and (*) denotes convolution.
The channel matrix H fully describes the propagation channel between all transmits and receive
antennas. The MIMO channel without noise and with representation of the channel matrix H can
be expressed as: H()= =1 ( 1 )
Where L is the number of taps (time bins) of the channel model, H () is the MxN matrix of the
channel impulse responses.
12
H() H = 11
21 22
=[ 1 ] is a complex matrix which describes the linear transformation between two
considered antenna arrays at delay l and lMN is the complex transmission coefficient from
antenna M at the transmitter to antenna N at the receiver. The complex transmission coefficients
are assumed to be zero mean complexes Gaussian.
Figure 21: The MIMO channel

7. STUDY ON MIMO SYSTEM
The study on MIMO is focused on the following two strategies:
i. Study on capacity improvement of MIMO
ii. Study to SNR improvement with BER
7.1 Study on Capacity improvement of MIMO

The capacity of MIMO system is highly depend on the selecting appropriate set of antennas array,
Number of RF, different channel condition, beam forming ability. We show the results in various
schemes of recent research on MIMO capacity. We try to find the best way to obtain the better
capacity. The correlated channel has a larger capacity than the i.e. channel at the 2 dB SNRs
below. So it is generally clear form that rich scattering environments are needed for optimal use
of multiple antennas [5]. The capacity is 20 Bits/s/Hz at 20 dB SNR for using Uniform linear
arrays of antennas at both end .So the capacity of the correlated channel can be approached by
using beam forming inputs at low SNRs. It is shown that in the low SNR regime using the
proposed limited feedback technique there is a gain over open loop BSMIMO systems of more

IJCSBI.ORG
than 2dB in the resulting SNR. In terms of from the above discussion we draw a comparison
diagram and observe the capacity of MIMO system for using different scheme in MIMO.
Table 1: Comparison table of Capacity for using different schemes in MIMO
Different systems Capacity (Bits/s/Hz) at 20 dB SNR
Uniform linear arrays of antennas at both end 20
Correlated Rician fading channel 22.5
Single RF at both end 11.8
Closed-loop BS-MIMO 11.5
Using HS-MRT 10.75
Optimum signaling scheme 12.3
Line link end to all available antennas and H-S in 22
receiving side
Low complexity linear multiuser beam forming system 15
Figure 22: Comparison of Capacity for using different schemes in MIMO

7.2. Study on SNR improvement with BER
Study on SNR improvement with BER of a System means quantity of data bits that receiver
receive over a channel distorted for the noise interference. The modern researches treys to reduce
the BER with respect to SNR. Using HS/MRT approach at one link end and choosing 2 out of 8
receiving antenna, 10-5 BER occurred at 1db SNR. It is observed that selecting higher number of
antenna is not efficient and performance is not differs much. Only additional 1.5 dB SNR gets for
8/8 antenna selection [6]. A new MU-MIMO with allowing more users in the same frequency and
time slot propose good signal to leakage ratio. It offers 10 -5 BER for 6.9 dB SNR. Another paper
applies the concept of spatial diversity at transmitter and receiver. For this condition BER is 10 -5
at 7 dB SNR. Alamouti coding method for systems over frequency selecting fading channels
offers the BER 10-5 at 23.6 dB SNR. Switch diversity MIMO for the flat and quasi-static relay
fading is also a new idea in MIMO systems. This technology provides the optimum BER at higher
fading channels also in 11.8 dB SNR.
Table 2: Comparison table of BER for using various techniques in MIMO
Different system for gain 10-5 BER SNR (dB)
Using HS-MRT 1
MU-MIMO allowing more user 6.9
Applying spatial diversity to MIMO system 7
In asynchronous cooperative communication system applying 23.6
Alamouti coding method
We assume that 10-5 BER is the reasonable for the high faded environment. This table compares
the Bit Error Rat (BER) of different MIMO system. It is known that SNR is inversely
proportional to noise. So it is expected that same BER at lower SNR. Using HS/MRT approach at
one link end and choosing 2 out of 8 receiving antenna offer the better BER at low 1 dB SNR.
Then the MU-MIMO with allowing more users in the same frequency and time slot also offer
good offer optimum BER at low value of SNR. Switch diversity MIMO and concept of diversity
to MIMO system also give the optimal BER at high noised environment.

IJCSBI.ORG
Space Time Modulation (MSK)

Data Source
Encoder
Nt
Correlated Rician MIMO channel
1
selects the best Lr of
the available Nr
Data Shrink Space time
antenna elements
(Received Data) Decoder
and
Demodulation
Nr
Figure 23: A new system with HS/MR transmission over the correlated rician MIMO flat
fading wireless channel
8. TECHNIQUES USED TO ENHANCE THE PERFORMANCE

i. One line link end to all available antennas and hybrid selection in receiving side
ii. Correlated rician MIMO flat fades Wireless Channels where the receiver has knowledge
of channel properties and transmitter has the behavior of channel statistics.
8.1. Hybrid selection/maximal-ratio transmission methods

In this system a bit stream is sent through a space time encoder whose outputs are forwarded to
the Nt transmit antennas. In a real system, the signals are subsequently up-converted to pass band,
amplified by a power amplifier, and filtered. For our model, we omit these stages, as well as their
corresponding stages at the receiver, and treat the whole problem in equivalent baseband because
of expensive and make the use of reduced-complexity systems desirable. The received signal,
which is written as Y = Hs + n = X + n is received by Nr antenna elements, where s is the
transmit signal vector and n is the noise vector [7]. Then a control algorithm selects the best Lr of
the available Nr antenna elements and down converts their signals for further processing (note
that only Lr receiver chains are required). Space time encoder and decoder are assumed to be ideal
so that the capacity can be achieved. We assume ideal knowledge of the channel at the receiver so
that it is always possible to select the best antennas. However, we do not assume any knowledge
of the channel at the transmitter. This implies that no water filling can be used and that the
available transmitter power is equally distributed among the transmit antennas.
8.2. Correlated rician MIMO flat fading Wireless Channels
Some recent papers have investigated the capacity and corresponding optimal input distributions
for correlated proper-complex Gaussian MIMO channel models. The outcome of these papers is
the product-form correlation assumption, where the correlation between the fading of two distinct
antenna pairs is the product of the corresponding transmit correlation and receive correlation. This
correlation model is referred to as the Kronecker model in the literature. Unfortunately such a
correlation structure is still quite restrictive, and can only be justified in scenarios where the
scattering is locally rich at either the transmitter or the receiver.
Figure 24: Physical model of a MIMO channel
The amplitude gain of Rician fading can be defined by Rician distribution [8]. Transmission is
over a flat-fading Rician channel with t antennas at the transmitter and r antennas at the receiver.
The vector of received symbols can be expressed as = +
In Rician fading the elements of H are non-zero mean complex Gaussians. Hence we can express
H in matrix = +

IJCSBI.ORG

Where is is a matrix of unit entries denoted as H1. And The Rician K-factor is defined as
10 log10 2 / 2

The capacity of Rician fading channel is define as = log 2 ( + ) where, t is the number

of transmitter and is the SNR at transmitter side. In correlated rician fading, channel of rician
1 1
fading is correlated and the matrix can be modeled as = 2 2
Here, Rr and Rt are the correlation matrix at the transmitter and at the receiver side, respectively.
The correlation matrix is defined in. So, the channel matrix H can be written as
1
= 1 +
+1 +1
The capacity equation for the correlated rician fading channel is,

= log 2 +

Where, is the positive eigenvalues of and is the expectation over.
Figure 25: Output of capacity vs SNR for Correlated Ricien fading channel at n_r=n_t=4.
CONCLUSION
This paper has discussed most important portions of the MIMO wireless communication system
and analytical review. This paper has mentioned different modulation technique (PSK, QPSK,
BPSK, MSK, and ACM diversity technique (Time, Frequency, Space, Polarization and Multiuser
diversity), channel capacity and technique to increase channel capacity.
The important and concerning issue of MIMO wireless system and way of some
limitation are shown in this paper.
Hybrid selection technique can give us the new way to research to reduce the data loss in
the high scatter and faded environment. Because day by day the environment will be
more complex for data transmission.
About to high data rate communication system correlate quasis-static channel agree to
maintain current data rate but it is no still possible to implement practically.
REFERENCES
[1]. Bell Labs., Lucent Technol Smart antenna technologies for future wireless systems:
trends and challenges Communications Magazine, IEEE, V: 42 , Issue: 9 , Sept. 2004
[2]. Cheolkyu Shin Hyounkuk Kim Kyeong Jin Kim Hyuncheol Park , Daejeon, High-
Throughput Low-Complexity Link Adaptation for MIMO BIC-OFDM Systems IEEE
Transactions on, V.59, April 2011

IJCSBI.ORG
[3]. H. Liao, H. Wang, and X.-G. Xia, Some designs and normalized diversity product
upper bounds for lattice-based diagonal and full-rate space-time block codes, IEEE
Trans. Inform. Theory, vol. 55, no. 2, pp. 569-583, Feb. 2009.
[4]. Vlasis Barousis, Athanasios G. Kanatas, and George Efthymoglou A Complete MIMO
System Built on a Single RF communication End. PIERS ONLINE, VOL. 6, NO. 6,
2010
[5]. Bains, R.; Muller, R.R.; Using Parasitic Elements for Implementing the Rotating
Antenna for MIMO Receivers Wireless Communications, IEEE Transactions on,
Volume : 7, PP: 4522 4533, December 2008
[6]. Barousis, V.; Kanatas, A.G.; Kalis, A.; Papadias, C.; Closed-Loop Beamspace
MIMO Systems with Low Hardware Complexity Vehicular Technology Conference,
2009. VTC Spring 2009. IEEE 69th, PP: 1 5, June 2009
[7]. Deepa , .baskaran, pratheek, aswath STUDY OF SPATIAL DIVERSITY SCHEMES IN MULTIPLE
ANTENNA SYSTEMS Journal of Theoretical and Applied Information Technology 2005-
2009
[8]. Youngpo Lee, Youngje Kim, Sun Yong Kim, Gyu-In Jee, Jin-Mo Yang, Seokho Yoon
An Alamouti Coding Scheme for Relay-Based Cooperative Communication Systems
The Third International Conference on Emerging Network Intelligence, 2011

Vol 7 No 1 - November 2013

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Vol 7 No 1 - November 2013

Încărcat de

Drepturi de autor:

Formate disponibile

International Journal of Computer Science

and Business Informatics

ISSN: 1694-2507 (Print)

A Hybrid Approach for Supervised Twitter Sentiment Classification .................................................... 1

A Survey on Privacy Preserving Data Mining Techniques .................................................................... 1

Performance Evaluation of Web Services in C#, JAVA, and PHP .......................................................... 1

Semi-Automated Polyhouse Cultivation Using LabVIEW ...................................................................... 1

MIMO System for Next Generation Wireless Communication.............................................................. 1

A Hybrid Approach for Supervised

Compositional Random forest

Figure 1. Architecture of Semantic Sentiment Miner

3.1 DATA PREPROCESSING

Table 1. Compositional Semantic Rules

Table 2. Compose Function

3.4 SVM CLASSIFIER

3.5 NAIVE BAYESIAN CLASSIFIER

3.6 KNOWLEDGE INFERENCE SYSTEM

4. RESULTS AND DISCUSSION

Precision Recall F- Precision Recall F-

The Semantic Sentiment Miner performs better than baseline system

Figure 2. Accuracy of Different Classifiers with Various Feature.

A Survey of Dynamic Duty Cycle

ISSN: 1694-2108 | Vol. 7, No. 1. NOVEMBER 2013 1

ISSN: 1694-2108 | Vol. 7, No. 1. NOVEMBER 2013 2

ISSN: 1694-2108 | Vol. 7, No. 1. NOVEMBER 2013 3

Location Connection Sleep/wakeup MAC Protocols with

On- Scheduled Asynchrono TDMA Contentio Hybrid

Figure 1. Taxonomy of approaches to Duty cycling in sensor networks

ISSN: 1694-2108 | Vol. 7, No. 1. NOVEMBER 2013 4

ISSN: 1694-2108 | Vol. 7, No. 1. NOVEMBER 2013 5

ISSN: 1694-2108 | Vol. 7, No. 1. NOVEMBER 2013 6

4. Dynamic Duty-Cycle scheduling Scheme

ISSN: 1694-2108 | Vol. 7, No. 1. NOVEMBER 2013 7

Figure 2. Decision graph of DSP scheme

ISSN: 1694-2108 | Vol. 7, No. 1. NOVEMBER 2013 8

Figure 3. Decision graph of DSP scheme [1]

ISSN: 1694-2108 | Vol. 7, No. 1. NOVEMBER 2013 9

ISSN: 1694-2108 | Vol. 7, No. 1. NOVEMBER 2013 10

ISSN: 1694-2108 | Vol. 7, No. 1. NOVEMBER 2013 11

A Survey on Privacy Preserving

ISSN: 1694-2108 | Vol. 7, No. 1. NOVEMBER 2013 1

Perturbation Anonymization Cryptographic

ISSN: 1694-2108 | Vol. 7, No. 1. NOVEMBER 2013 2

multiplicative, micro aggregation, categorical, resampling, data swapping

2.2 Data Anonymization

ISSN: 1694-2108 | Vol. 7, No. 1. NOVEMBER 2013 3

value appears no more than m / l times in the QI-group. A table is l-diverse,

The defect of generalization is for the query like

SELECT COUNT (*) from Unknown-Micro data

The QIT has the schema

ISSN: 1694-2108 | Vol. 7, No. 1. NOVEMBER 2013 4

Table 4. Sensitive table (ST)

ISSN: 1694-2108 | Vol. 7, No. 1. NOVEMBER 2013 5

1 2 [31-40] 271* * $54,000

1 3 [31-40] 271* * $55,000

2 5 [41-50] 272* * $75,000

2 6 [41-50] 272* * $70,000

3 8 [51-60] 276* * $75,000

3 9 [51-60] 276* * $85,000

Table 6. 3-anonymous table after permutation

ISSN: 1694-2108 | Vol. 7, No. 1. NOVEMBER 2013 6

uncorrelated attributes. Data utility is preserved by preserving the highly

Advantages of slicing over generalization:

ISSN: 1694-2108 | Vol. 7, No. 1. NOVEMBER 2013 7