Sunteți pe pagina 1din 112

International Journal of Computer Science

and Business Informatics


(IJCSBI.ORG)

ISSN: 1694-2507 (Print)


VOL 12, NO 1
ISSN: 1694-2108 (Online) APRIL 2014
IJCSBI.ORG
Table of Contents VOL 12, NO 1 APRIL 2014

Improving Ranking Web Documents using Users Feedbacks ............................................................... 1


Fatemeh Ehsanifar and Hasan Naderi

A Survey on Sparse Representation based Image Restoration ............................................................... 11


Dr. S. Sakthivel and M. Parameswari

Simultaneous Use of CPU and GPU to Real Time Inverted Index Updating in Microblogs
.................................................................................................................................................................... 25
Sajad Bolhasani and Hasan Naderi

A Survey on Prioritization Methodologies to Prioritize Non-Functional Requirements ........................ 32


Saranya. B., Subha. R and Dr. Palaniswami. S.

A Review on Various Visual Cryptography Schemes ................................................................................ 45


Nagesh Soradge and Prof. K. S. Thakare

Web Page Access Prediction based on an Integrated Approach ............................................................. 55


Phyu Thwe

A Survey on Bi-Clustering and its Applications .................................................................................. 65


K. Sathish Kumar, M. Ramalingam and Dr. V. Thiagarasu

Pixel Level Image Fusion: A Neuro-Fuzzy Approach ................................................................................ 71


Swathy Nair, Bindu Elias and VPS Naidu

A Comparative Analysis on Visualization of Microarray Gene Expression Data ...................................... 87


Poornima. S and Dr. J. Jeba Emilyn
A New Current-Mode Geometric-Mean Structure for SRD Filters ......................................................... 100
Ebrahim Farshidi and Saman Kaiedi
International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Improving Ranking Web


Documents using Users Feedbacks
Fatemeh Ehsanifar
Department of CSE,
Lorestan Science and Research College,
Lorestan, Iran

Hasan Naderi
Assistant Professor
Department of CSE,
Iran University of Science and Technology,
Tehran, Iran

ABSTRACT
Nowadays, World Wide Web has been utilized as the best environment for development,
distribution and achieving knowledge. The most significant tool for achieving to this
infinite ocean of information involves variety of Search Engines, in which ranking is one of
the main parts. Regarding problems based on text and link, some methods have been
considered according to users behavior in web. Users behavior includes valuable
information which can be used for improving quality of web ranking results. In this
research a model has been offered in which for each definite query, users positive and
negative feedbacks about displayed list in web pages have been received, including how
many times user has accessed to a certain site, time spent in a site, number of successful
downloads in a site, number of positive and negative clicks in a site, then it calculates the
ranking of each page using Multiple Attribute Decision Making method, and eventually
presents a new ranking about the site which could be updated regularly according to the
next feedbacks from users.
Keywords
Users feedback, Multiple Attribute Decision Making, Users Behavior, Search Engine.

1. INTRODUCTION
Due to vastness of web pages and their increasing improvement, there is a
persistent need to some methods for ranking web pages according to their
level of significance, and their relevance to the topic. Ranking is the main
component of an Information Retrieval System. With regard to research
engines, which are among Information Retrieval Systems, because of
particular characteristics of users, role of ranking has become more obvious.
It is a common practice for search engines to find myriads of pages as a
search result, and on the other hand, web user does not have enough and
sufficient time to observe all of the results in order to achieve his/her desired

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 1


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
pages. Most of web users do not pay attention to the pages which come after
first search results. Therefore, this is important for a web search engine to
present possibly the most favorite results to users in the top of the list,
otherwise, a search engine could not be considered effective enough. So the
role of a ranking algorithm is identification and dedication of more ranking
to more valid pages among other numerous web pages.
Structure of the present paper is as follow: Next part is assigned to review of
literature. In part3, TOPSIS algorithm and its characteristics have been
described. Proposed method is presented in part 4, and description of
Simulation is presented in part 5. Finally, part 6 includes conclusion and
some future works.

2. RELATED WORKS
Ranking is one of the main parts of search engine. Ranking is a process in
which the quality of a page is estimated. Owing to the fact that for every
query there could be thousands of relevant pages, it is imperative to
prioritize them, and present the first 10 or 20 results to the user. Ranking
methods generally can be divided in to five classifications: First ranking
classification is text-based, and the most important text-based ranking
models are probability and vector space. In vector space model, both of
document and query are vectors with dimensions as much as the number of
words. In this model each vector turns in to a weight vector, then cos of
angle between two vectors with weight is calculated as their degree of
similarity. Usually the most significant method of weighting is TF-IDF by
Mr. Salton[1]. Another text-based ranking method includes probability
model. Purpose of a retrieval system, based on probability model of
document ranking, is related to possible relevance of each document with
query of user. Thus, contrary to vector space, this model definitely cannot
find degree of similarity between query and document [2].
Second classification is connection-based ranking. Contrary to environment
of Traditional Information Retrieval, web has a great heterogeneous
structure in which documents are linked together, and also shape a huge
graph. Web links involve valuable information, so new ranking algorithms
have been created based on link. In a general view, connection-based
algorithms are divided in to two classifications of query-dependent models,
and query-independent models [3]. In query-independent models, such as
page Rank, ranking is done as offline (outline), and using overall web graph,
and subsequently for each query there is a fixed page. But in query-
dependent models (sensitive to topic), such as HITS, ranking in graph
involves collection of pages relating to user query.
Third classification is combinatorial method, which uses both of link and
content for ranking [4]. Fourth classification is Learning-based method,

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 2


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
which has drawn lots of attention in recent years. Proposed methods in the
area of ranking that work according to learning, are divided in to three main
classifications: point method, pair method, and list method. In point method,
a digit is dedicated to each pair of document-query which represents level of
connection between them [5]. In pair methods, with obtaining pair of objects
(characteristics of objects and their relative ranking), it has been attempted
to dedicate a ranking to each object close to its real ranking, and eventually
objects will be divided in to two general classifications of correctly
ranked and incorrectly ranked. Most of existing learning-based ranking
methods are of this type. List-based methods utilize list of ordered objects as
learning data collection for prediction of order of objects. Fifth classification
is based on users behavior. Regarding problems of text-based and link-
based methods, methods that are based on behavior and judgment of user
have been considered extensively for prevailing justice and democracy in
web. In other words, for development and improvement of web in terms of
quality and quantity, determination of the most befitted pages is carried out
by users [6]. There are two methods for data collection by users: Direct
Feedback Method, and Implicit Feedback Method [7].
In direct feedback methods the user is requested to judge about proposed
results, which is a difficult method. In indirect method, users behavior
during search process (that is registered in logs of search engines) is
utilized. As a consequence, it can be collected with the least possible cost.
Users behavior during search process involves text of query, how the user
clicks on ordered list of results [8], content of clicked pages, stop duration in
each page[9], and other existing information concerning events registered
during search. These registered events include invaluable information which
can be used for analysis, assessment, and modeling users behavior in order
of improving quality of results.

3. TOPSIS ALGORITM
TOPSIS method is a compensatory method among Multiple Attribute
Decision Making Models (MADM), and the purpose of being compensatory
is that exchange between the criteria is allowed, i.e. for instance weakness
of a criteria might be compensated by score of another criteria. In this
method M options would be assessed by N criteria. In addition to
considering distance of option Ai from ideal point, its distance from negative
ideal has been considered, too. This means selected option must has the
least distance from ideal solution, whereas it must has the last distance from
negative ideal solution [10].

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 3


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
4. METHODOLOGY
In this part we provide a model that collects 5 cases of users feedback
(positive and negative) on list of search results for each certain query, which
might be inserted by a large number of users, and also calculates ranking of
each document using TOPSIS method, and finally gives a ranking to
documents. At specified intervals and using next re-collected feedbacks
from users, these rankings are to be updated. Five cases of users feedback
which have been regarded as five criteria (the first four cases are positive
characteristics and the last case is a negative characteristic), have been used
for assessing web pages as follow:
Open Click: the number of times that each site will be available or
will be clicked for a certain query.
Dwell time: a period of time that users spend in each site for a
certain query, and this period of time is based on hour.
Download: the number of downloads which occurs for a certain
query in each page.
Plusclick: a collection of positive clicks identifying users
satisfaction from selecting the page, such as doing left click or right
click on existing links in the page and etc.
Negative click: a collection of negative clicks identifying lack of
satisfaction among users about selected documents, such as clicking
close and etc.
Implementation steps of proposed method:
First step: first of all Decision making Matrix is formulated as follow:
where A1, A2, , Am in Decision Making Matrix D stand for m sites that
are supposed to be ranked according to a series of the criteria;
andXoc, XDT, XD, XPC, XNC represent the criteria for assessment of
suitability of each site, and finally ri,jcomponents representing specific
values of jth criteria for ith site. Now this matrix becomes normal using
Scale-up method or Scale-up norm, and it leads to formation of Matrix D.
Second step: In this part, relative significance of existing criteria is
calculated using Entropy method, and becomes balanced using values.
values, respectively, have been considered for criteria of number of clicks
(0,2), criteria of spent time which can be more significant compared with
other criteria (0,3), criteria of number of downloads (0,1), criteria of number
of positive clicks (0,2), and criteria of number of negative clicks (0,2), that
vector W shapes as follow:
W={WOC, WDT, WD, WPC, WNC }
Now according to relation (1), Weightless Matrix comprises with regard to
assuming vector W as an entry in algorithm.

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 4


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
11 15
= . 55 (1)
1 5
so that ND turns in to a matrix that criteria scores have been scale-up in it.
W55 is a Diagonal Matrix in which only elements of its main diameter are
non-zero.
Third step: Determination of ideal solution according to relation (2), and
negative ideal solution according to relation (3): For positive ideal option
(A+) we define the best site as users viewpoint, and for negative ideal
option (A-) we define the worst site as users viewpoint as follow:

+ = , , , = 1,2,3,4,5 =
+ + + + +
, . , , , (2)

= , , , = 1,2, ,3,4,5 =

, , , , , (3)


= = 1,2,3,4,5

= = 1,2,3,4,5

According to relation (4) the distance of ith site from positive ideal site is as
follow:
.5

+ = (, + )2 ; = 1, 2, . , (4)
=1

And according to relationship (5) the distance of ith site from positive ideal
site is as follow:
.5

2
= , ; = 1, 2, , (5)
=1

Fifth step: Calculation of relative closeness of each site (Ai) to ideal site.
This relative closeness is defined according to relation (6) as follow:

+ = ; 0 + 1 ; = 1,2, , (6)
+ +

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 5


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

It seems that if Ai = A+, then we have di+ = 0 and then cli+ = 1, and if Ai =
A+, then we have di- = 0 and then cli- = 0. Therefore, the closeness of Ai
option to ideal solution (A+) is corresponding to higher value of cli+ .
Sixth step: Ranking of sites according to prioritization of preferences is
based on descending order of cli+.

5. SIMULATION OF PROPOSED MODEL


The proposed model became simulated using MATLAB software (version
2012). Simulation process was in such a way that firstly it receives three
specific entries from users. Entries, respectively, are as follow:
Number of users which insert a unit query.
The number of times that feedbacks are received again from users,
and a new ranking takes place, or in other words ranking of pages
becomes updated.
The number of pages or sites which is supposed to be ranked for
each specific query regarding users feedback, or using TOPSIS
method.
Some points have been considered for each feedback as follow: Maximum
positive click for each user is 20 times, and it must be at least 0, maximum
number of downloads for each visit is 15 times, maximum time for each
user to visit a site is 3 hours, and it must be at least 5 minutes.

5.1 TYPICAL SIMULATION OF PROPOSED MODEL


In the following an example of implementation of program has been
illustrated, in which for a certain query of abovementioned feedbacks has
been received from 1000 users about 10 first sites, and the sites have been
ranked, and also these rankings are updated in a specified interval, and using
next feedbacks.

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 6


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Figure 1. Typical Simulation of Proposed Model in three steps

5.2 ADVANTAGES AND DISADVANTAGES OF PROPOSED


METHOD
One of the main advantages of this method is that the criteria and
standards used in this comparison can have different measurement
units, and positive or negative criteria. In other words, positive or
negative criteria can be used in a combinatorial way in this
technique. According to this method, the best option or solution is

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 7


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
the closest option to ideal option or solution. Briefly, ideal solution
results from a collection of maximum values of each criterion, while
non-ideal solution results from a collection of the most minimum
values of each criterion.
In this method we can take into account a considerable number of
criteria.
This method can be applied very simply and in a convenient speed,
and due to reduction of volume of calculations in assessment, it
takes advantage of a great number of options.

6. CONCLUSION
Search engines provide search results regardless of users desires and work
background. In this concern, users while using search engines, mostly come
across some results which might not be interesting for them, and another
important case is that most of search engines use such algorithms that looks
in to the number of input and output links of a website, such as Page Rank;
so users behavior pattern is of uttermost significance in ranking of the
websites. In this work we presented a method for ranking of web documents
with simultaneous usage of five cases of negative and positive feedbacks of
users, and in this proposed model we used one of Multiple Attribute
Decision Making Models called TOPSIS.
Moreover, this method seems to be a suitable method for prioritization of
pages due to simultaneous characteristic of two distances from positive and
negative ideal option, and eventually implements a ranking on documents.
One of other innovations of this model is that they use a great deal of users
feedbacks for ranking, simultaneously; and among these feedbacks, time has
to be considered since it is one of well-known and new methods of variety
of implicit feedbacks among users, and researchers believe that as much as a
user spends more time for reading a document, the document becomes of
more importance for him/her.

FUTURE WORKS: with regard to the conducted study and research we


concluded that in spite of the fact that using users behavior in ranking
during recent years has become considerably significant, there is still
sufficient capacity and potential for more research in future. Also
personalization, concerning previous behavior of users, is of those cases
which has not been solved entirely, and requires more review and
investigation. As a suggestion for future works, providing and using real
data in this method, and examination of other combinatorial methods has to
be considered.

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 8


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
REFERENCES
[1] Salton, G. Buckley, C. 1988. Term-weighting approaches in automatic text retrieval.
Inf. Process. Manage. 24, 5 (August 1988), pp. 513-523.
[2] Robertson, S. E. Walker, S. 1994. Some simple effective approximations to the 2-
Poisson model for probabilistic weighted retrieval. In Proceedings of the 17th annual
international ACM SIGIR conference on Research and development in information
retrieval (SIGIR '94), W. Bruce Croft and C. J. van Rijsbergen (Eds.). Springer-Verlag New
York, Inc., New York, NY, USA, pp. 232-241.
[3] Jain, R. Dr. G. N. Purohit, 2011.page ranking Algorithms for Web Mining,
International Journal of Computer Applications (0975 8887) Vol. 13. No.5, January.
[4] Shakery, A. Zhai, C. 2003. Relevance propagation for topic distillation uiuc trec
2003web track experiments. In Proceedings of the TREC Conference.
[5]Yeh,J.Y Lin, J.Y. Ke H.R, Yang W.P.2007. Learning to Rank for Information Retrieval
Using Genetic Programming, Presented in SIGIR 2007 Workshop, Amsterdam.
[6] ZHAO, D. ZHANG, M.ZHANG, D. 2012. A Search Ranking Algorithm Based on
UserPreferences, Journal of Computational Information Systems, pp. 8969-8976.
[7] Attenberg, J. Pandey, S. Suel, T. 2009. Modeling and predicting user behavior in
sponsored search. In Proceedings of the 15th ACM SIGKDD international conference on
Knowledge discovery and data mining (KDD '09). ACM, New York, NY, USA, pp.1067-
1076.
[8] Dupret, G. Liao, C. 2010. A model to estimate intrinsic document relevance from the
clickthrough logs of a web search engine. In Proceedings of the third ACM international
conference on Web search and data mining (WSDM '10). ACM, New York, NY, USA,
pp.181-190.
[9] Liu, C. White, R.W. Dumais, S. 2010. Understanding web browsing behaviors through
Weibull analysis of dwell time. In Proceedings of the 33rd international ACM SIGIR
conference on Research and development in information retrieval (SIGIR '10). ACM, New
York, NY, USA, pp.379-386.
[10] Yurdakul M. 2008. Development of a performance measurement model for
manufacturing companies using the AHP and TOPSIS approaches, International Journal of
production research, pp. 4609-4641.

APPENDIX
Implementing Scale-up using Norms: In order of comparing different
measurement criteria for variety of criteria, we must use scale-up method,
which results in measurement of elements of transformed criteria (ni,j)
without considering their dimension. There are various methods of scale-up
(such as scale-up using norm, linear scale-up, phase scale-up), and here we
use scale-up using norm. We divide ri,j from assumed Decision Making
Matrix by existing norm of column jth (for criteria of xj). That is,

ri,j
ni,j = 1
m 2
i=1 ri,j

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 9


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
In this way, all of assumed columns of matrix have the same unit of length
and their overall comparison becomes simple.
Weighting criteria using Entropy Method:
In most of MADM problems we need to know the relative importance of
existing criteria, in a way that their total equals to unit (normalized), and this
relative importance estimates the degree of preference of each criteria than
other cases for Decision Making, and we use Entropy method for this
purpose. Entropy in theory of criterion information is for expressed lack of
certainty by a Discrete Probability Distribution (Pi). Decision Making
Matrix has been considered by m option and n criteria, and existing
information content from this Decision Making matrix is calculated as (Pij).
ri,j
Pi,j = m (2)
i=1 ri,j

And for Ej from Pij collection for each criterion we have:


m
Ej = k Pi,j Pi,j ; i, j 3
i=1

So that it holds k= 1/lnm, and deviation degree (dj) from produced


information for each jth criteria is as follow:
dj = 1 Ej ; j (4)

And finally for (Wj) weights from existing criteria we have:


dj
wj = n ; j 5
j=1 dj

As Wn1 matrix is not multipliable with Normalized Decision Making


Matrix (nn), before multiplying it is necessary to transform Weight Matrix
into a Diagonal Matrix (Wnn) (weights on the main diameter), and if DM
has a i subjective judgment as relative importance for jth criteria in advance,
then the calculated Wj through Entropy can be balanced as follow:

j wj
wj = n ; (6)
j=1 j , wj

This paper may be cited as:


Ehsanifar, F. and Naderi, H., 2014. Improving Ranking Web Documents
using Users Feedbacks. International Journal of Computer Science and
Business Informatics, Vol. 12, No. 1, pp. 1-10.

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 10


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

A Survey on Sparse Representation


based Image Restoration
Dr. S. Sakthivel
Professor, Department of Information Technology
Sona College of Technology,
Salem, India.

M. Parameswari
PG Scholar, Department of Information Technology
Sona College of Technology,
Salem, India.

ABSTRACT
In recent field of engineering, digital images gaining popularity due to increasing
requirement in many fields like satellite imaging, medical imaging, astronomical imaging,
poor-quality family portraits etc. Therefore, the quality of images matters in such fields.
There are many ways by which the quality of images can be improved. Image restoration is
one of the emerging methodologies among various existing techniques. Image restoration is
a process that deals with methods used to recover an original scene from degraded
observations. The primary goal of the image restoration is the original image is recovered
from degraded or blurred image. The main aim of this survey is to represent different
methodologies of restoration that provide state-of-the-art results. The motivation of the
literature originates from filter concept, iterative methods and sparse representations. The
restoration methods of filter concepts are evaluated with the help of performance metrics
SNR (signal-to-noise-ratio). These ideas can be used as a good reference in the research
field of image restoration.
Keywords: Image Denoising, Image Deblurring, Sparse Representation, Restoration.

1. INTRODUCTION
Image restoration intends to recover high resolution image from low
resolution image. Blurring is a process of reducing the bandwidth of an ideal
image that results in imperfect image formation. It happens due to the
relative motion between the camera and the original scene or by
atmospheric turbulence and relative motion between camera and ground.
Image restoration concerned with the estimation or reconstruction of
uncorrupted image from a blurred or noise one.

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 11


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
In addition to these blurring effects, noise also corrupts any recorded image.
Image Restoration can be modeled by the system as shown in equation 1,

y Hx v (1)

Where xRN is the unknown high quality original image, HRMN is the
degradation matrix, vRN is the additive noise and y is the observed
measurement. When H is specified by Kernel, then image reconstruction is
the problem of image blurring.

The solution for the de-blurring problem can be obtained by solving the
optimization problem as shown by equation 2,
X arg x min y Hx 22 .J x (2)

In the past decades, different methods and filters have been used for the
purpose of image restoration. These methods do not hold to be proven to
restore the image in case of additive white noise and Gaussian noises.
Sparse representations approximate an input vector by using a sparse linear
combination of atoms from an over complete dictionary. Sparse based
methods have been verified to perform well in terms of Mean Square Error
(MSE) measure as well as peak signal-to-noise ratio (PSNR). Sparse based
models are used in various image processing fields such as image de-
noising, image de-blurring, super resolution, etc.

2. IMAGE DENOISING AND DEBLURRING TECHNIQUES


Reginald L. Lagendijk and Jan Biemond [9] describe about the basic
methods and filters for the image restoration. Linear Spatially Invariant
Restoration Method is basic restoration filters were used. The author
described blurring function act as a convolution kernel or point spread
function d(n1,n2) that does not vary spatially. It is also assumed that the
statistical properties of mean and correlation function of the image and noise
do not change spatially. Modelling assumption can be denoted by f(n1,n2)
spatial discrete image that does not contain any blur or noise then the
recorded image g(n1,n2) is shown in the equation 3 ,

1, 2 = 1, 2 1, 2 + (1, 2) (3)

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 12


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
Two blur models were used. They are linear motion blur and uniform Out-
of-focus blur. In linear motion blur, the relative motion between recording
device and the scene results in several forms of motion blur that are all
distinguishable. In Uniform Out-of-focus blur when camera captures the 3D
image onto 2D image some parts are out of focus. These out of focus can be
calculated by spatial continuous point spread function.Yusuf Abu Sa'dah
et.al [14] discussed in image enhancement that Low pass filters blur the
images which result in noise reduction, whereas high pass filters used to
sharpen the images. Butterworth filter and Gaussian filter can be used to
sharpen the images and also high pass filter reside in the shape of the curve.
Therefore any one of the high pass filters can be used to sharpen the images
in restoration algorithm.

Jan Biemond et al.,[1] discusses the iterative restoration algorithms for the
elimination of linear blur from images that tainted by pointwise
nonlinearities such as additive noise and film saturation. Regularization is
projected for preventing the excessive noise magnification that is associated
with ill-conditioned inverse problems such as deblurring problem. There are
various basic iterative solutions such as inverse filter solution, least squares
solutions, wiener solution, constrained least squares solution, kalman filter
solution. Inverse filter is a linear filter whose point spread function is the
inverse of blurring function. It requires only the blur point spread function.
Least Square filters are used to overcome the noise sensitivity and Weiner
filter is a linear partial inverse filter which minimizes the mean-squared
error with the help of chosen point spread function. Power spectrum is a
measure for the average signal power per spatial frequency carried by the
image, that is estimated for the ideal image. Constrained least squares filter
for overcoming some of the difficulties of inverse filter and of wiener filter
and it also estimates power spectrum. Regularization methods associated
with the names of Tikhonov and Miller. For both the non-iterative and
iterative restorations based on Tikhonov-Miller regularization analysed
using eigen vector expansions.

Michael Elad and Michal Aharon [5] address the image denoising problem
zero-mean white and homogenous Gaussian additive noise is to be removed
from given image. Based on sparse and redundant representation over
trained dictionaries using K-SVD algorithm, image content dictionaries is
obtained. Using corrupted image or high quality image database training is
done. So far K-SVD algorithm is used to handle small image patches we
extend it to handle large image patches. Sparsity of unitary wavelet
coefficient was considered leading to shrinkage algorithm. One-dimensional
wavelet are inappropriate for handling images, several new multiscale and

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 13


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
directional redundant transforms are introduced including curvelet,
contourlet, vedgelet, bandlet and steerable wavelet. Matching pursuit and
basic pursuit denoising give rise to ability to address image denoising
problem as a direct sparse decomposition technique over redundant
dictionaries. In sparseland model Bayesian reconstruction framework is
employed for local treatment on local patches to global patches. This K-
SVD cannot be directly deployed on larger blocks even if provides
denoising results

Priyam Chatterjee and Peyman Milanfar [10] proposed K-LLD: a patch


based locally adaptive denoising method based on clustering the given noisy
image into region of similar geometric structure is proposed with the use of
K-LLD. To perform clustering, employ the features of local weight function
derived from steering regression. Dictionary employed to estimate the
underlying pixel values using a kernel regression. With the use of stein
unbiased risk estimator (SURE) local patch size for each size can be chosen.
Kernel regression framework uses the methods such as bilateral filter,
nonlocal means and optimal spatial adaptation. Denoising can be learned
with a suitable basis function that describes geometric structure of image
patches. Image denoising can be first performed by explicitly segmenting
the image based on local image structure and through efficient data
representation. Clustering based denoising (K-LLD) makes use of locally
learned dictionary that involves three steps:

1. Clustering: Image is clustered using the features that capture the


local structure of the image data.
2. Dictionary selection: We form an optimized dictionary that adapts to
the geometric structure of the image patches in each cluster.
3. Coefficient calculation: Coefficients for the linear combination of
dictionary atoms are estimated with respect to the steering kernel
weights.
Y

Y Calculating W Clustering Class 1


Steering Weights Stage
:
Noisy Img Class K
Y

Denoised Img Dictionary


Coefficient Selection Stage
Calculation Stage
Z { yi ||I k }
Fig 2.1: Block diagram of the iterative version of algorithm

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 14


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

K. Dabov et al.,[6] proposed a novel image denoising strategy on an


enhanced sparse representation in transform domain. Sparsity is achieved by
grouping similar 2D image fragments into 3D data arrays called groups.
Collaborative filtering procedure developed to deal with 3D groups. It
involves three steps i.e 3D transformation of group, Shrinkage of transform
spectrum, inverse 3 D transformation. Some of the methods used to
denoising are transform domain denoising method, sliding window
transform domain image denoising. To apply shrinkage in local transform
domain sliding window transform domain is employed. Sharp adaptive
transform can achieve a very sparse representation of true signal in adaptive
neighborhoods. Collaborative filtering for image denoising algorithm
involves 2 steps:
1. Block estimate: In step one Block wise estimate is done for grouping
and thresholding which follows aggregation.
2. Final estimate: In step two Block wise estimates is done for grouping
and filtering which also follows aggregation.

STEP 1 STEP 2

Block Wise
Basic Estimate
Block Wise Aggregation Estimate of
Noisy Estimate
true image
Image
Inverse 3D
Inverse 3D Transform
Transform

Wiener
Grouping by Filtering
Hard block
Thresholding Matching
Grouping 3D
by block 3D Transform
Matching Transform

Final Wiener
Aggregation Estimate

Fig 2.2: Flowchart of the proposed image denoising algorithm.

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 15


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
The proposed approach can be adopted to various noise models such as
additive colored noise, non-Gaussian noise etc by modifying the calculation
of coefficients variances in the basic and wiener parts of the algorithm. This
method can be modified for denoising 1-D signals and video for image
restoration as well as for other problems that benefit from highly sparse
signal representations.

Julien Mairal, Michael Elad, and Guillermo Sapiro [8] used prior knowledge
K-SVD algorithm for grayscale image processing is extended for color
Image Restoration. Techniques used in color image restoration are Markov
Random Field (MRF), Principal Component Analysis (PCA). An iterative
method that incorporates the K-SVD algorithm for handling non
homogeneous noise and missing information is used. Extension of denoising
algorithm can be used for the proper handling of nonhomogeneous noise
results better in correlation between the RGB channels. To capture the
correlation between the different color images K-SVD algorithm can be
adopted. This algorithm uses orthogonal matching pursuit (OMP) or basis
pursuit (BP) as part of its iterative procedure for learning the dictionary. At
each iteration, the best atom is selected from the dictionary that maximizes
its inner product with the residual (minimizing the error metric) and
updating the residual by performing an orthogonal projection. In denoising
of color image that is represented by column vector and white Gaussian
noise is added to each channel. Color spaces are often used to handle the
chroma and luma layers differently. The proposed method results better in
the application of color image denoising, demosaicing, and inpainting.

Noise in image is unavoidable, to estimate a true signal in noise, the most


frequently used methods are based on the least squares criteria. Proper norm
for images is the Total Variation (TV) norm. Closed form linear solutions
are easily computed, nonlinear is computationally complex. Constrained
minimization algorithm as a time dependent nonlinear PDE, where
constrains are determined by the noise statistics. TV (L1) philosophy used
to design hybrid algorithms. Nonlinear partial differential equations based
denoising algorithms [2]. Novel image enhancement technique called shock
filter is used. This algorithm yields more details of the solution in our
denoising procedure.

David S.C Biggs [3] proposed a new method for accelerating the
convergence of iterative restoration algorithms termed automatic
acceleration. It means faster processing and allows iterative techniques to be
used in applications where they would, otherwise seems too slow.

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 16


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
Four different iterative methods are used for the acceleration algorithms are,
Richardson Lucy (R-L): It is an iterative technique used for the restoration
of astronomical imagery in the presence of Poisson noise. Maximum
entropy (ME) deconvolution: is a means for deconvolving truth from an
image and point-spread function (PSF).In a perfectly focused, noiseless
image there is still a warping caused by a point-spread function (PSF). The
PSF is a result of atmospheric effects, the instrument optics, and anything
else that lies between the scene being captured and the CCD array.

Gerchberg-Saxton (G-S) magnitude: The Gerchberg Saxton (GS)


algorithm is one popular method for attempting Fourier magnitude or phase
retrieval. This algorithm can be painfully slow to converge and is a good
candidate for applying acceleration. Phase retrieval algorithms: The new
method is stable and an estimated acceleration factor has been derived and
confirmed by experiment. The acceleration technique has been successfully
applied to Richardson-Lucy, maximum entropy and Gerchberg- Saxton
restoration algorithms and can be integrated with other iterative techniques.
There is considerable scope for achieving higher levels of acceleration when
more information is used in the acceleration process.

Sparse representation of image signals admits a sparse decomposition over a


redundant dictionary for handling sources of data. The problems of learning
dictionaries for color images and extend the K-SVD based grayscale image
denoising algorithm was described by Elad and Aharon (2006). Marial et
al., [7] forwards the work for handling non homogenous noise and missing
information in application such as color image denoising, demosaicking and
inpainting. Sparseland model suggests dictionaries exist for various classes
of signals and that the sparsity of signal decomposition is a powerful model.
The removal of additive white Gaussian noise with gray-scale images uses
the K-SVD for learning the dictionary from the noisy image directly. The
extension to color can be easily performed by simple concatenation of the
RGB values to the single vector and training on those directly which gives
better results than denoising each channel separately. The steps involved in
K-SVD algorithm are: Sparse Coding Step, Dictionary update,
Reconstruction.

Sparse Coding Step: Orthogonal Matching Pursuit (OMP) greedy algorithm


to compute the coefficients for each patch by patch. Dictionary Update:
Select the set of patches and for each patch compute its residual. Sets are
used as matrices whose columns are vector elements. Finally, update the
dictionary atoms and the sparse representations that use it one at a time.

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 17


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
Reconstruction: A simple averaging between the patches approximations
and the noisy image.

To extend the K-SVD algorithm for denoising of color images is to denoise


each single channel using a separate algorithm with possibly different
dictionaries. To take the advantage of the learning capabilities of the K-SVD
algorithm to capture the correlation between the different color channels. In
denoising of RGB color images, represented by column vector with some
white Gaussian noise has been added to each channel. In sparse coding
stage, greedy algorithm selects the best atom at each iteration, from the
dictionary and then updating the residual by orthogonal projection.
Multiscale framework focuses on the use of different sizes of atoms
simultaneously. A large patch of size n pixel is divided along the tree to sub-
patches of fixed size. A dictionary is built for multiscale structure of all
atoms.

It characterizes soft edge smoothness based on a novel softcut metric by


generalizing the geocuts methods. Shengyang Dai and Yihong Gong [11]
proposed Soft edge smoothness measure can approximate the average length
of all level lines in an intensity image. From this the total length of all level
lines can be minimized. It presents a novel combination of this soft edge
smoothness prior and the alpha matting techniques for color image SR (
Super Resolution) by adaptively normalizing image edges according to their
- channel description. To minimize the reconstruction error, original high
resolution (HR) image can be recovered for low resolution (LR) inputs. This
is called as inverse process. Reconstruction error can be optimized by back-
projection method in iterative way. Interpolation methods such as bilinear or
bicubic interpolation tend to produce HR images. To measure and quantify
edge smoothness Goecut methods is employed. Goecut method
approximates the length of a hard edge with a cut metric on the image grid.
In softcut method, a soft edge cut metric measures smoothness of soft edges.
To handle various edges in color images alpha matting technique is
proposed from computer graphics literature. To prevent cross edges
interpolation edge directed interpolation infer sub pixel edges. To extend
color image SR, an adaptive softcuts method is proposed based on a novel
- channel image description. It enables a unified treatment of edges with
different contrasts on channel. Promising results for a large variety of
images are obtained by this algorithm.

In over complete dictionary, signals are described by sparse linear


combinations of atoms. Pursuit algorithm is used in this field to decay
signals with respect to given dictionary. Dictionary can be designed either

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 18


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
by selecting one from a prespecified set of linear transforms or adapting the
dictionary to a set of training signals. Aharon et al., [4] proposed an
algorithm for adapting dictionaries to achieve sparse representations. K-
SVD is an iterative algorithm that alternates between sparse coding based on
current dictionary and a process of updating the dictionary atoms to better fit
the data. The algorithm can also be accelerated by updating dictionaries
combined with the update of sparse representations. In sparse signal
representation, to overcome the representation problem Matching Pursuit
(MP) and Orthogonal Matching Pursuit algorithm is adopted to select the
dictionary atoms sequentially. In designing dictionary k- means clustering is
done. In clustering, a set of descriptive vectors is learned and each section is
represented by one of those vectors. Vector Quantization (VQ) coding
method called gain-shape VQ where coding coefficient is allowed to vary.
In k-means, at each iteration, two steps are involved.
1. Given {dk}, assign the training examples to their nearest neighbor.
2. Given the assignment, update {dk}.
At step one, it finds the coefficients given the dictionary that is called sparse
coding. Then the dictionary is updated assuming known and fixed co-
efficient. K-means is used to derive the K-SVD an effective sparse coding
and Gauss Seidel like accelerated dictionary update method. This algorithm
finds the best co-efficient matrix using pursuit method and the calculation of
co-efficient supplies the solution. This algorithm provides better results in
minimum number of iterations than other methods and used in various
applications such as filling in pixel missing, compression etc.

To reconstruct the degraded image, the sparse coding coefficients should be


as close as possible to those of those of the unknown original image with the
given dictionary. If only the local sparsity of the image is considered, the
sparse coding co-efficient are often not accurate. To make the sparse coding
more accurate, both the local and nonlocal sparsity constraint is considered.
In centralized sparse representation modelling [12], Sparse Coding Noise
(SCN) = y x is added to the original image. The sparse coding of x is
based on y is given by equation 4 and 5,
y arg max

y H 2
2 1
(4)

x arg max x
2
2 1
(5)

The original image is first blurred by a Gaussian blur kernel with standard
deviation 1.6 & Gaussian white noise of standard deviation 2 is added to
get a noisy and blurred image. Each patch is individually coded and
nonlocal similar patches to the given patch are clustered using PCA

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 19


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
dictionary. Iteratively PCA dictionary can be used to code the patches for
each cluster and dictionaries are updated along with the regularization
parameters. The centralized Sparse Representation model can be given by
the equation 6,

y arg max y H 2 ai 1 i i p
2
(6)
y i i
Where is a constant and lp norm is used to measure the distance between i
and i.

Sparse representation model (SRM) based image deblurring approaches


shown promising deblurring results. SRM dont utilize the spatial
correlations between the non-zero sparse coefficients, the SRM based
deblurring methods fail to recover the sharp edges. Weisheng Dong,
Guangming Shi, and Xin Li, [13] proposed structured sparse representation
model is employed to exploit the local and nonlocal spatial correlation
between sparse codes. Image deblurring algorithm uses patch based
structured SRM. In regularization based deblurring approach, the
construction of effective regularization is importance. Sparsity based
regularization can be solved by iterative shrinking algorithm. For high
dimensional data modeling the low rank approximation is used. Algorithms
used in structured based SRM are Patch based low rank approximation
structured sparse coding (LASSC), Principle Component Analysis (PCA)
and iterative threshold algorithm. The intrinsic connection between the
structured sparse coding and the low rank approximation has been exploited
to develop an efficient singular value thresholding algorithm for structured
sparse coding. In CSR model each patch is coded individually for the PCA
dictionary. Instead of coding each patch individually, simultaneous sparse
coding techniques code a set of patches simultaneously for the sparse code
alignment. Since patches share similar edge structures, over complete
dictionary is not needed, a compact dictionary PCA. In image blurring using
the patch based Structured Sparse Coding model, structured sparsity over
the grouped nonlocal similar patches can be enforced, patch clustering is
updated for iterations. An effective deblurring methods using the patch
based LASSC image deblurring produces the state-of-the-art image
deblurring results.

3. FILTERS AND RESULT

A mean filter acts on an image by smoothing it by reducing the intensity


variation between adjacent pixels. It is a simple sliding window spatial filter
that replaces unrepresentative of the neighbours. The mean or average filter
works on the shift-multiply-sum principle.

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 20


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

An adaptive filter does a better job of de-noising images compared to the


averaging filter. The fundamental difference between the mean filter and the
adaptive filter lies in the fact that the weight matrix varies at iterations in the
adaptive filter while it remains constant throughout the iterations in the
mean filter. Adaptive filters are capable of de-noising non-stationary
images, that is, images that have abrupt changes in intensity. Such filters are
known for their ability in automatically tracking an unknown circumstance
or when a signal is variable with little a priori knowledge about the signal to
be processed. In general, an adaptive filter iteratively adjusts its parameters
during scanning the image to match the image generating mechanism. This
mechanism is more significant in practical images, which tend to be non-
stationary.

Compared to other adaptive filters, the Least Mean Square (LMS) adaptive
filter is known for its simplicity in computation and implementation. The
basic model is a linear combination of a stationary low-pass image and a
non-stationary high-pass component through a weighting function. Thus, the
function provides a compromise between resolution of genuine features and
suppression of noise. A median filter belongs to the class of nonlinear filters
that follows the moving window principle as same as mean filter. The
median of the pixel values in the window is computed, and the center pixel
of the window is replaced with the computed median. Median filtering is
done by, first sorting all the pixel values from the surrounding neighborhood
into numerical order and then replacing the pixel being considered with the
middle pixel value. The median value must be written to a separate array or
buffer so that the results are not corrupted as the process is performed.

The selection of the denoising technique is presentation dependent. So, it is


necessary to learn and compare de-noising techniques to select the technique
that is appropriate for the application in which we are interested. A
technique to calculate the signal to noise ratio in images has been proposed
which can be used with some approximation. This method adopts that the
discontinuities in an image are only due to noise. For this reason,
experiments are done on an image with very little deviation in intensity. The
following Table 1 shows the Signal-to-noise Ratio (SNR) values of the input
and output images for the filtering approach.

The following Figure 3.1 shows the result images from the Mean filter,
LMS Adaptive Filter and Median Filter when the Gaussian noise is added to
the original image.

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 21


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

(a) (b)

(c) (d)

Fig 3.1: De-noising performance Comparison for the photograph image with
standard deviation of =0.05 when Gaussian Noise is added. (a) Original
Image with Noise, (b) Result image using Mean Filter approach, (c) Result
image using LMS Adaptive approach, (d) Result image using Median Filter.

Table 1: SNR Results with Gaussian Noise and Standard Deviation = 0.05

SNR value of input SNR value of an output


Method
image image
Mean Filter 13.39 21.24
LMS Adaptive
13.39 22..40
Filter
Median Filter 13.39 22.79

From the Table 1 it is shown that the median filter provides increased SNR
value of 22.79 than mean and adaptive filters. Median filter can be applied
for higher denoising performance in case of restoring the degraded original
image.

4. CONCLUSION
Image denoising and deblurring had been a major problem in the image
restoration methodologies. Different types of algorithms are studied for the
deblurring, denoising of degraded images and different type of filters are
also analyzed. Sparse representations have been found to provide the better

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 22


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
results of image restoration than other representations. Therefore based on
sparse representation, local and non-local methods can be used to restore the
degraded version of images effectively. Experimental result on filters shows
that median filter performs better than other types. By consolidating the
review and filter concepts, median and Gaussian filters can be applied for
sparse based representation of image denoising.

REFERENCES
[1] Jan Biemond, Reginald L. Lagendijk, and Russell M. Mersereau, Iterative methods
for image deblurring, Proceedings of the IEEE, Vol. 78, No. 5, pp. 856883, May
1990.
[2] Leonid I. Rudin, Stanley Osher, and Emad Fatemi, Nonlinear total variation based
noise removal algorithms, Phys. D, Vol. 60, pp. 259268, November 1992.
[3] David S. C. Biggs and Mark Andrews, Acceleration of Iterative Image Restoration
Algorithms, APPLIED OPTICS, Vol. 36, No. 8, pp. 1766-1775, 10 March 1997.
[4] Michal Aharon, Michael Elad, and Alfred Bruckstein, K-SVD: An Algorithm for
Designing Over complete Dictionaries for Sparse Representation, IEEE Trans. on
Signal Processing, Vol. 54, No. 11, November 2006.
[5] Michael Elad and Michal Aharon, Image denoising via sparse and redundant
representations over learned dictionaries, IEEE Trans. Image Process., Vol. 15, No.
12, pp. 37363745, Dec. 2006.
[6] Kostadin Dabov, Alessandro Foi, Vladimir Katkovnik, and Karen Egiazarian, Image
denoising by sparse 3-d transform-domain collaborative filtering, IEEE Trans. Image
Process., Vol. 16, No. 8, pp.20802095, Aug. 2007.
[7] Julien Mairal, Michael Elad, and Guillermo Sapiro, Sparse learned representations for
image restoration, In Proc. of the 4th World Conf. of the Int. Assoc. for Statistical
Computing (IASC), Yokohama, Japan 2008.
[8] Julien Mairal, Michael Elad, and Guillermo Sapiro, Sparse representation for color
image restoration, IEEE Trans. on Image Processing, Vol. 17, No. 1, pp. 5369, Jan.
2008.
[9] Reginald L. Lagendijk and Jan Biemond, Basic Methods for Image Restoration and
Identification, In: A. C. Bovik, the Essential Guide to Image Processing, Academic
Press, United States of America, pp. 326 330, 2009.
[10] Priyam Chatterjee and Peyman Milanfar, Clustering-based denoising with locally
learned dictionaries, IEEE Trans.Image Processing, Vol. 18, No. 7, pp. 14381451,
July 2009.
[11] Shengyang Dai, Mei Han, Wei Xu, Ying Wu, Yihong Gong, and Aggelos K.
Katsaggelos, Softcuts: a soft edge smoothness prior for color image super-resolution,
IEEE Trans. Image Process., Vol. 18, No. 5, pp. 969 981, May 2009.
[12] Weisheng Dong, Lei Zhang and Guangming Shi, Centralized Sparse Representation
for Image Restoration, in Proc. IEEE Int. Conf. on Computer Vision (ICCV), 2011.
[13] Weisheng Dong, Guangming Shi, and Xin Li, Image Deblurring with Low-rank
Approximation Structured Sparse Representation, Signal & Information Processing
Association Annual Summit and Conference (APSIPA ASC), Asia-Pacific 2012.

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 23


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
[14] Sa'dah Y, Nijad Al-Najdawi and Tedmori S, Exploiting Hybrid Methods for
Enhancing Digital X-Ray images, The International Arab Journal of Information
Technology, Vol. 10, No. 1, January 2013.

This paper may be cited as:


Sakthivel, S. and Parameswari, M., 2014. A Survey on Sparse
Representation based Image Restoration. International Journal of Computer
Science and Business Informatics, Vol. 12, No. 1, pp. 11-24.

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 24


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Simultaneous Use of CPU and GPU to Real


Time Inverted Index Updating in Microblogs
Sajad Bolhasani
Department of CSE,
Lorestan Science and Research College,
Lorestan, Iran

Hasan Naderi
Assistant Professor
Department of CSE,
Iran University of Science and Technology,
Tehran, Iran

ABSTRACT
Nowadays, with attention to developing the different data networks, the wide masses of
data are producing and updating continually. Managing the great data enumerate the
fundamental challenges in data mining. One of the considered main subjects in this context
is how searching among the wide masses of data. Therefore, require to producing the
typical powerful, expansible and efficient file of documents and data for using in search
motors is necessary. In this study, with surveying the done prior works, implementing the
inverted index with the immediate updating capability from the dynamic and little data of
microblogs is targeted. With utilization from processing multicore facility, the approach of
the graphical processing unit (GPU) is presented that as expansible and without decreasing
the attention, the index file is prepared with suitable speed, as the mentioned file is usable
in inquiry unit. This method tries to feed the updating unit continually with separating the
operation for the system Central Processing Unit (CPU) and suitable utilization of parallel
processing capability of CUDA core. Also, in parallel to increasing the quality, one Hint
method is presented for employing the vacant cores and compactor function for decreasing
the index file mass. The results indicate that the presence of necessary hardware, the
presented method in identity to immediate updating slogan, have the upper speed for
making the inverted index of microblogs than to available samples.
Keywords
Inverted index, Microblog, GPU, Update.

1. INTRODUCTION
In data mining and contextual managing information inverted index is a
main key for each searching process by having this file search engines have
the ability to stream a search without any repeated attention to the content of
any documents. The structure of inverted index is generally upon the hash
table frame and consists of a word dictionary and some values. Creator of
inverted index in a process of searching skim the words in document,
analyze and stemming, after all adds them to the dictionary. In this Platform

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 25


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
each term is a specific key in dictionary of words. Any keyword in
dictionary refer to a list of ID these ID's refer to those documents that,
containing keywords. While a change applied on a document there is a
necessity to update the ID files. So this updating process has some costs.
The ultimate goal for each dynamic inverted index is reducing updating
speed and near to zero or real-time it [1, 2, 3]. And the way we introduce in
this article to reach that goal is dividing and paralleling of making inverted
index operation. By using the capability of multi core, multi thread GPU's
help us to near our goal [4]. Cuda cores have capability to operate
simultaneously multi tasks give us the opportunity to divides the
instructions for paralleling in little blocks. Microblogs introduce as data
entries for inverted index in this article. In conclusion with the approach
introduced above for making inverted index use any possibility from
microblog's documents recognized by crawler so files makes with possible
lower cost and use with real-time update.

2. BACKGROUND STUDY
Time in updating inverted index is an important characteristic of
measurement on search engines. Insert time in multi barrel hierarchy index
[5], consist of different index size that will finally contribute each other with
this functions.
1
(1) 1 = log
log
()
(2) 2 = log
()
log
()
In function (1) 1 is the average time for insert n new documents with
different sizes. is also the time for making statistic inverted index with
1
the size of n. And also using log verses to show how much
log
use of k has positive effect on ability of system. In function (1) with
()
increasing the value of k the average insert time will near to that it will

be our ideal but it will increase search time. In function (2) by increasing k
search speed will improved.
2.1 Inverted Index
Search engines consist of three parts. Crawler that find web pages, indexer
that indexes inverted index and crawled web pages and a ranker that
answers query by using indexes. Dynamic documents are document that
change and update continually. Static inverted index [6] has a dictionary
structure. Dictionaries are making from split word in text and find their
stems by using algorithm called "porter stemmer" and prepare them for
indexing. The reason for saving stems is to reduce memory size and
indexing more documents in search result.

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 26


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

DOCUMENT
Index Builder
Inverted Index
DOCUMENT
Text objects PostingListNode
File TREC
Parse Document

SimpleStreamTokenizer
Stop Wordlist

Figure 1. Structure of index file [7]

All of information that gathered by search engine use as entry for inverted
index after process of save in barrel.
2.2 GPU structure and capability of paralleling non graphical tasks
The structure of graphic processor consists of many simple processing units
called thread. These units are only make for simple calculations like add or
subtract. By introducing CUDA form NVIDIA CO. the limitations of non-
graphical tasks has taken from graphic units.in a graphic card ,elements
have separated memory so designers know them as independent device or
even a computer. Regarding to this knowledge each computer has a
processor that works with a unique memory.in GPU a shared memory
specify to each block; also each stem has two specific memories. Local and
stable (fixed). Local memory use for global memory data or shared memory.
This memory is similar to computers hard drive that use as a kind of main
memory in a graphic unit. In this structure commands process in stems (set
of threads) simultaneously [8,9,10].

3. RELATED WORKS
In this way microblogs real time identification has discussed [11].
Microblogs update their contents many times. Main core of this structure
consist of inverted indexes with the value of1 = 0 2.new microblogs
place in smallest index so these set of indexes gather in larger indexes. This
hierarchy makes passive updating. Results obtained in their studies
regarding to multi threads capabilities.
QiuyingBai [12] and coworkers apply a way for real time updating in
subject oriented search engines. They are designing an inverted index with
three parts, primary inverted index, additional inverted index and a list of
omitted files. This way of real time updating is useful for subject oriented
search engine.
In a different research [13] introduce a way in which by using a graphic
processor and multi core processing among web without any distribution
and only one computer system the operations of build and update of index
file will be done. one of the advantage for this way is that the graphic unit

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 27


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
will done all processing and in result enteral processing unit will be free to
different task so that is not faced to the problem of decreasing the capability
of system.

4. METHODOLOGY
In this study we present a shared method in which by using two different
processors the operation of both build and updating of inverted index will
done in real time. In this method we take incoming documents as preset
frame from microblogs, the reason for this task is innate characteristic of the
method plan, where data income to graphic processor and divided in smaller
units (block). The sizes of units are very restricted (small) in GPU. Each
block run through a thread from a CUDA and process a carrying command
or part of it .In this stream of processing structure, cores of CUDA will be
like warp and blocks are like woof and regarding to this advantage that each
block done its process task in a clock so incoming documents that take from
crawler are consider as small graphic unit blocks. First, documents depend
on adding to index or deleting take stickers of I or D and places in a lineup
(Q) on basic memory of system so inter to the central processing unit. From
now on, the cores of CPU divide in two sets: half splitters and half updaters.
Documents are inter to splitters and change to term and prepare to send
them to next step. After each task of splits, by time lapse documents in
separated sets are inter to graphic unit to continue the task in parallel. While
documents inter to GPU from microblogs and set into blocks .graphic unit
depends on number of both blocks and CUDA core will done the process
phase to phase .In each block from graphic processor unit, two different
tasks of operation for build a doc index will done. Build in Insertion index
barrel and build in deleted index barrel. The kind of operation is shown in
header of document.

Figure 2. Tasks of real time inverted index by sharing processors.

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 28


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
In first part of GPU processing, inserted blocks both identify how many
times a word repeated in each document and the number of deleting blocks
and saves in global memory of graphic unit. Soon threshold processing done
and threshold inverted index makes. Hint is a notifiable function
(mechanism) designed for times the CPU cores do nothing (non-function).
This function identifies free times of each core and shares them temporarily
to each other so the amount power of processing for those very busy units
will increase.
HintArray[a]: S S S U U U
C Reset
opy
HintArrayC[a]: S S S U U U T0
S S U U U U T1
S U U U U U T2
S S S U U U T3
S S S S U U .
.
Tn
Figure 3. Change routine in GPU cores by using hint function.

In picture above the cores of main processor of system divided in two sets:
splitters (S) and updaters (U). that depends on how much busy an entrance
of a core ,the hint function will change real time those free cores to S or U to
facilitate busy points.

5. EVALUATION
Consider to ultimate goal of research that is both build inverted index and
real time updating of it so measurement range is already time for this
research. As you can see in below chart two tasks that are use graphic unit
or without graphic unit is presented. In both the portion of size with time
have direct relation. But execute time is decrease when we use the capability
of multi core CUDA graphic processor and it is guide us to reach real time
execution.

Figure 4. Evaluation algorithm by CPU and CPU&GPU.

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 29


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
In this chart horizontal line is the number of documents and vertical line is
the time for building file. Two upper lines related to required time to build
index and send for CPU and two lower lines is required time for GPU and
CPU.

6. CONCLUSIONS
In this article there is an effort to use graphic units processing power and
also use of central processing unit simultaneously for build a real time
updating index system from microblogs contents. Also present a way to
strongly use of non-function cores. Finally presented the algorithm that has
the function to build inverted index from documents in microblogs and
updating them in real time.
The future for this research is an end to create a static structure and present a
unit called index managing to distribute streams of processing between
processor units.

7. ACKNOWLEDGMENTS
We would like to thank Muhammad Saleh Mousavi for their exceptionally
useful reviews.

REFERENCES
[1] P. Mudgil, A. K. Sharma, and P. Gupta, An Improved Indexing Mechanism to Index
Web Documents, Computational Intelligence and Communication Networks (CICN),
2013 5th International Conference on, 27-29 Sept. 2013, pp. 460 - 464.
[2] R.Konow, G.Navarro, and C. L. A. Clarke, Faster and Smaller Inverted Indices with
Treaps, artially funded by Fondecyt grant 1-110066 , by the Conicyt PhD Scholarship
Program, Chile and by the Emerging Leaders in the Americas Program, Government
of Canada ACM, 2013.
[3] S. Brin and L. Page, Reprint of: The anatomy of a large-scale hypertextual web search
engine, The International Journal of Computer and Telecommunications Networking,
2012, 3825-3833.
[4] Z. Wei and J. JaJa, A fast algorithm for constructing inverted files on heterogeneous
platforms, J. Parallel Distrib. Comput, 2012.
[5] N. Grimsmo, Dynamic indexes vs. static hierarchies for substring search, Trondheim,
2005.
[6] R. A. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval, Addison-
Wesley Longman Publishing Co, Inc., 1999.
[7] C. D. Manning, P. Raghavan, and H. Schtze, Introduction to Information Retrieval,
Book, ISBN:0521865719 9780521865715 , 2008.
[8] NVIDIA CUDA, NVIDIA CUDA C Programming Guide, Book, www.nvidia.com,
2012
[9] W. Di, Z. Fan, A. Naiyong, W. Fang, L. Jing, and W. Gang, A Batched GPU
Algorithm for Set Intersection, Pervasive Systems, Algorithms, and Networks (ISPAN),

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 30


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
2009 10th International Symposium on, 978-1-4244-5403-7, 14-16 Dec. 2009, pp. 752
- 756.
[10] Z. Wei and J. JaJa, A fast algorithm for constructing inverted files on heterogeneous
platforms, J. Parallel Distrib. Comput. 2012.
[11] W. Lingkun, L. Wenqing, X. Xiaokui, and X. Yabo, LSII: An indexing structure for
exact real-time search on microblogs, in Data Engineering (ICDE), IEEE 29th
International Conference, 2013.
[12] Q. Bai, C. Ma, and X. Chen, A new index model based on inverted index, Software
Engineering and Service Science (ICSESS), 2012 IEEE 3rd International Conference
on, 978-1-4673-2007-8, 22-24 June 2012, pp. 157 - 160.
[13] N. N. Sophoclis, M. Abdeen, E. S. M. El-Horbaty, and M. Yagoub, A novel approach
for indexing Arabic documents through GPU computing, Electrical & Computer
Engineering (CCECE), 2012 25th IEEE Canadian Conference on, 978-1-4673-1431-2,
April 29 2012-May 2 2012, pp. 1- 4.

This paper may be cited as:


Bolhasani, S. and Naderi, H., 2014. Simultaneous Use of CPU and GPU to
Real Time Inverted Index Updating in Microblogs. International Journal of
Computer Science and Business Informatics, Vol. 12, No. 1, pp. 25-31.

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 31


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

A Survey on Prioritization Methodologies


to Prioritize Non-Functional Requirements
Saranya. B.
Department of Computer Science and Engineering
Sri Krishna College of Technology, Coimbatore, Tamilnadu, India

Subha. R.
Department of Computer Science and Engineering
Sri Krishna College of Technology, Coimbatore, Tamilnadu, India

Dr. Palaniswami. S.
Principal, Government College of Engineering,
Bodinayakanur, India

ABSTRACT
Nonfunctional Requirements are as important as functional requirements. But they have
been often neglected, poorly understood and not considered adequately in software
development process. If the NFRs are not met properly, it will lead to the dissatisfaction of
customers. NFRs may be more critical than functional requirements as there can be mutual
dependencies among the NFR, which may affect the completion of the project. Hence it is
necessary to prioritize the NFRs effectively. But prioritizing such NFR is a challenging task
in Software development. Many techniques are used to prioritize the requirements in
various dimensions. It is important to choose the appropriate requirement prioritization
technique for a particular software development process. One can select the appropriate
techniques based on the various factors such as, the stakeholders involved, available
resources, and the product he develop and so on. The goal of this paper is to increase the
awareness about the importance of NFRs and to analyze the various techniques that are
used to prioritize the NFRs.

Keywords
Requirements Engineering, Non Functional Requirements, Prioritization of NFRs,
Prioritization techniques, Quality requirements, NFR algorithm

1. INTRODUCTION
Requirements Engineering (RE) is the subfield of Software engineering; it
involves formulating, documenting and maintaining of software
requirements [16]. Requirements are generally described as what the system
is required to do along with the environment, it is intended to operate in.
Requirements provide the description of the system, its behavior,
application domain information, system constraints, specifications and
attributes [7].

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 32


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Requirements may be Functional or Nonfunctional.


Functional requirements (FRs) describe system services or
function.
Nonfunctional requirements (NFRs) are a constraint on the system
or on the development process.

The purpose of identifying non-functional requirements is to get a handle


on these absolutely necessary requirements that are normally not functional.
Some of these NFRs may conflict with each other. As like of the functional
requirements the NFRs a will also vary during the development process. So
it is essential to keep track to the NFRs throughout the development
process. NFRs are of three types Product requirements, Organisational
requirements, and External requirements [12].
Product requirements: Requirements which specify that the
delivered product must behave in a particular way, e.g. execution
speed, reliability, etc.
Organisational requirements: Requirements which are a
consequence of organizational policies and procedures, e.g. process
standards used implementation requirements, etc.
External requirements: Requirements which arise from factors
which are external to the system and its development process, e.g.
interoperability requirements, legislative requirements, etc.

The software development market can be divided into two major types,
namely market-driven development and bespoken development. In market-
driven development the product is developed for an open market, whereas in
bespoken market the product will be developed for the particular customer
based on their wishes. In the bespoken development if there is only one
customer there will not be any problem. But in real time, many customers
and developers will be involved in the software development and everyone
has different views and opinions. In such situation requirement
prioritization plays a major role in software development.

Requirement Prioritization is done in order to determine which NFRs to be


implemented in a software product in the current release or in order to
understand the mandatory NFRs that should be implemented in the product
to obtain the satisfaction of the customers. During a project development,
decision makers in software development need to make many different
decisions regarding the release plan. Requirement prioritization plays a
major role in decision making. It helps the decision makers to decide when
to release a product, how to develop the project in time, how to reach
milestones, etc. It helps them to resolve issues and risks arise during the

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 33


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

software development. The main reason for prioritizing is that, all NFRs
cannot be implemented by the given time or with the given resources.
In this paper, we analyze the prioritization techniques. The paper is
structured as follows: After the introduction Sect. 2 provides a review of
related work, Sect 3 explains about requirement prioritization, Sect. 4 &
Sect. 5 describes the various techniques to prioritize the NFRs; Sect. 6
concludes the paper.

2. RELATED WORK
Quality requirements are defined during the early stages of development,
but they are not incorporated properly during software development. The
NFRs are prioritized for various reasons such as to determine whether the
particular requirement is mandatory or not, to eliminate the requirements
that are unnecessary to the software product and to schedule the
requirements for implementation [5]. When the requirements are properly
prioritized it provides the significant benefits such as improved customer
satisfaction, Lower risk of cancellation and it also helps to identify the
hidden requirements. It helps to estimate the benefits of the project and also
priorities of requirements can help to determine how to utilize the limited
project resources. The various factors involved in prioritizing requirements
are cost, risk, value, benefits, dependency constraints, effort business value
dimensions etc. [1].

In web development NFRs are not given importance as like the functional
requirements, they are not discussed properly in the early stages. But web
developers found that NFRs become an issue during the later stages of
development. As a result, it leads to frequent change in the system design.
So paying more attention to the NFRs increases the quality of the web
system and also reduces the development time. In many techniques while
prioritizing NFRs, mostly they are converted into Functional requirements.
For example, the security requirement is For example, the security
requirement is operationalized as login requirements for cost estimation [6].
So when NFRs are elicited properly, it also leads to the discovery of new
Functional requirements (FRs) [11].

Many challenges arise while prioritizing requirements. For example, some


stakeholders believe that all the requirements have high priority, even
though they accept theoretically that different requirements have different
priority they always try to push for having most requirements under high
priority. So we should make them to understand that the requirement is not
necessary for time being [5]. So when NFRs are elicited properly, it also
leads to the discovery of new Functional requirements (FRs) [11].

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 34


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Many challenges arise while prioritizing requirements. For example, some


stakeholders believe that all the requirements have high priority, even
though they accept theoretically that different requirements have different
priority they always try to push for having most requirements under high
priority. So we should make them to understand that the requirement is not
necessary for time being [5]. The main difficulties in prioritizing quality
requirements are: First it is difficult to determine the cost and effort required
to implement NFRs in advance. Second for quality requirement values are
continuous, so it is tough to determine the right required value [3].

3. REQUIREMENT PRIORITIZATION
Requirement prioritization is an important activity in software development;
it helps to manage the relative importance of requirements and to manage
the software development when the resources are limited.[13]. All the
Stakeholders should collaborate to prioritize the requirements efficiently.
Requirement prioritization supports the following activities:
To estimate expected customer satisfaction.
To decide the core requirements.
To schedule the implementation of requirements.
To handle the dependency between the requirement.
To establish the relative importance of each requirement.

Requirements can be prioritized under various dimensions. They are as


follows [1]:
Cost: It denotes the cost required to implement the requirement
successfully. Often the developers will prioritize the requirements in the
terms of money.
Value: Value is the importance of the each requirement when compared to
other requirements. The requirements with high importance are
implemented first. As every stakeholder has various views on a requirement,
it is difficult to determine the value of a requirement.
Risk: It denotes the risk arises while implementing a Requirement. The
requirements with the high risk should be implemented first else it would
extend the time of implementation.
Dependency Constraints: It denotes how two requirements are dependent
on one another.
Precedence Constraints: This condition occurs when
implementation of one requirement requires the completion of any
other requirements.
Coupling Constraints: It represents the requirements that can be
implemented in parallel.
Business Values: It represents the mission of an organization. Example of
Business values is Continuous Improvement, customer satisfaction etc.

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 35


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Effort: It is the effort required to implement the requirement during the


development process.
Resources: Resources refer to the budget, staff and schedule. It is the
crucial factor that is not considered in most prioritizing techniques.
Approach: Requirements prioritization involves two approaches.
Absolute Assessment: Every requirement is assessed against each
criterion.
Relative Assessment: With this approach, each every requirement is
compared with every other requirement.

As the requirements are analyzed in various perspectives, the requirement


prioritization also provides many other benefits. For example, it helps to
find the hidden requirements, defects in requirements. By implementing the
high priority requirements before lower priority requirements it is possible
to reduce the time and cost required for that project. The require
prioritization can be divided into sub process [5]:
Convince stakeholders: Convince the stakeholders and make them to
understand the requirements.
Train stakeholders: Train the stakeholders to participate in the
prioritization process,
Categorize raw potential requirements: Then the raw potential
requirements should be categorized into actual requirements, useful
capabilities and desirable capabilities.
Prioritize the actual requirements: Prioritize the identified requirements it
involves negotiating with stakeholders and validating the results. To
determine the priorities of requirements the prioritization process should be
done in incremental basis. For the effective prioritization, representatives
from all stakeholder groups should be involved in requirement team and it
should be led by a professional requirements engineer.
Publish the priorities: After prioritization the requirement team should
publish the results and let stakeholders know about it.
Estimate effort: After finding the priorities calculate the effort required to
implement the requirements.
Schedule development: Then schedule the development process according
to the priorities calculated. As requirements are allocated in incremental
basisthe requirements team,development team and the management team
should work together to allocate the requirements properly.
Maintain priorities. During the development process the requirements and
priorities may vary. So store the priorities in the repository and maintain it
when changes occur.Simply the requirement prioritization can be carried
down in three stages [14]. The preparation stage, in this stage the team and
the team leader will prepare the requirements according to the principles of
the prioritization techniques to be used. The second stage is an execution

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 36


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

stage where the decision makers actually prioritize the requirements. The
final stage is a presentation stage where the results of the prioritization are
presented. But while prioritizing the requirements numerous ricks and
challenges arises [5].

The major challenges in prioritization are:


When requirements are large it is difficult to prioritize it efficiently.
As different stakeholders have different views, finding the
mandatory requirement is difficult.
When resources are limited then prioritization is very challenging.
As the requirements may vary in later stages, it also leads to changes
in priorities.
Some requirements may incompatible with another. So
implementing one requirement denies the implementation of other
requirements.
The customer may misunderstand the prioritization process.
The customer may try to pull all requirements in mandatory group.

4. PRIORITIZATION METHODOLOGIES
4.1 Numerical Assignment
Numerical assignment involves the process of grouping the requirements.
At first the requirements are divided into different groups, then they are
given to the stakeholders to enter the scale value from 1-5 for each
requirement based on the importance. Finally, the average value given by
all stakeholders are considered as the ranking for that requirement [9]. The
main disadvantage of this method is [1], as different users have different
opinions the information obtained is relative and it is difficult to determine
the absolute information. If the stakeholders prioritize the requirements,
they will pull 85% of the requirements into the high priority group.
4.2 Analytical Hierarchy Process (AHP)
AHP [1] is decision making technique which involves pairwise comparison.
With the help of multiple objectives or criteria AHP allows decision makers
to choose the best requirements from the several decision alternatives. AHP
involves three steps. They are
Making pairwise comparison.
Calculating the priorities of requirements and decision
making
Checking consistency.
In this is technique each requirement is compared with every other
requirement to determine to what extent one requirement is more important

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 37


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

than other requirements. So for n requirements n. (N - 1) /2 pairwise


comparisons are required [9].
The fundamental scale used in AHP is given below [1]:
1 Of equal importance
2 Moderate difference in importance.
5 Essential differences in importance.
7 Major differences in importance.
9 Extreme differences in importance.
Reciprocals-If requirement I have one of the above numbers assigned to it
when compared with requirement j, then j has the reciprocal value when
compared with I.
The redundant pair-wise comparison makes this methodology very
trustworthy during decision making. But when the number of requirements
becomes large, it requires a large number of comparisons so the system will
become complex.
4.3 Value-Oriented Prioritization (VOP)
It involves two steps, first establishing the Framework. The first step is
identifying the business core values, and then the organizational executives
must provide the importance of those values to the organization. To assign
weights simple scale ranging from 0 (Not important) to 10 (Critical) is used.
VOP also supports weighting using business risk categories. Then Applying
the Framework that is by using the identified core values and risk weight
categorys requirements are prioritized by constructing prioritization matrix.
VOP is the only technique which considers the business core values.In VOP
decision making is transparent to all the stakeholders involved, so it avoids
lengthy arguments about a particular requirement [8].
4.4 Cumulative Voting
Cumulative voting (CV) or 100 dollars method is a prioritization technique
where each stakeholder will be given 100 imaginary units. They can use
these units to vote for the requirement that they consider as important. One
can use all this 100 units for a single requirement or it can be equally
distributed to all requirements. In this way, units assigned to a requirement
represent the respondents relative preference to a requirement in relation to
the other requirement. Because of this cumulative voting is also known as
proportional voting. The term proportional also reflects that if the
amount of units assigned to a requirement is divided by the constant number
of units available to each stakeholder, the result will be a proportion
between zero and one.

In CV only one chance will be given to a stakeholder because if more


chance is given they may use all their units on their favorite requirements to
make it as the highest priority one [15]. Some stakeholders will pull all their

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 38


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

units into a requirement that is not highly preferred by any others. So it will
lead to conflicts in the prioritization process.

4.5 Binary Search Tree


Binary search tree (BST) uses basic BST data structure concept. BST
algorithm stores the information and it can be retrieved when it is necessary.
The BST is the special case of binary tree in which each node has at most
two children. The child nodes to the right have greater value/importance
than the root node, and the child nodes to the left have less value/importance
than the root node. If the BST is traversed in order, the requirements will be
in sorted order [14]. For n requirements binary tree will consist of n
requirements and it requires n log n comparisons until all requirements are
inserted.

The three stages involve in the BST are as the preparation step requirements
are gathered in execution step the requirements are inserted one by one.
First insert the one requirement in the first node, while inserting the priority
then it is added as the right child. Finally, at the presentation stage traverse
the tree in order and prioritize the requirements.

5. PRIORITIZATION METHODOLOGIES PARTICULARLY


DEFINED FOR NFRS
With the above mentioned methodologies there are also many other
methodologies such as Top ten requirements, Ranking, Planning Game were
used to prioritize NFRs [8]. But all of these techniques were designed for
FRs and can be used for NFRs. So it does not give efficient results when we
use to prioritize NFRs. There are some methodologies that are particularly
defined to prioritize the NFRs. They are discussed below.
5.1 Prioritization Using Architecture Feedback
This architectural design starts with the initial Requirements Engineering
activities as the initial understanding of NFRs is required. The steps are
shown in Fig. 1
The steps involved in architectural feedback are:
Identify the NFRs relevant to the software development.
Specify the NFRs.
Prioritize the NFRs based on the views of the stakeholders involved.
Based on the initially prioritized NFRs design the initial architecture
and create the architecture model with quality annotations required
for evaluation.
Evaluate software architecture and explore the design space using
the design space exploration tool such as Per- Opteryx.

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 39


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Requirements
Engineering Identify Relevant Specify Quality Prioritize Quality
Activities Quality Attributes Requirements Requirements

Feedback

Software
Architecture Evaluate Software Analyze Trade-offs
Evaluation Design Software Architecture and Dependencies
Activities Architecture and Explore Design Space Further
Development
process

Figure 1. Prioritization using architectural feedback

Based on the design space explorations Analyze trade-offs (the


above 3 steps are done by software architects).
Based on the results obtained, stakeholders discuss and negotiate on
the required quality levels and reprioritize the NFRs.
The software architect updates the architecture accordingly.
The architectural design is used to implement the system and it
should be updated continuously.
If any changes in NFRs then above steps should be repeated.

This process helps the stakeholders and architects to understand the NFRs
effectively. But this method is applicable only to the quantitatively
evaluated quality properties.

5.2 Prioritizing By Goal-Decomposition


This approach uses goal decomposition method to prioritize NFRs. In this
method nonoperation specifications are translated in to operation
definitions. After the system is developed the operation definitions are
evaluated [4]. Initially requirements are prioritized while eliciting. Here the
stakeholders will group the requirements which they think as critical and
they are implemented first. But developers do not have any influence in this
step. The second step is executed during the requirement negotiation
meeting. In this step the requirement conflicts are analyzed in the presence
of all stakeholders. To support this step link pairs of requirements is used
with the right sign that indicates their positive (+), negative (-) or neutral
() interaction.

The rules for assigning the link pairs are


The value - is assigned to a pair of NFRs when one NFR in the
pair has a negative effect on the other at the same functionality.

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 40


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

The value + is assigned to a pair of NFRs where one NFR in the


pair has a positive effect on the other.
The value is assigned to a pair of NFRs in the set of NFRs that do
not interact.
The value ? is assigned to a pair of NFRs when the influence is
unknown.
The conflicts in among the NFRs can be resolved using the scale
values, for example very important, important, average and dont
cares can be used.
Then identify the stakeholder goals that are important to the success
of the system. In this the hard goals will be given higher priority
than the soft goals. Apply Goal Question Metrics to the NFRs and
prioritize them.

5.3 NFR Algorithm


The NFR algorithm is based on the mutual dependencies among the NFRs
and the preferences of the stakeholders. The priority of NFRs is determined
by using the respondents of the project and the business organization are
involved [9].It uses a simple heuristic to prioritize the NFRs. The steps
involved in NFR algorithm are followed:
Business organization representatives identify the NFRs that are
concerned with the particular project.
Based upon the NFRs Business process Hierarchy is created. This
scenario defines the final product that should be delivered to the
customer.
The identified NFRs are given to stakeholders to provide ratings
between 0-9 based various Business Value Dimensions (value,
penalty, cost, risk).

Stake holder
preerences

Requirement Constructionof
Identification Providing
business process
of NFRs hierarchy ratings
Document
preferences
Prioritized
Heuristics for Adjustment of Assessing
NFRS NFR scores Relative
prioritization
NFRs importance

preferences
Association Matrix
Net % change

pref Figure 2. NFR Algorithm


erences
preferences
pre ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 41
ferences

preferences
pref
International Journal of Computer Science and Business Informatics

IJCSBI.ORG

The ratings are then combined to calculate the relative importance of


the NFRs

The relative priority is calculated using the formula

+ %
=
( %+ %)
(1)

Where
+
+ % = 100
+
(2)

% = 100

(3)

% = 100

(4)

i = ith NFR considered for implementation.

Identify the association among the NFRs. The association between


the NFRs may be positive, negative or neutral. The association can
be expressed clearly in the form of association matrix.
So construct NFR association matrix to the scenario.

X1 X2 X3 X4

X1 +m2 +m6
X2 +m3
X3 -m4 -m7
X4 -m1
Net % change NC1 NC2 NC3 NC4

Figure 3.Association matrix

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 42


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Indicators mis in the table shows that if the sign is + then there is a
positive association between the NFRs and if the sign is - then there
is a negative association between the NFRs.
Net % Change row indicates the aggregate percentage
improvement or degradation of capability identified by NFRs.
Adjusted importance is calculated using the formula:

Adjusted
Importance i= Relative Importance i * (1+ Net % Change i/100)
(5)
Where
i = ith NFR considered for implementation.

The NFRs are dropped by using the following heuristics.


Calculate the mean of the Adjusted Importance value of all the
NFRs.
Drop the NFRs with the negative adjusted importance and whose
value is less than the mean value of adjusted importance.

6. CONCLUSION
In the competitive Business environment, the quality of a product plays a
very crucial role in its success. So we have to prioritize the NFRs
efficiently. In the above techniques we found that the NFR algorithm is a
most suitable methodology to prioritize the NFRs. It is because the
algorithm is particularly designed for NFR prioritization. As the business
process hierarchy is created it can able to identify all the NFRs easily, it
prioritizes the NFRs in various dimensions and from various stakeholder
views. The heuristics involved is very simple to calculate and it also
considers the mutual dependencies among the NFR. So NFR algorithm can
prioritize the NFRs efficiently and in the cost effective manner.

REFERENCES
[1]. Aaqib Iqbal, Farhan, M. Khan, Shahbaz, A. Khan, 2009. A Critical Analysis of
Techniques for Requirement Prioritization and Open Research Issues. International Journal
of Reviews in Computing. Vol. 1.
[2]. Anne Koziolek, 2012. Research Preview: Prioritizing Quality Requirements Based On
Software Architecture Evaluation Feedback. REFSQ 2012. pp. 52-58.
[3]. Berntsson Svensson, R., Gorschek, T., Regnell, B., Torkar, R., Shahrokni, A., Feldt, R.,
and Aurum, A. 2011. Prioritization of quality requirements state of practice in eleven
companies. RE'11 IEEE. pp. 69.
[4]. Daneva1, M., Kassab, M., Ponisio, M.L., Wieringa, R. J., Ormandjieva, O., 2007.
Exploiting A Goal-Decomposition Technique to Prioritize Non-Functional Requirements.
Proceedings of the 10th Workshop on Requirements Engineering WER.
[5]. Donald Firesmith, 2004 Prioritizing Requirements, Journal of Object Technology,
Vol 3, No 8.

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 43


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

[6]. Herrmann, A., Daneva, M., 2008. Requirements Prioritization Based On Benefit And
Cost Prediction: An Agenda For Future Research. In Proceedings of The 16th IEEE
International Requirements Engineering Conference. pp. 125134.
[7]. Kotonya, G., Sommerville, I., 1998. Requirements engineering: Processes and
techniques, Chichester. UK: John Wiley & Sons.
[8]. ManjuKhari ,Nikunj Kumar, 2013. Comparison of six prioritization techniques for
software requirements.Journal of Global Research in Computer Science, Vol. 4, No. 1.
[9]. Muhammad Ramzan, ArfanJaffar, M., Arshad Ali Shahid, 2011. Value Based
Intelligent Requirement Prioritization (Virp): Expert Driven Fuzzy Logic Based
Prioritization Technique. International Journal Of Innovative Computing, Information And
Control.Vol. 7, No. 3.
[10]. Rahul Thakurta, 2013. A Framework for Prioritization of Quality Requirements for
Inclusion In A Software Project. Springer Software Quality Journal. Vol. 21, No 4, pp. 573-
597.
[11]. Yusop, N., Zowghi, D., Lowe, D., 2008. The Impacts of Non-Functional
Requirements in Web System Projects. International Journal of Value Chain Management,
Vol. l2, pp. 1832.
[12]. www.inf.ed.ac.uk/teaching/courses/cs2/LectureNotes /CS2Ah/SoftEng/se02.pdf.
[13].http://www.corpedgroup.com/resources/ba/ReqsPrioritization.asp
[14]. Joachim Karlssona, Claes Wohlinb, Bjorn Regnell, 1998. An evaluation of methods
for prioritizing software requirements. Information and Software Technology. Vol. 39, pp.
939947.
[15]. Patrik Berander Anneliese Andrews, 2005. Requirements Prioritization,
Engineering and Managing Software Requirements, pp. 69-94.
[16]. http://en.wikipedia.org/wiki/Requirement_engineering.

This paper may be cited as:

Saranya. B., Subha. R. and Palaniswami. S., 2014. A Survey on


Prioritization Methodologies to Prioritize Non-Functional Requirements.
International Journal of Computer Science and Business Informatics, Vol.
12, No. 1, pp. 32-44.

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 44


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

A Review on Various Visual


Cryptography Schemes
Nagesh Soradge
Sinhgad College of Engineering,
Vadgaon. Pune, India.

Prof. K. S. Thakare
Associate Professor
Sinhgad College of Engineering,
Vadgaon. Pune, India.

ABSTRACT
Cryptography is study of transforming information in order to make it secure from unintended
recipients or use. Visual Cryptography Scheme (VCS) is a cryptography method that encrypts
visual information (picture, printed text, handwritten notes) such that decryption can be
performed using human visual system. The idea is to convert this visual information into an
image and encypher this image into n different shares (known as sheets). The deciphering only
requires selecting some shares out of n shares. The intent of this review paper is to contribute
the readers an overview of the basic visual cryptography scheme constructions as well as
continued work in the area. In inclusion, we also review some applications that take advantage
of such secure system.

Keywords
Visual cryptography scheme (VCS), Pixel expansion, Contrast, Security, Accuracy,
Computational complexity

1. INTRODUCTION
Various sensitive data such as credit card information, personal health
information, military maps and personally identifiable information are
transmitted over the Internet. Multimedia information is also transferred over
the Internet conveniently, with the advancement of technology. Therefore, the
protection of the secret information has become critical research. While using
secret images, hackers may get help of weak link over network to suspect the
information. To solve the problem of protection of secret images, many image
secret sharing schemes have been formed. A new information security
technique called visual cryptography scheme was invented by Naor et al in
1994 [1]. Human visual system decode secret (handwritten notes, printed text
and pictures etc.) directly without performing any computations. This scheme
excludes complex computation problem in decryption and the secret images

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 45


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
can be reinstated by stacking operation. This property of visual cryptography is
useful for less computation load requirement.
Visual cryptography was presented for the problem of secret sharing. Secret
sharing is one of the early issues to be considered in cryptography. In
particular, suppose 4 smart robbers have deposited their loot in a bank account.
These robbers do not trust each other and they do not want a single robber of
themselves to withdraw the loot and escape. However, they assume that
withdrawing loot by at least two robbers is considered a loyalty. Therefore,
they decided to encrypt the bank code (with a trusted machine) into 4 partitions
so that at least two partitions can reconstruct the code and the partitions are
distributed themselves. Since the robbers will not have a machine with them to
decrypt the bank code when they want to withdraw the loot, they want to be
able to decrypt visually. The partition should not yield any information about
code. Nonetheless, by taking any two or more partitions, stacking them together
and aligning them, the code should be constructed. The solution to above
complication is given by visual cryptography scheme.
Simplest visual cryptography scheme is given by following structure. A secret
image will be made up of a gathering of black and white pixels, where each
pixel is served independently [1]. To encrypt the image, we split the image into
n modified versions such that each pixel in a share subdivides in m black and
white sub-pixels [1]. For deciphering the image, we pick a subgroup S of those
n shares. If S is a qualified subset, then stacking all these shares will allow
recovery of the image.
This paper introduces the construction of (k, n) threshold VCS along with some
parameters used to describe the model. Later, this paper provides overview of
various visual cryptography schemes. To meet the demand of multimedia
information, gray and color image format should be enciphered by the schemes.
Performance measures like security and computational complexity that affect
the efficiency of visual cryptography are also discussed.
The rest of the paper is structured as follows. Section II will describe the model
for the construction of (k, n) threshold VCS. Section III provides overview of
black and white VCS. Section IV elaborates color VCS. Applications of VCS
are included in section V. Performance of visual cryptography schemes are
analyzed in section VI and conclude the paper in section VII.

2. MODEL FOR ENCRYPTION


VCS model as well as (k, n)-threshold VCS scheme that was proposed by Naor
and Sharmir [1] formally defined as:

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 46


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
Definition 1 Hamming Weight: The number of non-zero symbols in a series
of symbols [1]. In a binary representation of number, Hamming weight is the
number of 1 bits in the binary series.

Definition 2 OR-ed k-vector: Given a jk matrix, it is the k-vector where


each and every tuple is made up of the result of executing Boolean OR
operation on its analogous j1 column vector [1].

A VCS scheme is a 6-tuple (d, , V, S, m, n). It supposes that each pixel arises
in n versions called shares and each share representing its corresponding
transparency. Each share is a group of m black and white sub-pixels. This new
generated structure can be described by an nm Boolean Matrix S = [Sij] where
Sij = 1, iff the jth sub pixel in the ith share is black. Hence, the grey degree of
the combined share which is obtained by overlapping the transparencies is
proportional to the Hamming Weight H(V) of the OR-ed m-vector V [1]. This
grey degree is usually depicted by the visual system as black if H(V) d and as
white if H(V) < (d m) for some fixed threshold 1 d m and relative
difference > 0 [1]. m is the difference between the minimum H(V) estimate
of a black pixel and the maximum permitted H(V) estimate for a white pixel is
called the contrast of a VCS scheme [1].

VCS Schemes where a subgroup is permitted if and only if its cardinality is k


are called (k, n)-threshold visual cryptography schemes [1]. A formation of (k,
n) - threshold VCS consists of two collections of nm Boolean matrices 0 and
1, each of size r [1]. To produce a white pixel, we randomly choose one of the
matrices in 0 and to produce a black pixel, we randomly choose a matrix in 1
[1]. The chosen matrix will define the color of the m sub-pixels in each one of
the n transparencies [1]. Meantime, the solution will be correct if the following
three conditions are satisfied:

1) For any matrix S in 0, the "OR" operation on any k out of the n rows
satisfies H(V) d m.
2) For any matrix S in 1, the "OR" operation on any k out of the n rows
satisfies H(V) d.
3) For any subset {i1, i2 iq} of {1, 2 ...n} along with q<k, the two
collection of qm matrices Bt obtained by restricting each nm matrix
in t (where t= {0, 1}) to rows i1, i2 iq are indistinguishable in the
sense that they contains exactly the identical matrices with the identical
frequencies [1]. In other words, any qn matrices S0 B0 and S1 B1
are same up to a column permutation.

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 47


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
Condition (1) and (2) defines the contrast of a VCS and condition (3) states the
security property of (k, n)-threshold VCS.

Let us consider an instance of (3, 3)-threshold VCS formation where, each


pixel is divided into 4 sub-pixel (m=4). According to the definition, 0 and 1 are
defined as following:
0 0 1 1
0 = {all matrices obtained by permuting the columns of 0 1 0 1 }
0 1 1 0
1 1 0 0
1 = {all matrices obtained by permuting the columns of 1 0 1 0 }
1 0 0 1
To encode a white pixel, the encoder needs to randomly choose one matrix
from 0 to form the sub-pixels in three shares accordingly. Meantime, to encode
a black pixel the encoder needs to randomly pick one matrix from 1. It is not
so hard to check that this construction will yield a relative contrast of 0.25. The
encoding of a black pixel needs altogether 4 black sub-pixels where a white
pixel needs 3 black sub pixels and 1 white sub-pixel. Consequently, when the
three shares stack together, the result is either dark grey, which we use to
substitute white or completely black, which we use to substitute black. Readers
can verify the security property of (3,3) threshold VCS by taking any two rows
from any S0 B0 and S1 B1 and convince themselves that superposition of any
two transparencies will always result in 3 white sub-pixels and 1 black sub-
pixel.

The construction of arbitrary (k, k) and (k, n)-threshold VCS is out of the scope
of our paper. Therefore we only state the result of such construction.

Theorem 1 In any (k, k)-threshold VCS scheme construction, m 2k-1 and


= 1/2k-1.

Theorem 2 There exists a (k, n)-threshold VCS scheme with m = nk.2k-1 and
= (2e)-k/ 2

Notice that the first theorem states the optimality of (k, k) scheme where the
second theorem only states the existence of a (k, n) VCS with given parameters.
H. C. Hsu et al [2] showed a more optimal (k, n) VCS construction with a
smaller m.

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 48


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

3. GREY SCALE VISUAL CRYPTOGRAPHY


A. Sharing Only One Secret
If pixel is white, one of the upper two rows of Table 1 is selected to produce
Share1 and Share2. In a similar way, if pixel is black, one of the lower two
rows of Table 1 is selected to produce Share1 and Share2. Here pixel p of each
share is encoded into two white and two black pixels. Whether each share alone
is white or black, it does not give hint about the pixel p. Secret image is
displayed only when both shares are overlapped. This encrypting scheme to
share a binary image into two shares Share1 and Share2 is suggested by Naor et
al [1].

Table 1. Scheme for encoding a binary pixel into two shares


Share1
Pixel Probability Share1 Share2 XOR
Share2

50%

50%

50%

50%

To decrypt the concealed messages, embedding images can be overlapped.


Balancing the performance between pixel expansion and contrast Liguo Fang
[3] proposed a (2, n) scheme based on combination. To conceal a binary image
into two meaningful shares Chin-Chen Chang et al [4] recommended spatial-
domain image hiding schemes. These two secret shares are embedded into two
gray level cover images [4]. Threshold visual secret sharing schemes mixed
XOR and OR operation with reversing and based on binary linear error
correcting code was suggested by Xiao-Qing and Tan [5]. The above schemes
have disadvantage that only one set of sensitive messages can be enclosed, so
to share large amounts of sensitive messages a number of shares have to be
produced.

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 49


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
B. Sharing Many Secrets

C. C. Wu et al [6] firstly presented the visual cryptography schemes to share


two secret images into two shares. They concealed two secret binary images
into two arbitrary Shares A and B [6]. The first secret can be obtained by
stacking the two shares and it is denoted by A B. The second secret can be
seen by first rotating A in anti clock wise direction. They designed the rotation
angle to be 90. However, it is easy to obtain that can be 180 or 270. To
control the angle restriction of scheme of C. C. Wu, Hsu et al [2] proposed a
scheme to conceal two secret images into two rectangular share images with
inconsistent rotating angles.
S. J. Shyu et al [7] firstly advised the two or more than two secrets sharing in
visual cryptography. This scheme encrypts a set of n 2 secrets into two
circular shares. The n secrets can be recovered one after another by stacking the
first share and second share is rotated with n discrete rotation angles. To encode
unrestricted shapes of image and to eliminate the restriction of transparencies to
be circular, a reversible visual cryptography scheme is recommended by Fang
[8]. Jen-Bang Feng et al [9] proposed a visual secret sharing scheme for
suppressing multiple secret images into two shares.
Tzung-Her Chen et al [10] proposed the multiple image encryption schemes by
rotating random grids, without any pixel expansion. Jonathan Weir et al [11]
suggested sharing multiple secrets using visual cryptography. A master key is
produced for all the secrets. Correspondingly, secrets are shared using the
master key and multiple shares are obtained. To provide more randomness for
producing the shares, a secret sharing scheme depending on the rotation of
shares is advised by Mustafa Ulutas et al [12]. This scheme produces
rectangular shares, which are designed randomly. Stacking the two shares
reproduces the first secret. After rotating the first share by 90 anticlockwise
and stacking it with the second share regenerates the second secret.

A non-expansion reversible visual secret sharing method that does not need to
define the lookup table was presented by Fang [13]. Zhengxin Fu et al [14]
intended a rotation visual cryptography scheme for encryption of four secrets
into two shares and recovering the reconstructed images without distortions.
Rotation visual cryptography scheme construction was depending correlative
matrices set and random permutation. Above mentioned all the schemes used to
share the black and white secret images. To deal with colorful images
researchers have been worked to share the colorful images.

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 50


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
4. COLORFUL VISUAL CRYPTOGRAPHY SCHEME
A. Sharing Only One Secret

Visual cryptography schemes were applied to only black and white images until
the year 1997. Verheul and Van Tilborg [15] firstly developed colored VCS.
With the concept of arcs colored secret images can be shared. In c-color VCS
single pixel is translated into m sub pixels, and each sub pixel is split into c
color regions. In each sub pixel, only one color region is colored, and all the
other color regions are kept black. The color of one pixel depends on the
combination between the stacked sub pixels. For a colored VCS with c colors,
the pixel expansion m is c 3. Yang and Laih [16] improved the pixel
expansion to c 2.
To share and transmit a secret color image and also to generate the meaningful
share Chang and Tsai proposed color VCS [17]. For a secret color image two
effective color images are chosen as cover images which are the exactly same
size of the secret color image. Then according to a predefined Color Index
Table, the secret color image will be concealed into two disguise images. One
loss of this scheme is that extra space is required to assemble the Color Index
Table.
To deal with this limitation Chin-Chen Chang et al [18] constructed a secret
color image sharing scheme depending on modified visual cryptography. In this
scheme size of the shares is decided; it does not change when the number of
colors appearing in the secret image differs [18]. Although pixel expansion is
set in this scheme, it is not suitable for true-color secret image. To share true-
color image Lukac and Plataniotis [19] proposed bit-level based scheme by
operating directly on bit-planes of a secret image.
S J Shyu [20] suggested a colour VCS for reducing pixel expansion which is a
more efficient coloured Visual secret sharing scheme with pixel expansion of
|log2c*m| where m is the pixel expansion of the exploited binary scheme and c
is the number of colour regions [20]. A cost effective VCS was developed by
Mohsen Heidarinejad et al. [21] by considering colour image transmission over
bandwidth constraint channels. The solution offers perfect reconstruction while
producing shares with size smaller than that of the input image using maximum
distance separable.
F. Liu et al [22] developed a colour visual cryptography scheme under the
visual cryptography model of Naor et al [1] without pixel expansion. In this
scheme, the increase in the number of colours of recovered secret image does

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 51


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
not increase pixel expansion. To increase the speed of encoding Haibo Zhang et
al [23] presented a multi-pixel encoding which can encode unfixed number of
pixels for each run.
B. Sharing Many Secrets

A multi-secrets visual cryptography anticipated by Tzung-Her Chen et al [24]


which is enlarged from traditional visual secret sharing. This scheme can be
used for multiple binary, gray and color secret images with pixel expansion of
4. The codebook of traditional visual secret sharing applied to generate share
images macro block by macro block just as if multiple secret images are
converted into only two share images and decode all the secrets one by one by
superimposing two of share images in a way of shifting [24].
5. APPLICATIONS OF VCS
A secret-Ballot Receipts system depending on (2, 2) - threshold binary VCS
was proposed by Chaum [26]. It produces an encrypted receipt to each voter
which allows verifying the election result. The VCS principle can also be
enforced in transferring important financial documents over Internet. VCRYPT
is a sample of this type of system was presented by Hawkes et al [27].
VCRYPT can encrypt the original drawing document with a certain (k,n) VCS,
then send each of the encrypted n shares independently through Emails to the
recipient.
6. PERFORMANCE OF VISUAL CRYPTOGRAPHY SCHEMES
Several parameters are constructed by researchers to evaluate the performance
of VCS. Naor et al [1] suggested two main parameters: pixel expansion m and
contrast . Pixel expansion m means the number of sub pixels in the generated
shares which represents a pixel of the original input image. Pixel expansion
represents the impairment in resolution from the original picture to the shared
one. Contrast refers to the relative difference in weight between combined
shares that come from a white pixel and a black pixel in the original image.
Jung-San Lee et al [25] suggested security, accuracy and computational
complexity as a performance measures. Security can be defined as if each share
does not give information of the original image. Accuracy is considered to be
the quality of the restructured secret image and evaluated by peak signal-to-
noise ratio (PSNR) measure. Computational complexity is the total number of
operators required both to produce the set of n shares and to reconstruct the
original secret image.

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 52


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
7. CONCLUSION
In this paper, we briefly review the research of visual cryptography schemes as
special cases of secret sharing methods among participants. In visual
cryptography schemes the grey-scale VCS and colorful VCS both are studied
according to the number of shares generated. Interesting applications are also
studied. Further, formulated performance parameters of various visual
cryptography schemes are evaluated.

REFERENCES
[1] Moni Naor and Adi Shamir, Visual Cryptography, advances in cryptology Eurocrypt,
1995, pp 1-12.
[2] H. C. Hsu, T.-S. Chen, Y.-H. Lin, The Ring Shadow Image Technology Of Visual
Cryptography By Applying Diverse Rotating Angles To Hide The Secret Sharing, In
Proceedings of the 2004 IEEE International Conference on Networking, Sensing &
Control, Taipei, Taiwan, March 2004, pp. 9961001.
[3] Liguo Fang, BinYu, Research On Pixel Expansion Of (2, n) Visual Threshold Scheme,
1st International Symposium on Pervasive Computing and Applications, IEEE, 2006, pp.
856-860.
[4] Chin-Chen Chang, Jun-Chou Chuang, Pei-Yu Lin, Sharing A Secret Two-Tone Image In
Two Gray-Level Images, Proceedings of the 11th International Conference on Parallel
and Distributed Systems (ICPADS'05), 2005, pp. 300-304.
[5] Xiao-qing Tan, Two Kinds of Ideal Contrast Visual Cryptography Schemes,
International Conference on Signal Processing Systems, 2009, pp. 450-453.
[6] C.C. Wu, L.H. Chen, A Study On Visual Cryptography, Master Thesis, Institute of
Computer and Information Science, National Chiao Tung University, Taiwan, R.O.C.,
1998.
[7] S. J. Shyu, S. Y. Huanga,Y. K. Lee, R. Z. Wang, and K. Chen, Sharing multiple secrets
in visual cryptography, Pattern Recognition, Vol. 40, Issue 12 , 2007, pp. 3633 - 3651.
[8] Wen-Pinn Fang, Visual Cryptography In Reversible Style, IEEE Proceeding on the
Third International Conference on Intelligent Information Hiding and Multimedia Signal
Processing (IIHMSP2007), Kaohsiung, Taiwan, R.O.C, 2007.
[9] Jen-Bang Feng, Hsien-Chu Wu, Chwei-Shyong Tsai, Ya-Fen Chang, Yen Ping Chu,
Visual Secret Sharing For Multiple Secrets, Pattern Recognition Vol. 41, 2008, pp.
3572 3581.
[10] Tzung-Her Chen, Kai-Hsiang Tsao, and Kuo-Chen Wei, Multiple Image Encryption By
Rotating Random Grids, Eighth International Conference on Intelligent Systems Design
and Applications, 2008, pp. 252-256.
[11] Jonathan Weir, WeiQi Yan, Sharing Multiple Secrets Using Visual Cryptography, IEEE,
2009, pp 509-512,.
[12] Mustafa Ulutas, Rfat Yazc, Vasif V. Nabiyev, Gzin Ulutas, (2, 2) - Secret Sharing
Scheme With Improved Share Randomness, IEEE, 2008.
[13] Wen-Pinn Fang, Non-Expansion Visual Secret Sharing in Reversible Style, IJCSNS
International Journal of Computer Science and Network Security, VOL.9 No.2, February
2009, pp.204-208.

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 53


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
[14] Zhengxin Fu, Bin Yu, Research on Rotation Visual Cryptography Scheme,
International Symposium on Information Engineering and Electronic Commerce, 2009,
pp 533-536.
[15] E. Verheul and H. V. Tilborg,Constructions And Properties Of K Out Of N Visual Secret
Sharing Schemes. Designs, Codes and Cryptography, 11(2), 1997, pp.179196.
[16] C. Yang and C. Laih, New Colored Visual Secret Sharing Schemes, Designs, Codes and
cryptography, 20, 2000, pp. 325335.
[17] C. Chang, C. Tsai, and T. Chen. A New Scheme For Sharing Secret Color Images In
Computer Network, Proceedings of International Conference on Parallel and Distributed
Systems, July 2000, pp. 2127.
[18] Chin-Chen Chang, Tai-Xing Yu, Sharing A Secret Gray Image In Multiple Images,
Proceedings of the First International Symposium on Cyber Worlds (CW.02), 2002.
[19] R. Lukac, K.N. Plataniotis, Bit-Level Based Secret Sharing For Image Encryption,
Pattern Recognition 38 (5), 2005, pp. 767772.
[20] S.J. Shyu, Efficient Visual Secret Sharing Scheme For Color Images, Pattern
Recognition 39 (5), pp. 866 880, 2006.
[21] Mohsen Heidarinejad, Amirhossein Alamdar Yazdi and Konstantinos N, Plataniotis
Algebraic Visual Cryptography Scheme For Color Images, ICASSP, 2008, pp. 1761-
1764.
[22] F. Liu1, C.K. Wu X.J. Lin, Colour Visual Cryptography Schemes, IET Information
Security, vol. 2, No. 4, 2008, pp. 151-165.
[23] Haibo Zhang, Xiaofei Wang, Wanhua Cao, Youpeng Huang , Visual Cryptography For
General Access Structure By Multi-Pixel Encoding With Variable Block Size,
International Symposium on Knowledge Acquisition and Modeling, 2008, pp. 340-344.
[24] Tzung-Her Chen, Kai-Hsiang Tsao, and Kuo-Chen Wei, Multi-Secrets Visual Secret
Sharing, Proceedings of APCC2008, IEICE, 2008.
[25] Jung-San Lee, T. Hoang Ngan Le, Hybrid (2, N) Visual Secret Sharing Scheme For
Color Images, 978-1- 4244-4568-4/09, IEEE, 2009.
[26] D Chaum, Secret-ballot receipts: True voter-verifiable elections, IEEE Security and
Privacy, 2004, pp.38-47.
[27] W. Hawkes, A. Yasinsac, C. Cline, An Application of Visual Cryptography to Financial
Documents, technical report TR001001, Florida State University (2000).

This paper may be cited as:


Soradge, N. and Thakare, K. S., 2014. A Review on Various Visual
Cryptography Schemes. International Journal of Computer Science and
Business Informatics, Vol. 12, No. 1, pp. 45-54.

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 54


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Web Page Access Prediction based


on an Integrated Approach
Phyu Thwe
Faculty of Information and Communication Technology,
University of Technology (Yatanarpon Cyber City),
Pyin Oo Lwin, Myanmar

ABSTRACT
Predicting the user's web page access is a challenging task that is continuing to gain
importance as the web. Understanding users' next page access helps in formulating
guidelines for web site personalization. Server side log files provide information that
enables to build the user sessions within the web site, where a user view a session consists
of a sequence of web pages within a given time. A web navigation behavior is helpful in
understanding what information of online users demand. In this paper, we present the
system that focuses on the improvements of predicting web page access. We proposed to
use clustering techniques to cluster the web log data sets. As a result, a more accurate
Markov model is built based on each group rather than the whole data sets. Markov models
are commonly used in the prediction of the next page access based on the previously
accessed pages. Then, we use popularity and similarity based-page rank algorithm to make
prediction when the ambiguous results are found. Page Rank represents how important a
page is on the web. When one page links to another page, it is a vote for the other page. The
more votes for a page, the more important the page must be.
Keywords
Web Log Mining, Web Page Access Prediction, K-means Clustering, Markov Model, Page
Rank Algorithm.

1. INTRODUCTION
As Internet is becoming an important part of our life, the quality of the information
is more considered and how it is displayed to the user. The research area of this
work is web data analysis and methods how to process this data. This knowledge
can be extracted by collecting web servers data log files, where all users
navigational patterns about browsing are recorded. Server side log files provide
information that enables to rebuild the user sessions within the particular web site,
where a user view a session consists of a sequence of web pages within a given
time. A web navigation behavior is helpful in understanding what information of
online users demand. Following that, the analyzed results can be seen as
knowledge to be used in intelligent online applications, refining web site maps,
web based personalization system and improving searching accuracy when seeking
information. However, an online navigation behavior grows each passing day, and
thus extracting information intelligently from it is a difficult issue. Web Usage
Mining (WUM) is the process of extracting knowledge from Web users' access

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 55


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
data by using Data Mining technologies. It can be used for different purposes such
as personalization, business intelligence, system improvement and site
modification.

In this paper, we present the system that focuses on the improvements of predicting
web page access. Data preprocessing is the process to convert the raw data into the
data abstraction necessary for further applying the data mining algorithm. We
proposed to use clustering techniques to cluster the data sets so that homogenous
sessions are grouped together. As a result, a more accurate Markov model is built
based on each group rather than the whole data sets. The proposed Markov model
is low order Markov model so that the state space complexity is kept to a
minimum. The accuracy of low order Markov model is normally not satisfactory.
Therefore, we use popularity and similarity based-page rank algorithm to make
prediction when the ambiguous results are found.
The rest of this paper is organized as follows: Section 2 describes the theory
background about preprocessing technique, Markov Model and Page Rank
Algorithm. In section 3, we review some researches that advance in web page
access prediction. Section 4 describes the proposed method for the predicting of
web page access in web log file. Results of an experimental evaluation are reported
in section 5. Finally, section 6 summarizes the paper.

2. BACKGROUND STUDY
2.1 Preprocessing Technique
2.1.1 Data Cleaning
This step is to remove all the data useless for data analyzing and mining e.g.
requests for graphical page content (e.g., jpg and gif images); requests for any other
file which might be included into a web page; or even navigation sessions
performed by robots and web spiders. The quality of the final results strongly
depends on cleaning process. Appropriate cleaning of the data set has profound
effects on the performance of web usage mining. The discovered associations or
reported statistics are only useful if the data represented in the server log gives an
accurate picture of the user accesses to the Web site.

The procedures of general data cleaning are as follows:


Firstly, it should remove entries that have status of error or failure. Its to
remove the noisy data from the data set. To accomplish it is quite easy.
Secondly, some access records generated by automatic search engine agent
should be identified and removed from the access log. Primarily, it should identify
log entries created by so-called crawlers or spiders that are used widely in Web
search engine tools. Such data offer nothing to the analyzing of user navigation
behaviors [6].

2.1.2 User Identification


There are some heuristics for user identification [6]. The first heuristic states two
accesses having the same IP but different browser or operation system, which are
both recorded in agent field, are originated from two different users. This heuristic

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 56


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
is that a user, when navigating the web site, rarely employs more than one browser,
much more than one OS. But this method will render confusion when a visitor
actually does like that. The second heuristic states that when a web page requested
is not reachable by a hyperlink from any previously visited pages, there is another
user with the same IP address. But such method introduces the similar confusion
when user types URL directly or uses bookmark to reach pages not connected via
links.

2.1.3 Session Identification


To define user session, two criteria are usually considered [6]:
1. Upper limit of the session duration as a whole;
2. Upper limit on the time spent visiting a page.
Generally the second method is more practical and has been used in Web Usage. It
is achieved through a timeout, where if the time between page requests exceeds a
certain limit, it is assumed that the user is starting a new session. Many commercial
products use 30 minutes as a default timeout. Once a site log has been analyzed and
usage statistics obtained, a timeout that is appropriate for the specific Web site can
be fed back into the session identification algorithm.

2.2 Markov Model


The 1st-order Markov models (Markov Chains) provide a simple way to capture
sequential dependence [14, 15, 16], but they do not take into consideration the
long-term memory aspects of web surfing behaviour since they are based on the
assumption that the next state to be visited is only a function of the current one.
Higher-order Markov models are more accurate for predicting navigational paths.
But, there exists a trade-off between improved coverage and exponential increase
in state space complexity as the order increases. Moreover, such complex models
often require inordinate amounts of training data, and the increase in the number of
states may even have worse prediction accuracy and can significantly limit their
applicability for applications requiring fast predictions, such as web
personalization. There have also been proposed some mixture models that combine
Markov models of different orders. However, such models require much more
resources in terms of preprocessing and training. Therefore, it is evident that the
final choice that should be made concerning the kind of model that is to be used,
depends on the trade-off between the required prediction accuracy and model's
complexity/size.
2.3 Page Rank Algorithm
Page Rank is used to determine the importance of the page on the web. Surgey Brin
and Larry Page [13] proposed a ranking algorithm named Page Rank (PR) that uses
the link structure of the web to determine the importance of web pages. According
to this algorithm, if a page has important links to it, then its links to other pages
also become important. Therefore, it takes back links into account and propagates
the ranking through links. In Page Rank, the rank score of a page is equally divided
among its outgoing links and that values of outgoing links are in turn used to
calculate the ranks of pages pointed by that page.

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 57


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
Page Rank [11] is the most popular link analysis algorithm, used broadly for
assigning numerical weightings to web documents and utilized from web
search engines in order to rank the retrieved results. The algorithm models
the behaviour of a random surfer, who either chooses an outgoing link from
the page hes currently at, or jumps to a random page after a few clicks.
The Web is treated as a directed graph G = (V, E), where V is the set of
vertices or nodes, i.e., the set of all pages, and E is the set of directed edges
in the graph, i.e., hyperlinks. In page rank calculation, especially for larger
systems, iterative calculation method is used. In this method, the calculation
is implemented with cycles. In the first cycle all rank values may be
assigned to a constant value such as 1, and with each iteration of calculation,
the rank value become normalized within approximately 50 iterations under
= 0.85.
RELATED WORKS
In recent years, there has been an increasing number of research works done with
regard to web usage mining. They [1] describe a prediction system to predict the
future occurrence of an event that is a prediction system based on fuzzy logic. A
subtractive clustering based fuzzy system identification method is used to
successfully model a general prediction system that can predict future events by
taking samples of past events. Historical data is obtained and is used to train the
prediction system. Recent data are given as input to the prediction system. All data
are specific to the application at hand. The system, that is developed using Java, is
tested in one of the many areas where prediction plays an important role, the stock
market. Prices of previous sessions of the market are taken as the potential inputs.
When recent data are given to the trained system, it predicts the possibility of a rise
or a fall along with the next possible value of data.

The prediction models [2] that are based on web log data that corresponds with
users behavior. They are used to make prediction for the general user and are not
based on the data for a particular client. This prediction requires the discovery of a
web users sequential access patterns and using these patterns to make predictions
of users future access. They will then incorporate these predictions into the web
prefetching system in an attempt to enhance the performance.

In [3], it is proposed to integrate Markov model based sequential pattern mining


with clustering. A variant of Markov model called dynamic support pruned all kth
order Markov model is proposed in order to reduce state space complexity. Mining
the web access log of users of similar interest provides good recommendation
accuracy. Hence, the proposed model provides accurate recommendations with
reduced state space complexity.

An Efficient Hybrid Successive Markov Prediction Model (HSMP) is introduced in


[4]. The HSMP model is initially predicts the possible wanted categories using
Relevance factor, which can be used to infer the users browsing behavior between
web categories. Then predict the pages in predicted categories using techniques for
intelligently combining different order Markov models so that the resulting model

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 58


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
has low state complexity, improved prediction accuracy and retains the coverage of
the all higher order Markov model.

In [5], they propose the use of CRFs(Conditional Random Fields) in the field of
Web page prediction. They treat the previous Web users' access sessions as
observation sequences and label these observation sequences to get the
corresponding label sequences, then they use CRFs to train a prediction model
based on these observation and label sequences and predict the probable
subsequent Web pages for the current users.

3. PROPOSED SYSTEM ARCHITECTURE


The processing steps of the system have three main phases. Preprocessing is
performed in the first phase. The second phase is clustering web sessions using K-
means clustering. In the final phase, Markov model is used to predict next page
access based on resulting web sessions. The popularity and similarity-based page
rank algorithm is used to decide the most relevant answer if the ambiguous result is
found in Markov model prediction. The input of the proposed system is a web log
file. A web log is a file to which the web server writes information each time a user
requests a resource from that particular site.

Figure 1. Proposed System Architecture


3.1 Web Server Log
A Web log file [12] records activity information when a Web user submits a
request to a Web Server. The main source of raw data is the web access log which
we shall refer to as log file. As log files are originally meant for debugging
purposes. A log file can be located in three different places: i) Web Servers, ii)
Web proxy Servers, and iii) Client browsers. Server-side logs: These logs generally

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 59


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
supply the most complete and accurate usage data, but their two major drawbacks
are:
These logs contain sensitive, personal information, therefore the server
owners usually keep them closed.
The logs do not record cached pages visited. The cached pages are
summoned from local storage of browsers or proxy servers, not from web
servers.

NASA web server log file is considered for the purpose of analysis. Web Server
logs are plain text (ASCII) files, that is independent from the server platform.
There are some distinctions between server software, but traditionally there are
four types of server logs: Transfer Log, Agent Log, Error Log and Referrer Log.
The first two types of log files are standard. The Referrer and Agent Logs may or
may not be turned on at the server or may be added to the Transfer log file to
create an Extended Log File format. A Web log is a file to which the Web server
writes information each time a user requests a resource from that particular site.
Most logs use the format of the common log format.

3.2 Popularity and Similarity based Page Rank Algorithm


Popularity and Similarity-based Page Rank (PSPR) calculation simply depends on
the duration values of pages and transitions, the frequency value of pages and
transitions, their web page file size and similarity of web page [19]. The popularity
value of page rank was discussed in [17]. Popularity defines in two dimensions.
They are page dimension and transition dimension. For both dimensions,
popularity defines in terms of time user spends on page, size of page and visit
frequency of page. Page popularity is needed for calculating random surfer
jumping behaviour of the user and transition popularity is needed for calculating
the normal navigating behaviour of the user.

Similarity of web page is important to predict next page access because million of
users generally access the similar web page in a particular Web site. The
calculation of the similarity is based on web page URL. The content of pages is not
considered and this calculation does not need for making a tree structure of the
Web site. For example, suppose /shuttle/missions/sts-73/mission-sts-73.html and
/shuttle/missions/sts-71/mission-sts-71.html are two requested pages in web log.
These two URLs are stored in string array by dividing "/" character. And then, we
compute the length of the two arrays and give weight to the longer array: the last
room of the array is given weight 1, the second to the last room of the array is
given weight 2, the third to given weight 3 and so on and so forth, until the first
room of the array is given higher length of the array. The similarity between two
strings is defined as the sum of the weight of those matching substrings divided by
the sum of the total weights.
This similarity measurement includes:
(1) 0 <= SURLji <= 1, i.e. the similarity of any pair of web pages is between 0.0
and 1.0;

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 60


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
(2) SURLji = 0, when the two web pages are totally different;
(3) SURLji = 1, when the two web pages are exactly same


w j i (d j i / si ) SURL j i
PSPRi PSPR j
P j In ( Pi )
w j k max( d m n / sn ) SURL j k
Pk Out ( P j ) Pk Out ( P j ) (1)
wi (di / si )
(1 )
w j max( d m / sm )
P j WS

In the equation 1, is a damping factor and usually = 0.85. In(p i) is the set that
keeps the in-links of that page. Out(pj) is the set of pages that point to pj. wji is the
number of times pages j and i appear consecutively in all user sessions. dji is the
duration of the transaction and si is the size of the transition's result page. WS is the
web session. SURLji is the similarity of web page j to page i.
w j i (d j i / si )
is the transition popularity based on transition
j k
w
Pk Out ( P j )
max( d mn / sn )

frequency and duration. SURL j i is the similarity calculation between


SURL j k
Pk Ou t ( P j )

web pages. wi is the frequency calculation for page i. (di / si ) is the


j
w
P j WS
max( d m / sm )

average duration calculation for page i. The popularity of page is calculated based
on page frequency and average duration of page.

By using this equation, the popularity and similarity-based page rank (PSPR) for
every page can be calculated. In order to make rank calculations faster, the required
steps of our calculations are stored in the database. The step values related to rank
calculations are, average duration value of pages, average duration values of
transitions, page size, frequency value of pages, frequency value of transitions, the
similarity value of pages. The result can be used for ambiguous result found in
Markov model to make the correct decision.

5. EXPERIMENTAL EVALUATION
This paper introduces a method that integrates k-means clustering, Markov model
and popularity and similarity-based page rank algorithm in order to improve the
Web page prediction accuracy. In this section, we present experimental results to
evaluate the performance of our system. Overall our experiment has verified the
effectiveness of our proposed techniques in web page access prediction based on a
particular website.

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 61


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

For our experiments, we used NASA web server data sets. We obtained the web
logs in August, 1995 and used the web logs from 01/Aug/1995 to 15/Aug/1995 as
the training data set. For the first testing data set (D1), the web logs from
16/Aug/1995 are used. For the second testing data set (D2), the web logs from
16/Aug/1995 to 17/Aug/1995 are used. For the third testing data set (D3), the web
logs from 16/Aug/1995 to 18/Aug/1995 are used. For the fourth testing data set
(D4), the web logs from 16/Aug/1995 to 19/Aug/1995 are used. For the fifth
testing data set (D5), the web logs from 16/Aug/1995 to 20/Aug/1995 are used. For
the sixth testing data set (D6), the web logs from 16/Aug/1995 to 21/Aug/1995 are
used. We filtered the records (such as *.jpg, *.gif, *.jpeg) and only reserved the hits
requesting web pages. When identifying user sessions, we set the session timeout
to 30 minutes, with a minimum of 10 pageviews per session. After filtering out the
web session data by preprocessing, the training data set contained 94307 records
and 5574 sessions. Table 1 show the data after processing the preprocessing phase.
Table 1. Testing data set after preprocessing
D1 D2 D3 D4 D5 D6
Records after 7965 17804 27400 33054 39006 50019
preprocessing
Sessions 346 736 1124 1376 1617 2072

In comparing the predictions with the real page visits, we use two similarity
algorithms that are commonly preferred for finding similarities of two sets. The
first one is called OSim [11, 17, 18] algorithm, which calculates the similarity of
two sets without considering the ordering of the elements in the two sets between A
and B and is defined as:
A B
OSim( A, B) (2)
n
As the second similarity metric we use KSim similarity algorithm, which concerns
Kendall Tau Distance [11, 17, 18] for measuring the similarity of next page
prediction set produced by training data set and real page visit set on the test data.
Kendall Tau Distance is the number of pairwise incompatibility between two sets.

(u, v) : r1 ' , r2 ' havesameorderingof (u, v), u v (3)


KSim (r1 , r2 )
A B ( A B 1)
Where, r1' is an extension of r1, containing all elements included in r2 but not r1 at
the end of the list (r2' is defined analogously). In our experiment setup, we make
experiment with top-3 comparison that are measured by KSim and OSim methods.
The results of the experiment for the next page prediction accuracy for popularity
and similarity-based page ranking algorithm under KSim and OSim similarity are
given in Table 2.
As depicted in Table 2, PSPR based on 2nd order Markov model prediction
outperforms PSPR based on 1st order Markov model significantly in all OSim and
KSim values in the top-3 prediction. Therefore, we can confirm that popularity and

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 62


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
similarity-based page rank depend on 2nd order Markov model can improve the
accuracy of Web page prediction.
Table 2. Top 3 prediction based on Average OSim and KSim value
OSim KSim OSim KSim
Results Results Results Results
based on 1st based on based on based on
order 1st order 2nd order 2nd order
Markov Markov Markov Markov
Model (%) Model (%) Model (%) Model (%)
D1 35.59 53.04 41.46 54.49
D2 38.68 55.92 48.22 59.23
D3 38.08 55.46 49.6 61.1
D4 38.39 55.66 50.22 61.94
D5 39.5 56.6 50.84 62.44
D6 40.53 57.33 51.53 63.2

6. CONCLUSIONS
The method presented in this paper is to improve the Web page access
prediction accuracy by integrating all three algorithm K-means Clustering,
Markov Model and Popularity and Similarity-based Page Rank algorithm.
OSim and KSim algorithm are used to calculate the similarity of our
prediction. In our experiment, we observed that in both cases PSPR based
on 2nd order Markov Model are more than promising PSPR based on 1st
order Markov Model in terms of accuracy (OSim and KSim). Higher order
Markov model result in better prediction accuracy since they look at
previous browsing history. We used the idea of Page Rank algorithm to
improve the prediction accuracy and modified this algorithm in order to
analyze the user behavior.
7. ACKNOWLEDGMENTS
My Sincere thanks to my supervisor Dr. Ei Ei Chaw, Faculty of Information and
Communication Technology, University of Technology (Yatanarpon Cyber City),
Myanmar for providing me an opportunity to do my research work. I express my
thanks to my Institution namely University of Technology (Yatanarpon Cyber
City) for providing me with a good environment and facilities like internet, books,
computers and all that as my source to complete this research. My heart-felt thanks
to my family, friends and colleagues who have helped me for the completion of
this work.

REFERENCES
[1] Vaidehi .V, Monica .S, Mohamed Sheik Safeer .S, Deepika .M, Sangeetha .S, "A
Prediction System Based on Fuzzy Logic", Proceedings of the World Congress on
Engineering and Computer Science 2008, WCECS 2008, October 22 - 24, 2008, San
Francisco, USA.
[2] Siriporn Chimphlee, Naomie Salim, Mohd Salihin Bin Ngadiman, Witcha Chimphlee,
Surat Srinoy, "Rough Sets Clustering and Markov model for Web Access Prediction",
Proceedings of the Postgraduate Annual Research Seminar 2006, pp. 470-475.

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 63


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
[3] A. Anitha, "A New Web Usage Mining Approach for Next Page Access Prediction",
International Journal of Computer Applications .Vol. 8, No.11, October 2010.
[4] V.V.R.Maheswara Rao, Dr. V. Valli Kumari, "An Efficient Hybrid Successive Markov
Model for Predicting Web User Usage Behavior using Web Usage Mining".
[5] Yong Zhen Guo and Kotagiri Ramamohanarao and Laurence A. F. Park, "Web Page
Prediction Based on Conditional Random Fields.
[6] Ke Yiping, "A Survey on Preprocessing Techniques in Web Usage Mining", The Hong
Kong University of Science and Technology, Dec 2003.
[7] S. Brin, L. Page, 1998. "The anatomy of a large-scale hypertextual Web search
engine", Computer Networks, Vol. 30, No. 1-7, pp. 107-117, Proc. of WWW7
Conference.
[8] F.Khalil, J. Li and H. Wang, 2007. "Integrating markov model with clustering for
predicting web page accesses". Proceedings of the 13th Australasian World Wide Web
Conference (AusWeb 2007), June 30-July 4, Coffs Harbor, Australia, pp. 1-26.
[9] M. Deshpande and G. Karypis. May 2004. Selective markov models for predicting web
page accesses. ACM Trans. Internet Technol., Vol. 4, pp. 163-184.
[10] M. Eirinaki, M. Vazirgiannis, D. Kapogiannis, Web Path Recommendations based on
Page Ranking and Markov Models, WIDM05, November 5, 2005, Bremen, Germany.
[11] M. Eirinaki and M. Vazirgiannis. Nov. 2005. "Usage-based pagerank for web
personalization". In Data Mining, Fifth IEEE International Conference on, pp. 8.
[12] K. R. Suneetha, Dr. R. Krishnamoorthi, "Identifying User Behavior by Analyzing Web
Server Access Log File", IJCSNS International Journal of Computer Science and
Network Security, Vol. 9 No. 4, April 2009.
[13] J. Zhu, "Using Markov Chains for Structural Link Prediction in Adaptive Web Sites"
[14] M. Vazirgiannis, D. Drosos, P. Senellart, A. Vlachou, "Web Page Rank Prediction with
Markov Models", April 21-25, 2008 Beijing, China.
[15] M. Eirinaki, M. Vazirgiannis, D. Kapogiannis, "Web Path Recommendations based on
Page Ranking and Markov Models", WIDM05, November 5, 2005, Bremen, Germany
[16] R. Khanchana, Dr. M. Punithavalli, "Web Page Prediction for Web Personalization: A
Review", Global Journal of Computer Science and Technology, Vol. 11, No. 7, 2011.
[17] Y. Z. Guo, K. Ramamohanarao, and L. Park. Nov. 2007. "Personalized pagerank for
web page prediction based on access time-length and frequency". In Web Intelligence,
IEEE/WIC/ACM International Conference, pp. 687-690.
[18] B. D. Gunel, P. Senkul, "Investigating the Effect of Duration, Page Size and Frequency
on Next Page Recommendation with Page Rank Algorithm", ACM, 2011.
[19] P. Thwe, "Proposed Approach for Web Page Access Prediction Using Popularity and
Similarity based Page Rank Algorithm", International Journal of Science and
Technology Research, Vol. 2, No. 3, March 2013.

This paper may be cited as:


Thwe, P. 2014. Web Page Access Prediction based on an Integrated
Approach. International Journal of Computer Science and Business
Informatics, Vol. 12, No. 1, pp. 55-64.

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 64


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

A Survey on Bi-Clustering and its


Applications
K. Sathish Kumar
Assistant Professor in Information Technology
Gobi Arts & Science College (Autonomous)
Gobichettipalayam

M. Ramalingam
Assistant Professor in Information Technology
Gobi Arts & Science College (Autonomous)
Gobichettipalayam

Dr. V. Thiagarasu
Associative Professor of Computer science
Gobi Arts & Science College (Autonomous)
Gobichettipalayam

ABSTRACT
Biclustering is the process of immediate taking apart of the set of samples and the set of
their attributes into classes. Samples and attributes are classied together which is believed
to have a high importance to each other. Though, the outcome of the application of classic
clustering methods to genes is limited. These limited results are forced by the survival of a
number of investigational conditions where the activity of genes is not correlated. For this
purpose, a number of algorithms that perform real-time clustering of the expression matrix
have been proposed. In this survey, analysis about the most widely used biclustering
techniques and their associated applications regarding various fields. This survey presents
an study of several biclustering algorithms proposed by various authors to deals with the
gene expression data efficiently. The existing algorithms are analyzed thoroughly to
identify their advantages and limitations. The performance evaluation of the existing
algorithms is carried out to determine the best approach. Then, in order to improve the
performance of the best approach, a novel approach is been proposed in this paper.
Keywords
Biclustering, simultaneous clustering, co-clustering, Data Mining, Gene Expression Data,
Gene Selection.

INTRODUCTION
Analyzing variations in expression levels of genes under different
conditions (samples) is significant to recognize the basic complex biological
processes that the genes take part in. In gene expression data analysis,

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 65


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
expression levels of genes in each sample are characterized by a real-valued
data matrix with rows and columns representing the genes and the samples,
correspondingly. The objective is to identify genes that have correlated
expression values in a variety of samples [1].
Gene expression matrices have been widely investigated in two dimensions,
that is, the gene dimension and the condition dimension. This corresponds to
the [2]:
Investigation of expression patterns of genes by comparing rows in
the matrix.
Investigation of expression patterns of samples by comparing
columns in the matrix.
However, applying clustering algorithms to gene expression data runs into
an important complexity. Numerous activation patterns are familiar to a
group of genes only under definite experimental conditions [3]. It is then
highly enviable to move further than the clustering model [4].
This paper proceeds as follows. In the next section, the background study is
described. Section 3 describes related works in this field, etc.
1. SURVEY
Biclustering, which has been applied intensively in molecular biology
explore in recent times, gives a structure for identifying hidden
substructures in large high dimensional matrices. Tanay et al. [5] said that a
bicluster as a subset of genes that together react upon a subset of conditions.
Biclustering algorithms might have two different objectives; one is to find
one bicluster or to identify a given number of biclusters.
Cheng and Churchs Algorithm (CC) [6] describe an underneath a user-
defined threshold. In order to identify the largest -bicluster in the data,
they recommend a twophase approach: first, rows and columns are removed
from the original expression matrix until the above limitation is satisfied.
Afterward, previously deleted rows and columns are added to the resulting
submatrix as long as the bicluster score does not exceed. This process is
iterated numerous times where biclusters are covered with random values
previously.
In [7] an improved form of CC algorithm were proposed in which avoids the
problem of random interference caused by covered biclusters. Samba [8]
introduced a graph-theoretic methodology to biclustering in grouping with a
statistical data model. In this structure, the expression matrix is modeled as
a bipartite graph, a bicluster is defined as a subgraph, and a likelihood score
is used in order to assess the importance of observed subgraphs. A related
heuristic algorithm called Samba aims at identifying highly important and

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 66


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
distinctive biclusters. In a recent investigation, this approach has been
extended to integrate multiple types of experimental data.
In [9], Order Preserving Submatrix Algorithm (OPSM) is a bicluster defined
as a sub matrix that conserves the order of chosen columns for all of the
selected rows. Also, it can be said that, the expression values of the genes
inside a bicluster persuade a matching linear ordering across the selected
samples. Based on a stochastical model, the authors implemented a
deterministic algorithm to discover large and related important biclusters.
This idea has been taken up in a recent investigation by [10].
Tang et al. [11] proposed the Interrelated Two-Way Clustering (ITWC)
algorithms that come together the results of the data matrix to generate
biclusters. After normalizing the rows of the data matrix, they calculate the
vector-angle cosine value between each row and a pre-dened steady pattern
to test the row values vary much among the columns and remove the ones
with little variation. After that they utilize a correlation coefcient as
similarity measure to measure the strength of the linear relationship between
two rows or two columns, to carry out two-way clustering. As this similarity
measure based on the pattern and not on the absolute magnitude of the
spatial vector, it also allows the identication of biclusters by means of
coherent values represented by the multiplicative model.
The worst-case running-time complexity of BiMax for matrices comprising
disjoint biclusters is O (nmb) and meant for arbitrary matrices is of order O
(nmb min {n, m}) Noureen and Qadir [12]. In [13] It main goal is to find
market segments between tourists who are similar to each other, therefore
allowing a targeted marketing mix to be flourished. In general data used to
segment tourists are illustrated. Small samples and many questions give rise
to serious methodological problems that have usually been addressed by
means of factor-cluster analysis to reduce the dimensionality of data.
In [14] The technique is depends on a force-directed graph where biclusters
are represented as feasible overlapped groups of genes and conditions. In
[15] introduced an expression pattern based biclustering approach, CoBi, for
combining both positively and negatively keeping up genes from microarray
expression data. Regulation pattern and resemblance in degree of fluctuation
are accounted for as computing likeness among two genes. Unlike
conventional biclustering approaches, which utilize greedy iterative
approaches, it uses a BiClust tree that requires single pass over the entire
dataset to find a set of biologically related biclusters. Biclusters determined
from various gene expression datasets by the technique show highly
improved functional categories. MSBE Biclustering algorithm [16] and the
threshold of the average relationship score is a user input factor to allow the
user to control the excellence of the biclustering results.

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 67


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
In [17], a fuzzy biclustering technique is introduced, it is based on
formulating the one-way clustering along the row and column dimension as
a normalized graph cut problem. The graph cut problem is after that solved
by a spectral decomposition, followed by K-mean clustering of the
eigenvectors. The biclustering of the row and column dimensions is
accomplished by a three-stage procedure. Initially, the original data matrix
experiences one-way clustering in the row dimension to gain k clusters.
After that, a novel pattern matrix where each row is specified by the average
number of rows that belong to the same cluster in the original data matrix is
calculated. Again, the new data matrix then experience the same one-way
clustering in the column dimension to obtain k clusters. Lastly, a table of
fuzzy relation coefficients that share each of the k row clusters to each of the
k column clusters are worked out. By calculating the new data matrix by
means of the result of the initial stage clustering, the fuzzy biclustering
algorithm attains a biclustering of the original data matrix.

2. PROBLEMS AND DIRECTIONS


Clustering the microarray data is based on user defined threshold value, this
influence the quality of biclusters produced. This value will affect the
quality of biclusters by missing some of the genes or by including a number
of unnecessary genes. Once a bicluster is produced, their entries are
replaced by random numbers, avoiding the identification of overlapping
biclusters. Problem of finding the minimum set of biclusters to cover all the
elements in a data matrix is extremely tough.
4. MOTIVATION
Biclustering is a significant approach in microarray data analysis. The
primary basis for using bi-clustering in the analysis of gene expression data
are (1) similar genes might show similar behaviors, not all conditions, (2)
genes may contribute in more than one purpose, resulting in one regulation
model. By biclustering algorithms, one may perhaps attain sets of genes that
are co-regulated under subsets of state of affairs. Some of the biclustering
algorithm is based on the user defined threshold value, this influences the
quality of biclusters produced. To prevail over these problems, build up an
algorithm to discover an optimal bicluster with threshold value rather than
user defined threshold.

3. CONCLUSIONS
A complete survey of the models, methods, and applications developed in
the field of biclustering algorithms are investigated and analyzed. The list of
applications presented is by no means comprehensive, and all-inclusive list
of potential applications would be prohibitively extended. The list of
accessible algorithms is also very composite, and many combinations of
thoughts can be personalized to obtain new algorithms potentially more

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 68


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
effectual in exacting applications. The modification and validation of
biclustering methods by comparison with known biological data is surely
one of the most important open issues. Another motivating region is the
application of robust biclustering techniques to new and existing application
domains.

REFERENCES
[1] Doruk Bozdag, Ashwin S. Kumar and Umit V. Catalyurek, Comparative Analysis of
Biclustering Algorithms, 2010.
[2] Sara C. Madeira and Arlindo L. Oliveira, Biclustering Algorithms for Biological Data
Analysis: A Survey, INESC-ID TEC. REP. 1/2004, JAN 2004.
[3] Rakesh Agrawal, Johannes Gehrke, Dimitrios Gunopulus, and Prabhakar Raghavan.
Automatic subspace clustering of high dimensional data for data mining applications.
In Proceedings of the ACM/SIGMOD International Conference on Management of
Data, pp. 94105, 1998.
[4] Amir Ben-Dor, Benny Chor, Richard Karp, and Zohar Yakhini. Discovering local
structure in gene expression data: The orderpreserving submatrix problem. In
Proceedings of the 6th International Conference on Computacional Biology
(RECOMB02), pp. 4957, 2002.
[5] A. Tanay, R. Sharan and R. Shamir,Discovering statistically significant biclusters in
gene expression data. Bioinformatics, Vol. 18, pp. 136-144, 2002.
[6] Y. Cheng and G. M. Church. Biclustering of expression data. In Proc. of the
International Conference on Intelligent Systems for Molecular Biology, pp. 93103,
2000.
[7] Yang, J., Wang, H., Wang, W., Yu, P.S., (2003) Enhanced Biclustering on Expression
Data. BIBE 2003, pp. 321-327.
[8] Tanay, A., Sharan, R., Kupiec, M., Shamir, R., (2004) Revealing Modularity and
Organization in the Yeast Molecular Network by Integrated Analysis of Highly
Heterogeneous Genomewide Data, PNAS, pp. 101-9, 2981-2986.
[9] Ben-Dor, A., Chor, B., Karp, R., Yakhini, Z., (2002) Discovering Local Structure in
Gene Expression Data: The Order-Preserving Sub-Matrix Problem, Proceedings of the
6th Annual International Conference on Computational Biology, pp. 49-57.
[10] Liu, J., Wang, W., (2003) OP-Clusters: Clustering by tendency in high dimensional
space, Proceedings of the 3rd IEEE International Conference on Data Mining (ICDM),
pp. 187-194.
[11] Chun Tang, Li Zhang, Idon Zhang, and Murali Ramanathan. Interrelated two-way
clustering: an unsupervised approach for gene expression data analysis. In Proceedings
of the 2nd IEEE International Symposium on Bioinformatics and Bioengineering, pp.
4148, 2001.
[12] Noureen, N., Qadir, M.A., BiSim: A Simple and Efficient Biclustering Algorithm,
Soft Computing and Pattern Recognition, SOCPAR '09. International Conference of
2009, pp. 1 6.
[13] Sara Dolnicar, Sebastian Kaiser, Katie Lazarevski, Friedrich Leisch, Biclustering
Overcoming Data Dimensionality Problems in Market Segmentation, Journal of Travel
Research.Vol. 51 No. 1 41-49, 2012.

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 69


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
[14] Rodrigo Santamara*, Roberto Thero n and Luis Quintales, BicOverlapper: A tool
for bicluster visualization, Vol. 24 no. 9 2008, pp. 12121213.
[15] Swarup Roya, , Dhruba K Bhattacharyyab, Jugal K Kalitac, CoBi: Pattern Based Co-
Regulated Biclustering of Gene Expression Data, Preprint submitted to Elsevier March
9, 2013.
[16] Liu X, Wang L: Computing the maximum similarity bi-clusters of gene expression
data. Bioinformatics 2007, Vol. 23, No. 1, pp. 50-56.
[17] Koutsonikola VA, Vakali A. A Fuzzy Bi-clustering Approach to Correlate Web Users
and Pages. Int. J. Knowledge and Web Intelligence 2009, Vol. 1 No.1-2, pp. 3-23.

This paper may be cited as:


Sathish Kumar, K., Ramalingam, M. and Thiagarasu, V. 2014. A Survey on
Bi-Clustering and its Applications. International Journal of Computer
Science and Business Informatics, Vol. 12, No. 1, pp. 65-70.

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 70


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Pixel Level Image Fusion:


A Neuro-Fuzzy Approach
Swathy Nair
Dept. of Electrical and Electronics Engineering
MA College of Engineering
Kothamangalam, Kerala, India

Bindu Elias
Dept. of Electrical and Electronics Engineering
MA College of Engineering
Kothamangalam, Kerala, India

VPS Naidu
Multi Sensor Data Fusion Lab
CSIR National Aerospace Laboratories
Bangalore-17, India

ABSTRACT
Image fusion is done for integrating images obtained from different sensors, which outputs
a single image containing all relevant data from the source images. Five different image
fusion algorithms, SWT, fuzzy, Neuro-Fuzzy, Fuzzylet and Neuro-Fuzzylet algorithms has
been discussed and tested with two datasets (mono-spectral and multi-spectral). The results
are compared using fusion quality performance evaluation metrics. It was observed that
Neuro-Fuzzy gives better results than Fuzzy and SWT. Fuzzylet and Neuro-Fuzzylet were
obtained by combining Fuzzy and Neuro-Fuzzy respectively with SWT. It was observed
that Fuzzylet gives better results for mono-spectral images and on the other hand, Neuro-
Fuzzylet had given better results for multi-spectral images at the cost of execution time.
Keywords
Image fusion, Fuzzy logic, image processing, Nero-fuzzy.

1. INTRODUCTION
For Intelligent systems, integration of information from different sensors
plays a great role. Image fusion is done for integrating images obtained from
different sensors, which outputs a single image containing all relevant data
from the source images and provides a human/machine perceivable result
with more useful complete information. Image Fusion has got great
importance in many applications such as object detection, automatic target
recognition, remote sensing, computer vision, flight vision, robotics etc.
This paper deals with a comparison of certain pixel level image fusion
techniques based on SWT, Fuzzy and Neuro-Fuzzy.

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 71


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
Many methods have been proposed and implemented for image fusion [1].
Wavelet transform based image fusion has the merits of multi-scale and
multi-resolution. In [2], an approach of multi-sensor image fusion using
wavelet transform and principal component analysis (PCA) was proposed
and comparison of image fusion with different techniques based on fusion
quality performance metrics is done. Wavelets have a disadvantage of shift
variance which results in loss of edge information in fused image [3].
Stationary Wavelet Transform (SWT) solves this problem which is shift
invariant [4]. Since the concept of image fusion is not that certain and crisp,
Fuzzy logic and Neuro- Fuzzy logic are implemented for image fusion in
order to incorporate uncertainty to the images [5]. The help of Neuro-fuzzy
of fuzzy systems can achieve sensor fusion. The major difference between
neuro-fuzzy and fuzzy systems is that a neuro-fuzzy system can be trained
using the input data obtained from the sensors. The basic concept is to
associate the given sensory inputs with some decision outputs. After
developing the system, another group of input data is used to evaluate the
performance of the system. Algorithms for image fusion using Fuzzy and
Neuro-Fuzzy approaches are introduced in [6]. In [7], SWT with higher
level of decomposition is introduced and Fuzzy logic is incorporated into it
to form a novel algorithm called Fuzzylet.

This work is done as an extension to the work done in [7]. In this paper
Neuro-fuzzy based image fusion is tested and compared with SWT and
Fuzzy logic. An algorithm is formed in which Neuro-fuzzy is incorporated
into SWT which is named as Neuro-Fuzzylet and compared with Fuzzylet.
All the comparisons are done by evaluating Fusion Quality Performance
Metrics and results are verified with different sets of images. In this paper, it
is assumed that images to be fused are already registered.

2. IMAGE FUSION TECHNIQUES


Pixel level image fusion technique using SWT, Fuzzy and Fuzzylet are
explained in [7]. Matlab code for SWT based image fusion is given in [8].
In [7] it is proved that SWT with higher decomposition levels and Fuzzy
logic with greater number of membership functions gives the better result.
Fuzzylet algorithm is formed by combining SWT with 4 levels of
decomposition and Fuzzy with 5 membership functions. In this paper,
Neuro-Fuzzy logic is also tested and compared with the results obtained in
[7].

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 72


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
2.1 Neuro-Fuzzy Approach to Image Fusion
Neural Network (NN) is a network which stores the experimental
knowledge and uses it for test data. Neuro- Fuzzy is a combination of
Artificial Neural Network (ANN) and Fuzzy logic. Using this method we
can train the system with input dataset and desired output. After training the
system, this system can be used for any other set of input data. A Neuro-
fuzzy system is a fuzzy system which is trained by any of neural network
learning algorithms and according to the training data system parameters are
modified automatically. Implementation of Neuro-Fuzzy system is done
using ANFIS. ANFIS stands for Adaptive Neural Fuzzy Inference System.
The Fuzzy Inference System (FIS) is a model that does the following
mappings:
A set of input characteristics to input membership functions
Input membership functions to rules
Rules to a set of output characteristics
Output characteristics to output membership functions and
The output membership function to a single-valued output

A FIS has the following limitations:


Membership functions are fixed and somewhat arbitrarily chosen
Fuzzy Inference is applied for modeling systems in which the rules
are predetermined strictly based on the viewpoint of user to the
model.
The shape of the membership functions can be changed by changing the
membership function parameters as it is dependent on these parameters. In
an ordinary FIS, these parameters are selected arbitrarily in a trial and error
basis just looking into the available data. For applying fuzzy logic to a
system in which a collection of input-output data is available, a
predetermined parameter set will not be available. In some situations
arbitrary selection of parameters will not be sufficient to model a system in
a desired way. Instead of choosing member ship function parameters
arbitrarily, it would be more effective if the parameters are adjusting
themselves based upon the input data variation. In such cases, Neuro-
adaptive learning techniques can be incorporated into the FIS.

Using the input-output data given, ANFIS constructs a FIS whose


membership function parameters are tuned using any neural network
algorithm. This allows the FIS to learn from the data that are given as the
test data. There is an ANFIS editor toolbox in Matlab which does all this
learning. A Neuro-Fuzzy system can be schematically represented as in Fig
1.

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 73


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

I1
Fuzzy
Inference If
System

I2 Fused image

Registered input images ANFIS


Figure 1. Schematic Diagram of Neuro-Fuzzy system

The ANFIS training structure obtained from Matlab ANFIS editor toolbox
for two inputs and three membership functions is as shown in Fig. 2.

Figure 2. ANFIS training structure obtained for two inputs and three membership
functions

In the ANFIS training structure shown in Fig.2. The leftmost nodes


represent the inputs and the rightmost node represents the output. The
branches are coded using different colors to indicate the logical operations
used in rule formation, that is, it indicates whether and, or or not is used to
combine antecedences to consequences.

For image fusion, the pixel values of input images and reference (desired)
image are given to the ANFIS for training the FIS, so that the system will
produce a fused image which is closer to the reference image from the input
images. Algorithm for image fusion using Neuro-Fuzzy logic (abbreviated
as NF(I 1 ,I 2 ) ) is as follows:
Step 1: Read the images ( I1 & I 2 ) to be fused into two variables

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 74


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
Step 2: Obtain a training data, which is a matrix with three columns (2
columns of input data and one column of output data)
Step3: Obtain a check data, which is a matrix of pixel values of two
input images in column format
Step 4: Decide number and type of membership functions for both
the input images
Step5: Generate a FIS structure from the train data and train the FIS
Step 6: Provide check data to the FIS structure for processing and obtain
the output image in column format
Step 7: Convert the column form into matrix form to get the fused
image I f
In the case of dataset without a reference output, the 3rd column (output) of
the training data is given as the maximum of absolute pixel values of the
input images.

2.2 Neuro-Fuzzylet Algorithm for Image Fusion


In [7], Fuzzylet Image Fusion algorithm has been developed. In Fuzzylet
algorithm, Fuzzy logic is used to find out the approximate and detail
coefficients of SWT of input images. In Neuro-Fuzzylet, instead of Fuzzy
Neuro-fuzzy algorithm discussed in section 2.A is used to calculate the
SWT coefficients. The information flow diagram of image fusion using
Fuzzylet is shown in Fig. 3.

SWT
I1
Fuzzy
Inference ISWT
If
System
Fused Wavelet Fused image
Coefficient map
SWT
I2
Registered input Wavelet coefficient ANFIS
images maps

Figure 3. Schematic diagram of Neuro-Fuzzylet Image Fusion Algorithm

The images to be fused I 1 and I 2 are decomposed into K ( k 1,2,..., K ) levels


using SWT. The resultant approximation and detail coefficients from I 1 are

I 1 1 AK , H 1
k , 1Vk , 1Dk
k 1 ,2 ,...,K
. Similarly from I 2 the resultant
approximation and detail coefficients are

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 75


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

I2 A
2
K , H
2
k , 2 V k , 2 Dk
k 1 ,2 ,...,K
. The fused image I can be obtained f

using SWT as:


If A
f
K , f
H k , f V k , f Dk k 1 ,2 ,...,K
(1)
Where
1
AK 2 AK
f
AK (2)
2
f
H k NF ( 1 H k ,2 H k ), k 1,2 ,..., K (3)
f
Vk NF ( 1Vk ,2 Vk ), k 1,2 ,..., K (4)
f
Dk NF ( 1 Dk ,2 Dk ), k 1,2 ,..., K (5)

Where, the function NF ( a , b ) is a Neuro-Fuzzy logic based image fusion


algorithm described in section 2.A.

3. IMAGE FUSION QUALITY EVALUATION INDICES


The quality of fused images obtained from different algorithms (SWT,
Fuzzy, Neuro-Fuzzy, Fuzzylet and Neuro-Fuzzylet) is compared using
Fusion Quality Performance Evaluation Indices. In this paper, two datasets
are used for the evaluation of algorithms. One among the datasets has a
reference image to which the fused image is compared while the other is not
having a reference image. So for the two datasets different evaluation
indices are used. Evaluation indices are calculated for all algorithms and
compared to find out the best algorithm.
A. With Reference Image
For datasets having reference image, fusion quality could be evaluated using
the following evaluation indices:
1. Root Mean Square Error(RMSE)
RMSE is computed as the root mean square error of the
corresponding pixels in the reference image I r and the fused image
I f . The RMSE between a reference image and the fused image is
given by:

I
M N
1
RMSE r ( i, j ) I f ( i, j ) (6)
MN i 1 j 1

Where I f (i, j ) and I r (i, j ) are the gray value of fused image and

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 76


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
reference image respectively at index (i, j ) . For better quality
images, the root mean square error should be less.

2. Peak Signal to Noise Ratio (PSNR)


Peak signal to noise ratio (PSNR) value will be high when the fused
and the ground truth images are comparable. Higher value implies
better fusion. PSNR can be calculated as:
L2
PSNR 20 log10
RMSE (7)
Where, RMSE is the root mean square error and L is the number of
gray levels in the image.

3. Relative dimensionless global error in synthesis(ERGAS)


Relative dimensionless global error in synthesis (ERGAS) calculates
the amount of spectral distortion in the image it is given by:
2
B
RMSE ( b )

h 1
ERGAS 100 (8)
l B b 1 m( b )

Where, h is the resolution ratio, m(b) is the mean of bth band and B is
l
the number of bands.

4. Structural Content (SC)


Structural content can be calculated by using the equation:
M N

I
i 1 j 1
f ( i, j )
(9)
SC M N

I ( i , j )
i 1 j 1
r

Structural content should be 1 for fused image identical to the


reference image.

5. Error Image (EI)


The error image is computed as the difference between
corresponding pixels of reference and fused image. Image of better
fusion quality would have less error and an ideal fusion results in a
complete black error image.
EI I r I f
(10)
B. Without Reference Image
Evaluation indices that are used for datasets without reference image
are:

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 77


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
1. Entropy (H)
Entropy is used to measure the information content of an image.
Entropy is sensitive to noise and other unwanted rapid fluctuations.
An image with high information content would have high entropy.
Entropy is defined as:
H sum p log2 p (11)
Where, p contains the histogram counts returned from the Matlab
function imhist.

2. Mean (m)
Mean gives the mean pixel value, which is formulated as:
M N

I
1
m f ( i, j )
MN i 1 j 1
(12)
Where, I f (i, j ) is the gray value of fused image at index (i, j ) , MxN
is the size of the image.

3. Standard Deviation (SD)


It is known that standard deviation is composed of the signal and
noise parts. This metric would be more efficient in the absence of
noise. It measures the contrast in the fused image. An image with
high contrast would have a high standard deviation. SD is given by:

I
M N
1
SD f ( i, j ) m (13)
MN i 1 j 1

Where, m is the mean pixel value of the fused image.

4. Spatial Frequency (SF)


This frequency in spatial domain indicates the overall activity level
in the image. Image with high spatial frequency offers better quality.
It can be calculated as
Row Frequency (RF):

I
M 1N 1
1
RF f ( i , j ) I f ( i , j 1 )2
MN i 0 j 1
(14)
Column Frequency (CF):

CF
1 N 1 M

MN j 0 i 1
I f (i, j ) I f (i 1, j ) 2
(15)
Spatial Frequency (SF): SF RF CF 2 2
(16)

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 78


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
5. Cross Entropy (CE)
Cross-entropy evaluates the similarity in information content
between input images ( I 1 & I 2 ) and fused image. Better fusion
result would have low cross entropy. Cross entropy can be calculated
as:


CE I 1 ; I f CE I 2 ; I f (17)
CE I 1 , I 2 ; I f
2


Where, CE I 1 ; I f sum p i log2 p i 1

pi
1

f
pi

CE I 2 ; I f sum p i2 log 2
pi
2

f
p i is the normalized histogram of the image I.
6. Fusion Factor(FF)
Fusion factor of two input images ( I 1 & I 2 ) and fused image ( I f ) is
given by:
FF I 1 f I 2 f (18)
P
Where, I 1 f sum Pi i log i i 1 f

Pi Pi
1 f
1 f

Pi2 1i f
I 2 f sum Pi2 i f log
Pi2 Pi f

Pi1 & Pi f are the probability density functions in the
individual images and
Pi1i f is probability density function of both images
together.
FF indicates the amount information present in fused image from
both the images. Hence, higher value of FF indicates good fusion
quality. But it does not give the indication that the information are
fused symmetrically. For that another metrics called fusion
symmetry is used.

7. Fusion Symmetry(FS)
Fusion symmetry indicates how symmetrically the information from
input images is fused to obtain the fused image. It is given by:
I1 f
FS abs 0.5 (19)
I1 f I 2 f

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 79


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
Since this metric is a symmetry factor, from the equation it is clear
that its value should be as low as possible so that the fused image
would contain the features of both input images. Fusion quality
depends on degree of Fusion symmetry.

8. Fusion Quality Index(FQI)


Fusion Quality Index is given by:

FQI sum c( w ) ( w )QI ( I 1 , I f | w ) ( 1 ( w ))QI ( I 1 , I f | w )
(20)
i1 2

Where, ( w ) computed over a window;


i1 i2 2
2

c( w ) max( i1 , i2 ) over a window & QI ( I1 , I f | w ) is the


2 2

quality index over a window for a given source image and fused
image.
The range of this metric is 0 to 1. One indicates the fused image
contains all the information from the source images. FQI of a better
fusion would have maximum value in between 0 & 1.
9. Execution Time (Et)
It gives the time taken to execute the algorithm.

4. RESULTS AND DISCUSSIONS


The results obtained in [7] are taken and compared with Neuro-Fuzzy and
Neuro-Fuzzylet fusion results. For experimentation, two datasets are taken.
Dataset-1 is of CSIR- NAL indigenously developed SARAS images (mono-
spectral), which consists of a reference image as shown in Fig. 2 and input
images, which are obtained by blurring the reference image as shown in Fig.
3. The fusion techniques are further tested using another dataset; Dataset-2
which is a multispectral dataset consists of a Low Light TV (LLTV) image
and a Forward Looking IR (FLIR) as inputs. Reference image is not
available for this dataset. Different fusion techniques are compared using
the fusion quality performance evaluation metrics described in section 3.
A. Dataset-1
As mentioned before, Dataset-1 consists of one reference image ( I r ) and 2
input images ( I 1 and I 2 ) of SARAS as shown in Fig. 4 and 5.

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 80


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Fig. 4 Reference image of SARAS ( I r )

Fig. 5 Input images of SARAS ( I 1 and I 2 )

The fusion techniques are tested one by one on Dataset-1 in Matlab. In SWT
algorithm, it is observed that fusion quality increases with the increase in
levels of decomposition at the cost of execution time and it is found out that
fusion results with 4 decomposition levels of SWT gives the better results
[7]. In Fuzzy logic based algorithm, Sugeno FIS with 5 membership
function had given better results. Fuzzylet algorithm is formed by
combining SWT with 4 decomposition levels and Fuzzy with 5 membership
functions [7]. ANFIS training is done to the FIS to get Neuro-fuzzy
algorithm. Here also number of membership functions can be varied.
Performance of image fusion using 3 and 5 membership functions with
ANFIS is tabulated in Table-1. From the table, it is observed that there is no
improvement in evaluation indices by increasing the number of membership
function and execution time increases with increase in membership
functions. So, ANFIS with 3 membership functions is selected for
evaluation. For formulating Neuro-Fuzzylet, the fuzzy function is replaced
with Neuro-fuzzy function in the Fuzzylet algorithm. The performance
metrics obtained for different methods is tabulated in Table-2 for
comparison.

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 81


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Table-1 Comparison of the Performance metrics obtained using different


membership functions of ANFIS
No:of Performance evaluation metrics
MFs Entropy RMSE PSNR SD ERGAS SF SC CE FF FQI FS Et(sec.)
3 3.578 0.016 66.542 0.199 1.828 0.067 1.003 4.682 3.377 0.816 0.018 0.292
5 3.548 0.016 65.993 0.198 1.844 0.062 0.986 4.714 3.376 0.814 0.012 0.473

Table-2 Comparison of the Performance metrics obtained from five image fusion
techniques for Dataset-1
Algorithm Performance evaluation metrics
Entropy RMSE PSNR SD ERGAS SF SC CE FF FQI FS Et(sec.)
SWT 3.89 0.007 69.944 0.195 0.744 0.066 1.002 4.215 3.378 0.811 0.016 0.826
Fuzzy 3.578 0.031 63.142 0.195 3.560 0.046 0.981 5.228 3.358 0.771 0.009 0.455
Neurofuzzy 3.578 0.016 66.542 0.199 1.828 0.067 1.003 4.682 3.376 0.816 0.015 0.292
Fuzzylet 4.061 0.005 71.062 0.198 0.224 0.068 1.000 3.255 3.389 0.882 0.013 3.324
Neurofuzzylet 3.912 0.006 70.165 0.199 0.771 0.066 1.002 3.626 3.379 0.848 0.017 3.212

From the table it is clear that Neuro-Fuzzy gives better results than fuzzy
(see values shown in red). But when it is combined with SWT, fuzzy gives
better results. So out of the five algorithms, Fuzzylet gives best fusion
results (see bold values) for Dataset-1. The fused and error images for all the
algorithms are given from Fig. 6 and 7.

Fig. 6 Fused image using SWT, Fuzzy, Neuro-Fuzzy, Fuzzylet and Neuro-Fuzzylet
respectively for Dataset-1

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 82


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Fig. 7 Error image using SWT, Fuzzy, Neuro-Fuzzy, Fuzzylet and Neuro-Fuzzylet
respectively for Dataset-1

B. Dataset-2
Dataset-2 is a multispectral data set consists of LLTV ( I1 ) and FLIR ( I 2 )
images as inputs as shown in Fig. 8. Reference image is not available for
this dataset, hence evaluation metrics explained in section 3.B is used for
the comparison.
Human eye is sensitive to a limited range of the electromagnetic spectrum as
well as to low light intensity. To obtain data that cannot be sensed by the
eye, one can use sensor data such as IR sensors or image intensifier night
time sensors. The human observer may use data from multiple sensors. For
example, using the visual channel as well as the IR channel can substantially
improve the ability to detect a target. This can be observed in the input
images shown in Fig.8. In the LLTV image, the bushes, trees etc are more
visible while in FLIR image, the roads are more visible. The fused image
should render the necessary features of both images.

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 83


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Fig. 8 Images to be fused (LLTV image and FLIR image)

Fused images using all the five algorithms are shown in Fig. 9. It is
observed that in SWT result, all the features of both input images are visible
but with poor clarity. Rendering of land texture, visual quality of image, etc
are poor. In Fuzzy and Neuro-Fuzzy, it is observed that IR features are
prominent. Its rendering quality is poor with dark texture and over enhanced
view of elements like bushes, trees, etc.

The training data for ANFIS training is selected as mentioned in section


1.A. It is observed that, with the use of Fuzzylet and Neuro-Fuzzylet
algorithm, both Visible and IR features are equally rendered maintaining the
quality of both input image. So visually, Fuzzylet and Neuro-Fuzzylet
provides better result. This can be further evaluated by evaluating fusion
quality metrics tabulated in Table-3 for all the five algorithms.
Table-3 Comparison of the Performance metrics obtained from five image fusion
techniques for Dataset-2
Agorithm Performance evaluation metrics
Entropy SD CE FF FS FQI Et(sec.)
SWT 7.241 0.187 4.538 2.184 0.023 0.597 0.379
Fuzzy 7.095 0.217 0.959 2.185 0.023 0.496 0.428
Neuro-Fuzzy 7.301 0.283 2.308 2.228 0.033 0.437 0.314
Fuzzylet 7.296 0.288 4.535 2.346 0.043 0.698 1.225
Neuro-Fuzzylet 7.321 0.296 4.355 2.419 0.045 0.698 0.922

From the table it is clear that Neuro-Fuzzy gives better results than Fuzzy
and SWT (see values shown in red).When it is combined with SWT, Neuro-
Fuzzy gives better results. So out of the five algorithms, Neuro-Fuzzylet
gives best fusion results (see bold values) for multispectral images.

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 84


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Fig. 9 Fused image using SWT, Fuzzy, Neuro-Fuzzy, Fuzzylet and Neuro-Fuzzylet
respectively for Dataset-2

5. CONCLUSION
Five different image fusion algorithms, SWT, fuzzy, Neuro-Fuzzy, Fuzzylet
and Neuro-Fuzzylet algorithms were discussed and tested with two datasets
(monospectral and multispectral). The results were compared using fusion
quality performance evaluation metrics. It was observed that Neuro-Fuzzy
gives better results than Fuzzy and SWT. Fuzzylet and Neuro-Fuzzylet were
obtained by combining Fuzzy and Neuro-Fuzzy respectively with SWT. It
was observed that Fuzzylet gives better results for monospectral images and
on the other hand, Neuro-Fuzzylet had given better results for multispectral
images at the cost of execution time. It is hoped that the proposed algorithm
can be extended for real time and color images.

6. REFERENCES
[1] Yanfen Guo, Mingyuan Xie, Ling Yang, An Adaptive Image Fusion Method
Based on Local Statistical Feature of Wavelet Coefficients 978-1-4244-5273-6/9
2009 IEEE.
[2] VPS Naidu and J.R. Raol, Pixel-level Image Fusion using Wavelets and Principal
Component Analysis, In defence journal, Vol. 58, No.3, May 2008, pp. 338-352.
[3] Andrew. P. Bradley, Shift-invariance in the Discrete Wavelet Transform, in Proc.
VIIth Digital Image Computing: Techniques and Applications, Dec. 2003, Sydney.
[4] Pusit Borwonwatanadelok, Wirat Rattanapitakand Somkait Udomhunsakul, Multi-
Focus Image Fusion based on Stationary Wavelet Transform and extended Spatial
Frequency Measurement, International Conference on Electronic Computer
Technology, 2009, pp. 77-81.
[5] R. Maruthi and K. Sankarasubramanian, Pixel Level Multifocus Image Fusion Based
on Fuzzy Logic Approach, Asian Journal of Information Technology 7(4): 168-171,
2008 ISSN: 1682-3915.

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 85


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
[6] Harpreet Singh, Jyoti Raj and Gulsheen Kaur, Image Fusion using Fuzzy Logic and
Applications, Budapest Hungary, 25-29 July. 2004.
[7] Swathy Nair, Bindu Elias, VPS Naidu, Pixel Level Image Fusion using Fuzzylet
Fusion Algorithm, in International Journal Of Advanced Research in Electrical
Electronics And Instrumentation Engineering, ISSN 2278-8875, Dec 2013.
[8] www.mathworks.in/matlabcentral/fileexchange/authors/104729. Accessed on
21/2/2014.

This paper may be cited as:


Nair, S., Elias, B. and Naidu V., 2014. Pixel Level Image Fusion: A Neuro-
Fuzzy Approach. International Journal of Computer Science and Business
Informatics, Vol. 12, No. 1, pp. 71-86.

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 86


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

A Comparative Analysis on
Visualization of Microarray Gene
Expression Data
Poornima. S
PG Scholar, Department of Information Technology
Sona College of Technology
Salem,India

Dr. J. Jeba Emilyn


Associate Professor, Department of Information Technology
Sona College of Technology
Salem, India

ABSTRACT
Visualization technique helps in the easy analysis of data. But, Visualization of
biological data is one of the most challenging processes and visualization of the
computed clustered and biclustered data still remains an open issue. Clustering and
Biclustering techniques were the popular methods for classifying the gene
expression data. There is no standard visualization technique for the biclustered
data. Visualization of multiple biclusters is very harder to implement because of the
overlapping property. Here, we analyzed the merits and demerits of various
visualization techniques and visualization tools. We Compared and provided a
detailed study of each technique. Finally, a conclusion to overcome the common
challenges in the visualization of microarray gene expression data is provided.
Keywords
Visualization, microarray gene expression data, clusters, biclusters.

1. INTRODUCTION
Visualization technique is the study of the visual representation of data.
Visualizing gene expression data is a challenging one. The most common
and efficient method for analyzing gene expression data is clustering that
groups together genes with similar expression profiles. We have many
standard visualization techniques for gene clustering. But it is not the case
for the gene biclustering. In gene expression data, a bicluster is a subset of
the genes exhibiting consistent patterns over a subset of the conditions.
Biclustering techniques group the genes under a certain subset of conditions.
At the same time, a gene or condition can be in more than one bicluster
called overlapping, but in clustering a gene or condition is usually assigned

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 87


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
to a unique cluster. The outputs of biclustering algorithms provide the basis
for better understanding of the biological process underlying the data.
However, to provide a clear knowledge about the biclustered results a better
and efficient visualization technique is required. Here, we have analyzed
some visualization techniques of both clustering tools and biclustering tools.
This paper proceeds as follows. In the next section, the background study is
described. Section 3 describes related works in this field, etc.

2. Study of Existing Visualization Techniques and Tools

2.1 Existing Visualization Techniques

2.1.1 Visualization of Multilayered Clustered Results

Leishi Zhang et al, proposed a paper [1] where 3D visualization of gene


clusters can be effectively performed by using Force directed placement
spring model. Detecting and allocating genes within the cluster to a local
area(Infocube).Allocate all the infocubes within a global area. Here nodes
are considered as physical bodies and edges are the spring force in between
the gene (nodes). Genes are clustered within the infocube in the local area
and then the infocubes are clustered in the global area. Therefore two layers
are obtained. It can be extended to display as a multilayered graph. Spring
force and repulsive force of each node is calculated and then they are
positioned inside the respective infocube. Similarly all the infocubes spring
and repulsive forces are calculated and then positioned in its global area.

Figure 1. Representing clusters in two layers in [1]

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 88


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
2.1.2 Integration of Clustering and Visualization

Gunther H. Weber et al proposed this paper [12] where both clustering and
visualization methods are involved for the analysis of gene expression.
Brushing and linking is the visualization technique used. In physical views,
colour and height are used for visualizing spatial gene expression pattern
clusters. In abstract views, positions are ignored and expression levels for
multiple genes in the cluster are plotted with respect to each other using
scatter plots or parallel coordinates. Colour mapping plays the major part. It
combines different visualization methods to overcome the shortcomings of
single techniques. It provides very high Interactivity.

2.1.3 Parallel Coordinate Plot Visualization For Biclusters

K. O. Cheng et al proposed this paper [4], in which the parallel coordinate


(PC) plot visualization technique is used for high dimensional data. A
bicluster is a subset of rows which exhibit coherent patterns across a subset
of columns. Generally there are two types of biclusters, namely additive-
related biclusters and multiplicative-related biclusters. Here, all these axes
were ordered in a parallel way on a 1-Dimensional plane. And the
orthogonal property is destroyed. Also the geometric structure of the data is
consistent. The additive-related bicluster shows a number of lines with the
same slope across the conditions [4]. Biclusters detection is done by using
the split and merge mechanism.

2.1.4 R Package Biclust

Sebastian Kaiser and Friedrich Leisch introduced the R package biclust in


[7], which consists of multiple biclustering algorithms and multiple
visualization techniques for different purposes. Bubbleplot is a graphical
representation of biclusters which is shown in a 2 dimensional format.
Bicluster within a bicluster is represented as small and big circles. Parallel
coordinates is used to represent the similarity of rows over columns inside
the bicluster. Heatmap is used to provide a representation of difference
between the bicluster with the rows and columns.

2.1.5 Visualization of Gene Clusters and Gene Expressions


Ashraf S. Hussein (2008) proposed a web based framework [14] for the
analysis and visualization of gene expression and protein structures. Two
types of visualizations are used. They are sequence visualization and

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 89


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
clustering visualization. In Clustering visualization we have hierarchical
clustering where N*N matrix is the input and D is the pairwise distance.
Depth first algorithm is implemented for displaying the results of
hierarchical clustering, in Self organising maps the clusters are represented
like Cartesian graph. In k-algorithm, clusters are represented as group of
spheres with distinct colors.

3. Existing Visualization Tools

3.1.1 Gene 3D Miner

YONGGAO YANG, JIM X. CHEN and WOOSUNG KIM proposed in [9],


where, clustering and 3D visualization takes place by using Self organizing
maps, principal component analysis and 3D plotting techniques. Clusters are
visualized in 3D space. Here, the genes are grouped into clusters using SOM
algorithm and then they are reduced to 3 dimension data using PCA
algorithm. And they are placed in the 2D and 3D space. Since all the genes
are plotted in the same 3D space, the distance between the gene pairs
represents the dissimilarities between the genes. The screen is divided into
many panes to avoid overcrowding of genes for the comparison. Time serial
values are provided in each pane. The size of the pane is directly
proportional to the number of genes in the cluster. The entire 3D space is
divided into smaller cubes. And each cube represents a cluster. Finally, the
full 3D space will look like a big cube which encloses clusters as smaller
cubes.

Figure 2. Representation of genes as cubes in gene 3D miner in [9]

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 90


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
3.1.2 Bicluster Viewer

Julian Heinrich deals with Bicluster Viewer in [2]. It supports contiguous


representation of selected biclusters. To achieve this Bicluster viewer [2]
allow row and column reordering as well as duplication. Here the
representation of biclusters is done with several unique coloured rectangles
inside the heat map. In Parallel-coordinate plots, each polyline represents
the expression of a gene over all conditions [2]. Genes belonging to a
bicluster are visualized using the same color as the corresponding bicluster
in the heatmap [2]. The axes of the parallel-coordinates plot are arranged in
the same order as the columns in the heatmap representation [2]. In order to
visualize the conditions of genes belonging to a bicluster, it compute the
average vertical position of all lines of a bicluster halfway between adjacent
conditions if at least one of the conditions is part of the bicluster [2]. Then,
the corresponding lines are forced to cross this point, which we call the
centroid [2]. This uses the linking and brushing techniques to link heatmap
and parallel coordinate plots.

3.1.3 Bicoverlapper

Rodrigo Santamaria proposed a tool Bicoverlapper [3] to visualize biclusters


from gene expression matrices. It is used to compare the biclusters, and to
highlight relevant genes and conditions. Biclusters are visualized using the
technique called Force directed graph. Each pair of nodes are held by
two types of forces called spring force and expansive force. A glyph is used
to convey multiple data values and glyphs on gene and conditions nodes
improves our understanding the instances of overlapping when the
representation becomes complex. Edge cluttering is avoided by wrapping
the bicluster in a rounded shape (transparent hull) built by splines that take
the positions of the outermost nodes in the bicluster as anchor points. A
gene or a condition node can be in more than one bicluster, reflecting
overlapping between biclusters [3], which can usually affect more than a
node.

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 91


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Figure 4. Representation of Biclusters in Bicoverlapper [3]

3.1.4 Biolayout Express 3D

Athanasios The ocharidis et al designed a tool [13] for various analysis and
visualization of biological process. It supports both unweighted and
weighted graphs together with edge annotation of pair wise relationships
[13]. Fruchterman -Rheingold layout algorithm is used for the visualization.
Various colour schemes make the network more informative and the clusters
can be easily visualized. Markovian chain clustering algorithm is designed
specifically for the clustering of simple or weighted graphs [13]. It has a
very high user interaction. The UI functions supported here were zooming,
scaling, rotation, translation, and selection of the graph. Landscape plot is
used as an alternative representation for dendrogram. The subtrees at a given
internal node are modified by reordering it, so that the larger tree is obtained
in the left. And finally a histogram is created based on the joining events
corresponding with the subsequent gene pairs.

Figure 5. Representation of Clusters in Biolayout express 3D in [13]

3.1.5 Bicat

Simon Barkow et al proposed in [5] a software tool where both clusterin and
biclustering techniques are used for the visualization of gene expression
data. Many biclustering and clustering algorithms were used. Higer user

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 92


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
interaction is provided in this tool. The expression matrix is displayed as a
heatmap. Biclusters can be visualized as heatmap and gene expression
profiles. The heatmap is rearranged in such a way that those genes and
conditions that define the bicluster under consideration appear in the upper
left corner of the map. Alternatively, the expression view of a bicluster,
displays the profiles of those genes that are grouped within a bicluster. For
different conditions, different colour lines are represented between the gene
pairs.

Figure 6. Representation of Biclusters in Heatmap in [5]


3.1.6 Bivisu
K.O. Cheng*, N.F. Law, W.C. Siu and T.H. Lau proposes a tool [6] called
BiVisu, an open-source tool for detecting and visualizing biclusters. The
biclustering results are then visualized in a two-dimensional set up for the
easy analysis. In parallel co-ordinates all the axes in the plane are drawn
parallel to each other. The axes are arranged properly to visualize the
biclusters clearly. The tool allows the navigation of biclusters individually.
The genes inside and/or outside a bicluster can be drawn for comparison
purpose. Heat map is also provided. Detection of biclusters of constant,
constant rows, constant columns, additive and multiplicative types are also
possible. split-and-merge algorithm is used for bicluster detection.

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 93


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

3.1.7 Jexpress

B. Dysvik and I. Jonnasen designed [10], a JAVA tool with


Multidimensional scaling, clustering and visualization methods to visualize
the gene clusters. Gene graph viewer is interactive and when the internal
nodes are clicked it shows the expression profiles of the gene in the subtree.
Clusters are visualized in the form of gene graphs as an output. To get a 2D
or 3D representation of clusters multidimensional scaling method is
employed. Visualization of clusters are not upto the expected level, it is
acceptable, since this is the primary tool in this field which leads to the
invention of powerful visualization techniques. It is interactive and when the
internal nodes are clicked it shows the expression profiles of the gene in the
subtree

3.1.8 clusterMaker

John H Morris et al designed [11] clusterMaker, a plugin for the software


cytoscape, which provides a variety of clustering algorithms and
visualization techniques for the analysis of gene expression data. The three
types of visualizations are heatmap, treeview and network view. Heatmaps
are used for visualizing clusters from k-means, k-mediod and AutoSome
clustering algorithms and also all the numeric attributes within the network
are also visualized using heatmap. The clusters obtained from hierarchical
clustering algorithm are visualized as dendrograms. Heat map and Treeview
(dendrogram) visualizations are used from Java Treeview software.
Network view, visualization is provided using force directed layout. It
represents only the intra- cluster edges all the inter cluster edges are
dropped.

3.1.9 EUREKA-DMA

Sagi Abelson in [16] designed a software called EUREKA-DMA. It is a


GUI used for the fast classification and analysis of microarray gene
expression data. It integrates many primary tools and forms a common
interface. It provides a very efficient framework for the visualization and
interpretation of gene expression data. Many visualization techniques are
use for various purposes. They are volcano plot, heat map, gene ontology
biograph, bar plots and box plots [16]. Volcano plot is used to show the
genes with differential expression [16]. Heatmap is for the representation of
hierarchial clustering analysis and the other techniques are for other
purposes like finding protein (KEGG) pathways [16] and to analyse the

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 94


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
difference between the computed results. This heat map representation is
helpful to the users with high degree of knowledge about the microarray
gene expression data.

Table 1. Comparison of Visualization techniques


# Techniques Cluster/ Methods Merits Demerits
Bicluster
1 Multilayered Clusters Force Cluster No optimal
clusters in[1] In 3D directed visualization representation
placement is simple, and takes very
spring effective, high running
model easily adapted time
and extended

Integration of Cluster Scatter Clear visual possibility of


2 clustering and In 2 D plots and representation missing data
visualization in parallel of the results. when there is
[12] coordinate a large and
complex
pattern of
gene
expression
data
Gene cluster Cluster in DFS tree, Visually Works good
3 and gene 2D Cartesian attractive only for small
expression graphs, patterns are datasets
visualization spheres obtained
4 Parallel Bicluster in Parallel Biclusters are Contiguous
coordinates 2D coordinate detected and biclusters are
plots in [4] visualized. not obtained
User
interaction is
possible
5 Heatmap in [8] Bicluster in Rearrangin It visually only one
2D g rows and highlights occurrence of
columns in similarities each bicluster
the 2D and can be
matrix differences highlighted
between the
biclusters
6 R package Bicluster in Bubble Better analysis Simultaneous
Biclust in [7] 2D plot, and visualization
Heatmap, understanding is not possible
Parallel of both clusters
coordinate and biclusters

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 95


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Table 2. Comparison of Visualization Tools


# Tools Cluster/ Methods User Merits Demerits
Bicluster Interacti
on
1 Gene Cluster PCA to Low Overcrowding user interaction
3D reduce the of genes are is poor
miner dimension avoided in this
tool

2 Cluster Cluster Java treeview Low Increased bicluster


maker software for efficiency visualization is
in [11] tree view, not available
heatmap,Fore
directed
layout for
network view
3 Jexpres Cluster PCA ,tree medium First Very basic
s in view visualization visualization
[10] tool.

4 Biolayo Cluster Fruchterman- High. Highend Possible of


ut Rheingold zooming, visualization producing poor
express layout scaling, technologies, quality of
3D in algorithm rotation, Parallelization, images because
[13] translatio allowing the of rendering
n, and utilization of large graphs.
selection all available
of the cores
graph
simultaneousl
y and thus
speeding up
the operating
time.

5 Bicat in Cluster Heatmap, High. GUI level is Selection and


[5] parallel Selection very high comparison of
coordinates of gene/condition
are available. clusterin inside the
g heatmap and
algorith expression
m is profiles are not
possible. available

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 96


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
6 Bivisu Bicluster Heatmap and High. Selection and Simultaneous
in [6] parallel Zooming, comparison navigation is
coordinates. selection features are not possible
and available.
comparis
on
features
are
available.
7 Biclust Bicluster Heatmap and Medium avoids visual Continuous
er parallel clutter, duplication and
viewer coordinates zooming is reordering of
in [2] possible rows and
columns may
lead to the loss
of originality.
8 Bicover Bicluster Force Very Simultaneous Heatmap and
lapper in 2D directed high. visualization parallel
in [13] graph for of all the coordinates are
overlapper, biclusters in a not good
Heatmap, single enough in UI
parallel window. High
coordinates level of user
interaction is
possible.

9 EURE Cluster Heat map High Apart from Very basic tool
KA- In 2D clustering, for cluster
DMA visualization visualization
techniques are
available for
other
purposes.

4. Conclusion
According to the study, visualization of biclustered gene expression data in
3D provides an efficient way of analyzing the genes and condition present in
all the biclusters. But it is a challenging process. Heatmap and parallel
coordinates were efficient and often followed for the visualization of cluster
genes. But for gene biclusters there is no standard visualization technique.
Because, biclusters present in a micro array gene expression data cannot be
visualized simultaneously, but can be analysed one by one. BicOverlapper
provides an efficient way of visualizing the gene biclusters. Therefore it is
better to visualize the gene biclusters in 3D so that all the biclusters can be
visualized and analyzed in a single plane with higher level of user iteraction.

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 97


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
REFERENCES
[1] Leishi Zhang, Xiaohui Liu and Weiguo Sheng.:3D Visualization of gene clusters,
Computer Vision and Graphics Computational Imaging and Vision Volume 32, pp
349-354,2006.
[2] Julian Heinrich, Robert Seifert, Michael Burch, Daniel Weiskopf: BiCluster Viewer: A
Visualization Tool for Analyzing Gene Expression Data, Advances in Visual
Computing Lecture Notes in Computer Science Volume 6938, pp 641-652, 2011.
[3] Rodrigo Santamara Roberto Thero and Luis Quintales: A Visual analytics approach
for understanding biclustering results from microarray data, Bioinformatics 9 (247)
2008.
[4] K.O. Cheng1, N.F. Law, W.C. Siu and A.W.C. Liew, Biclusters Visualization and
Detection Using Parallel Coordinate Plots, Proceedings of the International
Symposium on Computational Models for Life Sciences, American Institute of
Physics, 2007.
[5] Simon Barkow1, Stefan Bleuler1, Amela Prelic, Philip Zimmerman and Eckart Zitzle,
BicAT: a biclustering analysis toolbox, Bioinformatics, Vol. 22, No. 10: 1282-
1283 March 21, 2006.
[6] K.O. Cheng*, N.F. Law, W.C. Siu and T. H, BiVisu: Software Tool for Bicluster
Detection and Visualization, Bioinformatics, Vol. 23, No.17: 2342- 2344 June 22,
2007.
[7] Kaiser Sebastian, Leisch Friedrich A Toolbox for Bicluster Analysis in R. In
Compstat 2008: Proceedings iComputational Statistics, 2008.
[8] Gregory A Grothau, Adeel Mufti1 and TM Murali, Automatic layout and
visualization of biclusters, Algorithms for Molecular Biology September 2006.
[9] Yonggao Yang, Prairie View A & M, Niversity, Jim X. Chen gene expression
clustering and 3d visualization, Computing in science and engineering, volume 5,
issue 5, September 2003.
[10] B. Dysvik and I. Jonnasen, Jexpress: exploring gene expression data using java,
Bioinformatics, volume 17, issue 4, pp. 369-370, 2001.
[11] John H Morris, Leonard Apeltsin, Aaron M Newman, Jan Baumbach, Tobias
Wittkop4, Gang Su, Gary D Bader and Thomas E Ferrin, clusterMaker: a multi-
algorithm clustering plugin for Cytoscape, BMC Bioinformatics Nov 9 2011.
[12] Oliver Ru bel, Gunther H. Weber, Min-Yu Huang, E. Wes Bethel, Mark D. Biggin,
Charless C. Fowlkes, Cris L. Luengo Hendriks, Soile V.E. Kera nen, Michael B.
Eisen, David W. Knowles, Jitendra Malik, Hans Hagen, and Bernd Hamann
Integrating Data Clustering and Visualization for the Analysis of 3D Gene Expression
Data, IEEE/ACM Trans Comput Biol Bioinform, pp.64-79. 2010
[13] Athanasios Theocharidis, Stjin van Dongen, Anton J Enright & Tom C Freeman
Network visualization and analysis of gene expression data using BioLayout Express
3D, Nature Protocols 4, pp.1535 - 1550, Oct 2009.
[14] Ashraf S. Hussein, Faculty of Computer and Information Sciences, Ain Shams
University, Cairo, 11566, Egypt Analysis and Visualization of Gene Expressions and
Protein Structures, 2008.
[15] O. Rubel1, G.H. Weber3, M.-Y. Huang, E. W. Bethel, S. V. E. Keranen, C .C.
Fowlkes5, C. L. Luengo Hendriks, Angela H. De Pace, L. Simirenko, M. B. Eisen4,
M.D. Biggin, H. Hagen, J. Malik, D. W. Knowles and B. Hamann, PointCloudXplore
2: isual Exploration of 3D Gene Expression, Journal software, Vol. 3, 2008.
[16] Sagi Abelson, Eureka-DMA: an easy-to-operate graphical user interface for fast
comprehensive investigation and analysis of DNA microarray data, BMC
Binformatics, Feb 24 2014.

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 98


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
[17] Georgios A Pavlopoulos, Anna-Lynn Wegener and Reinhard Schneider, A survey of
visualization tools for biological network analysis, Bio Datamining, 28 November
2008.
[18] Tangirala Venkateswara Prasad* and Syed Ismail Ahson, A survey of Visualization of
microarray gene expression data, Bioinformation, pp. 141-145. May 3 2006.

This paper may be cited as:


Poornima, S. and Emilyn, J. J., 2014. A Comparative Analysis on
Visualization of Microarray Gene Expression Data. International Journal of
Computer Science and Business Informatics, Vol. 12, No. 1, pp. 87-99.

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 99


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

A New Current-Mode Geometric-


Mean Structure for SRD Filters
Ebrahim Farshidi and Saman Kaiedi
Electrical Department, Faculty of engineering,
Shahid Chaman University of technology,
Ahvaz, Iran

ABSTRACT
A current-mode square-root domain filter based on a new geometric-mean circuit is
presented. The circuit is a tunable first-order low pass filter, working in low voltage, and
featuring low nonlinearity. It employs MOSFETs that operate in both strong inverted
saturation and triode regions. Simulation results by HSPICE confirm the validity of the
proposed design technique and show high performance of the proposed circuit.

Keywords
Current-mode, Geometric-mean, squarer/divider, Companding, Filter.

1. INTRODUCTION
Companding process (compressing and expanding) as an attractive
technique in analog circuit designs have drawn the attention of many
researchers. In a filter employing companding technique, the circuit is
internally nonlinear and the dynamic range (DR) of its signal is different at
various points of signal path, and still, the resulting input-output current
relationship is linear. The main advantage of these filters is their large
dynamic ranges in low voltages, caused by voltage swing reduction at
internal nodes. The first implementation of companding systems employed
the exponential I-V characteristic of bipolar transistors that led to the log-
domain structures [1, 2]. Developments in CMOS circuits and also
similarity in I-V characteristics, caused the bipolar transistors were
substituted by MOS transistors operating in weak inversion region [3].
However, the effects of limited speed and transistor mismatches restricted
their applications. Afterwards, companding systems employed MOS
transistors in saturation region based on voltage translinear principle and
class-AB linear transconductors that led to Square-Root-Domain (SRD)
structures [4-14]. The main drawback of these topologies is that for correct
operation, all MOS transistors of the circuit must work in saturation region.
If, in some conditions, the transistors are forced to enter in triode region, it
will invalidate the MOS translinear or transconductance principle. This

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 100


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
restricts the input range and affects the linearity. In this work to overcome
the above problems in SRD filter, a new approach in which MOS transistors
operate both in saturation and triode regions is proposed [15]. The
simulation results of the proposed circuit show less nonlinearity and less
total harmonic distortion (THD) compared to those reported before [4]-[8].
This paper is organized as follows. In section 2, the basic principle of
current-mode SRD filter operation is presented. In section 3 a divider-square
root-multiplier (DSM), as the basic building block of the filter, is presented.
This DSM has been realized by using of a new geometric-mean circuit. In
section 4 a current-mode first-order LPF is designed. The simulation results
of the filter show that the circuit has favorable characteristics.

2. BACKGROUND STUDY
In a current-mode first-order low pass filter and in Laplace domain, the
output current I out and input current I in are related as:
I out s A
(1)
I in s s
1
c
1
in which, A is the DC gain and c is the cutoff frequency of the filter.

In time domain eqn. (1) can be written as:
dI out
I out AI in . (2)
dt
Also in the circuit shown in Figure 1, the I-V relation of MOSFET MF in
saturation region can be expressed as:

I out Vcap Vth out 2I out
2 dI dVcap
(3)
2 dt dt
W
in which, Vcap Vgs is the voltage of capacitor C, Cox is the
L
transconductance parameter and Vth is the threshold voltage of MOS
transistor.

Figure 1: SRD filter principle


dVcap
Using (3) in (2) and also using equation I cap C result:
dt

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 101


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
I tune2 I
I cap I in tune1 I out (4)
I out I out
in which, tuning currents I tune1 and I tune2 are given by:

I tune1
Cc 2 , I ACc 2 . (5)
2 2
tune2

Eqn. (4) describes the internal nonlinear dynamic operation of the filter;
nevertheless, the I out I in relationship remains linear as described by (2).
The cutoff frequency and dc gain can be written as:
2I tune1 I
c , A tune2 . (6)
C I tune1
The divider-square root- multiplier operator(DSM) is defined as:
I tune
Iz Ix . (7)
Iy
Therefore the right hand side of eqn. (4) is subtraction of two DSM
operators with different inputs and outputs.
Figure. 2 shows the symbol of the DSM operator and Figure. 3 shows the
block diagram of the current-mode first-order LPF which consists of two
DSM, one capacitor, two MOS transistors (MF1, MF2) and two current
mirrors (CM1,CM2).

I tune
Iz Ix
Iy

Figure 2: DSM operator symbol

I tune
Iz I in
I out

I tune
Iz I in
I out

Figure 3: Block diagram of the LPF filter

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 102


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
3. Circuit Design
3.1 Geometric mean
Figure. 4 shows the proposed geometric-mean circuit that consists of two
input current mirrors (CM1, CM2), two Vthlevel shifters, two isolating
MOS transistors (M7, M8), and a output current mirror (CM3). I x and I y are
input and I z is the output of the geometric-mean.
In this circuit, transistors M1, M2, M3 and M4 are identical
1 2 3 4 ) and also 5 6 . As the figure shows transistors M1
and M2 of the input current mirrors are always in saturation region and the
I-V relationship of them are represented respectively by:
2I x
Vg1 Vth (8)

2I y
Vg 2 Vth . (9)

The drain voltages Vd 3 and Vd 4 are determined by:
Vd 3 Vg 2 Vth (10)
Vd 4 Vg1 Vth . (11)

Figure 4: Geometric-mean
circuit

The operation regions of transistors M3 and M4 are explained as follows. It


can be shown that when input current I x is higher than input current
I y ( I x I y ) or equivalently ( Vg1 Vg 2 ) , transistor M3 operates in triode
region andtransistor M4 operates in saturation region. Similarly when input
current I x is less than input current I y ( I x I y ) or equivalently ( Vg1 Vg 2 ) ,
M4 operates in triode region and transistor M3 operates in saturation
region.So the two following cases exist:

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 103


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
Case1:
For I x I y , transistor M3 operates in triode region and transistor M4
operates in saturation region so we can write:

I d 3 [ 2( Vg1 Vth )Vd 3 Vd23 ] (12)
2

Id 4 ( Vg 2 Vth )2 . (13)
2
Substituting (10) into (12) results:

Id3 [ 2( Vg1 Vth )(Vg 2 Vth ) ( Vg 2 Vth )2 ] .(14)
2
The drain current of transistors M5 and M6 can be written as:
I z Id3 Id4 . (15)
Substituting (13) and (14) into (15) gives:
I z ( Vg1 Vth )(Vg 2 Vth ) (16)
and substituting (8) and (9) into (16) results:
Iz 2 IxI y . (17)
Case 2:
For I x I y , transistor M3 operates in saturation region and transistor M4
operates in triode region so we have:

I d 3 ( Vg1 Vth )2 (18)
2

Id 4 [ 2( Vg 2 Vth )Vd 4 Vd24 ] . (19)
2

Substituting eqn. (11) into (19) results:



Id 4 [ 2( Vg 2 Vth )(Vg1 Vth ) ( Vg1 Vth )2 ] . (20)
2
The drain current of transistors M5 and M6 are expressed as:
I z Id3 Id4 . (21)
Substituting (18) and (20) into (21) gives:
I z ( Vg1 Vth )(Vg 2 Vth ) (22)
and substituting (8) and (9) into (22) results:

I z 2 I x I y (23)
As (17) and (23) show in both cases the circuit acts as a geometric-mean.
The squarer/divider is obtained after few modifications by taking the I y as
the output, I z and I x as the inputs, and removing the connection between
gate and drain of transistor M2 and instead biasing transistor M2 and M4 by
connecting their gates to the drain of transistor M8. It results:

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 104


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
I x2
Iy (24)
4I y

3.2 Vth level shifter


As shown in Figure. 5, the Vth level shifter is realized using MOS
transistors ML1, ML2, ML3 and four equal current mirrors I b1 , I b 2 , I b3 and
I b 4 ( I b1 I b 2 I b3 I b 4 I b ) .

Figure 5: Vth level shifter

The voltage Vab in the figure can be written as:


I b1 I b2 I
Vab ( Vth ) ( Vth ) ( Vth b3 ) . (25)
K1 K2 K3
Assuming that K 2 K 3 4K1 , eqn. (25) can be rewritten as:
Vab Vth (26)
This circuit is used as the Vth level shifter of Figure 4.

3.3 Divider-Square root Multiplier


The operation of the Divider-Square root-Multiplier (DSM) as defined in
eqn. (7) can be rewritten as:
I tune I x2
Iz * Ix Iz 2 * I tune . (27)
Iy ( 4I y )
Eqn. (27) shows that the DSM can be implemented by cascading of, the
previously designed squarer/ divider circuit with input currents I x , I y and
I x2
output current , and the geometric-mean circuit with input currents I tune ,
4I y

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 105


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
I x2
and output current I z .
4I y
Figure. 6 shows the block diagram of the DSM. The advantage of cascading
a squarer/divider and a geometric-mean with inverse functions is its ability
to compensate the non-linearity.

I x2
Io I z 2 I o I tune
4I y

Figure 6: DSM building


4. Simulation Results block
Simulation of the current-mode first order low pass filter of Figure 3 is
performed by using HSPICE and with 0.6um AMS CMOS process
parameters. Vdd 2.2V and an external capacitor C=1nF were employed. The
aspect ratios of the NMOS and PMOS transistors (except for the Vth level
shifter transistors ML1, ML2, ML3) were chosen 61.2um/4.8um and
122.4um/4.8um, respectively. The aspect ratio of transistor ML1 in Vth
level shifter was chosen 12um/2.4um and for ML2 and ML3 it was
24um/4.8u. As mentioned in section 2, the frequency response of the
proposed filter is tunable. Using 15uA for the tune currents I tune1 and I tune2 , a
transient simulation was carried out. Figure 7 shows the time response of the
filter for a sinusoidal input current with 19uA peak-to-peak amplitude, DC
bias current 10uA and frequency of 2 kHz.

As the figure shows the distortion of the output current is negligible and the
input-output current relationship is linear. This is in spite of the fact that the
capacitance voltage in the lower half of the signal is strongly distorted,
meaning that the system is internally nonlinear. Figure 8 shows the
simulation results of the frequency response for different tuning currents,
ranging from I tune1 I tune2 0.4uA to 40uA (from left to right). The results
are approximately equal to the expected results from eqn. (6). Figure 9
shows the nonlinear behaviour of the output current by using Total
Harmonic Distortion (THD) with a 4096 point Fast Fourier Transform
(FFT). The worst-case THD of the output current is less than -51db (0.28%)
for the input amplitudes close to the DC current bias. The comparison of the
results with those reported in other works [4]-[8], shows 10db lower THD
and more tuning current.

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 106


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
5. CONCLUSIONS
Using MOS transistors operating in both saturation and triode regions a new
current-mode geometric-mean structure for low voltage square-root domain
filters is presented. The proposed circuit has low nonlinearity and high
dynamic range. Simulation results of a first-order LPF show that the
proposed technique is applicable to design filters with very low voltage
requirement. Since the time-dependent state-space equation of a higher-
order linear filter can be reduced to a set of first-order differential equations,
the technique can be readily extended to the higher order configurations.

(a)

(b)

Figure 7: Time response of the LPF; a: Input and output currents, b:


Capacitance voltage

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 107


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Figure 8: Frequency response of the LPF (for Itune1 from 0.4uA to 40uA
from left to right)

Figure 9: nonlinear performance of the LPF

6. REFERENCES
[1] D. R. Frey, (1996) Exponential state space filters: A generic current mode design
strategy, IEEE Trans. Circuits Syst. I, Vol. 43, pp. 34-42.
[2] E. M. Drakakis, A. J. Payne, and C. Toumazou, (1999) Log-domain state-space: A
systematic transistor-level approach for log-domain filtering, IEEE. Trans. Circuits Syst.
II, Vol. 46, pp. 290-305.
[3] E. Morsy and J. Wu, (2000) Low voltage micropower log-domain filters, Analog
Integr. Circuits Signal Processing , Vol. 22, pp. 103-114.
[4] C. A. D. L. C. Blas, A. J. Lopez-Martin and A. Carlosena, (2004) 1.5v tunable square-

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 108


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
root domain filter, Electron. Lett. , Vol. 40, No. 4, pp. 133-147.
[5] A. J. Lopez-Martin and A. Carlosena, (2002) A 1.5v CMOS companding filter,
Electron. Lett. , Vol. 38, No. 22, pp. 1299-1300.
[6] J. Mulder, A. C. V. D. Woerd, W. A. Serdijn and A. H. M. V. Roermund, (1996)
Current-mode companding x-domain integrator, Electron. Lett. , Vol. 32, No.3, pp.198-
199.
[7] R. G. Carvajal, J. Ramirez-Angulo, A. J. Lopez-Martin, A. Torralba, A. G. Galan, A.
Carlosena and F. M. Chavero, (2005) The flipped voltage follower: A useful cell for low-
voltage low power circuit design, IEEE Trans. Circuits Syst. I: Regular Paper, Vol. 52,
No. 7, pp. 1276-1291, 34-42.
[8] C. A. D. L. C. Blas, A. J. Lopez-Martin and A. Carlosena, (2003) 1.5v MOS
translinear loops with improved dynamic range and their application to current-mode signal
processing, IEEE. Trans. Circuits Syst. II, Analog Digit. Signal process, Vol. 50, No. 12,
pp. 918-927.
[9] C. Psychalinos and S. Vlassis, (2002) A systematic design procedure for square-root -
domain circuits based on the signal flow graph approach, IEEE Trans. Circuits Syst. I,
Fundam. Theory Appl., Vol. 49, No. 12, pp. 1702-1712.
[10] M. Eskiyerli and Payne, (2000) Square root domain filter design and performance,
Analog Integr. Circuits Signal Processing, Vol. 22, pp. 231-243.
[11] E. Seevinck, (1990) companding current-mode integrator: a new circuit principle for
continuous-time monolithic filters, Electron. Letter Vol. 26, pp. 2046-2047.
[12] C. A. D. L. C. Blas, A. J. Lopez-Martin and A. Carlosena, (2004) Low voltage
CMOS nonlinear transconductors and their application to companding current-mode
filters, Analog Integr. Circuits Signal Processing, Vol. 38, pp. 137-147.
[13] J. Mulder, W. A. Serdijn, A. C. V. D. Woerd, and A. H. M. Roermund, (2000)
Dynamic translinear circuits - an overview, (2002) Analog Integr. Circuits Signal
Processing, Vol. 22, pp. 111-126.
[14] C. Psychalinos and S. Vlassis, (2001) A high performance square-root domain
integrator,AnalogIntegr. Circuits Signal Processing, Vol. 28, No. 3, pp. 97-101.
[15] E. Farshidi and S. M. Sayedi, (2007) A Square-Root Domain Filter Based on a New
Geometric-mean Circuit, 15th Iranian Conference on Electrical Engineering ICEE2007,
pp. 6-11.

This paper may be cited as:


Farshidi, E. and Kaiedi, S., 2014. A New Current-Mode Geometric-Mean
Structure for SRD Filters. International Journal of Computer Science and
Business Informatics, Vol. 12, No. 1, pp. 100-109.

ISSN: 1694-2108 | Vol. 12, No. 1. APRIL 2014 109

S-ar putea să vă placă și