Documente Academic
Documente Profesional
Documente Cultură
Simultaneous Use of CPU and GPU to Real Time Inverted Index Updating in Microblogs
.................................................................................................................................................................... 25
Sajad Bolhasani and Hasan Naderi
IJCSBI.ORG
Hasan Naderi
Assistant Professor
Department of CSE,
Iran University of Science and Technology,
Tehran, Iran
ABSTRACT
Nowadays, World Wide Web has been utilized as the best environment for development,
distribution and achieving knowledge. The most significant tool for achieving to this
infinite ocean of information involves variety of Search Engines, in which ranking is one of
the main parts. Regarding problems based on text and link, some methods have been
considered according to users behavior in web. Users behavior includes valuable
information which can be used for improving quality of web ranking results. In this
research a model has been offered in which for each definite query, users positive and
negative feedbacks about displayed list in web pages have been received, including how
many times user has accessed to a certain site, time spent in a site, number of successful
downloads in a site, number of positive and negative clicks in a site, then it calculates the
ranking of each page using Multiple Attribute Decision Making method, and eventually
presents a new ranking about the site which could be updated regularly according to the
next feedbacks from users.
Keywords
Users feedback, Multiple Attribute Decision Making, Users Behavior, Search Engine.
1. INTRODUCTION
Due to vastness of web pages and their increasing improvement, there is a
persistent need to some methods for ranking web pages according to their
level of significance, and their relevance to the topic. Ranking is the main
component of an Information Retrieval System. With regard to research
engines, which are among Information Retrieval Systems, because of
particular characteristics of users, role of ranking has become more obvious.
It is a common practice for search engines to find myriads of pages as a
search result, and on the other hand, web user does not have enough and
sufficient time to observe all of the results in order to achieve his/her desired
IJCSBI.ORG
pages. Most of web users do not pay attention to the pages which come after
first search results. Therefore, this is important for a web search engine to
present possibly the most favorite results to users in the top of the list,
otherwise, a search engine could not be considered effective enough. So the
role of a ranking algorithm is identification and dedication of more ranking
to more valid pages among other numerous web pages.
Structure of the present paper is as follow: Next part is assigned to review of
literature. In part3, TOPSIS algorithm and its characteristics have been
described. Proposed method is presented in part 4, and description of
Simulation is presented in part 5. Finally, part 6 includes conclusion and
some future works.
2. RELATED WORKS
Ranking is one of the main parts of search engine. Ranking is a process in
which the quality of a page is estimated. Owing to the fact that for every
query there could be thousands of relevant pages, it is imperative to
prioritize them, and present the first 10 or 20 results to the user. Ranking
methods generally can be divided in to five classifications: First ranking
classification is text-based, and the most important text-based ranking
models are probability and vector space. In vector space model, both of
document and query are vectors with dimensions as much as the number of
words. In this model each vector turns in to a weight vector, then cos of
angle between two vectors with weight is calculated as their degree of
similarity. Usually the most significant method of weighting is TF-IDF by
Mr. Salton[1]. Another text-based ranking method includes probability
model. Purpose of a retrieval system, based on probability model of
document ranking, is related to possible relevance of each document with
query of user. Thus, contrary to vector space, this model definitely cannot
find degree of similarity between query and document [2].
Second classification is connection-based ranking. Contrary to environment
of Traditional Information Retrieval, web has a great heterogeneous
structure in which documents are linked together, and also shape a huge
graph. Web links involve valuable information, so new ranking algorithms
have been created based on link. In a general view, connection-based
algorithms are divided in to two classifications of query-dependent models,
and query-independent models [3]. In query-independent models, such as
page Rank, ranking is done as offline (outline), and using overall web graph,
and subsequently for each query there is a fixed page. But in query-
dependent models (sensitive to topic), such as HITS, ranking in graph
involves collection of pages relating to user query.
Third classification is combinatorial method, which uses both of link and
content for ranking [4]. Fourth classification is Learning-based method,
IJCSBI.ORG
which has drawn lots of attention in recent years. Proposed methods in the
area of ranking that work according to learning, are divided in to three main
classifications: point method, pair method, and list method. In point method,
a digit is dedicated to each pair of document-query which represents level of
connection between them [5]. In pair methods, with obtaining pair of objects
(characteristics of objects and their relative ranking), it has been attempted
to dedicate a ranking to each object close to its real ranking, and eventually
objects will be divided in to two general classifications of correctly
ranked and incorrectly ranked. Most of existing learning-based ranking
methods are of this type. List-based methods utilize list of ordered objects as
learning data collection for prediction of order of objects. Fifth classification
is based on users behavior. Regarding problems of text-based and link-
based methods, methods that are based on behavior and judgment of user
have been considered extensively for prevailing justice and democracy in
web. In other words, for development and improvement of web in terms of
quality and quantity, determination of the most befitted pages is carried out
by users [6]. There are two methods for data collection by users: Direct
Feedback Method, and Implicit Feedback Method [7].
In direct feedback methods the user is requested to judge about proposed
results, which is a difficult method. In indirect method, users behavior
during search process (that is registered in logs of search engines) is
utilized. As a consequence, it can be collected with the least possible cost.
Users behavior during search process involves text of query, how the user
clicks on ordered list of results [8], content of clicked pages, stop duration in
each page[9], and other existing information concerning events registered
during search. These registered events include invaluable information which
can be used for analysis, assessment, and modeling users behavior in order
of improving quality of results.
3. TOPSIS ALGORITM
TOPSIS method is a compensatory method among Multiple Attribute
Decision Making Models (MADM), and the purpose of being compensatory
is that exchange between the criteria is allowed, i.e. for instance weakness
of a criteria might be compensated by score of another criteria. In this
method M options would be assessed by N criteria. In addition to
considering distance of option Ai from ideal point, its distance from negative
ideal has been considered, too. This means selected option must has the
least distance from ideal solution, whereas it must has the last distance from
negative ideal solution [10].
IJCSBI.ORG
4. METHODOLOGY
In this part we provide a model that collects 5 cases of users feedback
(positive and negative) on list of search results for each certain query, which
might be inserted by a large number of users, and also calculates ranking of
each document using TOPSIS method, and finally gives a ranking to
documents. At specified intervals and using next re-collected feedbacks
from users, these rankings are to be updated. Five cases of users feedback
which have been regarded as five criteria (the first four cases are positive
characteristics and the last case is a negative characteristic), have been used
for assessing web pages as follow:
Open Click: the number of times that each site will be available or
will be clicked for a certain query.
Dwell time: a period of time that users spend in each site for a
certain query, and this period of time is based on hour.
Download: the number of downloads which occurs for a certain
query in each page.
Plusclick: a collection of positive clicks identifying users
satisfaction from selecting the page, such as doing left click or right
click on existing links in the page and etc.
Negative click: a collection of negative clicks identifying lack of
satisfaction among users about selected documents, such as clicking
close and etc.
Implementation steps of proposed method:
First step: first of all Decision making Matrix is formulated as follow:
where A1, A2, , Am in Decision Making Matrix D stand for m sites that
are supposed to be ranked according to a series of the criteria;
andXoc, XDT, XD, XPC, XNC represent the criteria for assessment of
suitability of each site, and finally ri,jcomponents representing specific
values of jth criteria for ith site. Now this matrix becomes normal using
Scale-up method or Scale-up norm, and it leads to formation of Matrix D.
Second step: In this part, relative significance of existing criteria is
calculated using Entropy method, and becomes balanced using values.
values, respectively, have been considered for criteria of number of clicks
(0,2), criteria of spent time which can be more significant compared with
other criteria (0,3), criteria of number of downloads (0,1), criteria of number
of positive clicks (0,2), and criteria of number of negative clicks (0,2), that
vector W shapes as follow:
W={WOC, WDT, WD, WPC, WNC }
Now according to relation (1), Weightless Matrix comprises with regard to
assuming vector W as an entry in algorithm.
IJCSBI.ORG
11 15
= . 55 (1)
1 5
so that ND turns in to a matrix that criteria scores have been scale-up in it.
W55 is a Diagonal Matrix in which only elements of its main diameter are
non-zero.
Third step: Determination of ideal solution according to relation (2), and
negative ideal solution according to relation (3): For positive ideal option
(A+) we define the best site as users viewpoint, and for negative ideal
option (A-) we define the worst site as users viewpoint as follow:
+ = , , , = 1,2,3,4,5 =
+ + + + +
, . , , , (2)
= , , , = 1,2, ,3,4,5 =
, , , , , (3)
= = 1,2,3,4,5
= = 1,2,3,4,5
According to relation (4) the distance of ith site from positive ideal site is as
follow:
.5
+ = (, + )2 ; = 1, 2, . , (4)
=1
And according to relationship (5) the distance of ith site from positive ideal
site is as follow:
.5
2
= , ; = 1, 2, , (5)
=1
Fifth step: Calculation of relative closeness of each site (Ai) to ideal site.
This relative closeness is defined according to relation (6) as follow:
+ = ; 0 + 1 ; = 1,2, , (6)
+ +
IJCSBI.ORG
It seems that if Ai = A+, then we have di+ = 0 and then cli+ = 1, and if Ai =
A+, then we have di- = 0 and then cli- = 0. Therefore, the closeness of Ai
option to ideal solution (A+) is corresponding to higher value of cli+ .
Sixth step: Ranking of sites according to prioritization of preferences is
based on descending order of cli+.
IJCSBI.ORG
IJCSBI.ORG
the closest option to ideal option or solution. Briefly, ideal solution
results from a collection of maximum values of each criterion, while
non-ideal solution results from a collection of the most minimum
values of each criterion.
In this method we can take into account a considerable number of
criteria.
This method can be applied very simply and in a convenient speed,
and due to reduction of volume of calculations in assessment, it
takes advantage of a great number of options.
6. CONCLUSION
Search engines provide search results regardless of users desires and work
background. In this concern, users while using search engines, mostly come
across some results which might not be interesting for them, and another
important case is that most of search engines use such algorithms that looks
in to the number of input and output links of a website, such as Page Rank;
so users behavior pattern is of uttermost significance in ranking of the
websites. In this work we presented a method for ranking of web documents
with simultaneous usage of five cases of negative and positive feedbacks of
users, and in this proposed model we used one of Multiple Attribute
Decision Making Models called TOPSIS.
Moreover, this method seems to be a suitable method for prioritization of
pages due to simultaneous characteristic of two distances from positive and
negative ideal option, and eventually implements a ranking on documents.
One of other innovations of this model is that they use a great deal of users
feedbacks for ranking, simultaneously; and among these feedbacks, time has
to be considered since it is one of well-known and new methods of variety
of implicit feedbacks among users, and researchers believe that as much as a
user spends more time for reading a document, the document becomes of
more importance for him/her.
IJCSBI.ORG
REFERENCES
[1] Salton, G. Buckley, C. 1988. Term-weighting approaches in automatic text retrieval.
Inf. Process. Manage. 24, 5 (August 1988), pp. 513-523.
[2] Robertson, S. E. Walker, S. 1994. Some simple effective approximations to the 2-
Poisson model for probabilistic weighted retrieval. In Proceedings of the 17th annual
international ACM SIGIR conference on Research and development in information
retrieval (SIGIR '94), W. Bruce Croft and C. J. van Rijsbergen (Eds.). Springer-Verlag New
York, Inc., New York, NY, USA, pp. 232-241.
[3] Jain, R. Dr. G. N. Purohit, 2011.page ranking Algorithms for Web Mining,
International Journal of Computer Applications (0975 8887) Vol. 13. No.5, January.
[4] Shakery, A. Zhai, C. 2003. Relevance propagation for topic distillation uiuc trec
2003web track experiments. In Proceedings of the TREC Conference.
[5]Yeh,J.Y Lin, J.Y. Ke H.R, Yang W.P.2007. Learning to Rank for Information Retrieval
Using Genetic Programming, Presented in SIGIR 2007 Workshop, Amsterdam.
[6] ZHAO, D. ZHANG, M.ZHANG, D. 2012. A Search Ranking Algorithm Based on
UserPreferences, Journal of Computational Information Systems, pp. 8969-8976.
[7] Attenberg, J. Pandey, S. Suel, T. 2009. Modeling and predicting user behavior in
sponsored search. In Proceedings of the 15th ACM SIGKDD international conference on
Knowledge discovery and data mining (KDD '09). ACM, New York, NY, USA, pp.1067-
1076.
[8] Dupret, G. Liao, C. 2010. A model to estimate intrinsic document relevance from the
clickthrough logs of a web search engine. In Proceedings of the third ACM international
conference on Web search and data mining (WSDM '10). ACM, New York, NY, USA,
pp.181-190.
[9] Liu, C. White, R.W. Dumais, S. 2010. Understanding web browsing behaviors through
Weibull analysis of dwell time. In Proceedings of the 33rd international ACM SIGIR
conference on Research and development in information retrieval (SIGIR '10). ACM, New
York, NY, USA, pp.379-386.
[10] Yurdakul M. 2008. Development of a performance measurement model for
manufacturing companies using the AHP and TOPSIS approaches, International Journal of
production research, pp. 4609-4641.
APPENDIX
Implementing Scale-up using Norms: In order of comparing different
measurement criteria for variety of criteria, we must use scale-up method,
which results in measurement of elements of transformed criteria (ni,j)
without considering their dimension. There are various methods of scale-up
(such as scale-up using norm, linear scale-up, phase scale-up), and here we
use scale-up using norm. We divide ri,j from assumed Decision Making
Matrix by existing norm of column jth (for criteria of xj). That is,
ri,j
ni,j = 1
m 2
i=1 ri,j
IJCSBI.ORG
In this way, all of assumed columns of matrix have the same unit of length
and their overall comparison becomes simple.
Weighting criteria using Entropy Method:
In most of MADM problems we need to know the relative importance of
existing criteria, in a way that their total equals to unit (normalized), and this
relative importance estimates the degree of preference of each criteria than
other cases for Decision Making, and we use Entropy method for this
purpose. Entropy in theory of criterion information is for expressed lack of
certainty by a Discrete Probability Distribution (Pi). Decision Making
Matrix has been considered by m option and n criteria, and existing
information content from this Decision Making matrix is calculated as (Pij).
ri,j
Pi,j = m (2)
i=1 ri,j
j wj
wj = n ; (6)
j=1 j , wj
IJCSBI.ORG
M. Parameswari
PG Scholar, Department of Information Technology
Sona College of Technology,
Salem, India.
ABSTRACT
In recent field of engineering, digital images gaining popularity due to increasing
requirement in many fields like satellite imaging, medical imaging, astronomical imaging,
poor-quality family portraits etc. Therefore, the quality of images matters in such fields.
There are many ways by which the quality of images can be improved. Image restoration is
one of the emerging methodologies among various existing techniques. Image restoration is
a process that deals with methods used to recover an original scene from degraded
observations. The primary goal of the image restoration is the original image is recovered
from degraded or blurred image. The main aim of this survey is to represent different
methodologies of restoration that provide state-of-the-art results. The motivation of the
literature originates from filter concept, iterative methods and sparse representations. The
restoration methods of filter concepts are evaluated with the help of performance metrics
SNR (signal-to-noise-ratio). These ideas can be used as a good reference in the research
field of image restoration.
Keywords: Image Denoising, Image Deblurring, Sparse Representation, Restoration.
1. INTRODUCTION
Image restoration intends to recover high resolution image from low
resolution image. Blurring is a process of reducing the bandwidth of an ideal
image that results in imperfect image formation. It happens due to the
relative motion between the camera and the original scene or by
atmospheric turbulence and relative motion between camera and ground.
Image restoration concerned with the estimation or reconstruction of
uncorrupted image from a blurred or noise one.
IJCSBI.ORG
In addition to these blurring effects, noise also corrupts any recorded image.
Image Restoration can be modeled by the system as shown in equation 1,
y Hx v (1)
Where xRN is the unknown high quality original image, HRMN is the
degradation matrix, vRN is the additive noise and y is the observed
measurement. When H is specified by Kernel, then image reconstruction is
the problem of image blurring.
The solution for the de-blurring problem can be obtained by solving the
optimization problem as shown by equation 2,
X arg x min y Hx 22 .J x (2)
In the past decades, different methods and filters have been used for the
purpose of image restoration. These methods do not hold to be proven to
restore the image in case of additive white noise and Gaussian noises.
Sparse representations approximate an input vector by using a sparse linear
combination of atoms from an over complete dictionary. Sparse based
methods have been verified to perform well in terms of Mean Square Error
(MSE) measure as well as peak signal-to-noise ratio (PSNR). Sparse based
models are used in various image processing fields such as image de-
noising, image de-blurring, super resolution, etc.
1, 2 = 1, 2 1, 2 + (1, 2) (3)
IJCSBI.ORG
Two blur models were used. They are linear motion blur and uniform Out-
of-focus blur. In linear motion blur, the relative motion between recording
device and the scene results in several forms of motion blur that are all
distinguishable. In Uniform Out-of-focus blur when camera captures the 3D
image onto 2D image some parts are out of focus. These out of focus can be
calculated by spatial continuous point spread function.Yusuf Abu Sa'dah
et.al [14] discussed in image enhancement that Low pass filters blur the
images which result in noise reduction, whereas high pass filters used to
sharpen the images. Butterworth filter and Gaussian filter can be used to
sharpen the images and also high pass filter reside in the shape of the curve.
Therefore any one of the high pass filters can be used to sharpen the images
in restoration algorithm.
Jan Biemond et al.,[1] discusses the iterative restoration algorithms for the
elimination of linear blur from images that tainted by pointwise
nonlinearities such as additive noise and film saturation. Regularization is
projected for preventing the excessive noise magnification that is associated
with ill-conditioned inverse problems such as deblurring problem. There are
various basic iterative solutions such as inverse filter solution, least squares
solutions, wiener solution, constrained least squares solution, kalman filter
solution. Inverse filter is a linear filter whose point spread function is the
inverse of blurring function. It requires only the blur point spread function.
Least Square filters are used to overcome the noise sensitivity and Weiner
filter is a linear partial inverse filter which minimizes the mean-squared
error with the help of chosen point spread function. Power spectrum is a
measure for the average signal power per spatial frequency carried by the
image, that is estimated for the ideal image. Constrained least squares filter
for overcoming some of the difficulties of inverse filter and of wiener filter
and it also estimates power spectrum. Regularization methods associated
with the names of Tikhonov and Miller. For both the non-iterative and
iterative restorations based on Tikhonov-Miller regularization analysed
using eigen vector expansions.
Michael Elad and Michal Aharon [5] address the image denoising problem
zero-mean white and homogenous Gaussian additive noise is to be removed
from given image. Based on sparse and redundant representation over
trained dictionaries using K-SVD algorithm, image content dictionaries is
obtained. Using corrupted image or high quality image database training is
done. So far K-SVD algorithm is used to handle small image patches we
extend it to handle large image patches. Sparsity of unitary wavelet
coefficient was considered leading to shrinkage algorithm. One-dimensional
wavelet are inappropriate for handling images, several new multiscale and
IJCSBI.ORG
directional redundant transforms are introduced including curvelet,
contourlet, vedgelet, bandlet and steerable wavelet. Matching pursuit and
basic pursuit denoising give rise to ability to address image denoising
problem as a direct sparse decomposition technique over redundant
dictionaries. In sparseland model Bayesian reconstruction framework is
employed for local treatment on local patches to global patches. This K-
SVD cannot be directly deployed on larger blocks even if provides
denoising results
IJCSBI.ORG
STEP 1 STEP 2
Block Wise
Basic Estimate
Block Wise Aggregation Estimate of
Noisy Estimate
true image
Image
Inverse 3D
Inverse 3D Transform
Transform
Wiener
Grouping by Filtering
Hard block
Thresholding Matching
Grouping 3D
by block 3D Transform
Matching Transform
Final Wiener
Aggregation Estimate
IJCSBI.ORG
The proposed approach can be adopted to various noise models such as
additive colored noise, non-Gaussian noise etc by modifying the calculation
of coefficients variances in the basic and wiener parts of the algorithm. This
method can be modified for denoising 1-D signals and video for image
restoration as well as for other problems that benefit from highly sparse
signal representations.
Julien Mairal, Michael Elad, and Guillermo Sapiro [8] used prior knowledge
K-SVD algorithm for grayscale image processing is extended for color
Image Restoration. Techniques used in color image restoration are Markov
Random Field (MRF), Principal Component Analysis (PCA). An iterative
method that incorporates the K-SVD algorithm for handling non
homogeneous noise and missing information is used. Extension of denoising
algorithm can be used for the proper handling of nonhomogeneous noise
results better in correlation between the RGB channels. To capture the
correlation between the different color images K-SVD algorithm can be
adopted. This algorithm uses orthogonal matching pursuit (OMP) or basis
pursuit (BP) as part of its iterative procedure for learning the dictionary. At
each iteration, the best atom is selected from the dictionary that maximizes
its inner product with the residual (minimizing the error metric) and
updating the residual by performing an orthogonal projection. In denoising
of color image that is represented by column vector and white Gaussian
noise is added to each channel. Color spaces are often used to handle the
chroma and luma layers differently. The proposed method results better in
the application of color image denoising, demosaicing, and inpainting.
David S.C Biggs [3] proposed a new method for accelerating the
convergence of iterative restoration algorithms termed automatic
acceleration. It means faster processing and allows iterative techniques to be
used in applications where they would, otherwise seems too slow.
IJCSBI.ORG
Four different iterative methods are used for the acceleration algorithms are,
Richardson Lucy (R-L): It is an iterative technique used for the restoration
of astronomical imagery in the presence of Poisson noise. Maximum
entropy (ME) deconvolution: is a means for deconvolving truth from an
image and point-spread function (PSF).In a perfectly focused, noiseless
image there is still a warping caused by a point-spread function (PSF). The
PSF is a result of atmospheric effects, the instrument optics, and anything
else that lies between the scene being captured and the CCD array.
IJCSBI.ORG
Reconstruction: A simple averaging between the patches approximations
and the noisy image.
IJCSBI.ORG
by selecting one from a prespecified set of linear transforms or adapting the
dictionary to a set of training signals. Aharon et al., [4] proposed an
algorithm for adapting dictionaries to achieve sparse representations. K-
SVD is an iterative algorithm that alternates between sparse coding based on
current dictionary and a process of updating the dictionary atoms to better fit
the data. The algorithm can also be accelerated by updating dictionaries
combined with the update of sparse representations. In sparse signal
representation, to overcome the representation problem Matching Pursuit
(MP) and Orthogonal Matching Pursuit algorithm is adopted to select the
dictionary atoms sequentially. In designing dictionary k- means clustering is
done. In clustering, a set of descriptive vectors is learned and each section is
represented by one of those vectors. Vector Quantization (VQ) coding
method called gain-shape VQ where coding coefficient is allowed to vary.
In k-means, at each iteration, two steps are involved.
1. Given {dk}, assign the training examples to their nearest neighbor.
2. Given the assignment, update {dk}.
At step one, it finds the coefficients given the dictionary that is called sparse
coding. Then the dictionary is updated assuming known and fixed co-
efficient. K-means is used to derive the K-SVD an effective sparse coding
and Gauss Seidel like accelerated dictionary update method. This algorithm
finds the best co-efficient matrix using pursuit method and the calculation of
co-efficient supplies the solution. This algorithm provides better results in
minimum number of iterations than other methods and used in various
applications such as filling in pixel missing, compression etc.
x arg max x
2
2 1
(5)
The original image is first blurred by a Gaussian blur kernel with standard
deviation 1.6 & Gaussian white noise of standard deviation 2 is added to
get a noisy and blurred image. Each patch is individually coded and
nonlocal similar patches to the given patch are clustered using PCA
IJCSBI.ORG
dictionary. Iteratively PCA dictionary can be used to code the patches for
each cluster and dictionaries are updated along with the regularization
parameters. The centralized Sparse Representation model can be given by
the equation 6,
y arg max y H 2 ai 1 i i p
2
(6)
y i i
Where is a constant and lp norm is used to measure the distance between i
and i.
IJCSBI.ORG
Compared to other adaptive filters, the Least Mean Square (LMS) adaptive
filter is known for its simplicity in computation and implementation. The
basic model is a linear combination of a stationary low-pass image and a
non-stationary high-pass component through a weighting function. Thus, the
function provides a compromise between resolution of genuine features and
suppression of noise. A median filter belongs to the class of nonlinear filters
that follows the moving window principle as same as mean filter. The
median of the pixel values in the window is computed, and the center pixel
of the window is replaced with the computed median. Median filtering is
done by, first sorting all the pixel values from the surrounding neighborhood
into numerical order and then replacing the pixel being considered with the
middle pixel value. The median value must be written to a separate array or
buffer so that the results are not corrupted as the process is performed.
The following Figure 3.1 shows the result images from the Mean filter,
LMS Adaptive Filter and Median Filter when the Gaussian noise is added to
the original image.
IJCSBI.ORG
(a) (b)
(c) (d)
Fig 3.1: De-noising performance Comparison for the photograph image with
standard deviation of =0.05 when Gaussian Noise is added. (a) Original
Image with Noise, (b) Result image using Mean Filter approach, (c) Result
image using LMS Adaptive approach, (d) Result image using Median Filter.
Table 1: SNR Results with Gaussian Noise and Standard Deviation = 0.05
From the Table 1 it is shown that the median filter provides increased SNR
value of 22.79 than mean and adaptive filters. Median filter can be applied
for higher denoising performance in case of restoring the degraded original
image.
4. CONCLUSION
Image denoising and deblurring had been a major problem in the image
restoration methodologies. Different types of algorithms are studied for the
deblurring, denoising of degraded images and different type of filters are
also analyzed. Sparse representations have been found to provide the better
IJCSBI.ORG
results of image restoration than other representations. Therefore based on
sparse representation, local and non-local methods can be used to restore the
degraded version of images effectively. Experimental result on filters shows
that median filter performs better than other types. By consolidating the
review and filter concepts, median and Gaussian filters can be applied for
sparse based representation of image denoising.
REFERENCES
[1] Jan Biemond, Reginald L. Lagendijk, and Russell M. Mersereau, Iterative methods
for image deblurring, Proceedings of the IEEE, Vol. 78, No. 5, pp. 856883, May
1990.
[2] Leonid I. Rudin, Stanley Osher, and Emad Fatemi, Nonlinear total variation based
noise removal algorithms, Phys. D, Vol. 60, pp. 259268, November 1992.
[3] David S. C. Biggs and Mark Andrews, Acceleration of Iterative Image Restoration
Algorithms, APPLIED OPTICS, Vol. 36, No. 8, pp. 1766-1775, 10 March 1997.
[4] Michal Aharon, Michael Elad, and Alfred Bruckstein, K-SVD: An Algorithm for
Designing Over complete Dictionaries for Sparse Representation, IEEE Trans. on
Signal Processing, Vol. 54, No. 11, November 2006.
[5] Michael Elad and Michal Aharon, Image denoising via sparse and redundant
representations over learned dictionaries, IEEE Trans. Image Process., Vol. 15, No.
12, pp. 37363745, Dec. 2006.
[6] Kostadin Dabov, Alessandro Foi, Vladimir Katkovnik, and Karen Egiazarian, Image
denoising by sparse 3-d transform-domain collaborative filtering, IEEE Trans. Image
Process., Vol. 16, No. 8, pp.20802095, Aug. 2007.
[7] Julien Mairal, Michael Elad, and Guillermo Sapiro, Sparse learned representations for
image restoration, In Proc. of the 4th World Conf. of the Int. Assoc. for Statistical
Computing (IASC), Yokohama, Japan 2008.
[8] Julien Mairal, Michael Elad, and Guillermo Sapiro, Sparse representation for color
image restoration, IEEE Trans. on Image Processing, Vol. 17, No. 1, pp. 5369, Jan.
2008.
[9] Reginald L. Lagendijk and Jan Biemond, Basic Methods for Image Restoration and
Identification, In: A. C. Bovik, the Essential Guide to Image Processing, Academic
Press, United States of America, pp. 326 330, 2009.
[10] Priyam Chatterjee and Peyman Milanfar, Clustering-based denoising with locally
learned dictionaries, IEEE Trans.Image Processing, Vol. 18, No. 7, pp. 14381451,
July 2009.
[11] Shengyang Dai, Mei Han, Wei Xu, Ying Wu, Yihong Gong, and Aggelos K.
Katsaggelos, Softcuts: a soft edge smoothness prior for color image super-resolution,
IEEE Trans. Image Process., Vol. 18, No. 5, pp. 969 981, May 2009.
[12] Weisheng Dong, Lei Zhang and Guangming Shi, Centralized Sparse Representation
for Image Restoration, in Proc. IEEE Int. Conf. on Computer Vision (ICCV), 2011.
[13] Weisheng Dong, Guangming Shi, and Xin Li, Image Deblurring with Low-rank
Approximation Structured Sparse Representation, Signal & Information Processing
Association Annual Summit and Conference (APSIPA ASC), Asia-Pacific 2012.
IJCSBI.ORG
[14] Sa'dah Y, Nijad Al-Najdawi and Tedmori S, Exploiting Hybrid Methods for
Enhancing Digital X-Ray images, The International Arab Journal of Information
Technology, Vol. 10, No. 1, January 2013.
IJCSBI.ORG
Hasan Naderi
Assistant Professor
Department of CSE,
Iran University of Science and Technology,
Tehran, Iran
ABSTRACT
Nowadays, with attention to developing the different data networks, the wide masses of
data are producing and updating continually. Managing the great data enumerate the
fundamental challenges in data mining. One of the considered main subjects in this context
is how searching among the wide masses of data. Therefore, require to producing the
typical powerful, expansible and efficient file of documents and data for using in search
motors is necessary. In this study, with surveying the done prior works, implementing the
inverted index with the immediate updating capability from the dynamic and little data of
microblogs is targeted. With utilization from processing multicore facility, the approach of
the graphical processing unit (GPU) is presented that as expansible and without decreasing
the attention, the index file is prepared with suitable speed, as the mentioned file is usable
in inquiry unit. This method tries to feed the updating unit continually with separating the
operation for the system Central Processing Unit (CPU) and suitable utilization of parallel
processing capability of CUDA core. Also, in parallel to increasing the quality, one Hint
method is presented for employing the vacant cores and compactor function for decreasing
the index file mass. The results indicate that the presence of necessary hardware, the
presented method in identity to immediate updating slogan, have the upper speed for
making the inverted index of microblogs than to available samples.
Keywords
Inverted index, Microblog, GPU, Update.
1. INTRODUCTION
In data mining and contextual managing information inverted index is a
main key for each searching process by having this file search engines have
the ability to stream a search without any repeated attention to the content of
any documents. The structure of inverted index is generally upon the hash
table frame and consists of a word dictionary and some values. Creator of
inverted index in a process of searching skim the words in document,
analyze and stemming, after all adds them to the dictionary. In this Platform
IJCSBI.ORG
each term is a specific key in dictionary of words. Any keyword in
dictionary refer to a list of ID these ID's refer to those documents that,
containing keywords. While a change applied on a document there is a
necessity to update the ID files. So this updating process has some costs.
The ultimate goal for each dynamic inverted index is reducing updating
speed and near to zero or real-time it [1, 2, 3]. And the way we introduce in
this article to reach that goal is dividing and paralleling of making inverted
index operation. By using the capability of multi core, multi thread GPU's
help us to near our goal [4]. Cuda cores have capability to operate
simultaneously multi tasks give us the opportunity to divides the
instructions for paralleling in little blocks. Microblogs introduce as data
entries for inverted index in this article. In conclusion with the approach
introduced above for making inverted index use any possibility from
microblog's documents recognized by crawler so files makes with possible
lower cost and use with real-time update.
2. BACKGROUND STUDY
Time in updating inverted index is an important characteristic of
measurement on search engines. Insert time in multi barrel hierarchy index
[5], consist of different index size that will finally contribute each other with
this functions.
1
(1) 1 = log
log
()
(2) 2 = log
()
log
()
In function (1) 1 is the average time for insert n new documents with
different sizes. is also the time for making statistic inverted index with
1
the size of n. And also using log verses to show how much
log
use of k has positive effect on ability of system. In function (1) with
()
increasing the value of k the average insert time will near to that it will
be our ideal but it will increase search time. In function (2) by increasing k
search speed will improved.
2.1 Inverted Index
Search engines consist of three parts. Crawler that find web pages, indexer
that indexes inverted index and crawled web pages and a ranker that
answers query by using indexes. Dynamic documents are document that
change and update continually. Static inverted index [6] has a dictionary
structure. Dictionaries are making from split word in text and find their
stems by using algorithm called "porter stemmer" and prepare them for
indexing. The reason for saving stems is to reduce memory size and
indexing more documents in search result.
IJCSBI.ORG
DOCUMENT
Index Builder
Inverted Index
DOCUMENT
Text objects PostingListNode
File TREC
Parse Document
SimpleStreamTokenizer
Stop Wordlist
All of information that gathered by search engine use as entry for inverted
index after process of save in barrel.
2.2 GPU structure and capability of paralleling non graphical tasks
The structure of graphic processor consists of many simple processing units
called thread. These units are only make for simple calculations like add or
subtract. By introducing CUDA form NVIDIA CO. the limitations of non-
graphical tasks has taken from graphic units.in a graphic card ,elements
have separated memory so designers know them as independent device or
even a computer. Regarding to this knowledge each computer has a
processor that works with a unique memory.in GPU a shared memory
specify to each block; also each stem has two specific memories. Local and
stable (fixed). Local memory use for global memory data or shared memory.
This memory is similar to computers hard drive that use as a kind of main
memory in a graphic unit. In this structure commands process in stems (set
of threads) simultaneously [8,9,10].
3. RELATED WORKS
In this way microblogs real time identification has discussed [11].
Microblogs update their contents many times. Main core of this structure
consist of inverted indexes with the value of1 = 0 2.new microblogs
place in smallest index so these set of indexes gather in larger indexes. This
hierarchy makes passive updating. Results obtained in their studies
regarding to multi threads capabilities.
QiuyingBai [12] and coworkers apply a way for real time updating in
subject oriented search engines. They are designing an inverted index with
three parts, primary inverted index, additional inverted index and a list of
omitted files. This way of real time updating is useful for subject oriented
search engine.
In a different research [13] introduce a way in which by using a graphic
processor and multi core processing among web without any distribution
and only one computer system the operations of build and update of index
file will be done. one of the advantage for this way is that the graphic unit
IJCSBI.ORG
will done all processing and in result enteral processing unit will be free to
different task so that is not faced to the problem of decreasing the capability
of system.
4. METHODOLOGY
In this study we present a shared method in which by using two different
processors the operation of both build and updating of inverted index will
done in real time. In this method we take incoming documents as preset
frame from microblogs, the reason for this task is innate characteristic of the
method plan, where data income to graphic processor and divided in smaller
units (block). The sizes of units are very restricted (small) in GPU. Each
block run through a thread from a CUDA and process a carrying command
or part of it .In this stream of processing structure, cores of CUDA will be
like warp and blocks are like woof and regarding to this advantage that each
block done its process task in a clock so incoming documents that take from
crawler are consider as small graphic unit blocks. First, documents depend
on adding to index or deleting take stickers of I or D and places in a lineup
(Q) on basic memory of system so inter to the central processing unit. From
now on, the cores of CPU divide in two sets: half splitters and half updaters.
Documents are inter to splitters and change to term and prepare to send
them to next step. After each task of splits, by time lapse documents in
separated sets are inter to graphic unit to continue the task in parallel. While
documents inter to GPU from microblogs and set into blocks .graphic unit
depends on number of both blocks and CUDA core will done the process
phase to phase .In each block from graphic processor unit, two different
tasks of operation for build a doc index will done. Build in Insertion index
barrel and build in deleted index barrel. The kind of operation is shown in
header of document.
IJCSBI.ORG
In first part of GPU processing, inserted blocks both identify how many
times a word repeated in each document and the number of deleting blocks
and saves in global memory of graphic unit. Soon threshold processing done
and threshold inverted index makes. Hint is a notifiable function
(mechanism) designed for times the CPU cores do nothing (non-function).
This function identifies free times of each core and shares them temporarily
to each other so the amount power of processing for those very busy units
will increase.
HintArray[a]: S S S U U U
C Reset
opy
HintArrayC[a]: S S S U U U T0
S S U U U U T1
S U U U U U T2
S S S U U U T3
S S S S U U .
.
Tn
Figure 3. Change routine in GPU cores by using hint function.
In picture above the cores of main processor of system divided in two sets:
splitters (S) and updaters (U). that depends on how much busy an entrance
of a core ,the hint function will change real time those free cores to S or U to
facilitate busy points.
5. EVALUATION
Consider to ultimate goal of research that is both build inverted index and
real time updating of it so measurement range is already time for this
research. As you can see in below chart two tasks that are use graphic unit
or without graphic unit is presented. In both the portion of size with time
have direct relation. But execute time is decrease when we use the capability
of multi core CUDA graphic processor and it is guide us to reach real time
execution.
IJCSBI.ORG
In this chart horizontal line is the number of documents and vertical line is
the time for building file. Two upper lines related to required time to build
index and send for CPU and two lower lines is required time for GPU and
CPU.
6. CONCLUSIONS
In this article there is an effort to use graphic units processing power and
also use of central processing unit simultaneously for build a real time
updating index system from microblogs contents. Also present a way to
strongly use of non-function cores. Finally presented the algorithm that has
the function to build inverted index from documents in microblogs and
updating them in real time.
The future for this research is an end to create a static structure and present a
unit called index managing to distribute streams of processing between
processor units.
7. ACKNOWLEDGMENTS
We would like to thank Muhammad Saleh Mousavi for their exceptionally
useful reviews.
REFERENCES
[1] P. Mudgil, A. K. Sharma, and P. Gupta, An Improved Indexing Mechanism to Index
Web Documents, Computational Intelligence and Communication Networks (CICN),
2013 5th International Conference on, 27-29 Sept. 2013, pp. 460 - 464.
[2] R.Konow, G.Navarro, and C. L. A. Clarke, Faster and Smaller Inverted Indices with
Treaps, artially funded by Fondecyt grant 1-110066 , by the Conicyt PhD Scholarship
Program, Chile and by the Emerging Leaders in the Americas Program, Government
of Canada ACM, 2013.
[3] S. Brin and L. Page, Reprint of: The anatomy of a large-scale hypertextual web search
engine, The International Journal of Computer and Telecommunications Networking,
2012, 3825-3833.
[4] Z. Wei and J. JaJa, A fast algorithm for constructing inverted files on heterogeneous
platforms, J. Parallel Distrib. Comput, 2012.
[5] N. Grimsmo, Dynamic indexes vs. static hierarchies for substring search, Trondheim,
2005.
[6] R. A. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval, Addison-
Wesley Longman Publishing Co, Inc., 1999.
[7] C. D. Manning, P. Raghavan, and H. Schtze, Introduction to Information Retrieval,
Book, ISBN:0521865719 9780521865715 , 2008.
[8] NVIDIA CUDA, NVIDIA CUDA C Programming Guide, Book, www.nvidia.com,
2012
[9] W. Di, Z. Fan, A. Naiyong, W. Fang, L. Jing, and W. Gang, A Batched GPU
Algorithm for Set Intersection, Pervasive Systems, Algorithms, and Networks (ISPAN),
IJCSBI.ORG
2009 10th International Symposium on, 978-1-4244-5403-7, 14-16 Dec. 2009, pp. 752
- 756.
[10] Z. Wei and J. JaJa, A fast algorithm for constructing inverted files on heterogeneous
platforms, J. Parallel Distrib. Comput. 2012.
[11] W. Lingkun, L. Wenqing, X. Xiaokui, and X. Yabo, LSII: An indexing structure for
exact real-time search on microblogs, in Data Engineering (ICDE), IEEE 29th
International Conference, 2013.
[12] Q. Bai, C. Ma, and X. Chen, A new index model based on inverted index, Software
Engineering and Service Science (ICSESS), 2012 IEEE 3rd International Conference
on, 978-1-4673-2007-8, 22-24 June 2012, pp. 157 - 160.
[13] N. N. Sophoclis, M. Abdeen, E. S. M. El-Horbaty, and M. Yagoub, A novel approach
for indexing Arabic documents through GPU computing, Electrical & Computer
Engineering (CCECE), 2012 25th IEEE Canadian Conference on, 978-1-4673-1431-2,
April 29 2012-May 2 2012, pp. 1- 4.
IJCSBI.ORG
Subha. R.
Department of Computer Science and Engineering
Sri Krishna College of Technology, Coimbatore, Tamilnadu, India
Dr. Palaniswami. S.
Principal, Government College of Engineering,
Bodinayakanur, India
ABSTRACT
Nonfunctional Requirements are as important as functional requirements. But they have
been often neglected, poorly understood and not considered adequately in software
development process. If the NFRs are not met properly, it will lead to the dissatisfaction of
customers. NFRs may be more critical than functional requirements as there can be mutual
dependencies among the NFR, which may affect the completion of the project. Hence it is
necessary to prioritize the NFRs effectively. But prioritizing such NFR is a challenging task
in Software development. Many techniques are used to prioritize the requirements in
various dimensions. It is important to choose the appropriate requirement prioritization
technique for a particular software development process. One can select the appropriate
techniques based on the various factors such as, the stakeholders involved, available
resources, and the product he develop and so on. The goal of this paper is to increase the
awareness about the importance of NFRs and to analyze the various techniques that are
used to prioritize the NFRs.
Keywords
Requirements Engineering, Non Functional Requirements, Prioritization of NFRs,
Prioritization techniques, Quality requirements, NFR algorithm
1. INTRODUCTION
Requirements Engineering (RE) is the subfield of Software engineering; it
involves formulating, documenting and maintaining of software
requirements [16]. Requirements are generally described as what the system
is required to do along with the environment, it is intended to operate in.
Requirements provide the description of the system, its behavior,
application domain information, system constraints, specifications and
attributes [7].
IJCSBI.ORG
The software development market can be divided into two major types,
namely market-driven development and bespoken development. In market-
driven development the product is developed for an open market, whereas in
bespoken market the product will be developed for the particular customer
based on their wishes. In the bespoken development if there is only one
customer there will not be any problem. But in real time, many customers
and developers will be involved in the software development and everyone
has different views and opinions. In such situation requirement
prioritization plays a major role in software development.
IJCSBI.ORG
software development. The main reason for prioritizing is that, all NFRs
cannot be implemented by the given time or with the given resources.
In this paper, we analyze the prioritization techniques. The paper is
structured as follows: After the introduction Sect. 2 provides a review of
related work, Sect 3 explains about requirement prioritization, Sect. 4 &
Sect. 5 describes the various techniques to prioritize the NFRs; Sect. 6
concludes the paper.
2. RELATED WORK
Quality requirements are defined during the early stages of development,
but they are not incorporated properly during software development. The
NFRs are prioritized for various reasons such as to determine whether the
particular requirement is mandatory or not, to eliminate the requirements
that are unnecessary to the software product and to schedule the
requirements for implementation [5]. When the requirements are properly
prioritized it provides the significant benefits such as improved customer
satisfaction, Lower risk of cancellation and it also helps to identify the
hidden requirements. It helps to estimate the benefits of the project and also
priorities of requirements can help to determine how to utilize the limited
project resources. The various factors involved in prioritizing requirements
are cost, risk, value, benefits, dependency constraints, effort business value
dimensions etc. [1].
In web development NFRs are not given importance as like the functional
requirements, they are not discussed properly in the early stages. But web
developers found that NFRs become an issue during the later stages of
development. As a result, it leads to frequent change in the system design.
So paying more attention to the NFRs increases the quality of the web
system and also reduces the development time. In many techniques while
prioritizing NFRs, mostly they are converted into Functional requirements.
For example, the security requirement is For example, the security
requirement is operationalized as login requirements for cost estimation [6].
So when NFRs are elicited properly, it also leads to the discovery of new
Functional requirements (FRs) [11].
IJCSBI.ORG
3. REQUIREMENT PRIORITIZATION
Requirement prioritization is an important activity in software development;
it helps to manage the relative importance of requirements and to manage
the software development when the resources are limited.[13]. All the
Stakeholders should collaborate to prioritize the requirements efficiently.
Requirement prioritization supports the following activities:
To estimate expected customer satisfaction.
To decide the core requirements.
To schedule the implementation of requirements.
To handle the dependency between the requirement.
To establish the relative importance of each requirement.
IJCSBI.ORG
IJCSBI.ORG
stage where the decision makers actually prioritize the requirements. The
final stage is a presentation stage where the results of the prioritization are
presented. But while prioritizing the requirements numerous ricks and
challenges arises [5].
4. PRIORITIZATION METHODOLOGIES
4.1 Numerical Assignment
Numerical assignment involves the process of grouping the requirements.
At first the requirements are divided into different groups, then they are
given to the stakeholders to enter the scale value from 1-5 for each
requirement based on the importance. Finally, the average value given by
all stakeholders are considered as the ranking for that requirement [9]. The
main disadvantage of this method is [1], as different users have different
opinions the information obtained is relative and it is difficult to determine
the absolute information. If the stakeholders prioritize the requirements,
they will pull 85% of the requirements into the high priority group.
4.2 Analytical Hierarchy Process (AHP)
AHP [1] is decision making technique which involves pairwise comparison.
With the help of multiple objectives or criteria AHP allows decision makers
to choose the best requirements from the several decision alternatives. AHP
involves three steps. They are
Making pairwise comparison.
Calculating the priorities of requirements and decision
making
Checking consistency.
In this is technique each requirement is compared with every other
requirement to determine to what extent one requirement is more important
IJCSBI.ORG
IJCSBI.ORG
units into a requirement that is not highly preferred by any others. So it will
lead to conflicts in the prioritization process.
The three stages involve in the BST are as the preparation step requirements
are gathered in execution step the requirements are inserted one by one.
First insert the one requirement in the first node, while inserting the priority
then it is added as the right child. Finally, at the presentation stage traverse
the tree in order and prioritize the requirements.
IJCSBI.ORG
Requirements
Engineering Identify Relevant Specify Quality Prioritize Quality
Activities Quality Attributes Requirements Requirements
Feedback
Software
Architecture Evaluate Software Analyze Trade-offs
Evaluation Design Software Architecture and Dependencies
Activities Architecture and Explore Design Space Further
Development
process
This process helps the stakeholders and architects to understand the NFRs
effectively. But this method is applicable only to the quantitatively
evaluated quality properties.
IJCSBI.ORG
Stake holder
preerences
Requirement Constructionof
Identification Providing
business process
of NFRs hierarchy ratings
Document
preferences
Prioritized
Heuristics for Adjustment of Assessing
NFRS NFR scores Relative
prioritization
NFRs importance
preferences
Association Matrix
Net % change
preferences
pref
International Journal of Computer Science and Business Informatics
IJCSBI.ORG
+ %
=
( %+ %)
(1)
Where
+
+ % = 100
+
(2)
% = 100
(3)
% = 100
(4)
X1 X2 X3 X4
X1 +m2 +m6
X2 +m3
X3 -m4 -m7
X4 -m1
Net % change NC1 NC2 NC3 NC4
IJCSBI.ORG
Indicators mis in the table shows that if the sign is + then there is a
positive association between the NFRs and if the sign is - then there
is a negative association between the NFRs.
Net % Change row indicates the aggregate percentage
improvement or degradation of capability identified by NFRs.
Adjusted importance is calculated using the formula:
Adjusted
Importance i= Relative Importance i * (1+ Net % Change i/100)
(5)
Where
i = ith NFR considered for implementation.
6. CONCLUSION
In the competitive Business environment, the quality of a product plays a
very crucial role in its success. So we have to prioritize the NFRs
efficiently. In the above techniques we found that the NFR algorithm is a
most suitable methodology to prioritize the NFRs. It is because the
algorithm is particularly designed for NFR prioritization. As the business
process hierarchy is created it can able to identify all the NFRs easily, it
prioritizes the NFRs in various dimensions and from various stakeholder
views. The heuristics involved is very simple to calculate and it also
considers the mutual dependencies among the NFR. So NFR algorithm can
prioritize the NFRs efficiently and in the cost effective manner.
REFERENCES
[1]. Aaqib Iqbal, Farhan, M. Khan, Shahbaz, A. Khan, 2009. A Critical Analysis of
Techniques for Requirement Prioritization and Open Research Issues. International Journal
of Reviews in Computing. Vol. 1.
[2]. Anne Koziolek, 2012. Research Preview: Prioritizing Quality Requirements Based On
Software Architecture Evaluation Feedback. REFSQ 2012. pp. 52-58.
[3]. Berntsson Svensson, R., Gorschek, T., Regnell, B., Torkar, R., Shahrokni, A., Feldt, R.,
and Aurum, A. 2011. Prioritization of quality requirements state of practice in eleven
companies. RE'11 IEEE. pp. 69.
[4]. Daneva1, M., Kassab, M., Ponisio, M.L., Wieringa, R. J., Ormandjieva, O., 2007.
Exploiting A Goal-Decomposition Technique to Prioritize Non-Functional Requirements.
Proceedings of the 10th Workshop on Requirements Engineering WER.
[5]. Donald Firesmith, 2004 Prioritizing Requirements, Journal of Object Technology,
Vol 3, No 8.
IJCSBI.ORG
[6]. Herrmann, A., Daneva, M., 2008. Requirements Prioritization Based On Benefit And
Cost Prediction: An Agenda For Future Research. In Proceedings of The 16th IEEE
International Requirements Engineering Conference. pp. 125134.
[7]. Kotonya, G., Sommerville, I., 1998. Requirements engineering: Processes and
techniques, Chichester. UK: John Wiley & Sons.
[8]. ManjuKhari ,Nikunj Kumar, 2013. Comparison of six prioritization techniques for
software requirements.Journal of Global Research in Computer Science, Vol. 4, No. 1.
[9]. Muhammad Ramzan, ArfanJaffar, M., Arshad Ali Shahid, 2011. Value Based
Intelligent Requirement Prioritization (Virp): Expert Driven Fuzzy Logic Based
Prioritization Technique. International Journal Of Innovative Computing, Information And
Control.Vol. 7, No. 3.
[10]. Rahul Thakurta, 2013. A Framework for Prioritization of Quality Requirements for
Inclusion In A Software Project. Springer Software Quality Journal. Vol. 21, No 4, pp. 573-
597.
[11]. Yusop, N., Zowghi, D., Lowe, D., 2008. The Impacts of Non-Functional
Requirements in Web System Projects. International Journal of Value Chain Management,
Vol. l2, pp. 1832.
[12]. www.inf.ed.ac.uk/teaching/courses/cs2/LectureNotes /CS2Ah/SoftEng/se02.pdf.
[13].http://www.corpedgroup.com/resources/ba/ReqsPrioritization.asp
[14]. Joachim Karlssona, Claes Wohlinb, Bjorn Regnell, 1998. An evaluation of methods
for prioritizing software requirements. Information and Software Technology. Vol. 39, pp.
939947.
[15]. Patrik Berander Anneliese Andrews, 2005. Requirements Prioritization,
Engineering and Managing Software Requirements, pp. 69-94.
[16]. http://en.wikipedia.org/wiki/Requirement_engineering.
IJCSBI.ORG
Prof. K. S. Thakare
Associate Professor
Sinhgad College of Engineering,
Vadgaon. Pune, India.
ABSTRACT
Cryptography is study of transforming information in order to make it secure from unintended
recipients or use. Visual Cryptography Scheme (VCS) is a cryptography method that encrypts
visual information (picture, printed text, handwritten notes) such that decryption can be
performed using human visual system. The idea is to convert this visual information into an
image and encypher this image into n different shares (known as sheets). The deciphering only
requires selecting some shares out of n shares. The intent of this review paper is to contribute
the readers an overview of the basic visual cryptography scheme constructions as well as
continued work in the area. In inclusion, we also review some applications that take advantage
of such secure system.
Keywords
Visual cryptography scheme (VCS), Pixel expansion, Contrast, Security, Accuracy,
Computational complexity
1. INTRODUCTION
Various sensitive data such as credit card information, personal health
information, military maps and personally identifiable information are
transmitted over the Internet. Multimedia information is also transferred over
the Internet conveniently, with the advancement of technology. Therefore, the
protection of the secret information has become critical research. While using
secret images, hackers may get help of weak link over network to suspect the
information. To solve the problem of protection of secret images, many image
secret sharing schemes have been formed. A new information security
technique called visual cryptography scheme was invented by Naor et al in
1994 [1]. Human visual system decode secret (handwritten notes, printed text
and pictures etc.) directly without performing any computations. This scheme
excludes complex computation problem in decryption and the secret images
IJCSBI.ORG
can be reinstated by stacking operation. This property of visual cryptography is
useful for less computation load requirement.
Visual cryptography was presented for the problem of secret sharing. Secret
sharing is one of the early issues to be considered in cryptography. In
particular, suppose 4 smart robbers have deposited their loot in a bank account.
These robbers do not trust each other and they do not want a single robber of
themselves to withdraw the loot and escape. However, they assume that
withdrawing loot by at least two robbers is considered a loyalty. Therefore,
they decided to encrypt the bank code (with a trusted machine) into 4 partitions
so that at least two partitions can reconstruct the code and the partitions are
distributed themselves. Since the robbers will not have a machine with them to
decrypt the bank code when they want to withdraw the loot, they want to be
able to decrypt visually. The partition should not yield any information about
code. Nonetheless, by taking any two or more partitions, stacking them together
and aligning them, the code should be constructed. The solution to above
complication is given by visual cryptography scheme.
Simplest visual cryptography scheme is given by following structure. A secret
image will be made up of a gathering of black and white pixels, where each
pixel is served independently [1]. To encrypt the image, we split the image into
n modified versions such that each pixel in a share subdivides in m black and
white sub-pixels [1]. For deciphering the image, we pick a subgroup S of those
n shares. If S is a qualified subset, then stacking all these shares will allow
recovery of the image.
This paper introduces the construction of (k, n) threshold VCS along with some
parameters used to describe the model. Later, this paper provides overview of
various visual cryptography schemes. To meet the demand of multimedia
information, gray and color image format should be enciphered by the schemes.
Performance measures like security and computational complexity that affect
the efficiency of visual cryptography are also discussed.
The rest of the paper is structured as follows. Section II will describe the model
for the construction of (k, n) threshold VCS. Section III provides overview of
black and white VCS. Section IV elaborates color VCS. Applications of VCS
are included in section V. Performance of visual cryptography schemes are
analyzed in section VI and conclude the paper in section VII.
IJCSBI.ORG
Definition 1 Hamming Weight: The number of non-zero symbols in a series
of symbols [1]. In a binary representation of number, Hamming weight is the
number of 1 bits in the binary series.
A VCS scheme is a 6-tuple (d, , V, S, m, n). It supposes that each pixel arises
in n versions called shares and each share representing its corresponding
transparency. Each share is a group of m black and white sub-pixels. This new
generated structure can be described by an nm Boolean Matrix S = [Sij] where
Sij = 1, iff the jth sub pixel in the ith share is black. Hence, the grey degree of
the combined share which is obtained by overlapping the transparencies is
proportional to the Hamming Weight H(V) of the OR-ed m-vector V [1]. This
grey degree is usually depicted by the visual system as black if H(V) d and as
white if H(V) < (d m) for some fixed threshold 1 d m and relative
difference > 0 [1]. m is the difference between the minimum H(V) estimate
of a black pixel and the maximum permitted H(V) estimate for a white pixel is
called the contrast of a VCS scheme [1].
1) For any matrix S in 0, the "OR" operation on any k out of the n rows
satisfies H(V) d m.
2) For any matrix S in 1, the "OR" operation on any k out of the n rows
satisfies H(V) d.
3) For any subset {i1, i2 iq} of {1, 2 ...n} along with q<k, the two
collection of qm matrices Bt obtained by restricting each nm matrix
in t (where t= {0, 1}) to rows i1, i2 iq are indistinguishable in the
sense that they contains exactly the identical matrices with the identical
frequencies [1]. In other words, any qn matrices S0 B0 and S1 B1
are same up to a column permutation.
IJCSBI.ORG
Condition (1) and (2) defines the contrast of a VCS and condition (3) states the
security property of (k, n)-threshold VCS.
The construction of arbitrary (k, k) and (k, n)-threshold VCS is out of the scope
of our paper. Therefore we only state the result of such construction.
Theorem 2 There exists a (k, n)-threshold VCS scheme with m = nk.2k-1 and
= (2e)-k/ 2
Notice that the first theorem states the optimality of (k, k) scheme where the
second theorem only states the existence of a (k, n) VCS with given parameters.
H. C. Hsu et al [2] showed a more optimal (k, n) VCS construction with a
smaller m.
IJCSBI.ORG
50%
50%
50%
50%
IJCSBI.ORG
B. Sharing Many Secrets
A non-expansion reversible visual secret sharing method that does not need to
define the lookup table was presented by Fang [13]. Zhengxin Fu et al [14]
intended a rotation visual cryptography scheme for encryption of four secrets
into two shares and recovering the reconstructed images without distortions.
Rotation visual cryptography scheme construction was depending correlative
matrices set and random permutation. Above mentioned all the schemes used to
share the black and white secret images. To deal with colorful images
researchers have been worked to share the colorful images.
IJCSBI.ORG
4. COLORFUL VISUAL CRYPTOGRAPHY SCHEME
A. Sharing Only One Secret
Visual cryptography schemes were applied to only black and white images until
the year 1997. Verheul and Van Tilborg [15] firstly developed colored VCS.
With the concept of arcs colored secret images can be shared. In c-color VCS
single pixel is translated into m sub pixels, and each sub pixel is split into c
color regions. In each sub pixel, only one color region is colored, and all the
other color regions are kept black. The color of one pixel depends on the
combination between the stacked sub pixels. For a colored VCS with c colors,
the pixel expansion m is c 3. Yang and Laih [16] improved the pixel
expansion to c 2.
To share and transmit a secret color image and also to generate the meaningful
share Chang and Tsai proposed color VCS [17]. For a secret color image two
effective color images are chosen as cover images which are the exactly same
size of the secret color image. Then according to a predefined Color Index
Table, the secret color image will be concealed into two disguise images. One
loss of this scheme is that extra space is required to assemble the Color Index
Table.
To deal with this limitation Chin-Chen Chang et al [18] constructed a secret
color image sharing scheme depending on modified visual cryptography. In this
scheme size of the shares is decided; it does not change when the number of
colors appearing in the secret image differs [18]. Although pixel expansion is
set in this scheme, it is not suitable for true-color secret image. To share true-
color image Lukac and Plataniotis [19] proposed bit-level based scheme by
operating directly on bit-planes of a secret image.
S J Shyu [20] suggested a colour VCS for reducing pixel expansion which is a
more efficient coloured Visual secret sharing scheme with pixel expansion of
|log2c*m| where m is the pixel expansion of the exploited binary scheme and c
is the number of colour regions [20]. A cost effective VCS was developed by
Mohsen Heidarinejad et al. [21] by considering colour image transmission over
bandwidth constraint channels. The solution offers perfect reconstruction while
producing shares with size smaller than that of the input image using maximum
distance separable.
F. Liu et al [22] developed a colour visual cryptography scheme under the
visual cryptography model of Naor et al [1] without pixel expansion. In this
scheme, the increase in the number of colours of recovered secret image does
IJCSBI.ORG
not increase pixel expansion. To increase the speed of encoding Haibo Zhang et
al [23] presented a multi-pixel encoding which can encode unfixed number of
pixels for each run.
B. Sharing Many Secrets
IJCSBI.ORG
7. CONCLUSION
In this paper, we briefly review the research of visual cryptography schemes as
special cases of secret sharing methods among participants. In visual
cryptography schemes the grey-scale VCS and colorful VCS both are studied
according to the number of shares generated. Interesting applications are also
studied. Further, formulated performance parameters of various visual
cryptography schemes are evaluated.
REFERENCES
[1] Moni Naor and Adi Shamir, Visual Cryptography, advances in cryptology Eurocrypt,
1995, pp 1-12.
[2] H. C. Hsu, T.-S. Chen, Y.-H. Lin, The Ring Shadow Image Technology Of Visual
Cryptography By Applying Diverse Rotating Angles To Hide The Secret Sharing, In
Proceedings of the 2004 IEEE International Conference on Networking, Sensing &
Control, Taipei, Taiwan, March 2004, pp. 9961001.
[3] Liguo Fang, BinYu, Research On Pixel Expansion Of (2, n) Visual Threshold Scheme,
1st International Symposium on Pervasive Computing and Applications, IEEE, 2006, pp.
856-860.
[4] Chin-Chen Chang, Jun-Chou Chuang, Pei-Yu Lin, Sharing A Secret Two-Tone Image In
Two Gray-Level Images, Proceedings of the 11th International Conference on Parallel
and Distributed Systems (ICPADS'05), 2005, pp. 300-304.
[5] Xiao-qing Tan, Two Kinds of Ideal Contrast Visual Cryptography Schemes,
International Conference on Signal Processing Systems, 2009, pp. 450-453.
[6] C.C. Wu, L.H. Chen, A Study On Visual Cryptography, Master Thesis, Institute of
Computer and Information Science, National Chiao Tung University, Taiwan, R.O.C.,
1998.
[7] S. J. Shyu, S. Y. Huanga,Y. K. Lee, R. Z. Wang, and K. Chen, Sharing multiple secrets
in visual cryptography, Pattern Recognition, Vol. 40, Issue 12 , 2007, pp. 3633 - 3651.
[8] Wen-Pinn Fang, Visual Cryptography In Reversible Style, IEEE Proceeding on the
Third International Conference on Intelligent Information Hiding and Multimedia Signal
Processing (IIHMSP2007), Kaohsiung, Taiwan, R.O.C, 2007.
[9] Jen-Bang Feng, Hsien-Chu Wu, Chwei-Shyong Tsai, Ya-Fen Chang, Yen Ping Chu,
Visual Secret Sharing For Multiple Secrets, Pattern Recognition Vol. 41, 2008, pp.
3572 3581.
[10] Tzung-Her Chen, Kai-Hsiang Tsao, and Kuo-Chen Wei, Multiple Image Encryption By
Rotating Random Grids, Eighth International Conference on Intelligent Systems Design
and Applications, 2008, pp. 252-256.
[11] Jonathan Weir, WeiQi Yan, Sharing Multiple Secrets Using Visual Cryptography, IEEE,
2009, pp 509-512,.
[12] Mustafa Ulutas, Rfat Yazc, Vasif V. Nabiyev, Gzin Ulutas, (2, 2) - Secret Sharing
Scheme With Improved Share Randomness, IEEE, 2008.
[13] Wen-Pinn Fang, Non-Expansion Visual Secret Sharing in Reversible Style, IJCSNS
International Journal of Computer Science and Network Security, VOL.9 No.2, February
2009, pp.204-208.
IJCSBI.ORG
[14] Zhengxin Fu, Bin Yu, Research on Rotation Visual Cryptography Scheme,
International Symposium on Information Engineering and Electronic Commerce, 2009,
pp 533-536.
[15] E. Verheul and H. V. Tilborg,Constructions And Properties Of K Out Of N Visual Secret
Sharing Schemes. Designs, Codes and Cryptography, 11(2), 1997, pp.179196.
[16] C. Yang and C. Laih, New Colored Visual Secret Sharing Schemes, Designs, Codes and
cryptography, 20, 2000, pp. 325335.
[17] C. Chang, C. Tsai, and T. Chen. A New Scheme For Sharing Secret Color Images In
Computer Network, Proceedings of International Conference on Parallel and Distributed
Systems, July 2000, pp. 2127.
[18] Chin-Chen Chang, Tai-Xing Yu, Sharing A Secret Gray Image In Multiple Images,
Proceedings of the First International Symposium on Cyber Worlds (CW.02), 2002.
[19] R. Lukac, K.N. Plataniotis, Bit-Level Based Secret Sharing For Image Encryption,
Pattern Recognition 38 (5), 2005, pp. 767772.
[20] S.J. Shyu, Efficient Visual Secret Sharing Scheme For Color Images, Pattern
Recognition 39 (5), pp. 866 880, 2006.
[21] Mohsen Heidarinejad, Amirhossein Alamdar Yazdi and Konstantinos N, Plataniotis
Algebraic Visual Cryptography Scheme For Color Images, ICASSP, 2008, pp. 1761-
1764.
[22] F. Liu1, C.K. Wu X.J. Lin, Colour Visual Cryptography Schemes, IET Information
Security, vol. 2, No. 4, 2008, pp. 151-165.
[23] Haibo Zhang, Xiaofei Wang, Wanhua Cao, Youpeng Huang , Visual Cryptography For
General Access Structure By Multi-Pixel Encoding With Variable Block Size,
International Symposium on Knowledge Acquisition and Modeling, 2008, pp. 340-344.
[24] Tzung-Her Chen, Kai-Hsiang Tsao, and Kuo-Chen Wei, Multi-Secrets Visual Secret
Sharing, Proceedings of APCC2008, IEICE, 2008.
[25] Jung-San Lee, T. Hoang Ngan Le, Hybrid (2, N) Visual Secret Sharing Scheme For
Color Images, 978-1- 4244-4568-4/09, IEEE, 2009.
[26] D Chaum, Secret-ballot receipts: True voter-verifiable elections, IEEE Security and
Privacy, 2004, pp.38-47.
[27] W. Hawkes, A. Yasinsac, C. Cline, An Application of Visual Cryptography to Financial
Documents, technical report TR001001, Florida State University (2000).
IJCSBI.ORG
ABSTRACT
Predicting the user's web page access is a challenging task that is continuing to gain
importance as the web. Understanding users' next page access helps in formulating
guidelines for web site personalization. Server side log files provide information that
enables to build the user sessions within the web site, where a user view a session consists
of a sequence of web pages within a given time. A web navigation behavior is helpful in
understanding what information of online users demand. In this paper, we present the
system that focuses on the improvements of predicting web page access. We proposed to
use clustering techniques to cluster the web log data sets. As a result, a more accurate
Markov model is built based on each group rather than the whole data sets. Markov models
are commonly used in the prediction of the next page access based on the previously
accessed pages. Then, we use popularity and similarity based-page rank algorithm to make
prediction when the ambiguous results are found. Page Rank represents how important a
page is on the web. When one page links to another page, it is a vote for the other page. The
more votes for a page, the more important the page must be.
Keywords
Web Log Mining, Web Page Access Prediction, K-means Clustering, Markov Model, Page
Rank Algorithm.
1. INTRODUCTION
As Internet is becoming an important part of our life, the quality of the information
is more considered and how it is displayed to the user. The research area of this
work is web data analysis and methods how to process this data. This knowledge
can be extracted by collecting web servers data log files, where all users
navigational patterns about browsing are recorded. Server side log files provide
information that enables to rebuild the user sessions within the particular web site,
where a user view a session consists of a sequence of web pages within a given
time. A web navigation behavior is helpful in understanding what information of
online users demand. Following that, the analyzed results can be seen as
knowledge to be used in intelligent online applications, refining web site maps,
web based personalization system and improving searching accuracy when seeking
information. However, an online navigation behavior grows each passing day, and
thus extracting information intelligently from it is a difficult issue. Web Usage
Mining (WUM) is the process of extracting knowledge from Web users' access
IJCSBI.ORG
data by using Data Mining technologies. It can be used for different purposes such
as personalization, business intelligence, system improvement and site
modification.
In this paper, we present the system that focuses on the improvements of predicting
web page access. Data preprocessing is the process to convert the raw data into the
data abstraction necessary for further applying the data mining algorithm. We
proposed to use clustering techniques to cluster the data sets so that homogenous
sessions are grouped together. As a result, a more accurate Markov model is built
based on each group rather than the whole data sets. The proposed Markov model
is low order Markov model so that the state space complexity is kept to a
minimum. The accuracy of low order Markov model is normally not satisfactory.
Therefore, we use popularity and similarity based-page rank algorithm to make
prediction when the ambiguous results are found.
The rest of this paper is organized as follows: Section 2 describes the theory
background about preprocessing technique, Markov Model and Page Rank
Algorithm. In section 3, we review some researches that advance in web page
access prediction. Section 4 describes the proposed method for the predicting of
web page access in web log file. Results of an experimental evaluation are reported
in section 5. Finally, section 6 summarizes the paper.
2. BACKGROUND STUDY
2.1 Preprocessing Technique
2.1.1 Data Cleaning
This step is to remove all the data useless for data analyzing and mining e.g.
requests for graphical page content (e.g., jpg and gif images); requests for any other
file which might be included into a web page; or even navigation sessions
performed by robots and web spiders. The quality of the final results strongly
depends on cleaning process. Appropriate cleaning of the data set has profound
effects on the performance of web usage mining. The discovered associations or
reported statistics are only useful if the data represented in the server log gives an
accurate picture of the user accesses to the Web site.
IJCSBI.ORG
is that a user, when navigating the web site, rarely employs more than one browser,
much more than one OS. But this method will render confusion when a visitor
actually does like that. The second heuristic states that when a web page requested
is not reachable by a hyperlink from any previously visited pages, there is another
user with the same IP address. But such method introduces the similar confusion
when user types URL directly or uses bookmark to reach pages not connected via
links.
IJCSBI.ORG
Page Rank [11] is the most popular link analysis algorithm, used broadly for
assigning numerical weightings to web documents and utilized from web
search engines in order to rank the retrieved results. The algorithm models
the behaviour of a random surfer, who either chooses an outgoing link from
the page hes currently at, or jumps to a random page after a few clicks.
The Web is treated as a directed graph G = (V, E), where V is the set of
vertices or nodes, i.e., the set of all pages, and E is the set of directed edges
in the graph, i.e., hyperlinks. In page rank calculation, especially for larger
systems, iterative calculation method is used. In this method, the calculation
is implemented with cycles. In the first cycle all rank values may be
assigned to a constant value such as 1, and with each iteration of calculation,
the rank value become normalized within approximately 50 iterations under
= 0.85.
RELATED WORKS
In recent years, there has been an increasing number of research works done with
regard to web usage mining. They [1] describe a prediction system to predict the
future occurrence of an event that is a prediction system based on fuzzy logic. A
subtractive clustering based fuzzy system identification method is used to
successfully model a general prediction system that can predict future events by
taking samples of past events. Historical data is obtained and is used to train the
prediction system. Recent data are given as input to the prediction system. All data
are specific to the application at hand. The system, that is developed using Java, is
tested in one of the many areas where prediction plays an important role, the stock
market. Prices of previous sessions of the market are taken as the potential inputs.
When recent data are given to the trained system, it predicts the possibility of a rise
or a fall along with the next possible value of data.
The prediction models [2] that are based on web log data that corresponds with
users behavior. They are used to make prediction for the general user and are not
based on the data for a particular client. This prediction requires the discovery of a
web users sequential access patterns and using these patterns to make predictions
of users future access. They will then incorporate these predictions into the web
prefetching system in an attempt to enhance the performance.
IJCSBI.ORG
has low state complexity, improved prediction accuracy and retains the coverage of
the all higher order Markov model.
In [5], they propose the use of CRFs(Conditional Random Fields) in the field of
Web page prediction. They treat the previous Web users' access sessions as
observation sequences and label these observation sequences to get the
corresponding label sequences, then they use CRFs to train a prediction model
based on these observation and label sequences and predict the probable
subsequent Web pages for the current users.
IJCSBI.ORG
supply the most complete and accurate usage data, but their two major drawbacks
are:
These logs contain sensitive, personal information, therefore the server
owners usually keep them closed.
The logs do not record cached pages visited. The cached pages are
summoned from local storage of browsers or proxy servers, not from web
servers.
NASA web server log file is considered for the purpose of analysis. Web Server
logs are plain text (ASCII) files, that is independent from the server platform.
There are some distinctions between server software, but traditionally there are
four types of server logs: Transfer Log, Agent Log, Error Log and Referrer Log.
The first two types of log files are standard. The Referrer and Agent Logs may or
may not be turned on at the server or may be added to the Transfer log file to
create an Extended Log File format. A Web log is a file to which the Web server
writes information each time a user requests a resource from that particular site.
Most logs use the format of the common log format.
Similarity of web page is important to predict next page access because million of
users generally access the similar web page in a particular Web site. The
calculation of the similarity is based on web page URL. The content of pages is not
considered and this calculation does not need for making a tree structure of the
Web site. For example, suppose /shuttle/missions/sts-73/mission-sts-73.html and
/shuttle/missions/sts-71/mission-sts-71.html are two requested pages in web log.
These two URLs are stored in string array by dividing "/" character. And then, we
compute the length of the two arrays and give weight to the longer array: the last
room of the array is given weight 1, the second to the last room of the array is
given weight 2, the third to given weight 3 and so on and so forth, until the first
room of the array is given higher length of the array. The similarity between two
strings is defined as the sum of the weight of those matching substrings divided by
the sum of the total weights.
This similarity measurement includes:
(1) 0 <= SURLji <= 1, i.e. the similarity of any pair of web pages is between 0.0
and 1.0;
IJCSBI.ORG
(2) SURLji = 0, when the two web pages are totally different;
(3) SURLji = 1, when the two web pages are exactly same
w j i (d j i / si ) SURL j i
PSPRi PSPR j
P j In ( Pi )
w j k max( d m n / sn ) SURL j k
Pk Out ( P j ) Pk Out ( P j ) (1)
wi (di / si )
(1 )
w j max( d m / sm )
P j WS
In the equation 1, is a damping factor and usually = 0.85. In(p i) is the set that
keeps the in-links of that page. Out(pj) is the set of pages that point to pj. wji is the
number of times pages j and i appear consecutively in all user sessions. dji is the
duration of the transaction and si is the size of the transition's result page. WS is the
web session. SURLji is the similarity of web page j to page i.
w j i (d j i / si )
is the transition popularity based on transition
j k
w
Pk Out ( P j )
max( d mn / sn )
average duration calculation for page i. The popularity of page is calculated based
on page frequency and average duration of page.
By using this equation, the popularity and similarity-based page rank (PSPR) for
every page can be calculated. In order to make rank calculations faster, the required
steps of our calculations are stored in the database. The step values related to rank
calculations are, average duration value of pages, average duration values of
transitions, page size, frequency value of pages, frequency value of transitions, the
similarity value of pages. The result can be used for ambiguous result found in
Markov model to make the correct decision.
5. EXPERIMENTAL EVALUATION
This paper introduces a method that integrates k-means clustering, Markov model
and popularity and similarity-based page rank algorithm in order to improve the
Web page prediction accuracy. In this section, we present experimental results to
evaluate the performance of our system. Overall our experiment has verified the
effectiveness of our proposed techniques in web page access prediction based on a
particular website.
IJCSBI.ORG
For our experiments, we used NASA web server data sets. We obtained the web
logs in August, 1995 and used the web logs from 01/Aug/1995 to 15/Aug/1995 as
the training data set. For the first testing data set (D1), the web logs from
16/Aug/1995 are used. For the second testing data set (D2), the web logs from
16/Aug/1995 to 17/Aug/1995 are used. For the third testing data set (D3), the web
logs from 16/Aug/1995 to 18/Aug/1995 are used. For the fourth testing data set
(D4), the web logs from 16/Aug/1995 to 19/Aug/1995 are used. For the fifth
testing data set (D5), the web logs from 16/Aug/1995 to 20/Aug/1995 are used. For
the sixth testing data set (D6), the web logs from 16/Aug/1995 to 21/Aug/1995 are
used. We filtered the records (such as *.jpg, *.gif, *.jpeg) and only reserved the hits
requesting web pages. When identifying user sessions, we set the session timeout
to 30 minutes, with a minimum of 10 pageviews per session. After filtering out the
web session data by preprocessing, the training data set contained 94307 records
and 5574 sessions. Table 1 show the data after processing the preprocessing phase.
Table 1. Testing data set after preprocessing
D1 D2 D3 D4 D5 D6
Records after 7965 17804 27400 33054 39006 50019
preprocessing
Sessions 346 736 1124 1376 1617 2072
In comparing the predictions with the real page visits, we use two similarity
algorithms that are commonly preferred for finding similarities of two sets. The
first one is called OSim [11, 17, 18] algorithm, which calculates the similarity of
two sets without considering the ordering of the elements in the two sets between A
and B and is defined as:
A B
OSim( A, B) (2)
n
As the second similarity metric we use KSim similarity algorithm, which concerns
Kendall Tau Distance [11, 17, 18] for measuring the similarity of next page
prediction set produced by training data set and real page visit set on the test data.
Kendall Tau Distance is the number of pairwise incompatibility between two sets.
IJCSBI.ORG
similarity-based page rank depend on 2nd order Markov model can improve the
accuracy of Web page prediction.
Table 2. Top 3 prediction based on Average OSim and KSim value
OSim KSim OSim KSim
Results Results Results Results
based on 1st based on based on based on
order 1st order 2nd order 2nd order
Markov Markov Markov Markov
Model (%) Model (%) Model (%) Model (%)
D1 35.59 53.04 41.46 54.49
D2 38.68 55.92 48.22 59.23
D3 38.08 55.46 49.6 61.1
D4 38.39 55.66 50.22 61.94
D5 39.5 56.6 50.84 62.44
D6 40.53 57.33 51.53 63.2
6. CONCLUSIONS
The method presented in this paper is to improve the Web page access
prediction accuracy by integrating all three algorithm K-means Clustering,
Markov Model and Popularity and Similarity-based Page Rank algorithm.
OSim and KSim algorithm are used to calculate the similarity of our
prediction. In our experiment, we observed that in both cases PSPR based
on 2nd order Markov Model are more than promising PSPR based on 1st
order Markov Model in terms of accuracy (OSim and KSim). Higher order
Markov model result in better prediction accuracy since they look at
previous browsing history. We used the idea of Page Rank algorithm to
improve the prediction accuracy and modified this algorithm in order to
analyze the user behavior.
7. ACKNOWLEDGMENTS
My Sincere thanks to my supervisor Dr. Ei Ei Chaw, Faculty of Information and
Communication Technology, University of Technology (Yatanarpon Cyber City),
Myanmar for providing me an opportunity to do my research work. I express my
thanks to my Institution namely University of Technology (Yatanarpon Cyber
City) for providing me with a good environment and facilities like internet, books,
computers and all that as my source to complete this research. My heart-felt thanks
to my family, friends and colleagues who have helped me for the completion of
this work.
REFERENCES
[1] Vaidehi .V, Monica .S, Mohamed Sheik Safeer .S, Deepika .M, Sangeetha .S, "A
Prediction System Based on Fuzzy Logic", Proceedings of the World Congress on
Engineering and Computer Science 2008, WCECS 2008, October 22 - 24, 2008, San
Francisco, USA.
[2] Siriporn Chimphlee, Naomie Salim, Mohd Salihin Bin Ngadiman, Witcha Chimphlee,
Surat Srinoy, "Rough Sets Clustering and Markov model for Web Access Prediction",
Proceedings of the Postgraduate Annual Research Seminar 2006, pp. 470-475.
IJCSBI.ORG
[3] A. Anitha, "A New Web Usage Mining Approach for Next Page Access Prediction",
International Journal of Computer Applications .Vol. 8, No.11, October 2010.
[4] V.V.R.Maheswara Rao, Dr. V. Valli Kumari, "An Efficient Hybrid Successive Markov
Model for Predicting Web User Usage Behavior using Web Usage Mining".
[5] Yong Zhen Guo and Kotagiri Ramamohanarao and Laurence A. F. Park, "Web Page
Prediction Based on Conditional Random Fields.
[6] Ke Yiping, "A Survey on Preprocessing Techniques in Web Usage Mining", The Hong
Kong University of Science and Technology, Dec 2003.
[7] S. Brin, L. Page, 1998. "The anatomy of a large-scale hypertextual Web search
engine", Computer Networks, Vol. 30, No. 1-7, pp. 107-117, Proc. of WWW7
Conference.
[8] F.Khalil, J. Li and H. Wang, 2007. "Integrating markov model with clustering for
predicting web page accesses". Proceedings of the 13th Australasian World Wide Web
Conference (AusWeb 2007), June 30-July 4, Coffs Harbor, Australia, pp. 1-26.
[9] M. Deshpande and G. Karypis. May 2004. Selective markov models for predicting web
page accesses. ACM Trans. Internet Technol., Vol. 4, pp. 163-184.
[10] M. Eirinaki, M. Vazirgiannis, D. Kapogiannis, Web Path Recommendations based on
Page Ranking and Markov Models, WIDM05, November 5, 2005, Bremen, Germany.
[11] M. Eirinaki and M. Vazirgiannis. Nov. 2005. "Usage-based pagerank for web
personalization". In Data Mining, Fifth IEEE International Conference on, pp. 8.
[12] K. R. Suneetha, Dr. R. Krishnamoorthi, "Identifying User Behavior by Analyzing Web
Server Access Log File", IJCSNS International Journal of Computer Science and
Network Security, Vol. 9 No. 4, April 2009.
[13] J. Zhu, "Using Markov Chains for Structural Link Prediction in Adaptive Web Sites"
[14] M. Vazirgiannis, D. Drosos, P. Senellart, A. Vlachou, "Web Page Rank Prediction with
Markov Models", April 21-25, 2008 Beijing, China.
[15] M. Eirinaki, M. Vazirgiannis, D. Kapogiannis, "Web Path Recommendations based on
Page Ranking and Markov Models", WIDM05, November 5, 2005, Bremen, Germany
[16] R. Khanchana, Dr. M. Punithavalli, "Web Page Prediction for Web Personalization: A
Review", Global Journal of Computer Science and Technology, Vol. 11, No. 7, 2011.
[17] Y. Z. Guo, K. Ramamohanarao, and L. Park. Nov. 2007. "Personalized pagerank for
web page prediction based on access time-length and frequency". In Web Intelligence,
IEEE/WIC/ACM International Conference, pp. 687-690.
[18] B. D. Gunel, P. Senkul, "Investigating the Effect of Duration, Page Size and Frequency
on Next Page Recommendation with Page Rank Algorithm", ACM, 2011.
[19] P. Thwe, "Proposed Approach for Web Page Access Prediction Using Popularity and
Similarity based Page Rank Algorithm", International Journal of Science and
Technology Research, Vol. 2, No. 3, March 2013.
IJCSBI.ORG
M. Ramalingam
Assistant Professor in Information Technology
Gobi Arts & Science College (Autonomous)
Gobichettipalayam
Dr. V. Thiagarasu
Associative Professor of Computer science
Gobi Arts & Science College (Autonomous)
Gobichettipalayam
ABSTRACT
Biclustering is the process of immediate taking apart of the set of samples and the set of
their attributes into classes. Samples and attributes are classied together which is believed
to have a high importance to each other. Though, the outcome of the application of classic
clustering methods to genes is limited. These limited results are forced by the survival of a
number of investigational conditions where the activity of genes is not correlated. For this
purpose, a number of algorithms that perform real-time clustering of the expression matrix
have been proposed. In this survey, analysis about the most widely used biclustering
techniques and their associated applications regarding various fields. This survey presents
an study of several biclustering algorithms proposed by various authors to deals with the
gene expression data efficiently. The existing algorithms are analyzed thoroughly to
identify their advantages and limitations. The performance evaluation of the existing
algorithms is carried out to determine the best approach. Then, in order to improve the
performance of the best approach, a novel approach is been proposed in this paper.
Keywords
Biclustering, simultaneous clustering, co-clustering, Data Mining, Gene Expression Data,
Gene Selection.
INTRODUCTION
Analyzing variations in expression levels of genes under different
conditions (samples) is significant to recognize the basic complex biological
processes that the genes take part in. In gene expression data analysis,
IJCSBI.ORG
expression levels of genes in each sample are characterized by a real-valued
data matrix with rows and columns representing the genes and the samples,
correspondingly. The objective is to identify genes that have correlated
expression values in a variety of samples [1].
Gene expression matrices have been widely investigated in two dimensions,
that is, the gene dimension and the condition dimension. This corresponds to
the [2]:
Investigation of expression patterns of genes by comparing rows in
the matrix.
Investigation of expression patterns of samples by comparing
columns in the matrix.
However, applying clustering algorithms to gene expression data runs into
an important complexity. Numerous activation patterns are familiar to a
group of genes only under definite experimental conditions [3]. It is then
highly enviable to move further than the clustering model [4].
This paper proceeds as follows. In the next section, the background study is
described. Section 3 describes related works in this field, etc.
1. SURVEY
Biclustering, which has been applied intensively in molecular biology
explore in recent times, gives a structure for identifying hidden
substructures in large high dimensional matrices. Tanay et al. [5] said that a
bicluster as a subset of genes that together react upon a subset of conditions.
Biclustering algorithms might have two different objectives; one is to find
one bicluster or to identify a given number of biclusters.
Cheng and Churchs Algorithm (CC) [6] describe an underneath a user-
defined threshold. In order to identify the largest -bicluster in the data,
they recommend a twophase approach: first, rows and columns are removed
from the original expression matrix until the above limitation is satisfied.
Afterward, previously deleted rows and columns are added to the resulting
submatrix as long as the bicluster score does not exceed. This process is
iterated numerous times where biclusters are covered with random values
previously.
In [7] an improved form of CC algorithm were proposed in which avoids the
problem of random interference caused by covered biclusters. Samba [8]
introduced a graph-theoretic methodology to biclustering in grouping with a
statistical data model. In this structure, the expression matrix is modeled as
a bipartite graph, a bicluster is defined as a subgraph, and a likelihood score
is used in order to assess the importance of observed subgraphs. A related
heuristic algorithm called Samba aims at identifying highly important and
IJCSBI.ORG
distinctive biclusters. In a recent investigation, this approach has been
extended to integrate multiple types of experimental data.
In [9], Order Preserving Submatrix Algorithm (OPSM) is a bicluster defined
as a sub matrix that conserves the order of chosen columns for all of the
selected rows. Also, it can be said that, the expression values of the genes
inside a bicluster persuade a matching linear ordering across the selected
samples. Based on a stochastical model, the authors implemented a
deterministic algorithm to discover large and related important biclusters.
This idea has been taken up in a recent investigation by [10].
Tang et al. [11] proposed the Interrelated Two-Way Clustering (ITWC)
algorithms that come together the results of the data matrix to generate
biclusters. After normalizing the rows of the data matrix, they calculate the
vector-angle cosine value between each row and a pre-dened steady pattern
to test the row values vary much among the columns and remove the ones
with little variation. After that they utilize a correlation coefcient as
similarity measure to measure the strength of the linear relationship between
two rows or two columns, to carry out two-way clustering. As this similarity
measure based on the pattern and not on the absolute magnitude of the
spatial vector, it also allows the identication of biclusters by means of
coherent values represented by the multiplicative model.
The worst-case running-time complexity of BiMax for matrices comprising
disjoint biclusters is O (nmb) and meant for arbitrary matrices is of order O
(nmb min {n, m}) Noureen and Qadir [12]. In [13] It main goal is to find
market segments between tourists who are similar to each other, therefore
allowing a targeted marketing mix to be flourished. In general data used to
segment tourists are illustrated. Small samples and many questions give rise
to serious methodological problems that have usually been addressed by
means of factor-cluster analysis to reduce the dimensionality of data.
In [14] The technique is depends on a force-directed graph where biclusters
are represented as feasible overlapped groups of genes and conditions. In
[15] introduced an expression pattern based biclustering approach, CoBi, for
combining both positively and negatively keeping up genes from microarray
expression data. Regulation pattern and resemblance in degree of fluctuation
are accounted for as computing likeness among two genes. Unlike
conventional biclustering approaches, which utilize greedy iterative
approaches, it uses a BiClust tree that requires single pass over the entire
dataset to find a set of biologically related biclusters. Biclusters determined
from various gene expression datasets by the technique show highly
improved functional categories. MSBE Biclustering algorithm [16] and the
threshold of the average relationship score is a user input factor to allow the
user to control the excellence of the biclustering results.
IJCSBI.ORG
In [17], a fuzzy biclustering technique is introduced, it is based on
formulating the one-way clustering along the row and column dimension as
a normalized graph cut problem. The graph cut problem is after that solved
by a spectral decomposition, followed by K-mean clustering of the
eigenvectors. The biclustering of the row and column dimensions is
accomplished by a three-stage procedure. Initially, the original data matrix
experiences one-way clustering in the row dimension to gain k clusters.
After that, a novel pattern matrix where each row is specified by the average
number of rows that belong to the same cluster in the original data matrix is
calculated. Again, the new data matrix then experience the same one-way
clustering in the column dimension to obtain k clusters. Lastly, a table of
fuzzy relation coefficients that share each of the k row clusters to each of the
k column clusters are worked out. By calculating the new data matrix by
means of the result of the initial stage clustering, the fuzzy biclustering
algorithm attains a biclustering of the original data matrix.
3. CONCLUSIONS
A complete survey of the models, methods, and applications developed in
the field of biclustering algorithms are investigated and analyzed. The list of
applications presented is by no means comprehensive, and all-inclusive list
of potential applications would be prohibitively extended. The list of
accessible algorithms is also very composite, and many combinations of
thoughts can be personalized to obtain new algorithms potentially more
IJCSBI.ORG
effectual in exacting applications. The modification and validation of
biclustering methods by comparison with known biological data is surely
one of the most important open issues. Another motivating region is the
application of robust biclustering techniques to new and existing application
domains.
REFERENCES
[1] Doruk Bozdag, Ashwin S. Kumar and Umit V. Catalyurek, Comparative Analysis of
Biclustering Algorithms, 2010.
[2] Sara C. Madeira and Arlindo L. Oliveira, Biclustering Algorithms for Biological Data
Analysis: A Survey, INESC-ID TEC. REP. 1/2004, JAN 2004.
[3] Rakesh Agrawal, Johannes Gehrke, Dimitrios Gunopulus, and Prabhakar Raghavan.
Automatic subspace clustering of high dimensional data for data mining applications.
In Proceedings of the ACM/SIGMOD International Conference on Management of
Data, pp. 94105, 1998.
[4] Amir Ben-Dor, Benny Chor, Richard Karp, and Zohar Yakhini. Discovering local
structure in gene expression data: The orderpreserving submatrix problem. In
Proceedings of the 6th International Conference on Computacional Biology
(RECOMB02), pp. 4957, 2002.
[5] A. Tanay, R. Sharan and R. Shamir,Discovering statistically significant biclusters in
gene expression data. Bioinformatics, Vol. 18, pp. 136-144, 2002.
[6] Y. Cheng and G. M. Church. Biclustering of expression data. In Proc. of the
International Conference on Intelligent Systems for Molecular Biology, pp. 93103,
2000.
[7] Yang, J., Wang, H., Wang, W., Yu, P.S., (2003) Enhanced Biclustering on Expression
Data. BIBE 2003, pp. 321-327.
[8] Tanay, A., Sharan, R., Kupiec, M., Shamir, R., (2004) Revealing Modularity and
Organization in the Yeast Molecular Network by Integrated Analysis of Highly
Heterogeneous Genomewide Data, PNAS, pp. 101-9, 2981-2986.
[9] Ben-Dor, A., Chor, B., Karp, R., Yakhini, Z., (2002) Discovering Local Structure in
Gene Expression Data: The Order-Preserving Sub-Matrix Problem, Proceedings of the
6th Annual International Conference on Computational Biology, pp. 49-57.
[10] Liu, J., Wang, W., (2003) OP-Clusters: Clustering by tendency in high dimensional
space, Proceedings of the 3rd IEEE International Conference on Data Mining (ICDM),
pp. 187-194.
[11] Chun Tang, Li Zhang, Idon Zhang, and Murali Ramanathan. Interrelated two-way
clustering: an unsupervised approach for gene expression data analysis. In Proceedings
of the 2nd IEEE International Symposium on Bioinformatics and Bioengineering, pp.
4148, 2001.
[12] Noureen, N., Qadir, M.A., BiSim: A Simple and Efficient Biclustering Algorithm,
Soft Computing and Pattern Recognition, SOCPAR '09. International Conference of
2009, pp. 1 6.
[13] Sara Dolnicar, Sebastian Kaiser, Katie Lazarevski, Friedrich Leisch, Biclustering
Overcoming Data Dimensionality Problems in Market Segmentation, Journal of Travel
Research.Vol. 51 No. 1 41-49, 2012.
IJCSBI.ORG
[14] Rodrigo Santamara*, Roberto Thero n and Luis Quintales, BicOverlapper: A tool
for bicluster visualization, Vol. 24 no. 9 2008, pp. 12121213.
[15] Swarup Roya, , Dhruba K Bhattacharyyab, Jugal K Kalitac, CoBi: Pattern Based Co-
Regulated Biclustering of Gene Expression Data, Preprint submitted to Elsevier March
9, 2013.
[16] Liu X, Wang L: Computing the maximum similarity bi-clusters of gene expression
data. Bioinformatics 2007, Vol. 23, No. 1, pp. 50-56.
[17] Koutsonikola VA, Vakali A. A Fuzzy Bi-clustering Approach to Correlate Web Users
and Pages. Int. J. Knowledge and Web Intelligence 2009, Vol. 1 No.1-2, pp. 3-23.
IJCSBI.ORG
Bindu Elias
Dept. of Electrical and Electronics Engineering
MA College of Engineering
Kothamangalam, Kerala, India
VPS Naidu
Multi Sensor Data Fusion Lab
CSIR National Aerospace Laboratories
Bangalore-17, India
ABSTRACT
Image fusion is done for integrating images obtained from different sensors, which outputs
a single image containing all relevant data from the source images. Five different image
fusion algorithms, SWT, fuzzy, Neuro-Fuzzy, Fuzzylet and Neuro-Fuzzylet algorithms has
been discussed and tested with two datasets (mono-spectral and multi-spectral). The results
are compared using fusion quality performance evaluation metrics. It was observed that
Neuro-Fuzzy gives better results than Fuzzy and SWT. Fuzzylet and Neuro-Fuzzylet were
obtained by combining Fuzzy and Neuro-Fuzzy respectively with SWT. It was observed
that Fuzzylet gives better results for mono-spectral images and on the other hand, Neuro-
Fuzzylet had given better results for multi-spectral images at the cost of execution time.
Keywords
Image fusion, Fuzzy logic, image processing, Nero-fuzzy.
1. INTRODUCTION
For Intelligent systems, integration of information from different sensors
plays a great role. Image fusion is done for integrating images obtained from
different sensors, which outputs a single image containing all relevant data
from the source images and provides a human/machine perceivable result
with more useful complete information. Image Fusion has got great
importance in many applications such as object detection, automatic target
recognition, remote sensing, computer vision, flight vision, robotics etc.
This paper deals with a comparison of certain pixel level image fusion
techniques based on SWT, Fuzzy and Neuro-Fuzzy.
IJCSBI.ORG
Many methods have been proposed and implemented for image fusion [1].
Wavelet transform based image fusion has the merits of multi-scale and
multi-resolution. In [2], an approach of multi-sensor image fusion using
wavelet transform and principal component analysis (PCA) was proposed
and comparison of image fusion with different techniques based on fusion
quality performance metrics is done. Wavelets have a disadvantage of shift
variance which results in loss of edge information in fused image [3].
Stationary Wavelet Transform (SWT) solves this problem which is shift
invariant [4]. Since the concept of image fusion is not that certain and crisp,
Fuzzy logic and Neuro- Fuzzy logic are implemented for image fusion in
order to incorporate uncertainty to the images [5]. The help of Neuro-fuzzy
of fuzzy systems can achieve sensor fusion. The major difference between
neuro-fuzzy and fuzzy systems is that a neuro-fuzzy system can be trained
using the input data obtained from the sensors. The basic concept is to
associate the given sensory inputs with some decision outputs. After
developing the system, another group of input data is used to evaluate the
performance of the system. Algorithms for image fusion using Fuzzy and
Neuro-Fuzzy approaches are introduced in [6]. In [7], SWT with higher
level of decomposition is introduced and Fuzzy logic is incorporated into it
to form a novel algorithm called Fuzzylet.
This work is done as an extension to the work done in [7]. In this paper
Neuro-fuzzy based image fusion is tested and compared with SWT and
Fuzzy logic. An algorithm is formed in which Neuro-fuzzy is incorporated
into SWT which is named as Neuro-Fuzzylet and compared with Fuzzylet.
All the comparisons are done by evaluating Fusion Quality Performance
Metrics and results are verified with different sets of images. In this paper, it
is assumed that images to be fused are already registered.
IJCSBI.ORG
2.1 Neuro-Fuzzy Approach to Image Fusion
Neural Network (NN) is a network which stores the experimental
knowledge and uses it for test data. Neuro- Fuzzy is a combination of
Artificial Neural Network (ANN) and Fuzzy logic. Using this method we
can train the system with input dataset and desired output. After training the
system, this system can be used for any other set of input data. A Neuro-
fuzzy system is a fuzzy system which is trained by any of neural network
learning algorithms and according to the training data system parameters are
modified automatically. Implementation of Neuro-Fuzzy system is done
using ANFIS. ANFIS stands for Adaptive Neural Fuzzy Inference System.
The Fuzzy Inference System (FIS) is a model that does the following
mappings:
A set of input characteristics to input membership functions
Input membership functions to rules
Rules to a set of output characteristics
Output characteristics to output membership functions and
The output membership function to a single-valued output
IJCSBI.ORG
I1
Fuzzy
Inference If
System
I2 Fused image
The ANFIS training structure obtained from Matlab ANFIS editor toolbox
for two inputs and three membership functions is as shown in Fig. 2.
Figure 2. ANFIS training structure obtained for two inputs and three membership
functions
For image fusion, the pixel values of input images and reference (desired)
image are given to the ANFIS for training the FIS, so that the system will
produce a fused image which is closer to the reference image from the input
images. Algorithm for image fusion using Neuro-Fuzzy logic (abbreviated
as NF(I 1 ,I 2 ) ) is as follows:
Step 1: Read the images ( I1 & I 2 ) to be fused into two variables
IJCSBI.ORG
Step 2: Obtain a training data, which is a matrix with three columns (2
columns of input data and one column of output data)
Step3: Obtain a check data, which is a matrix of pixel values of two
input images in column format
Step 4: Decide number and type of membership functions for both
the input images
Step5: Generate a FIS structure from the train data and train the FIS
Step 6: Provide check data to the FIS structure for processing and obtain
the output image in column format
Step 7: Convert the column form into matrix form to get the fused
image I f
In the case of dataset without a reference output, the 3rd column (output) of
the training data is given as the maximum of absolute pixel values of the
input images.
SWT
I1
Fuzzy
Inference ISWT
If
System
Fused Wavelet Fused image
Coefficient map
SWT
I2
Registered input Wavelet coefficient ANFIS
images maps
IJCSBI.ORG
I2 A
2
K , H
2
k , 2 V k , 2 Dk
k 1 ,2 ,...,K
. The fused image I can be obtained f
I
M N
1
RMSE r ( i, j ) I f ( i, j ) (6)
MN i 1 j 1
Where I f (i, j ) and I r (i, j ) are the gray value of fused image and
IJCSBI.ORG
reference image respectively at index (i, j ) . For better quality
images, the root mean square error should be less.
Where, h is the resolution ratio, m(b) is the mean of bth band and B is
l
the number of bands.
I
i 1 j 1
f ( i, j )
(9)
SC M N
I ( i , j )
i 1 j 1
r
IJCSBI.ORG
1. Entropy (H)
Entropy is used to measure the information content of an image.
Entropy is sensitive to noise and other unwanted rapid fluctuations.
An image with high information content would have high entropy.
Entropy is defined as:
H sum p log2 p (11)
Where, p contains the histogram counts returned from the Matlab
function imhist.
2. Mean (m)
Mean gives the mean pixel value, which is formulated as:
M N
I
1
m f ( i, j )
MN i 1 j 1
(12)
Where, I f (i, j ) is the gray value of fused image at index (i, j ) , MxN
is the size of the image.
I
M N
1
SD f ( i, j ) m (13)
MN i 1 j 1
I
M 1N 1
1
RF f ( i , j ) I f ( i , j 1 )2
MN i 0 j 1
(14)
Column Frequency (CF):
CF
1 N 1 M
MN j 0 i 1
I f (i, j ) I f (i 1, j ) 2
(15)
Spatial Frequency (SF): SF RF CF 2 2
(16)
IJCSBI.ORG
5. Cross Entropy (CE)
Cross-entropy evaluates the similarity in information content
between input images ( I 1 & I 2 ) and fused image. Better fusion
result would have low cross entropy. Cross entropy can be calculated
as:
CE I 1 ; I f CE I 2 ; I f (17)
CE I 1 , I 2 ; I f
2
Where, CE I 1 ; I f sum p i log2 p i 1
pi
1
f
pi
CE I 2 ; I f sum p i2 log 2
pi
2
f
p i is the normalized histogram of the image I.
6. Fusion Factor(FF)
Fusion factor of two input images ( I 1 & I 2 ) and fused image ( I f ) is
given by:
FF I 1 f I 2 f (18)
P
Where, I 1 f sum Pi i log i i 1 f
Pi Pi
1 f
1 f
Pi2 1i f
I 2 f sum Pi2 i f log
Pi2 Pi f
Pi1 & Pi f are the probability density functions in the
individual images and
Pi1i f is probability density function of both images
together.
FF indicates the amount information present in fused image from
both the images. Hence, higher value of FF indicates good fusion
quality. But it does not give the indication that the information are
fused symmetrically. For that another metrics called fusion
symmetry is used.
7. Fusion Symmetry(FS)
Fusion symmetry indicates how symmetrically the information from
input images is fused to obtain the fused image. It is given by:
I1 f
FS abs 0.5 (19)
I1 f I 2 f
IJCSBI.ORG
Since this metric is a symmetry factor, from the equation it is clear
that its value should be as low as possible so that the fused image
would contain the features of both input images. Fusion quality
depends on degree of Fusion symmetry.
quality index over a window for a given source image and fused
image.
The range of this metric is 0 to 1. One indicates the fused image
contains all the information from the source images. FQI of a better
fusion would have maximum value in between 0 & 1.
9. Execution Time (Et)
It gives the time taken to execute the algorithm.
IJCSBI.ORG
The fusion techniques are tested one by one on Dataset-1 in Matlab. In SWT
algorithm, it is observed that fusion quality increases with the increase in
levels of decomposition at the cost of execution time and it is found out that
fusion results with 4 decomposition levels of SWT gives the better results
[7]. In Fuzzy logic based algorithm, Sugeno FIS with 5 membership
function had given better results. Fuzzylet algorithm is formed by
combining SWT with 4 decomposition levels and Fuzzy with 5 membership
functions [7]. ANFIS training is done to the FIS to get Neuro-fuzzy
algorithm. Here also number of membership functions can be varied.
Performance of image fusion using 3 and 5 membership functions with
ANFIS is tabulated in Table-1. From the table, it is observed that there is no
improvement in evaluation indices by increasing the number of membership
function and execution time increases with increase in membership
functions. So, ANFIS with 3 membership functions is selected for
evaluation. For formulating Neuro-Fuzzylet, the fuzzy function is replaced
with Neuro-fuzzy function in the Fuzzylet algorithm. The performance
metrics obtained for different methods is tabulated in Table-2 for
comparison.
IJCSBI.ORG
Table-2 Comparison of the Performance metrics obtained from five image fusion
techniques for Dataset-1
Algorithm Performance evaluation metrics
Entropy RMSE PSNR SD ERGAS SF SC CE FF FQI FS Et(sec.)
SWT 3.89 0.007 69.944 0.195 0.744 0.066 1.002 4.215 3.378 0.811 0.016 0.826
Fuzzy 3.578 0.031 63.142 0.195 3.560 0.046 0.981 5.228 3.358 0.771 0.009 0.455
Neurofuzzy 3.578 0.016 66.542 0.199 1.828 0.067 1.003 4.682 3.376 0.816 0.015 0.292
Fuzzylet 4.061 0.005 71.062 0.198 0.224 0.068 1.000 3.255 3.389 0.882 0.013 3.324
Neurofuzzylet 3.912 0.006 70.165 0.199 0.771 0.066 1.002 3.626 3.379 0.848 0.017 3.212
From the table it is clear that Neuro-Fuzzy gives better results than fuzzy
(see values shown in red). But when it is combined with SWT, fuzzy gives
better results. So out of the five algorithms, Fuzzylet gives best fusion
results (see bold values) for Dataset-1. The fused and error images for all the
algorithms are given from Fig. 6 and 7.
Fig. 6 Fused image using SWT, Fuzzy, Neuro-Fuzzy, Fuzzylet and Neuro-Fuzzylet
respectively for Dataset-1
IJCSBI.ORG
Fig. 7 Error image using SWT, Fuzzy, Neuro-Fuzzy, Fuzzylet and Neuro-Fuzzylet
respectively for Dataset-1
B. Dataset-2
Dataset-2 is a multispectral data set consists of LLTV ( I1 ) and FLIR ( I 2 )
images as inputs as shown in Fig. 8. Reference image is not available for
this dataset, hence evaluation metrics explained in section 3.B is used for
the comparison.
Human eye is sensitive to a limited range of the electromagnetic spectrum as
well as to low light intensity. To obtain data that cannot be sensed by the
eye, one can use sensor data such as IR sensors or image intensifier night
time sensors. The human observer may use data from multiple sensors. For
example, using the visual channel as well as the IR channel can substantially
improve the ability to detect a target. This can be observed in the input
images shown in Fig.8. In the LLTV image, the bushes, trees etc are more
visible while in FLIR image, the roads are more visible. The fused image
should render the necessary features of both images.
IJCSBI.ORG
Fused images using all the five algorithms are shown in Fig. 9. It is
observed that in SWT result, all the features of both input images are visible
but with poor clarity. Rendering of land texture, visual quality of image, etc
are poor. In Fuzzy and Neuro-Fuzzy, it is observed that IR features are
prominent. Its rendering quality is poor with dark texture and over enhanced
view of elements like bushes, trees, etc.
From the table it is clear that Neuro-Fuzzy gives better results than Fuzzy
and SWT (see values shown in red).When it is combined with SWT, Neuro-
Fuzzy gives better results. So out of the five algorithms, Neuro-Fuzzylet
gives best fusion results (see bold values) for multispectral images.
IJCSBI.ORG
Fig. 9 Fused image using SWT, Fuzzy, Neuro-Fuzzy, Fuzzylet and Neuro-Fuzzylet
respectively for Dataset-2
5. CONCLUSION
Five different image fusion algorithms, SWT, fuzzy, Neuro-Fuzzy, Fuzzylet
and Neuro-Fuzzylet algorithms were discussed and tested with two datasets
(monospectral and multispectral). The results were compared using fusion
quality performance evaluation metrics. It was observed that Neuro-Fuzzy
gives better results than Fuzzy and SWT. Fuzzylet and Neuro-Fuzzylet were
obtained by combining Fuzzy and Neuro-Fuzzy respectively with SWT. It
was observed that Fuzzylet gives better results for monospectral images and
on the other hand, Neuro-Fuzzylet had given better results for multispectral
images at the cost of execution time. It is hoped that the proposed algorithm
can be extended for real time and color images.
6. REFERENCES
[1] Yanfen Guo, Mingyuan Xie, Ling Yang, An Adaptive Image Fusion Method
Based on Local Statistical Feature of Wavelet Coefficients 978-1-4244-5273-6/9
2009 IEEE.
[2] VPS Naidu and J.R. Raol, Pixel-level Image Fusion using Wavelets and Principal
Component Analysis, In defence journal, Vol. 58, No.3, May 2008, pp. 338-352.
[3] Andrew. P. Bradley, Shift-invariance in the Discrete Wavelet Transform, in Proc.
VIIth Digital Image Computing: Techniques and Applications, Dec. 2003, Sydney.
[4] Pusit Borwonwatanadelok, Wirat Rattanapitakand Somkait Udomhunsakul, Multi-
Focus Image Fusion based on Stationary Wavelet Transform and extended Spatial
Frequency Measurement, International Conference on Electronic Computer
Technology, 2009, pp. 77-81.
[5] R. Maruthi and K. Sankarasubramanian, Pixel Level Multifocus Image Fusion Based
on Fuzzy Logic Approach, Asian Journal of Information Technology 7(4): 168-171,
2008 ISSN: 1682-3915.
IJCSBI.ORG
[6] Harpreet Singh, Jyoti Raj and Gulsheen Kaur, Image Fusion using Fuzzy Logic and
Applications, Budapest Hungary, 25-29 July. 2004.
[7] Swathy Nair, Bindu Elias, VPS Naidu, Pixel Level Image Fusion using Fuzzylet
Fusion Algorithm, in International Journal Of Advanced Research in Electrical
Electronics And Instrumentation Engineering, ISSN 2278-8875, Dec 2013.
[8] www.mathworks.in/matlabcentral/fileexchange/authors/104729. Accessed on
21/2/2014.
IJCSBI.ORG
A Comparative Analysis on
Visualization of Microarray Gene
Expression Data
Poornima. S
PG Scholar, Department of Information Technology
Sona College of Technology
Salem,India
ABSTRACT
Visualization technique helps in the easy analysis of data. But, Visualization of
biological data is one of the most challenging processes and visualization of the
computed clustered and biclustered data still remains an open issue. Clustering and
Biclustering techniques were the popular methods for classifying the gene
expression data. There is no standard visualization technique for the biclustered
data. Visualization of multiple biclusters is very harder to implement because of the
overlapping property. Here, we analyzed the merits and demerits of various
visualization techniques and visualization tools. We Compared and provided a
detailed study of each technique. Finally, a conclusion to overcome the common
challenges in the visualization of microarray gene expression data is provided.
Keywords
Visualization, microarray gene expression data, clusters, biclusters.
1. INTRODUCTION
Visualization technique is the study of the visual representation of data.
Visualizing gene expression data is a challenging one. The most common
and efficient method for analyzing gene expression data is clustering that
groups together genes with similar expression profiles. We have many
standard visualization techniques for gene clustering. But it is not the case
for the gene biclustering. In gene expression data, a bicluster is a subset of
the genes exhibiting consistent patterns over a subset of the conditions.
Biclustering techniques group the genes under a certain subset of conditions.
At the same time, a gene or condition can be in more than one bicluster
called overlapping, but in clustering a gene or condition is usually assigned
IJCSBI.ORG
to a unique cluster. The outputs of biclustering algorithms provide the basis
for better understanding of the biological process underlying the data.
However, to provide a clear knowledge about the biclustered results a better
and efficient visualization technique is required. Here, we have analyzed
some visualization techniques of both clustering tools and biclustering tools.
This paper proceeds as follows. In the next section, the background study is
described. Section 3 describes related works in this field, etc.
IJCSBI.ORG
2.1.2 Integration of Clustering and Visualization
Gunther H. Weber et al proposed this paper [12] where both clustering and
visualization methods are involved for the analysis of gene expression.
Brushing and linking is the visualization technique used. In physical views,
colour and height are used for visualizing spatial gene expression pattern
clusters. In abstract views, positions are ignored and expression levels for
multiple genes in the cluster are plotted with respect to each other using
scatter plots or parallel coordinates. Colour mapping plays the major part. It
combines different visualization methods to overcome the shortcomings of
single techniques. It provides very high Interactivity.
IJCSBI.ORG
clustering visualization. In Clustering visualization we have hierarchical
clustering where N*N matrix is the input and D is the pairwise distance.
Depth first algorithm is implemented for displaying the results of
hierarchical clustering, in Self organising maps the clusters are represented
like Cartesian graph. In k-algorithm, clusters are represented as group of
spheres with distinct colors.
IJCSBI.ORG
3.1.2 Bicluster Viewer
3.1.3 Bicoverlapper
IJCSBI.ORG
Athanasios The ocharidis et al designed a tool [13] for various analysis and
visualization of biological process. It supports both unweighted and
weighted graphs together with edge annotation of pair wise relationships
[13]. Fruchterman -Rheingold layout algorithm is used for the visualization.
Various colour schemes make the network more informative and the clusters
can be easily visualized. Markovian chain clustering algorithm is designed
specifically for the clustering of simple or weighted graphs [13]. It has a
very high user interaction. The UI functions supported here were zooming,
scaling, rotation, translation, and selection of the graph. Landscape plot is
used as an alternative representation for dendrogram. The subtrees at a given
internal node are modified by reordering it, so that the larger tree is obtained
in the left. And finally a histogram is created based on the joining events
corresponding with the subsequent gene pairs.
3.1.5 Bicat
Simon Barkow et al proposed in [5] a software tool where both clusterin and
biclustering techniques are used for the visualization of gene expression
data. Many biclustering and clustering algorithms were used. Higer user
IJCSBI.ORG
interaction is provided in this tool. The expression matrix is displayed as a
heatmap. Biclusters can be visualized as heatmap and gene expression
profiles. The heatmap is rearranged in such a way that those genes and
conditions that define the bicluster under consideration appear in the upper
left corner of the map. Alternatively, the expression view of a bicluster,
displays the profiles of those genes that are grouped within a bicluster. For
different conditions, different colour lines are represented between the gene
pairs.
IJCSBI.ORG
3.1.7 Jexpress
3.1.8 clusterMaker
3.1.9 EUREKA-DMA
IJCSBI.ORG
difference between the computed results. This heat map representation is
helpful to the users with high degree of knowledge about the microarray
gene expression data.
IJCSBI.ORG
IJCSBI.ORG
6 Bivisu Bicluster Heatmap and High. Selection and Simultaneous
in [6] parallel Zooming, comparison navigation is
coordinates. selection features are not possible
and available.
comparis
on
features
are
available.
7 Biclust Bicluster Heatmap and Medium avoids visual Continuous
er parallel clutter, duplication and
viewer coordinates zooming is reordering of
in [2] possible rows and
columns may
lead to the loss
of originality.
8 Bicover Bicluster Force Very Simultaneous Heatmap and
lapper in 2D directed high. visualization parallel
in [13] graph for of all the coordinates are
overlapper, biclusters in a not good
Heatmap, single enough in UI
parallel window. High
coordinates level of user
interaction is
possible.
9 EURE Cluster Heat map High Apart from Very basic tool
KA- In 2D clustering, for cluster
DMA visualization visualization
techniques are
available for
other
purposes.
4. Conclusion
According to the study, visualization of biclustered gene expression data in
3D provides an efficient way of analyzing the genes and condition present in
all the biclusters. But it is a challenging process. Heatmap and parallel
coordinates were efficient and often followed for the visualization of cluster
genes. But for gene biclusters there is no standard visualization technique.
Because, biclusters present in a micro array gene expression data cannot be
visualized simultaneously, but can be analysed one by one. BicOverlapper
provides an efficient way of visualizing the gene biclusters. Therefore it is
better to visualize the gene biclusters in 3D so that all the biclusters can be
visualized and analyzed in a single plane with higher level of user iteraction.
IJCSBI.ORG
REFERENCES
[1] Leishi Zhang, Xiaohui Liu and Weiguo Sheng.:3D Visualization of gene clusters,
Computer Vision and Graphics Computational Imaging and Vision Volume 32, pp
349-354,2006.
[2] Julian Heinrich, Robert Seifert, Michael Burch, Daniel Weiskopf: BiCluster Viewer: A
Visualization Tool for Analyzing Gene Expression Data, Advances in Visual
Computing Lecture Notes in Computer Science Volume 6938, pp 641-652, 2011.
[3] Rodrigo Santamara Roberto Thero and Luis Quintales: A Visual analytics approach
for understanding biclustering results from microarray data, Bioinformatics 9 (247)
2008.
[4] K.O. Cheng1, N.F. Law, W.C. Siu and A.W.C. Liew, Biclusters Visualization and
Detection Using Parallel Coordinate Plots, Proceedings of the International
Symposium on Computational Models for Life Sciences, American Institute of
Physics, 2007.
[5] Simon Barkow1, Stefan Bleuler1, Amela Prelic, Philip Zimmerman and Eckart Zitzle,
BicAT: a biclustering analysis toolbox, Bioinformatics, Vol. 22, No. 10: 1282-
1283 March 21, 2006.
[6] K.O. Cheng*, N.F. Law, W.C. Siu and T. H, BiVisu: Software Tool for Bicluster
Detection and Visualization, Bioinformatics, Vol. 23, No.17: 2342- 2344 June 22,
2007.
[7] Kaiser Sebastian, Leisch Friedrich A Toolbox for Bicluster Analysis in R. In
Compstat 2008: Proceedings iComputational Statistics, 2008.
[8] Gregory A Grothau, Adeel Mufti1 and TM Murali, Automatic layout and
visualization of biclusters, Algorithms for Molecular Biology September 2006.
[9] Yonggao Yang, Prairie View A & M, Niversity, Jim X. Chen gene expression
clustering and 3d visualization, Computing in science and engineering, volume 5,
issue 5, September 2003.
[10] B. Dysvik and I. Jonnasen, Jexpress: exploring gene expression data using java,
Bioinformatics, volume 17, issue 4, pp. 369-370, 2001.
[11] John H Morris, Leonard Apeltsin, Aaron M Newman, Jan Baumbach, Tobias
Wittkop4, Gang Su, Gary D Bader and Thomas E Ferrin, clusterMaker: a multi-
algorithm clustering plugin for Cytoscape, BMC Bioinformatics Nov 9 2011.
[12] Oliver Ru bel, Gunther H. Weber, Min-Yu Huang, E. Wes Bethel, Mark D. Biggin,
Charless C. Fowlkes, Cris L. Luengo Hendriks, Soile V.E. Kera nen, Michael B.
Eisen, David W. Knowles, Jitendra Malik, Hans Hagen, and Bernd Hamann
Integrating Data Clustering and Visualization for the Analysis of 3D Gene Expression
Data, IEEE/ACM Trans Comput Biol Bioinform, pp.64-79. 2010
[13] Athanasios Theocharidis, Stjin van Dongen, Anton J Enright & Tom C Freeman
Network visualization and analysis of gene expression data using BioLayout Express
3D, Nature Protocols 4, pp.1535 - 1550, Oct 2009.
[14] Ashraf S. Hussein, Faculty of Computer and Information Sciences, Ain Shams
University, Cairo, 11566, Egypt Analysis and Visualization of Gene Expressions and
Protein Structures, 2008.
[15] O. Rubel1, G.H. Weber3, M.-Y. Huang, E. W. Bethel, S. V. E. Keranen, C .C.
Fowlkes5, C. L. Luengo Hendriks, Angela H. De Pace, L. Simirenko, M. B. Eisen4,
M.D. Biggin, H. Hagen, J. Malik, D. W. Knowles and B. Hamann, PointCloudXplore
2: isual Exploration of 3D Gene Expression, Journal software, Vol. 3, 2008.
[16] Sagi Abelson, Eureka-DMA: an easy-to-operate graphical user interface for fast
comprehensive investigation and analysis of DNA microarray data, BMC
Binformatics, Feb 24 2014.
IJCSBI.ORG
[17] Georgios A Pavlopoulos, Anna-Lynn Wegener and Reinhard Schneider, A survey of
visualization tools for biological network analysis, Bio Datamining, 28 November
2008.
[18] Tangirala Venkateswara Prasad* and Syed Ismail Ahson, A survey of Visualization of
microarray gene expression data, Bioinformation, pp. 141-145. May 3 2006.
IJCSBI.ORG
ABSTRACT
A current-mode square-root domain filter based on a new geometric-mean circuit is
presented. The circuit is a tunable first-order low pass filter, working in low voltage, and
featuring low nonlinearity. It employs MOSFETs that operate in both strong inverted
saturation and triode regions. Simulation results by HSPICE confirm the validity of the
proposed design technique and show high performance of the proposed circuit.
Keywords
Current-mode, Geometric-mean, squarer/divider, Companding, Filter.
1. INTRODUCTION
Companding process (compressing and expanding) as an attractive
technique in analog circuit designs have drawn the attention of many
researchers. In a filter employing companding technique, the circuit is
internally nonlinear and the dynamic range (DR) of its signal is different at
various points of signal path, and still, the resulting input-output current
relationship is linear. The main advantage of these filters is their large
dynamic ranges in low voltages, caused by voltage swing reduction at
internal nodes. The first implementation of companding systems employed
the exponential I-V characteristic of bipolar transistors that led to the log-
domain structures [1, 2]. Developments in CMOS circuits and also
similarity in I-V characteristics, caused the bipolar transistors were
substituted by MOS transistors operating in weak inversion region [3].
However, the effects of limited speed and transistor mismatches restricted
their applications. Afterwards, companding systems employed MOS
transistors in saturation region based on voltage translinear principle and
class-AB linear transconductors that led to Square-Root-Domain (SRD)
structures [4-14]. The main drawback of these topologies is that for correct
operation, all MOS transistors of the circuit must work in saturation region.
If, in some conditions, the transistors are forced to enter in triode region, it
will invalidate the MOS translinear or transconductance principle. This
IJCSBI.ORG
restricts the input range and affects the linearity. In this work to overcome
the above problems in SRD filter, a new approach in which MOS transistors
operate both in saturation and triode regions is proposed [15]. The
simulation results of the proposed circuit show less nonlinearity and less
total harmonic distortion (THD) compared to those reported before [4]-[8].
This paper is organized as follows. In section 2, the basic principle of
current-mode SRD filter operation is presented. In section 3 a divider-square
root-multiplier (DSM), as the basic building block of the filter, is presented.
This DSM has been realized by using of a new geometric-mean circuit. In
section 4 a current-mode first-order LPF is designed. The simulation results
of the filter show that the circuit has favorable characteristics.
2. BACKGROUND STUDY
In a current-mode first-order low pass filter and in Laplace domain, the
output current I out and input current I in are related as:
I out s A
(1)
I in s s
1
c
1
in which, A is the DC gain and c is the cutoff frequency of the filter.
In time domain eqn. (1) can be written as:
dI out
I out AI in . (2)
dt
Also in the circuit shown in Figure 1, the I-V relation of MOSFET MF in
saturation region can be expressed as:
I out Vcap Vth out 2I out
2 dI dVcap
(3)
2 dt dt
W
in which, Vcap Vgs is the voltage of capacitor C, Cox is the
L
transconductance parameter and Vth is the threshold voltage of MOS
transistor.
IJCSBI.ORG
I tune2 I
I cap I in tune1 I out (4)
I out I out
in which, tuning currents I tune1 and I tune2 are given by:
I tune1
Cc 2 , I ACc 2 . (5)
2 2
tune2
Eqn. (4) describes the internal nonlinear dynamic operation of the filter;
nevertheless, the I out I in relationship remains linear as described by (2).
The cutoff frequency and dc gain can be written as:
2I tune1 I
c , A tune2 . (6)
C I tune1
The divider-square root- multiplier operator(DSM) is defined as:
I tune
Iz Ix . (7)
Iy
Therefore the right hand side of eqn. (4) is subtraction of two DSM
operators with different inputs and outputs.
Figure. 2 shows the symbol of the DSM operator and Figure. 3 shows the
block diagram of the current-mode first-order LPF which consists of two
DSM, one capacitor, two MOS transistors (MF1, MF2) and two current
mirrors (CM1,CM2).
I tune
Iz Ix
Iy
I tune
Iz I in
I out
I tune
Iz I in
I out
IJCSBI.ORG
3. Circuit Design
3.1 Geometric mean
Figure. 4 shows the proposed geometric-mean circuit that consists of two
input current mirrors (CM1, CM2), two Vthlevel shifters, two isolating
MOS transistors (M7, M8), and a output current mirror (CM3). I x and I y are
input and I z is the output of the geometric-mean.
In this circuit, transistors M1, M2, M3 and M4 are identical
1 2 3 4 ) and also 5 6 . As the figure shows transistors M1
and M2 of the input current mirrors are always in saturation region and the
I-V relationship of them are represented respectively by:
2I x
Vg1 Vth (8)
2I y
Vg 2 Vth . (9)
The drain voltages Vd 3 and Vd 4 are determined by:
Vd 3 Vg 2 Vth (10)
Vd 4 Vg1 Vth . (11)
Figure 4: Geometric-mean
circuit
IJCSBI.ORG
Case1:
For I x I y , transistor M3 operates in triode region and transistor M4
operates in saturation region so we can write:
I d 3 [ 2( Vg1 Vth )Vd 3 Vd23 ] (12)
2
Id 4 ( Vg 2 Vth )2 . (13)
2
Substituting (10) into (12) results:
Id3 [ 2( Vg1 Vth )(Vg 2 Vth ) ( Vg 2 Vth )2 ] .(14)
2
The drain current of transistors M5 and M6 can be written as:
I z Id3 Id4 . (15)
Substituting (13) and (14) into (15) gives:
I z ( Vg1 Vth )(Vg 2 Vth ) (16)
and substituting (8) and (9) into (16) results:
Iz 2 IxI y . (17)
Case 2:
For I x I y , transistor M3 operates in saturation region and transistor M4
operates in triode region so we have:
I d 3 ( Vg1 Vth )2 (18)
2
Id 4 [ 2( Vg 2 Vth )Vd 4 Vd24 ] . (19)
2
I z 2 I x I y (23)
As (17) and (23) show in both cases the circuit acts as a geometric-mean.
The squarer/divider is obtained after few modifications by taking the I y as
the output, I z and I x as the inputs, and removing the connection between
gate and drain of transistor M2 and instead biasing transistor M2 and M4 by
connecting their gates to the drain of transistor M8. It results:
IJCSBI.ORG
I x2
Iy (24)
4I y
IJCSBI.ORG
I x2
and output current I z .
4I y
Figure. 6 shows the block diagram of the DSM. The advantage of cascading
a squarer/divider and a geometric-mean with inverse functions is its ability
to compensate the non-linearity.
I x2
Io I z 2 I o I tune
4I y
As the figure shows the distortion of the output current is negligible and the
input-output current relationship is linear. This is in spite of the fact that the
capacitance voltage in the lower half of the signal is strongly distorted,
meaning that the system is internally nonlinear. Figure 8 shows the
simulation results of the frequency response for different tuning currents,
ranging from I tune1 I tune2 0.4uA to 40uA (from left to right). The results
are approximately equal to the expected results from eqn. (6). Figure 9
shows the nonlinear behaviour of the output current by using Total
Harmonic Distortion (THD) with a 4096 point Fast Fourier Transform
(FFT). The worst-case THD of the output current is less than -51db (0.28%)
for the input amplitudes close to the DC current bias. The comparison of the
results with those reported in other works [4]-[8], shows 10db lower THD
and more tuning current.
IJCSBI.ORG
5. CONCLUSIONS
Using MOS transistors operating in both saturation and triode regions a new
current-mode geometric-mean structure for low voltage square-root domain
filters is presented. The proposed circuit has low nonlinearity and high
dynamic range. Simulation results of a first-order LPF show that the
proposed technique is applicable to design filters with very low voltage
requirement. Since the time-dependent state-space equation of a higher-
order linear filter can be reduced to a set of first-order differential equations,
the technique can be readily extended to the higher order configurations.
(a)
(b)
IJCSBI.ORG
Figure 8: Frequency response of the LPF (for Itune1 from 0.4uA to 40uA
from left to right)
6. REFERENCES
[1] D. R. Frey, (1996) Exponential state space filters: A generic current mode design
strategy, IEEE Trans. Circuits Syst. I, Vol. 43, pp. 34-42.
[2] E. M. Drakakis, A. J. Payne, and C. Toumazou, (1999) Log-domain state-space: A
systematic transistor-level approach for log-domain filtering, IEEE. Trans. Circuits Syst.
II, Vol. 46, pp. 290-305.
[3] E. Morsy and J. Wu, (2000) Low voltage micropower log-domain filters, Analog
Integr. Circuits Signal Processing , Vol. 22, pp. 103-114.
[4] C. A. D. L. C. Blas, A. J. Lopez-Martin and A. Carlosena, (2004) 1.5v tunable square-
IJCSBI.ORG
root domain filter, Electron. Lett. , Vol. 40, No. 4, pp. 133-147.
[5] A. J. Lopez-Martin and A. Carlosena, (2002) A 1.5v CMOS companding filter,
Electron. Lett. , Vol. 38, No. 22, pp. 1299-1300.
[6] J. Mulder, A. C. V. D. Woerd, W. A. Serdijn and A. H. M. V. Roermund, (1996)
Current-mode companding x-domain integrator, Electron. Lett. , Vol. 32, No.3, pp.198-
199.
[7] R. G. Carvajal, J. Ramirez-Angulo, A. J. Lopez-Martin, A. Torralba, A. G. Galan, A.
Carlosena and F. M. Chavero, (2005) The flipped voltage follower: A useful cell for low-
voltage low power circuit design, IEEE Trans. Circuits Syst. I: Regular Paper, Vol. 52,
No. 7, pp. 1276-1291, 34-42.
[8] C. A. D. L. C. Blas, A. J. Lopez-Martin and A. Carlosena, (2003) 1.5v MOS
translinear loops with improved dynamic range and their application to current-mode signal
processing, IEEE. Trans. Circuits Syst. II, Analog Digit. Signal process, Vol. 50, No. 12,
pp. 918-927.
[9] C. Psychalinos and S. Vlassis, (2002) A systematic design procedure for square-root -
domain circuits based on the signal flow graph approach, IEEE Trans. Circuits Syst. I,
Fundam. Theory Appl., Vol. 49, No. 12, pp. 1702-1712.
[10] M. Eskiyerli and Payne, (2000) Square root domain filter design and performance,
Analog Integr. Circuits Signal Processing, Vol. 22, pp. 231-243.
[11] E. Seevinck, (1990) companding current-mode integrator: a new circuit principle for
continuous-time monolithic filters, Electron. Letter Vol. 26, pp. 2046-2047.
[12] C. A. D. L. C. Blas, A. J. Lopez-Martin and A. Carlosena, (2004) Low voltage
CMOS nonlinear transconductors and their application to companding current-mode
filters, Analog Integr. Circuits Signal Processing, Vol. 38, pp. 137-147.
[13] J. Mulder, W. A. Serdijn, A. C. V. D. Woerd, and A. H. M. Roermund, (2000)
Dynamic translinear circuits - an overview, (2002) Analog Integr. Circuits Signal
Processing, Vol. 22, pp. 111-126.
[14] C. Psychalinos and S. Vlassis, (2001) A high performance square-root domain
integrator,AnalogIntegr. Circuits Signal Processing, Vol. 28, No. 3, pp. 97-101.
[15] E. Farshidi and S. M. Sayedi, (2007) A Square-Root Domain Filter Based on a New
Geometric-mean Circuit, 15th Iranian Conference on Electrical Engineering ICEE2007,
pp. 6-11.