Documente Academic
Documente Profesional
Documente Cultură
An Efficient Connection between Statistical Software and Database Management System ................... 1
Sunghae Jun
Pragmatic Approach to Component Based Software Metrics Based on Static Methods ......................... 1
S. Sagayaraj and M. Poovizhi
SDI System with Scalable Filtering of XML Documents for Mobile Clients ............................................... 1
Yi Yi Myint and Hninn Aye Thant
An Easy yet Effective Method for Detecting Spatial Domain LSB Steganography .................................... 1
Minati Mishra and Flt. Lt. Dr. M. C. Adhikary
Design of ATL Rules for TransformingUML 2 Sequence Diagrams into Petri Nets..................................... 1
Elkamel Merah, Nabil Messaoudi, Dalal Bardou and Allaoua Chaoui
International Journal of Computer Science and Business Informatics
IJCSBI.ORG
S. SOPHIA
Professor, Department of Electronics and Communication Engineering
Sri Krishna College of Engineering and Technology, Coimbatore, India.
Abstract
Latest researches in wireless communications and electronics has imposed
the progress of low-cost wireless sensor nodes. Clustering is a thriving
topology control approach, which can prolong the lifetime and increase
scalability for wireless sensor networks. The admired criteria for clustering
methodology are to select cluster heads with more residual energy and to
rotate them periodically. Sensors at heavy traffic locations quickly deplete
their energy resources and die much earlier, leaving behind energy hole and
network partition. In this paper, a model of distributed layer-based
clustering algorithm is proposed based on three concepts. First, the
aggregated data is forwarded from cluster head to the base station through
cluster head of the next higher layer with shortest distance between the
cluster heads. Second, cluster head is elected based on the clustering factor,
which is the combination of residual energy and the number of neighbors of
a particular node within a cluster. Third, each cluster has a crisis hindrance
node, which does the function of cluster head when the cluster head fails to
carry out its work in some critical conditions. The key aim of the proposed
algorithm is to accomplish energy efficiency and to prolong the network
lifetime. The proposed distributed clustering algorithm is contrasted with
the existing clustering algorithm LEACH.
IJCSBI.ORG
1. INTRODUCTION
Wireless sensor network (WSN) is a collection of huge number of small,
low-power and low-cost electronic devices called sensor nodes. Each sensor
node consists of four major blocks: sensing, processing, power and
communication unit and they are responsible for sensing, processing and
wireless communications (figure 1). These nodes bring together the relevant
data from the environment and then transfer the gathered data to base station
(BS). Since WSNs has many advantages like self organization,
infrastructure-free, fault-tolerance and locality, they have a wide variety of
potential applications like border security and surveillance, environmental
monitoring and forecasting, wildlife animal protection and home
automation, disaster management and control. Considering that sensor
nodes are usually deployed in remote locations, it is impossible to recharge
their batteries. Therefore, ways to utilize the limited energy resource wisely
to extend the lifetime of sensor networks is a very demanding research issue
for these sensor networks.
IJCSBI.ORG
route setup, uses communication bandwidth [17] efficiently and takes
advantage of network lifetime [12-16]. By the data aggregation process,
unnecessary communication between sensor nodes, cluster head and the
base station is evaded. In this paper, a well-defined model of distributed
layer-based clustering algorithm is proposed based of three concepts: the
aggregated data is forwarded from the cluster head to the base station
through cluster head of the next higher layer with shortest distance between
the cluster heads, cluster head is elected based on the clustering factor and
the crisis hindrance node does the function of cluster head when the cluster
head fails to carry out its work. The prime aim of the proposed algorithm is
to attain energy efficiency and increased network lifetime.
IJCSBI.ORG
Nagpal and Coore proposed CLUBS [20], which is executed with an idea to
form overlapping clusters with maximum cluster diameter of two hops. The
clusters are created by local broadcasting and its convergence depends on
the local density of the wireless sensor nodes. This algorithm can be
implemented in asynchronous environment without dropping efficiency.
The main difficulty is the overlapping of clusters, clusters having their CHs
within one hop range of each other, thereby both the clusters will collapse
and CH election process will get restarted.
Demirbas, Arora and Mittal brought out FLOC [21], which shows double-
band nature of wireless radio-model for communication. The nodes can
commune reliably with the nodes in the inner-band and unreliably with the
nodes that are in the outer-band. The chief disadvantage of the algorithm is,
the communication between the nodes in the outer band is unreliable and the
messages have maximum probability of getting lost during communication.
Ye, Li, Chen and Wu proposed EECS [22], which is based on a supposition
that all CHs can communicate directly with the BS. The clusters have
variable size, those closer to the CH are larger in size and those farther from
CH are smaller in size. It is really energy efficient in intra-cluster
communication and shows an excellent improvement in network lifetime.
IJCSBI.ORG
EEUC is anticipated for uniform energy consumption within the sensor
network. It forms dissimilar clusters, with a guessing that each cluster can
have variable sizes. Probabilistic selection of CH is the focal shortcoming of
this algorithm. Few nodes will be gone without being part of any cluster.
Yu, Li and Levy proposed DECA, which selects CH based on residual
energy, connectivity and a node identifier. It is greatly energy efficient, as it
uses lesser messages for CH selection. The main trouble with this algorithm
is that high risk of wrong CH selection which leads to the discarding of
every packets sent by the wireless sensor node.
Ding, Holliday and Celik proposed DWEHC, which elects CH on the basis
of weight, a combination of nodes residual energy and its distance to the
neighboring nodes. It produces well balanced clusters, independent of
network topology. A node possessing largest weight in a cluster is
designated as CH. The algorithm constructs multilevel clusters and the
nodes in every cluster reach CH by relaying through other intermediate
nodes. The foremost problem occurs due to much energy utilization by
several iterations until the nodes settle in most energy efficient topology.
IJCSBI.ORG
itself as a CH, the sensor node broadcasts an advertisement message which
has its own ID. The non-cluster head nodes can formulate an assessment,
which cluster to join based on the strength of the received advertisement
signal. After the decision is made, every non-cluster head node should
transmit a join- request message to the chosen cluster head to specify that it
will be a member of the cluster. The cluster head fashions and broadcasts a
time division multiple access (TDMA) schedule to exchange the data with
non-cluster sensor nodes without collision after it receives all the join-
request messages.
(1)
The steady phase commences after the clusters are fashioned and the TDMA
schedules are broadcasted. All of the sensor nodes transmits their data to the
cluster head once per round during their allotted transmission slot based on
the TDMA schedule and in other time, they turn off the radio in order to
trim down the energy consumption. However, the cluster heads must stay
awake all the time. Therefore, it can receive every data from the nodes
within their own clusters. On receiving the data from the cluster, the cluster
head carries out data aggregation mechanism and onwards it to the base
station directly. This is the entire mechanism of the steady state phase. After
a certain predefined time, the network will step into the next round. LEACH
is the basic clustering protocol which processes cluster approach and it can
prolong the network lifetime in comparison with other multi-hop routing
and static routing. However, there are still some hiding problems that should
be considered.
LEACH does not take into account the residual energy to elect cluster heads
and to construct the clusters. As a result, nodes with lesser energy may be
elected as cluster heads and then die much earlier. Moreover, since a node
selects itself as a cluster head only according to the value of the calculated
probability, it is hard to guarantee the number of cluster heads and their
distribution. Also in LEACH clustering algorithm, the cluster heads are
selected randomly and hence the weaker nodes drain easily. To rise above
IJCSBI.ORG
these shortcomings in LEACH, a model of distributed layer-based clustering
algorithm is proposed, where clusters are arranged in to hierarchical layers.
Instead of cluster heads directly sending the aggregated data to the base
station, sends them to their next layer nearer cluster heads. These cluster
heads send their data along with that received from lower level cluster heads
to the next layer nearer cluster heads. The cumulative process gets repeated
and finally the data from all the layers reach the base station. The proposed
model is dedicated with some expensive designs, focusing on reduced
energy utilization and improved network lifetime of the sensor network.
4. THE PROPOSED CLUSTERING ALGORITHM
The proposed clustering algorithm is well distributed, where the sensor
nodes are deployed randomly to sense the target environment. The nodes are
divided into clusters with each cluster having a CH. The nodes throw the
information during their TDMA timeslot to their respective CH which fuses
the data to avoid redundant information by the process of data aggregation.
The aggregated data is forwarded to the BS. Compared to the existing
algorithms, the proposed algorithm has three distinguishing features. First,
the aggregated data is forwarded from the cluster head to the base station
through cluster head of the next higher layer with shortest distance between
the cluster heads. Second, cluster head is elected based on the clustering
factor, which is the combination of residual energy and the number of
neighbors of a particular node within a cluster. Third, each cluster has a
crisis hindrance node, that does the function of cluster head when the cluster
head fails to carry out its work in some conditions.
IJCSBI.ORG
A. Aggregated Data Forwarding
In a network of N nodes, each node is assigned with an exclusive Node
Identity (NID). The NID just serves as a recognition of the nodes and has no
relationship with location or clustering. The CH will be placed at the center
and the nodes will be organized in to several layers around the CH. Every
clusters are arranged into hierarchical layers and layer numbers are assigned
to each clusters. The cluster that is far away from the base station is
designated as the lowest layer and the cluster nearer to the base station is
designated as the highest layer. The main characteristic feature of the
proposed algorithm is that the lowest layer cluster head forwards only its
own aggregated data to the next layer cluster head but the highest layer
forwards all the aggregated data from the preceding cluster heads to the base
station (figure 3). Thus lower workload is assigned to the lower layers but
the higher layers are assigned with greater workload. The workload assigned
to a particular cluster head is directly proportional to the energy utilization
of the cluster head. In order to balance the energy utilization among the
cluster head, the concept of variable transmission power is employed, where
the transmission power reduces with increase in layer numbers. In LEACH,
each cluster head forwards the aggregated data to the base station directly
which uses much energy. The proposed algorithm uses a multi-hop fashion
of data forwarding from cluster head to the base station resulting in reduced
energy utilization.
IJCSBI.ORG
IJCSBI.ORG
of neighbors vary which should be taken into account but it is barely not
concentrated in the LEACH clustering mechanism.
C. Alternate Crisis Hindrance Node
In a cluster with large number of nodes, cluster crisis does not affect the
overall performance of the wireless sensor system. But in the case of
network with less number of nodes, cluster crisis greatly affects the wireless
sensor system. Care should be done when cluster head selection process by
applying alternate recovery mechanisms. In addition to the regular cluster
head, additional cluster node is assigned the task of secondary cluster head,
and the particular node is called as crisis hindrance node. Generally the
cluster collapses when the cluster head fails. In such situations, crisis
hindrance node act as cluster head and recovers the cluster. The main
characteristic feature of the proposed algorithm is that, the crisis hindrance
node solely performs the function of recovery mechanism and does not
involve in sensing process. In case of LEACH, the distribution and the
loading of CHs to all nodes in the networks is not uniform by switching the
cluster heads periodically. Hence, there is a maximum probability of a
cluster to be collapsed easily, but it can be avoided in the proposed
algorithm with the help of crisis hindrance node.
7. ACKNOWLEDGMENTS
Our sincere gratitude to the management of SVS Educational Institutions
and my Research Supervisor Dr. S. Sophia who served as a guiding light to
come out with this amazing research work.
IJCSBI.ORG
REFERENCES
[1] W.B.Heinzelman, A.P.Chandrakasan, H.Balakrishnan, (2002), An application specific
protocol architecture for wireless microsensor networks, IEEE Transactions on Wireless
Communication Volume 1, Number 4, Pages 660-670.
[2] O.Younis, S.Fahmy, (2004), HEED: A hybrid energy-efficient distributed clustering
approach for adhoc sensor networks, IEEE Transactions on Mobile Computing, Volume 3,
Number 4, Pages 366-379.
[3] S.Zairi, B.Zouari, E.Niel, E.Dumitrescu, (2012), Nodes self-scheduling approach for
maximizing wireless sensor network lifetime based on remaining energy IET Wireless
Sensor Systems, Volume 2, Number 1, Pages 52-62.
[4] I.Akyildiz, W.Su, Y.Sankarasubramaniam, E.Cayirci, (2002), A Survey on sensor
networks, IEEE Communications Magazine, Pages 102-114.
[5] G.J.Pottie, W.J.Kaiser, (2000), Embedding the internet: wireless integrated network
sensors, Communications of the ACM, Volume 43, Number 5, Pages 51-58.
[6] J.H.Chang, L.Tassiulas, (2004), Maximum lifetime routing in wireless sensor
networks, IEEE/ACM Transactions on Networking, Volume 12, Number 4, Pages 609-
619.
[7] S.R.Boselin Prabhu, S.Sophia, (2011), A survey of adaptive distributed clustering
algorithms for wireless sensor networks, International Journal of Computer Science and
Engineering Survey, Volume 2, Number 4, Pages 165-176.
[8] S.R.Boselin Prabhu, S.Sophia, (2012), A Research on decentralized clustering
algorithms for dense wireless sensor networks, International Journal of Computer
Applications , Volume 57, Number 20, Pages 0975-0987.
[9] S.R.Boselin Prabhu, S.Sophia, (2013), Mobility assisted dynamic routing for mobile
wireless sensor networks, International Journal of Advanced Information Technology ,
Volume 3, Number 1, Pages 09-19.
[10] S.R.Boselin Prabhu, S.Sophia, (2013), A review of energy efficient clustering
algorithm for connecting wireless sensor network fields, International Journal of
Engineering Research & Technology, Volume 1, Number 4, Pages 477481.
[11] S.R.Boselin Prabhu, S.Sophia, (2013), Capacity based clustering model for dense
wireless sensor networks, International Journal of Computer Science and Business
Informatics, Volume 5, Number 1.
[12] J.Deng, Y.S.Han, W.B.Heinzelman, P.K.Varshney, (2005), Balanced-energy sleep
scheduling scheme for high density cluster-based sensor networks, Elsevier Computer
Communications Journal, Special Issue on ASWN04, Pages 1631-1642.
[13] C.Y.Wen, W.A.Sethares, (2005), Automatic decentralized clustering for wireless
sensor networks, EURASIP Journal of Wireless Communication Networks, Volume 5,
Number 5, Pages 686-697.
[14] S.D.Murugananthan, D.C.F.Ma, R.I.Bhasin, A.O.Fapojuwo, (2005) A centralized
energy-efficient routing protocol for wireless sensor networks, IEEE Transactions on
Communication Magazine, Volume 43, Number 3, Pages S8-13.
[15] F.Bajaber, I.Awan, (2009), Centralized dynamic clustering for wireless sensor
networks, Proceedings of the International Conference on Advanced Information
Networking and Applications.
IJCSBI.ORG
[16] Pedro A. Forero, Alfonso Cano, Georgios B.Giannakis, (2011), Distributed clustering
using wireless sensor networks, IEEE Journal of Selected Topics in Signal Processing,
Volume 5, Pages 707-724.
[17] Lianshan Yan, Wei Pan, Bin Luo, Xiaoyin Li, Jiangtao Liu, (2011), Modified energy-
efficient protocol for wireless sensor networks in the presence of distributed optical fiber
sensor link, IEEE Sensors Journal, Volume 11, Number 9, Pages 1815-1819.
[18] S.Bandyopadhay, E.Coyle, (2003), An energy-efficient hierarchical clustering
algorithm for wireless sensor networks, Proceedings of the 22 nd Annual Joint Conference
of the IEEE Computer and Communications Societies (INFOCOM 2003), San Francisco,
California.
[19] D.J.Barker, A.Ephremides, J.A.Flynn, (1984), The design and simulation of a mobile
radio network with distributed control, IEEE Journal on Selected Areas in
Communications, Pages 226-237.
[20] R.Nagpal, D.Coore, (2002), An algorithm for group formation in an amorphous
computer, Proceedings of IEEE Military Communications Conference (MILCOM 2002),
Anaheim, CA.
[21] M.Demirbas, A.Arora, V.Mittal, (2004), FLOC: A fast local clustering service for
wireless sensor networks, Proceedings of Workshop on Dependability Issues in Wireless
Ad Hoc Networks and Sensor Networks (DIWANS04), Italy.
[22] M.Ye, C.F.Li, G.H.Chen, J.Wu, (2005), EECS: An energy efficient clustering scheme
in wireless sensor networks, Proceedings of the Second IEEE International Performance
Computing and Communications Conference (IPCCC), Pages 535-540.
IJCSBI.ORG
ABSTRACT
In big data era, we need to manipulate and analyze the big data. For the first step of big data
manipulation, we can consider traditional database management system. To discover novel
knowledge from the big data environment, we should analyze the big data. Many statistical
methods have been applied to big data analysis, and most works of statistical analysis are
dependent on diverse statistical software such as SAS, SPSS, or R project. In addition, a
considerable portion of big data is stored in diverse database systems. But, the data types of
general statistical software are different from the database systems such as Oracle, or
MySQL. So, many approaches to connect statistical software to database management
system (DBMS) were introduced. In this paper, we study on an efficient connection
between the statistical software and DBMS. To show our performance, we carry out a case
study using real application.
Keywords
Statistical software, Database management system, Big data analysis, Database connection,
MySQL, R project.
1. INTRODUCTION
Every day, huge data are created from diverse fields, and stored in computer
systems. These big data are extremely large and complex [1]. So, it is very
difficult to manage and analyze them. But, big data analysis is important
issue in many fields such as marketing, finance, technology, or medicine.
Big data analysis is based on statistics and machine learning algorithms. In
addition, data analysis is depended on statistical software, and the data are
stored in database systems. So, for big data analysis, we should manage
statistical software and database system effectively. In this paper, we
consider R project system as statistical software. R is an environment for
statistical computing including statistical analysis and graphical display of
data [2]. This program provides most of statistical and machine learning
methods for big data analysis. We use MySQL for connecting database
system from R project. The MySQL is a database management system
(DBMS) product that is the most popular open source database in the world,
in addition, this is a free software like R system [3]. So, in our research, we
use R and MySQL for an efficient connection between statistical software
and DBMS. There was a work about DB access through R [4]. This covered
IJCSBI.ORG
the DB access problems of R, and showed the ODBC (open database
connectivity) drivers for connecting R and DBMS such as MySQL,
PostgreSQL, and Oracle. Also, the authors of this paper introduced the
installation and technological environment for the DB access. But, they did
not illustrate detailed approaches for real applications. That is, their work
was about a conceptual suggestion for the access of R to MySQL. So, in this
paper, we perform more specific study for connection between statistical
software, R to DBMS, MySQL. In our case study, we will show detailed
and efficient connection of R to MySQL using specific data set from the
University of California, Irvine (UCI) machine learning repository [5]. We
will cover our research background in next section. In section 3, our
proposed methodology will be shown. We also introduce an efficient
connection between statistical database and DBMS in section 4. Lastly we
conclude our study and offer our future works for statistical database system.
2. RESEARCH BACKGROUND
2.1 Statistical Software
To analyze data, we can consider diverse approaches using statistical
software. These days, there are so many products for statistical software.
SAS (statistical analysis system) is the most popular software for statistical
analysis [6]. But, this is expensive, so there are not many companies using
SAS except large size companies. SPSS (statistical analysis in social science)
is another representative software [7], but this is also expensive. Minitab [8]
and S-Plus [9] are well used statistics packages and these are all not free.
Recently, R has been used in many works for statistical data analysis, and
this is free. In addition, R also provides most of statistical functions
included in SAS, or SPSS. R is open source program, so we can modify R
functions for our statistical computing. This is very useful advantage of R.
Therefore, we consider R for connection to database system in this research.
2.2 Database Management System
Database is a collection of data, and database management system (DBMS)
is a software for managing database using structured query language (SQL)
[10],[11]. Oracle is one of popular DBMS products [12], but it is expensive.
MySQL is another DBMS, which is widely used open source software in the
world [3]. Also, most functions of MySQL are similar to Oracle [3]. So, in
this paper, we use MySQL for DBMS connecting to statistical software, R.
Using MySQL DBMS efficiently, we use RODBC package supported by R
CRAN in our research [13].
IJCSBI.ORG
between SAS and DBMS, we need SAS/Access product as supplementary
software. In general, this is expensive. So, we tried to make the connection
between statistical software and DBMS without cost. The efficient of our
paper was about cost. There are many approaches to connect statistical
software and DBMS. To use most of them, we should buy additional
products. But, there are few free approaches. So, we find an approach to
connect statistical software and DBMS without cost. In this paper, we study
an efficient connection between DBMS and statistical software. We select
the MySQL as a DBMS for our research, and use R project as statistical
software because not only they are free but also they have good functions. In
addition, the R and MySQL have strong performance in statistical
computing and DBMS respectively for constructing statistical database
system [14],[15],[16],[17]. In general, big data are transformed to structured
data type for statistical analysis as follow;
IJCSBI.ORG
In this paper, we use SQL codes in the MySQL console. Also, we use
RODBC as an ODBC database interface between R and MySQL [13]. In
general R system, package is a set of additional R functions. R packages are
not installed in basic R system. If we need to use a package, we have to add
the package to the R system. Also we can search all packages from the R
CRAN, and install them from the CRAN [19]. The RODBC package
provides efficient functions for ODBC database access. So, our research is
based on RODBC package to connect R to MySQL. To install RODBC in R
system, we should select R CRAN mirror site. After RODBC installation,
we load this package on R system as follow;
>library (RODBC)
The R system uses library function for loading a package. By this R code,
we can use all functions provided by RODBC package such as odbcConnect,
sqlFetch, and sqlQuery. They are used in our research for DB accessing and
connecting. To connect MySQL DB, we use odbcConnect function of
RODBC package as follow;
>db_con =odbcConnect("stat_MySQL")
User = , Password = , Database =
The DSN is stat_MySQL and the db_con object of R system includes the
connecting result. Also, in this connecting process, we decide user name,
password, and determined database. If R and MySQL are connected each
other, we can show the tables of MySQL DB using sqlTables function as
follow;
>sqlTables(con)
TABLE_CAT TABLE_SCHEM TABLE_NAME TABLE_TYPE
REMARKS
The result of this function is the information of connected DB and its tables.
3.1 Structure of DB Connection Software
In general, for connecting DBMS to application software, we should use
ODBC connector [20]. R as a statistical software is also needed to ODBC
driver to access MySQL DBMS. In this paper, we consider RODBC
package for efficient connection between R and MySQL. Figure 3 shows
the ODBC connection between DBMS and statistical software, and their
specific products.
IJCSBI.ORG
IJCSBI.ORG
sqlQuery: function for SQL query
sqlSave: function for writing data frame to table in DB
Also, we can use more functions for accessing and manipulating MySQL
DB by RODBC packages. The process of connection between R and
MySQL is as follow;
IJCSBI.ORG
4. CASE STUDY
To illustrate a case study in real problem, we used RODBC package from
R-project [13]. This is the software for ODBC database connection between
R and DBMS such as MySQL. Also, we made experiment using an example
data set from the UCI machine learning repository [5].
4.1 UCI Machine Learning Repository
For our case study, we used Abalone data set from the UCI machine
learning repository [5]. This data set consisted of 8 variables (columns) and
4,177 observations (rows). The main goal of the data is to predict the age of
abalone from the physical measurements. Next table shows the variables
and their values [5].
Table 1. Table captions should be placed above the table
The last variable (rings) is target variable, and others are all input variables.
We constructed MySQL DB using this data set. The original data from UCI
machine learning repository was text file separated by comma, but the
MySQL needed data file separated by tab key for DB loading file. So, we
transformed the data type using Excel as follow.
IJCSBI.ORG
IJCSBI.ORG
>library(RODBC)
>abalone_con=odbcConnect("abalone_ODBC")
>sqlTables(abalone_con)
TABLE_SCHEM TABLE_NAME TABLE_TYPE
case_study abalone TABLE
>vars=sqlQuery(abalone_con, "SELECT sex, diameter, rings FROM
abalone")
Sex Diameter Rings
1 M 0.365 15
2 M 0.265 7
3 F 0.420 9
4 M 0.365 10
5 I 0.255 7
Using above R codes, we saved three variables of abalone data set to vars
R object. We found the abalone table was created well from the SQL query
result by sqlQuery function. This function enabled the usage of SQL in R
system. So, we analyzed abalone data using analytical functions of R system.
Next, the result of data analysis is shown.
4.4 Data Analysis
First, we performed data summarization of three variables using summary
function of R system as follow;
>summary(vars)
sex diameter rings
F:1307 Min. :0.0550 Min. : 1.000
I:1342 1st Qu.:0.3500 1st Qu.: 8.000
M:1528 Median :0.4250 Median : 9.000
Mean :0.4079 Mean : 9.934
3rd Qu.:0.4800 3rd Qu.:11.000
Max. :0.6500 Max. :29.000
This function provided frequency or descriptive statistic according to data
type (continuous or nominal). For example diameter is continuous variable,
so we got minimum, 25 percentile, median, mean, 75 percentile, and
maximum values. Next we carried out data visualization as follow;
>boxplot(vars$diameter)
IJCSBI.ORG
IJCSBI.ORG
case study, we illustrated how our approach could be applied in real
application. We selected Abalone data set from the UCI machine learning
repository for our case study. Our result contributes to the works related to
big data analysis. In addition, we can analyze the data in DBMS directly by
statistical methods. In our future works, we will expand the scope of the
connection between DBMS and statistical software to more products.
6. DISCUSSION
The biggest problem of statistical database system is the cost of connecting
between statistical software and DBMS. For example, we should buy
SAS/Access product additionally and install it to SAS base system for
connecting SAS and DBMS. Generally this supplementary product is
expensive, so most users have had difficulty to use statistical databases
system. In this paper, we selected R system as statistical software instead of
SAS, and we used RODBC as ODBC connector instead of SAS/Access,
because R and RODBC are all free. But, their performance is similar to SAS.
Also, in new analytical functions such as statistical leaning theory and
machine learning algorithm, they surpass SAS.
REFERENCES
[1] Sathi, A. Big Data Analytics. An Article from IBM Corporation, 2012.
[2] Heiberger, R. M., and Neuwirth, E.R through Excel A Spreadsheet Interface for
Statistics, Data Analysis, and Graphics. Springer, 2009.
[3] MySQL, The Worlds most popular open source database. http://www.mysql.com,
accessed on October 2013.
[4] Sim, S., Kang, H., and Lee, Y. Access to Database through the R-Language. The
Korean Communications in Statistics, 15, 1 (2008), 51-64.
[5] UCI Machine Learning Repository, http://archive.ics.uci.edu/ml, accessed on October
2013.
[6] SAS, http://www.sas.com,accessed on October 2013.
[7] SPSS, http://www-01.ibm.com/software/analytics/spss/, accessed on October 2013.
[8] Minitab, http://www.minitab.com, accessed on October 2013.
[9] S-Plus, http://solutionmetrics.com.au/products/splus/, accessed on October 2013.
[10] Wikipedia, the free encyclopedia. http://en.wikipedia.org, accessed on October 2013.
[11] Date, C. J.An Introduction to Database Systems. 7th edition, Addition-Wesley, 2000.
[12] Oracle, http://www.oracle.com, accessed on October 2013.
[13] Ripley, B.Package RODBC. CRAN R-Project, 2013.
[14] R-bloggers, On R versus SAS. http://www.r-bloggers.com/on-r-versus-sas/, accessed on
December, 2013.
[15] Linkin,Advanced Business Analytics, Data Mining and Predictive Modeling.
http://www.linkedin.com/groups/SAS-versus-R-35222.S.65098787, accessed on
December, 2013.
[16] Clever Logic, MySQL vs. Oracle Security, http://cleverlogic.net/articles/mysql-vs-
oracle, accessed on December, 2013.
IJCSBI.ORG
[17] Find The Best, Oracle vs MySQL, http://database-management-
systems.findthebest.com/saved_compare/Oracle-vs-MySQL, accessed on December,
2013.
[18] Han, J., and Kamber, M. Data Mining Concepts and Techniques. Morgan Kaufmann,
2001.
[19] R system, The R Project for Statistical Computing. http://www.r-project.org, accessed
on October 2013.
[20] Spector, P. Data Manipulation with R, Springer, 2008.
[21] James, D. A., and DebRoy, S.Package RMySQL. CRAN R-Project, 2013.
IJCSBI.ORG
M. Poovizhi
Department of Computer Science
Sacred Heart College, Tirupattur
ABSTRACT
Component-Based Software Engineering (CBSE) is an emerging technique for reuse of
software. This paper presents the component based software metrics by investigating the
improved measurement techniques. Two types of metrics are used: static metrics and
dynamic metrics. This research work presents the measured metric value for Complexity
metrics and Criticality metric. The static metrics applied to the E-healthcare application
which is developed with the reusable software components. The value of each metric is
analyzed with the application. The metric measured value is the evidence for the
reusability, good maintainability of component based software system.
Keywords
Component Based Software Engineering, Component Based Software Metrics, Component
Based Software System.
1. INTRODUCTION
The demand for new software applications is currently increasing at the
exponential rate. The number of qualified and experienced professionals
required for creating new software/applications is not increasing
commensurably [1]. Software Reuse applications are built from existing
components, primarily by assembling and replacing interoperable parts. So,
software professionals have recognized reuse as a powerful means of
potentially overcoming the above said software crisis and it promises
significant improvements in software productivity and quality [2].
There are two approaches for reuse of code: develop the reusable code from
scratch or identify and extract the reusable code from already developed
code [3]. The organizations have experience in developing software, there
exists extra cost to develop the reusable components from scratch to build
and strengthen their reusable software reservoir. The cost of developing the
software from scratch can be saved by identifying and extracting the
reusable components from already developed and existing software systems
or legacy systems [4]. But the problem of how to recognize reusable
components from existing systems has remained relatively unexplored. In
IJCSBI.ORG
both the cases, whether the organization is developing software from scratch
or reusing code from already developed projects, there is a need of
evaluating the quality of the potentially reusable piece of software. Metrics
is very essential to prove the quality of the components [5].
During the last decade, the software reuse and software engineering
communities have come to better understanding on component-based
software engineering. The development of a reuse process and repository
produces a base of knowledge that improves in excellence after every reuse,
minimizing the amount of development work necessary for future projects,
and ultimately reducing the risk of new projects that are based on repository
knowledge [8].
IJCSBI.ORG
2. RELATED WORKS
Many works are carried out in the area of Component Based Software
Metrics. Some of the works are listed below:
Nael SALMAN focuses mainly on the complexity that results mainly from
factors related to system structure and connectivity in 2006 [10]. Also, a
new set of properties that a component-oriented complexity metric must
possess are defined. The metrics have been evaluated using the properties
defined. A case study has been conducted to detect the power of complexity
metrics in predicting integration and maintenance efforts. The results of the
study revealed that component oriented complexity metrics can be of great
value in predicting both integration and maintenance efforts.
IJCSBI.ORG
Based Software System (CBSS) in 2011 [14]. Two sets of metrics that is,
Component Information Flow Metrics and Component Coupling Metrics are
proposed based on the concept of Component Information Flow from CBSS
designers point of view.
Jianguo Chen, Hui Wang, Yongxia Zhou, Stefan D. Bruda presented some
such efforts by investigating the improved measurement tools and
techniques, i.e., through the effective software metrics in 2011 [15].
Coupling, Cohesion and interface metrics are proposed newly and evaluated
those metrics.
The previous research explained the work done with varieties of Component
Based Software Metrics. This paper deals about the static and dynamic
metrics of component based software. This work is extended by developing
the E-Healthcare application and the results are carried out for the static
metrics.
3. COMPONENT BASED SOFTWARE METRICS
The traditional software metrics focus on non-CBSS and are inappropriate
to CBSS mainly because the component size is normally not known in
advance. Inaccessibility of the source code for some components prevents
comprehensive testing. So, the component based metrics are defined to
evaluate the component based application.
There are two types of metrics considered in this paper for measuring the
values.
Static Metric
Static metrics cover the complexity and the criticality within an integrated
component. Static metrics are collected from static analysis of component
assembly. The complexity and criticality metrics are intended to be used
early during the design stage. The list of static metrics [16] is provided in
Table 1.
Dynamic metric
Dynamic metrics are gathered during execution of complete application.
Dynamic metrics are meant to be used at implementation stage. The
dynamic metrics are listed in Table 2 [15].
IJCSBI.ORG
Table 1. Static Metrics
Sl.no Metric Name Formulae
1 Component Packing Density Metric
4. IMPLEMENTATION
The E-Healthcare application is developed to measure the static metrics.
The application is designed with the number of components. The metrics are
applied with the application and the values are measured. There are five
modules in e-healthcare application.
IJCSBI.ORG
4.1 Admin
Admin module is used to store the user or doctor or admin details. Admin
has a responsibility to manage every record in the database.
4.2 Appointments and payments
This module is used to add, drop doctor details and help to get appointment
for users. Admin is a responsible person to add new doctor details. The
existing doctor also can be deleted by admin.
4.3 Diagnosis and health
Diagnosis and Health module is used to retrieve users diagnosis details.
The users who are all taking the treatment by using application, those users
information is store in the database.
4.4 First aid and E-certificate
This module is used to get blood bank details for the required blood group.
A first aid medicine detail for a particular disease is provided to the users.
The user can get treatment type which helps users for their emergency.
4.5 Symptoms and alerts
Symptoms and alerts module is used to check the BP level of the user. The
patient information is retrieved from database and their symptoms, causes
for disease are helps the users to prevent them from disease.
Appointments
and payments
IJCSBI.ORG
out manually with the application. With the help of database table, web page
form, components the metric values are calculated.
5. ANALYSIS
The analysis made to prove the CBSS has good reusability, maintainability
and independence.
The Component Packing Density Metric, Component Interaction Metrics
(Incoming, outgoing, Average), Criticality metrics analyses are as follows:
5.1 Component Packing Density Metric
CPD is used to measure the number of operation in which each component
contains.
The CPD is defined as a ratio of #constituent (LOC, object/classes,
operations, classes and/or modules) and #component
For this metric the no. of operation of each component is listed in Table 3.
Table 3. Component packing Density
S.No Component Name No. of operations
1 Admin 3
2 Appointments and payments 4
3 Diagnosis and health 4
4 Firstaid and e-certificate 6
5 Symptoms and alerts 5
6 DBHelper 1
7 EhealthBL 19
= 3+4+6+5+4+1+19/7= 42/7= 6
Hence, the CPD metric is helps to know the average number of operations
in each component contains.
5.2Component Interaction Density Metric
The CID is defined as a ration of actual interactions over potential ones. A
higher interaction density causes a higher complexity in the interaction [17].
The CID metric is applied on the E-Healthcare application. The measured
value of actual interactions in each component of E-Healthcare is illustrated
in Table 4.
IJCSBI.ORG
#I = no. of actual interactions
#Imax = no. of maximum available interactions.
Table 4. Actual interactions
S.No Name of the page No. of actual interactions
1 Registration.aspx 4
2 Postquestion.aspx 2
3 Search.aspx 5 i/p, 5 o/p
4 Doctormanagement.aspx 6
5 Diagnosis.aspx 1
6 Searchmedicine.aspx 2 i/p, 3 o/p
7 Medicine.aspx 5
8 Bloodbank.aspx 4
9 Firstaidsuggestion.aspx 2
10 Medicalcertificate.aspx 3
11 Treatmenttype.aspx 1 i/p, 2 o/p
12 Symptoms.aspx 1 i/p, 3 o/p
Total 51
The actual interaction value between other components is 51.
The maximum no. of available interaction with other component is 87
=51/87 = 0.586
This metric brings out the number of incoming and outgoing interactions
available in each component. This metric helps to know which component
has greater connectivity with other component.
5.3 Component Incoming Interaction Density
CIID is defined as a ratio of number of incoming interactions and maximum
number of incoming interactions. A higher interaction density causes a
higher complexity in the interaction. The no. of actual incoming interactions
in each component is shown in the Table 5.
IJCSBI.ORG
Table 5. Incoming Interactions
S.No Name of the page No. of incoming interactions
1 Registration.aspx 4
2 Postquestion.aspx 1
3 Search.aspx 5
4 Doctormanagement.aspx 4
5 Diagnosis.aspx 1
6 Searchmedicine.aspx 2
7 Medicine.aspx 5
8 Bloodbank.aspx 4
9 Firstaidsuggestion.aspx 1
10 Medicalcertificate.aspx 2
11 Treatmenttype.aspx 4
12 Symptoms.aspx 4
Total 37
The no. of incoming interaction value is 37.
The maximum no. of available incoming interaction value is 51. Out of 51
interactions only the 37 interactions are actually has link to the other
component.
= 37/51 = 0.725
CIID metric value 0.725 is clearly state the incoming interactions density
with other component is very high.
5.4 Component Outgoing interaction Density
COID is defined as a ratio of number of outgoing interactions and maximum
number of outgoing interactions. A higher interaction density causes a
higher complexity in the interaction. The number of outgoing interaction in
each component is shown in Table 6.
IJCSBI.ORG
4 Doctormanagement.aspx 1
5 Diagnosis.aspx 3
6 Searchmedicine.aspx 3
7 Medicine.aspx 1
8 Bloodbank.aspx 3
9 Firstaidsuggestion.aspx 1
10 Medicalcertificate.aspx 1
11 Treatmenttype.aspx 4
12 Symptoms.aspx 3
Total 28
The no. of outgoing interaction value is 28.
The maximum no. of available outgoing interaction value is 46. Only 28
outgoing interactions are actually connected with other components.
= 28/46 = 0.608
The calculated value is 0.608 proven that there is greater outgoing
interactions with the components.
5.5 Component Average Interaction Density
CAID represents the sum of CID for each component divided by the number
of components.
IJCSBI.ORG
Summation of CID for Component Admin is 7/16. Seven are actual
interactions out of sixteen. This component has a greater reliability.
Appointments and payments: Sum of interaction density of an
appointments and payments component is shown in Table 8. The sum is
considered both the incoming and outgoing interfaces in appointments and
payments component.
Firstaid and e-certificate: Table 10 shows the sum of CID value for
component called firstaid and e-certificates.
IJCSBI.ORG
1 Bloodbank.aspx : 1 out of 1
: 3 out of 3
2 Firstaidsuggestion.aspx : 1 out of 1
3 Medical certificate.aspx : 2 out of 4
4 Treatmenttype.aspx : 1 out of 1
: 3 out of 7
S.No Name of the page Sum of CID for symptoms and alerts
1 : 1 out of 1
Searchpatient.aspx : 3 out of 3
Component Average Interaction Density metric takes the ratio between sum
of each component and number of existing components.
= (7/16+10/12+7/9+11/17+4/4)/7
= 0.5279
The measured value for this metric proved that, greater reliability with the
components.
IJCSBI.ORG
So, the bridge_component value is 1.The value 1 is explicitly tells that, one
component is operates as a bridge component to all other component.
Root components
Symptoms and alerts (patient info inherited to diagnosis component)
EhealthBL (query is inherited from the basequery)
So, the root component value is: 2, this value is shows that, object oriented
programming concepts utilized between the components.
Threshold Value
The threshold value is fixed as 0.5 and it is used to compare the computed
value of each meric. The comparison with this threshold value is to check
IJCSBI.ORG
the metric value is increased or decreased in it reusability and good
maintainability aspects. Table 12 shows the result of compared with the
threshold value.
6. CONCLUSIONS
Building software systems with reusable components bring many
advantages to Organizations. Reusability may have several direct or indirect
factors like cost, efforts, and time. This paper discussed various aspects of
reusability for Component- Based systems. It has given an insight view of
various reusability metrics for Component-Based systems. The qualities of
components are correctly measured by applying metrics to an e-healthcare
in an electronic commerce domain. The component-based metrics result in
improving the quality of design components and developing the component
based system with good maintainability, reusability, and independence.
Most of the Metrics have future enhancement. That enhancements help to
add the features at the future. The demand of the new software applications
is currently increasing at the exponential rate. So the future enhancements
will help to fulfill those requirements. The Dynamic Metric analysis can be
applied to the component based software application and it can be validated.
Based on the applications the enhanced metrics can be proposed for the
component based software systems.
REFERENCES
[1] Dr. Nedhal A. Al Saiyd, Dr. Intisar A. Al Said, Ahmed H. Al Takrori, Semantic-Based
Retrieving Model of Reuse Software Component, IJCSNS International Journal of
Computer Science and Network Security, VOL.10 No.7, July 2010.
[2] Joaquina Martn-Albo, Manuel F. Bertoa, Coral Calero, Antonio Vallecillo, Alejandra
Cechich and Mario Piattini, CQM: A Software Component Metric Classification
Model, IEEE Transactions onJjournal Name.
IJCSBI.ORG
[3] Anas Bassam AL-Badareen, Mohd Hasan Selamat, Marzanah A. Jabar, Jamilah Din,
Sherzod Turaev, Reusable Software Component Life Cycle, International Journal of
Computers, Issue 2, Volume 5, 2011.
[4] Chintakindi Srinivas, Dr.C.V.Guru rao, Software Reusable Components With
Repository System, International Journal of Computer Science & Informatics, Volume-
1, Issue-1,2011
[5] Parvinder S.Sandhu, Harpreet Kaur, and Amanpreet Singh, Modeling of Reusability of
Object oriented Software System, World Academy of Science, Engineering and
Technology 56 2009.
[6] Sarbjeet Singh, Manjit Thapa, Sukhvinder singh and Gurpreet Singh, Sarbjeet Singh,
Manjit Thapa, Sukhvinder singh and Gurpreet Singh, International Journal of
Computer Applications (0975 8887) Volume 8 No.12, October 2010
[7] Linda L. Westfall, Seven steps to designing a software metrics, Principles of software
measurement services.
[8] K.S. Jasmine and R.Vasantha, DRE A Quality metric for Component Based Software
Products, World Academy of Science, Engineering and Technology 34 2007.
[9] Iqbaldeep Kaur, Parvinder S. Sandhu, Hardeep Singh, and Vandana Saini, Analytical
Study of Component Based Software Engineering, World Academy of Science,
Engineering and Technology 50 2009.
[10] Nael Salman, Complexity metrics as predicators of maintainability and integrability of
software components, Journal of arts and science, May 2006.
[11] Arun Sharma, Rajesh Kumar, and P. S. Grover, A critical survey of reusability aspects
for component-Based systems, World academy of science, Engineering and
Technology 33 2007.
[12] V. Lakshmi Narasimhan, P. T. Parthasarathy, and M. Das, Evaluation of a suite of
metrics for CBSE, Issues in informing science and information technology, Vol 6,
2009.
[13] Misook Choi, Injoo J. Kim, Jiman Hong, Jungyeop Kim, Component-Based Metrics
Applying the Strength of Dependency between Classes, ACM Journal, March 2009.
[14] Majdi Abdellatief, Abu Bakar Md Sultan, Abdul Azim Abd Ghani, Marzanah A.Jabar,
Component-based Software System Dependency Metrics based on Component
Information Flow Measurements, ICSEA 2011.
[15] Jianguo Chen, Hui Wang, Yongxia Zhou, Stefan D.Bruda, Complexity Metrics for
Component-based Software Systems, International Journal of Digital Content
Technology and its Applications. Vol.5, No.3, March 2011.
[16] V. Lakshmi Narasimhan, and Bayu Hendradjaya, Theoretical Considerations for Software
Component Metrics, World Academy of Science, Engineering and Technology 10
2005.
[17] E. S. Cho, M.S. Kim, S.D. Kim, Component Metrics to Measure Component Quality,
the 8th Asia-Pacific Software Engineering Conference (APSEC), Macau, 2001, pp.
419-426.
IJCSBI.ORG
ABSTRACT
As the number of user grows and the amount of information available becomes even bigger,
the information dissemination applications are gaining popularity in distributing data to the
end users. Selective Dissemination of Information (SDI) system distributes the right
information to the right users based upon their profiles. Typically, the exploitation of
Extensible Markup Language (XML) representation entails the profile representation, and
the utilization of the XML query languages assist the employment of queries indexing
techniques in SDI systems. As a consequence of these advances, mobile information
retrieval is crucial to share the vast information from diverse data sources. However, the
inherent limitations of mobile devices require information to be delivered to mobile clients
to be highly personalized consistent with their profiles. In this paper, we address the issue
of scalable filtering of XML documents for mobile clients. We describe an efficient
indexing mechanism by enhancing XFilter algorithm based on a modified Finite State
Machine (FSM) approach that can quickly locate and evaluate relevant profiles. Finally, our
experimental results show that the proposed indexing method outperforms the previous
XFilter algorithm in time aspect.
Keywords
XML, FSM, scalable filtering, SDI.
1. INTRODUCTION
Nowadays the SDI System becomes increasingly an important research area
and industrial topic. Obviously, there is a trend to create new applications
for small and light computing devices such as cell phones and PDAs.
Amongst the new applications, mobile information dissemination
applications (e.g. electronic personalized newspapers delivery, ecommerce
site monitoring, headline news, alerting services for digital libraries, etc.)
deserve special attention.
Recently, there have been a number of efforts to build efficient large-scale
XML filtering systems. In an XML filtering system [4], constantly arriving
streams of XML documents are passed through a filtering engine that
matches documents to queries and routes the matched documents
IJCSBI.ORG
accordingly. XML filtering techniques comprise a key component of
modern SDI applications.
XML [3] is becoming a standard for information exchange and a textual
representation of data that is designed for the description of the content,
especially on the internet. The basic mechanism used to describe user
profiles in XML format is through the XPath query language. XPath is a
query language for addressing parts of an XML document. However, this
technique often suffers from restricted capability to express user interests,
being unable to rightly capture the semantics of the user requirements.
Therefore, expressing deeply personalized profiles require a querying power
just like SQL provides on relational databases. Moreover, as the user
profiles are complex in mobile environment, a more powerful language than
XPath is needed. In this case, the choice is XML-QL. XML-QL [7] has
more expressive power compared to XPath and it is also measured the most
powerful among all XML query languages. XML-QLs querying power and
its elaborate CONSTRUCT statement allows the format of the query results
to be specified.
The rest of the paper is organized as follows: Section 2 briefly summarizes
the related works. Section 3 describes the proposed system architecture and
its components. The operation of the system that is how the query index is
created, the operation of the finite state machine and the generation of the
customized results are explained in Section 4. Section 5 gives the
performance evaluation of the system. Finally Section 6 concludes the
paper.
2. RELATED WORKS
We now introduce some existing XML filtering methods. XFilter [1] was
one of the early works. The XFilter system is designed and implemented for
pushing XML documents to users according to their profiles expressed in
XML Path Language (XPath). XFilter employs a separate FSM per path
query and a novel indexing mechanism to allow all of the FSMs to be
executed simultaneously during the processing of a document. A major
drawback of XFilter is its lack of expressiveness.
In addition, XFilter does not execute the XPath queries to generate partial
results. As a result, the whole document is pushed to the user when a
document matches a users profile. This feature prevents XFilter to be used
in mobile environments because the limited capability of the mobile devices
is not enough to handle the entire document. Also XFilter does not utilize
the commonalities between the queries, i.e. it produces a FSM per query.
This observation motivated us to develop mechanisms that employ only a
single FSM for the queries which have common element structure.
IJCSBI.ORG
YFilter [2] overcomes the disadvantage of XFilter by using
Nondeterministic Finite Automata (NFA) to emphasize prefix sharing. The
resulting shared processing provided tremendous improvements to the
performance of structure matching but complicated the handling of value-
based predicates. However, the ancestor/descendant relationship introduces
more matching states, which may result in the number of active states
increasing exponentially. Post processing is required for YFilter.
FoXtrot [5] is an efficient XML filtering system which integrates the
strengths of automata and distributed hash tables to create a fully distributed
system. FoXtrot also describes different methods for evaluating value-based
predicates. The performance evaluation demonstrates that it can index
millions of queries and attain an excellent filtering throughput. However,
FoXtrot necessitates the extensions of the query language to reach the full
XPath or the powerful expressiveness for user profiles.
NiagaraCQ system [6] uses XML-QL to express user profiles. It provides
the measures of scalability through query groups and cashing techniques.
However, its query grouping ability is derived from execution plans which
are different from our proposed method. The execution times of queries do
not make such planning a possible applicant for mobile environments.
Accordingly, our system will solve the above problems and reduce the
filtering time as much as possible.
IJCSBI.ORG
are automatically converted into a XML-QL format that can be efficiently
stored in the profile database and evaluated by the filtering system. These
profiles are effectively standing queries, which are applied to all incoming
documents. Filtered engine first creates query indices for user profiles and
then parses the incoming XML documents to obtain the query results. The
results are stored in a special content list, so that the whole document need
not be sent. Extracting parts of an XML document can save bandwidth in a
mobile environment. After that, filtered engine sends the filtered XML
documents to the related mobile clients.
3.1 Defining User Profiles with XML-QL
XML-QL has a SELECT WHERE construct, like SQL, that can express
queries, to extract pieces of data from XML documents. It can also specify
transformations that, for example, can map XML data between Document
Type Definitions (DTDs) and integrate XML data from different sources.
Profiles defined through a GUI are transformed into XML documents which
contain XML-QL queries as shown in Figure 2.
<Profile>
<XML-QL>
WHERE<course>
<major>
<name>ICT</name>
<program>First Year</program>
<syllabus>$n</syllabus>
</major></course> IN course.xml
CONSTRUCT<result><syllabus>$n</syllabus></result>
</XML-QL>
<PushTo> <address></address> </PushTo>
</Profile>
Figure 2. Profile syntax represented in XML containing XML-QL query
3.2 Filtered Engine
The basic components of the filtered engine are 1) An event-based XML
parser which is implemented using SAX API for XML documents; 2) A
profile parser that has an XML-QL parser for user profiles and creates the
Query Index; 3) A Query Execution Engine which contains the Query Index
which is associated with Finite State Machines to query the XML
documents; 4) Delivery Component which pushes the results to the related
mobile clients (see Figure 3).
IJCSBI.ORG
Query
XML-QL Parser
Query nodes
Delivery
Query Index
Results
Events
IJCSBI.ORG
<!ELEMENT course (degree, major*)>
<!ELEMENT degree (#PCDATA)>
<!ELEMENT major(name, program, semester, syllabus*)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT program (#PCDATA)>
<!ELEMENT semester (#PCDATA)>
<!ELEMENT syllabus (sub-code, sub-title, instructor)>
<!ELEMENT sub-code (#PCDATA)>
<!ELEMENT sub-title (#PCDATA)>
<!ELEMENT instructor (#PCDATA)>
<root> <course>
<degree>Bachelor</degree>
<major><name>ICT</name>
<program>First Year</program>
<semester>First Semester</semester>
<syllabus>
<sub-code>EM-101</sub-code>
<sub-title>English</sub-title>
<instructor>Dr. Thiri</instructor>
</syllabus>
</major>
</course></root>
Figure 4. An example XML document and its DTD (course.xml)
The example queries and their FSM representations are shown in Figure 5.
Note that there is a node in the FSM representation corresponding to each
element in the query, and the FSM representations tree structure follows
from XML-QL query structure.
Query 1: Retrieve all syllabuses of first year program for ICT major.
WHERE <major> <name>ICT</><program>First Year</><syllabus>$n</>
Q1.2
IJCSBI.ORG
WHERE <syllabus> <sub-code>EM-101</><instructor>$s</>
Q2.2
Q2.1 Q2.2 Q2.3
Q2.1
</> IN course.xml Q2.3
CONSTRUCT<result><syllabus>$s</></>
IJCSBI.ORG
CL
Q1.1 Q3.1
major WL
CL
name WL
Q1.2 Q3.2
CL
program
WL
Q1.3 Q3.3
CL
Q2.1
syllabus WL
Q1.4 Q3.4
CL
sub-code
WL
Q2.2
CL
instructor
WL
Q2.3 Q3.5
Start Element Handler checks whether the query element matches the
element in the document. For this purpose it performs a level and an
attribute check. If these are satisfied, it either enables data comparison or
starts variable content generation. As the next step, the nodes in the WL that
are the immediate successors of this node are moved to CL.
IJCSBI.ORG
End Element Handler evaluates the state of a node by considering the states
of its successor nodes. Moreover, it generates the output when the root node
is reached. It also deletes the nodes from CL which are inserted in the start
element handler of the node. This provides backtracking in the FSM.
Element Data Handler is implemented for data comparison in the query. If
the expression is true, the state of the node is set to true and this value is
used by the End Element Handler of the current element node.
End Document Handler signals the end of result generation and passes the
results to the Delivery Component.
4.3 Generating Customized Results
Results are generated when the end element of the root node of the query is
encountered. Therefore, content lists of the variable nodes are traversed to
obtain content groups. These content groups are further processed to
produce results. This process is repeated until the end of the document is
reached. The results require to be formatted as defined in the CONSTRUCT
clause. After all, the queries results are sent to the related mobile clients.
5. PERFORMANCE EVALUATION
In this section, we conducted three sets of experiments to demonstrate the
performance of the architecture for different document sizes and query
workloads. The graph shown in Figure 7 contains the results for different
query groups, that is, the queries have the same FSM representation but
different constants, for the document course.xml (1MB). When the number
of queries on the same XML document is very large, the probability of
having queries with the same FSM representation increases considerably.
IJCSBI.ORG
query groups and that generating a single FSM per query group rather than
per query is well justified.
Figure 9. Execution time of queries for different number of query groups and
document sizes
Figure 9 shows the results for the execution times of queries which are
varied the number of query groups and the size of different documents. The
results indicate that performance is more sensitive to document size when
the number of query groups increases. Therefore, this result also confirms
the importance of the query grouping.
As final conclusion we can say that FSM approach proposed in this paper
for executing XML-QL queries on XML documents is a very promising
approach to be used in the mobile environments.
IJCSBI.ORG
6. CONCLUSIONS
Mobile communication is blooming and access to Internet from mobile
devices has become possible. Given this new technology, researchers and
developers are in the process of figuring out what users really want to do
anytime from anywhere and determining how to make this possible. In
addition, highly personalization is a very important requirement for
developing SDI services in mobile environment as the limited capability of
mobile devices is not enough to handle the entire documents. This paper
attempts to develop an efficient and scalable SDI system for mobile clients
based upon their profiles. We anticipate that one of the common uses of
mobile devices will be to deliver the personalized information from XML
sources. We believe that a querying power is necessary for expressing
highly personalized user profiles and for the system to be used for millions
of mobile users, it has to be scalable. Since the critical issue is the number
of profiles compared to the number of documents, indexing queries rather
than documents makes sense. We expect that the performance of the system
will still be acceptable for mobile environments for millions of queries since
the results of the experiments show that the system is highly scalable.
7. ACKNOWLEDGMENTS
The authors wish to acknowledge Dr. Soe Khaing for her useful comments
on earlier drafts of the paper. Our heart-felt thanks to our family, friends and
colleagues who have helped us for the completion of this work.
REFERENCES
[1] M. Altinel and M. Franklin, Efficient filtering of XML documents for selective
dissemination of information, Proc of the Intl Conf on VLDB, pp. 53-64, Sept 2000.
[2] Y. Diao, M. Altinel, M. Franklin, H. Zhang and P.M. Fischer, Path sharing and
predicate evaluation for high-performance XML filtering, ACM Trans. Database
Syst., 28(4), Dec 2003, pp. 467516.
[3] Extensible Markup Language, http://www.w3.org/XML/.
[4] I. Miliaraki, Distributed Filtering and Dissemination of XML Data in Peer-to-Peer
Systems, PhD Thesis, Department of Informatics and Telecommunications, National
and Kapodistrian University of Athens, July 2011.
[5] I. Miliaraki and M. Koubarakis, FoXtrot: distributed structural and value XML
filtering, ACM Transactions on the Web, Vol. 6, No. 3, Article 12, Publication date:
September 2012.
[6] J. Chen, D. DeWitt, F. Tian and Y. Wang, NiagaraCQ: a scalable continuous query
system for internet databases, ACM SIGMOD, Texas, USA, June 2000, pp.379-390.
[7] XML-QL: A Query Language for XML, http://www.w3.org/TR/1998/NOTE-xml-ql-
19980819.
IJCSBI.ORG
ABSTRACT
Digitization of image was a revolutionary step for the fields of photography and Image
processing as this made the editing of images much effortless and easier. Image editing was
not an issue until it was limited to corrective editing procedures used to enhance the quality
of an image such as, contrast stretching, noise filtering, sharpening etc. But, it became a
headache for many fields when image editing became manipulative. Digital images have
become an easier source of tampering and forgery during last few decades. Today users and
editing specialists, equipped with easily available image editing software, manipulate
digital images with varied goals. Photo journalists often tamper photographs to give
dramatic effect to their stories. Scientists and researchers use this trick to get theirs works
published. Patients diagnoses are misrepresented by manipulating medical imageries.
Lawyers and Politicians use tampered images to direct the opinion of people or court to
their favor. Terrorists, anti-social groups use manipulated Stego images for secret
communication. In this paper we present an effective method for detecting spatial domain
Steganography.
Keywords
Digital Image, Steganography, Secret Message, Cover image, Stego Image, Encryption,
Decryption, Steganalysis, Tampering, Histogram, JPEG.
1. INTRODUCTION
Significant advancements in digital imaging during the last decade have
added a few innovative dimensions to the field of image processing.
Steganography and watermarking are few such creative dimensions of
image processing those have gained wide popularity among the researchers.
Digital image watermarking techniques are generally used for authentication
purposes and are achieved by embedding a small piece of information into
copyrighted digital information. Steganography, on the other hand, hides a
large amount of data secretly into a digital medium and is generally used for
secret communications. It is one of the effective means of data hiding that
protects data from unauthorized or unwanted disclosure and can be used in
various field such as medicines, research, defence and intelligence for secret
data storage, confidential communication, protection of data from alteration
and disclosure and access control in digital distribution.
IJCSBI.ORG
Like every coin has two sides, all technological developments are associated
with both bad as well as good applications and Steganography is not an
exception to this [1]. Though there are many good reasons to use data hiding
techniques and these should be used for legitimate applications only but,
unfortunately, Steganography can also be used for illegitimate reasons [2].
For instance, someone trying to steal data can conceal it in another file and
send it out through an innocent looking email. The information stolen and
passed may be a patients confidential test reports, the tender information of
a company/ organization or even the defence plans of a country. No doubt,
terrorists and criminals can use this method to secretly spread their action
plans and though no evidence is yet established to this claim still, it is
claimed that, Steganography was used to pass the execution plan of the 9/11
WTC attack[3]. Since Steganography mainly targets innocent looking
digital images for their huge size and popularity therefore it has become an
important requirement of the time to differentiate between real innocent
images from those innocent looking Stego images. In this paper we are
suggesting an easier method that can be used to detect spatial domain LSB
Steganography by analyzing the histograms of digital images.
2. STEGANOGRAPHY
Steganography is a process of secret communication where a piece of
information (a secret message) is hidden into another piece of innocent
looking information, called a cover. The message is hidden inside the cover
in such a way that the very existence of the secret information remains
concealed without raising any suspicion in the minds of the viewers [4, 5].
And
IJCSBI.ORG
X :{C' x (K)} S --------- (eq. 2)
IJCSBI.ORG
formats (such as, GIF) as they offer the highest capacity and best overall
security [8].
The 0th bit-plane or the least significant bit-plane (LSB plane) is the plane
that consists of bits with minimum positional value (20= 1) and the MSB
plane (most significant bit plane or the 7th bit-plane) consists of all high
order bits. Therefore a pixel with gray value 225 can have bits
1,1,1,0,0,0,0,1 in 7th to 0th bit-planes respectively. Figure 2 shows different
bit-planes of the grayscale Lena image.
IJCSBI.ORG
IJCSBI.ORG
3. STEGANALYSIS
Steganalysis is the art and science of detecting messages hidden using
Steganography [9]. It can be considered to be a two-class pattern
classification problem which aims to determine whether a medium under
test is a cover or Stego [2]. Hence, the goal of steganalysis is to identify
suspected packages, determine whether or not they have a payload encoded
into them, and, if possible, recover that payload. Steganalysis is similar to
cryptanalysis with a slight difference. In cryptanalysis, it is obvious that
intercepted data contains a message therefore; the task here is just to find the
underlying message by decrypting the encrypted text. On the other hand
steganalysis generally starts with a huge set of suspected data files without
any prior information about whether they contain any hidden message or not
and if any secret message is there then which file contains it. Therefore,
steganalysis can be viewed to be a two step process. One is to reduce this
large set to a smaller subset of files those are most likely to have been
altered and the second is to separate the covert signal from the carrier.
Detection of suspected files is straightforward when the original,
unmodified carriers are available for comparison but when only a single
image is available; it becomes a tough problem to say whether it is
manipulated or not as steganography attempts to make the Stego
indistinguishable from the cover.
Distinguishing a Stego file from that of a real innocent file is a major task in
steganalysis. Because of its global nature, Steganography generally leaves
detectable traces in the mediums characteristics and careful examination of
the modified media can reveal the existence of some secret content in it
defeating the very purpose of Steganography, even if the secret content is
not exposed. This paper illustrates how Stego images can be easily
separated from the real innocent ones through histogram analysis
(Histanalysis).
3.1 Histogram
A digital image is a two dimensional function f(x, y) where, x and y are
spatial coordinates, f is the amplitude at (x, y) , also called the intensity or
gray level of the image at that point and x, y, f are finite- discrete quantities.
The histogram of a digital image with gray levels from 0 to L-1 is a discrete
function h(rk)=nk, where, rk is the kth gray level, nk is the number of pixels
in the image with that gray level, n is the total number of pixels in the image
and k = 0, 1, 2, , L-1. [10] In other words, a histogram plot gives the
number of counts of each gray level. In a histogram plot, the horizontal axis
corresponds to gray level values, rk and the vertical axis corresponds to the
values of h(rk)=nk or p(rk)=nk/n if the values are normalized. Histograms are
IJCSBI.ORG
the basis to various spatial domain processing and provide useful image
statistics. Information inherent in histograms is quite useful for a number of
image processing applications. Figure.4 shows a grayscale image and its
corresponding histogram.
Encoding
Decoding
IJCSBI.ORG
3.2.1 Discrete Cosine Transform
Given an image f(x, y) of size NxN, the forward discrete transform F (u, v)
and given the F (u, v), the inverse discrete transform f(x, y) can be obtained
by the general relations:
N 1 N 1
F (u, v) f ( x, y ) g ( x, y, u, v)
x 0 y 0
N 1 N 1
f ( x, y ) F (u, v)h( x, y, u, v)
x 0 y 0
Where, g(x, y, u, v) and h(x, y, u, v) are called the forward and inverse
transform kernels respectively. In case of discrete cosine transformation
(DCT), both the forward and inverse transform kernels are given by a single
relation [8]
(2 x 1)u (2 y 1)v
g ( x, y, u, v) h( x, y, u, v) (u ) (v) cos cos 2 N
2N
Where,
1
if k 0
N
(k )
2
if k 1,2.. N 1
N
In a DCT transformed block F (u, v), the upper left corner bit F (0,0)
represents the DC component and rest of the F (u, v) are called AC
components. Because significant signal energy of an image lies in the low
frequency DC components, those appear in the upper left corner of the DCT
blocks, and since the lower right values representing higher frequencies are
small enough and are often neglected without causing much visible
distortion to the image therefore, DCT offers compression.
3.2.2 Quantization
It is the process of approximating a continuous signal by a set of discrete
signals and in this step corresponding quantized blocks C (i, j) of the DCT
coefficient blocks D (i, j) are obtained using the formula
D(i, j )
C (i, j ) round
Q(i, j )
Where Q (i, j) is a quantization table of selected quality. After this
quantization process, values of high frequency AC coefficients of the DCT
IJCSBI.ORG
block are usually become zero providing a lossy compression. The non-zero
DC coefficients of each block are then coded separately using Hoffmans
loss-less entropy encoding algorithm. Given in the Figure 6 (b) are the DCT
coefficients of the matrix A (given in Figure 6(a) after quantized by Q50
(Figure. 6(c) and then rounded up.
[a b c]
Figure 6: A 8 x 8 matrix and its coefficients after DCT and Quantization steps
Natural photographs being true tone images, the gray levels in these images
generally show a continuous variation between a minimum and maximum
gray value. For example, in the above Lena image of figure 4, we have
pixels with every gray value between 11 and 242. But when one or more bit-
planes of an image are replaced by some other data, the gray counts for
certain bins increase/ decrease in random giving rise to a type of
discontinuity to the gray value counts. In the following Figure.7, image (a)
is an original image (without secret message embedded) captured in PNG
format and figure (b) shows the histogram of gray level counts of this image
where x-axis represents the bins and y-axis the gray level counts. Figure (e)
shows the histogram of the same image after saving it in JPEG. In both
these histograms (as given in (b) and (e) in the figure below), it can be seen
that the gray counts vary continuously between the minimum to maximum
bin. Figure.7 (c) is a stego image which is formed by replacing the LSB bits
of the image (a) with secret message and saved in TIFF format.
[a,b,c,d,e]
Figure 7: Histograms of original and Stego images saved in PNG, TIFF and JPEG
The histogram plot of this image is given in Figure.7 (d) and it is clear from
the plot that the gray counts for certain bins have increased (e.g., for some
bins between 50 and 100) whereas for some other bins that have decreased
introducing a type discontinuity to the histogram plot that seems as if two
different histograms are super imposed with each other. This discontinuity is
further increased as shown in Figure.7 (e), when the Stego image is saved in
IJCSBI.ORG
JPEG format and then the histogram is plotted. The gray level counts have
been clearly partitioned into different gray zones and the histogram shows a
clear sign of discreetness with alternate picks and valleys which is not an
usual phenomena for photographic images.
[j, k]
Figure 8: Histograms of 128x128 bit Lena (tiff) image without Stego and with Stego
performed.
IJCSBI.ORG
Figure.8 (a) shows the histogram of the original Lena (tiff) image, b is the
histogram of stegolena after the LSB plane is replaced with text and saved
in .tif format, c shows the histogram of LSB replaced stegolena saved in
JPEG format. d, e, f and h show the histograms of 2 bit-plane and 3-bit
plane replaced stegolena images (TIFF) and their JPEG counterparts
respectively. Histogram counts of the original Lena image and that of the
Stegolena are plotted in j and those of the stegolena after saving it in JPEG
is given in Figure.8 (k).
Though JPEG images with high compression ratio (low quality factor) also
produce histograms of the type Figure.8 (b) but as it is discussed in section
2.1, JPEG images are considered to be very poor candidates for spatial
Steganography and are generally not considered for this purpose. Secondly,
after data is embedded into bit planes, Stego images are never saved in lossy
compression formats such as JPEG as these formats discard the lowers order
bits (so also the secret messages embedded into those bits!) in order to
achieve compression. Thirdly, uncompressed natural images never show a
histogram pattern as that of Figure. 8(b) or 8(d). Therefore, when a raw
format image such as BMP, TIFF produces a histogram with such an
unexpected pattern, it can be suspected to be a Stego medium.
Today, digital images not only provide forged information but also work as
agents of secret communication. With the availability of a wide range of
easy Steganographic methods, these popular digital media are used for
secret data transmission, sometimes with legitimate goals and sometimes for
immoral purposes. Lots of work has been done on steganalysis and tamper
detection techniques and still researchers worldwide are working to
successfully detect manipulations made to digital images. In this paper we
have used an extremely easy but highly effective histanalysis method to
detect spatial domain digital image Steganography.
REFERENCES
[1] Minati Mishra, P. Mishra, MC Adhikary, Digital Image Data Hiding Techniques:
A Comparative Study, ANVESA- the journal of F. M. University, (ISSN 0974-
715X), Vol. 7, Issue 2, Pp.105-115, Dec 2012.
[2] Bin Li et al., A Survey on Image Steganography and Steganalysis, Journal of
Info. Hiding and Multimedia Signal Processing, ISSN 2073-4212, Vol-2, No-2,
pp142-172, Apr 2011.
[3] http://en.wikipedia.org/wiki/Steganography
[4] Neil F. Johnson: Steganography: Seeing the Unseen, George Mason University,
http://www.iitc.com/stegdoc/sec202.html
IJCSBI.ORG
[5] M. Mishra et al., High Security Image Steganography with modified Arnolds cat
map, IJCA, Vol.37, No.9:16-20, January 2012.
[6] M. Mishra, M. C. Adhikary, Digital Image Tamper Detection Techniques: A
Comprehensive Study, International Journal of Computer Science and Business
Informatics (ISSN: 1694-2108), Vol. 2, No. 1, Pp. 1-12, JUNE 2013.
[7] Aura, T., Invisible communication, Proc. of the HUT Seminar on Network
Security 95, Espoo, Finland, November 1995. Telecommunications Software and
Multimedia Laboratory, Helsinki University of Technology.
[8] Fridrich, J. and Du, R., Secure Steganographic Methods for Palette Images,
Proc. The 3rd Information Hiding Workshop, September 2830, Dresden,
Germany, LNCS vol. 1768, Springer-Verlag, New York, pp. 4760, 1999.
[9] http://en.wikipedia.org/wiki/Steganalysis
[10] RC Gonzalez, RE Wood: Digital Image Processing, 2nd Ed, PHI, New Delhi,
2006.
[11] http://en.wikipedia.org/wiki/JPEG
IJCSBI.ORG
Dusko Parezanovic
Assistant Gimnazija
Ivanjica, Serbia
Zoran Vucetic
Assistant Gimnazija
Ivanjica, Serbia
ABSTRACT
In this paper we present the experimental results that more clearly than any theory suggest
an answer to the question: when in detection of large (probably) prime numbers to apply, a
very resource demanding, Miller-Rabin algorithm. Or, to put it another way, when the
dividing by first several tens of prime numbers should be replaced by primality testing? As
an innovation, the procedure above will be supplemented by considering the use of the
well-known Goldbachs conjecture in the solving of this and some other important
questions about the RSA cryptosystem, always guided by the motto do not harm neither
the security nor the time spent.
Keywords
Public key cryptosystems, Prime numbers, Trial division, Miller-Rabin algorithm,
Goldbach conjecture.
1. INTRODUCTION
In asymmetric schemes [1] of protecting the confidentiality and integrity of
data there is a need for large prime numbers. For some tasks required
number of bits now exceeds 15,000, and it is still just a passing figure in the
endless game of those who protect data and those who attack them. It is
therefore quite clear that the time spent on detection of large prime numbers
must be as short as possible.
It would be best to check the divisibility of number n with all prime
numbers less than or equal to sqrt(n). However, with so many bits it's not
realistic. Therefore, the number which is tested to primality is previously
divided by several tens of first prime numbers and then, if it is not divisible
by any of these numbers, it is left to Miller-Rabin algorithm [1].
It is a very difficult task to find theoretically an optimum ratio of time
required for dividing the number and testing by Miller-Rabin algorithm.
IJCSBI.ORG
Perhaps a redundant task as well in terms of our needs, since in practical
tasks such as ours, we have no reason to pretend that computers do not exist,
that the experimentally obtained, very useful results are less valuable than
the values obtained theoretically.
As a useful tool for our task (minimizing the time required for detection of
prime numbers) we see the Goldbach conjecture [2], which states that every
(large for us) even number is the sum of two prime numbers. It may forever
remain a conjecture, or one day some talented mathematician may write a
book of hundreds of pages that will prove its truth, or some computer may
find the number for which it is not valid, and with that break the conjecture.
For those of us looking for large prime numbers none of these three matters.
We will, in any case, generate a random large even number 2n, of, say,
1024 bits, and detect a much smaller random prime number of, say, 128 or
256 bits, which is negligible in terms of time, and then verify that the
difference of those two numbers is a prime number. If so, we have a large
prime number. If not, we will repeat the procedure or we will use this
difference to generate the prime number closest to it by a combination of
dividing by first prime numbers and Miller-Rabin algorithm. Experimentally
we will ensure that the above procedures may also result in saving the time
required to detect a large prime number.
2. WHY WE NEED PRIME NUMBERS
The public key cryptography-PK [1][3], a major breakthrough in the field
of data secrecy and integrity protection, is mostly based on the assurance
which has never been mathematically proved that some mathematical
problems are difficult to solve. The two of them are particularly prominent
and used a lot.
Since we opted for RSA [1][3] mechanism we will point to one of them.
The multiplication of two large prime numbers is a one-way hash function
[1], which means that we can easily get their multiplication result. However,
factorization of that multiplication result with the aim of getting the prime
factors (factoring), turned out to be very difficult. This problem of
identifying private key d in the Public Key cryptography (PK), if we know
the public key and if it is the pair (n,e) are two equivalent problems [1][3].
Certainly, there are many other PK schemes, asymmetrical algorithms, apart
from RSA. They are based on the same problem which is difficult to solve
in practice if the number of digits is large enough, and by means of these
schemes a one-way function with trap door is created.
By technological development and progress in the field of algorithms for
whole number factorization, the need for larger and larger prime numbers
has been demonstrated. This means that their multiplication result will
consist of more and more digits. The competition between those who attack
IJCSBI.ORG
unprotected data and those who protect them using RSA mechanism
requires creation of the faster operations for dealing with large numbers.
The new arithmetic requires more efficient codes for addition, subtraction,
multiplication and division of large numbers and what is particularly
significant is to solve modular exponentiation in the most efficient way
possible [4].
This makes sense only if special attention is paid to the creation of ones
large (probable) prime numbers, since the use of such numbers available on
the Internet or in any other way is not in accordance with the very aim of
data protection. Since the process of large prime generation requires a lot of
time and computer resources [1][4], it is of particular interest to us to find a
way to avoid the application of the primality testing algorithm to the number
as much as possible.
3. EXPERIMENTAL RESULTS
In order to avoid unnecessary applications of Miller-Rabin algorithm to the
number in question, we resort to trial division by a few initial prime
numbers, since such a division take less time.
How far we should go with such a division is the that we are trying to
answer in this paper? For the theory of the matter is fully resolved.
However, that in practice we do not have much use.
The trial division takes less time then exponentiation [1][4], but it would
certainly be wrong to conclude that we should divide the number as long as
possible. It is very difficult to determine the real relation between the two,
since everything depends on the number we start with and odd numbers we
examine so as to generate a probable prime.
Therefore, we present two solutions that are probably irrelevant to theorists,
but it is very useful to people who have spent many nights to produce large
(probably) prime numbers using its own software [4].
3.1 Dividing by First Several Tens of Prime Numbers
In this paragraph we show the results of detection of prime numbers of 513,
1024 and 1500 bits, namely: without dividing by prime numbers, dividing
by first 10, 20, 30,, 100 prime numbers.
Example 1
If we start with number c with ones in places: c[512]:=1; c[255]:=1;
c[200]:=1;c[127]:=1; c[100]:=1; c[50]:=1; c[10]:=1; c[9]:=1; c[8]:=1;
c[7]:=1; c[2]:=1; c[1]:=1; c[0]:=1; by dividing and testing we intend to
detect first prime number with ones in places: c[512]:=1; c[255]:=1;
c[200]:=1;c[127]:=1; c[100]:=1; c[50]:=1; c[11]:=1; c[9]:=1; c[6]:=1;
c[3]:=1; c[2]:=1; c[1]:=1; c[0]:=1; as a result we have the following table:
IJCSBI.ORG
TABLE 1. The Timing of Detection of a Prime Numbers
a 353 110 91 81 73 72 67 66 65 65
b 0 10 20 30 40 50 60 70 80 90
c 1455 466 398 361 337 342 329 330 334 337
d 1455 453 375 334 301 297 276 272 268 268
e 0 13 23 27 36 45 53 58 66 69
a 63 62 61 61 61 60 58 57 57 56
b 100 110 120 130 140 150 160 170 180 200
c 326 328 331 337 343 345 343 344 353 358
d 260 255 251 251 251 247 239 235 235 231
e 66 73 80 86 92 98 104 109 118 127
IJCSBI.ORG
a 584 178 129 115 111 107 105 101
b 0 10 30 50 60 70 80 100
c 18144 5583 4251 3872 3778 3711 3734 3792
a 50 16 15 14 12 12 12
b 0 10 20 30 40 50 60
c 4805 1551 1458 1387 1175 1178 1256
d 4805 1538 1442 1345 1153 1153 1153
e 0 13 16 42 22 25 103
IJCSBI.ORG
through the upper part (numbers larger than n) which is in terms of time far
more demanding job than detection of prime numbers in the lower part
(numbers less than n).
Of course all of this is possible if there are enough pairs with simple
coordinates between all pairs of numbers (p, q), where p + q = 2n, p-prime
and q-odd number.
TABLE 4. Number of GC Pairs
Even number 220 221 225 226 227
Number of 4244 7492 83543 153881 283830
Pairs GC
Number of 43458 82125 1078257 2064123 3958400
pairs (*1)
% (GC) in (*1) 9.77% 9.12% 7.75% 7.45% 7.17%
The Table 4. shows that Goldbach conjecture can be a useful tool in our task
because there is a probability, though not large, of guessing the large prime
number. A possible loss of time in detecting the prime number for the first
coordinate is negligible because it is number less than n, and it is
particularly negligible compared to the possibility to immediately detect the
other simple coordinate- a large prime number. We can get more favorable
result, if we consider the set (*2 ) = {(p,r)} for given number 2n, p n-1, r
n + 1, where p is a prime number and r is an odd number from the set
{6*k+1, 6*k -1} and p + r = 2n. In any case, it is clear that by this process
we cannot increase the time of detecting a large prime number, while we can
significantly reduce it using favourable conditions.
With our own software we conducted an experiment whose aim was to find
all pairs (p, q) for given number 2n, p n-1, q n + 1, where p is a prime
number and q is an odd number and p + q = 2n (*1). Then, among these
pairs to find those in which the second coordinate is prime number (pairs of
Goldbach conjecture (GC)) and to measure the time of finding a number of
representations of number 2n which satisfy the Goldbach conjecture.
4. SOME FURTHER OBSERVATIONS
If we consider the time of finding all GC pairs of some even number 2n, we
can see that with the increase of number n, the time to find them
significantly increases.
IJCSBI.ORG
TABLE 5. Time to Find a GC Pairs
Even number 224 225 226 227
Number of 45752 83543 153881 283830
pairs GC
Time of finding 56 42 2h 1 14 3h 29 48 7h 18 17
the pairs GC
IJCSBI.ORG
almost impossible to factor the resulting products. RSA has became the
algorithm that most people associate with the notion of public key
cryptography. The technique literally produces public keys that are tied to
specific private keys. If Alice has a copy of Bobs public key she can
encrypt a message to him, and he uses his private key to decrypt it. RSA
also allows the holder of a private key to encrypt data with it so that anyone
with a copy of the public key can then decrypt it. While public decryption
obviously doesnt provide secrecy, the technique does provide digital
signature, which attest that a particular crypto transform was performed by
the owner of a particular private key [1][3].
RSA keys consist of three special numeric values that are used in pairs to
perform encryption or decryption. The public key value is generally a
selected constant that is recommended to be either 3 or 65537. After
choosing the public key we generate two large prime numbers P and Q. The
private key value is derived from P, Q, and the public key value. The
distributed public keying material includes the constant public key value
and the modulus N, which is the product of P and Q. The modulus is used
in both the encryption and decryption procedures when either the public or
private key is used. The original primes P and Q are discarded [1][3].
Key generation for the RSA encryption:
Each entity creates an RSA public key and a corresponding private key [1].
Algorithm
Each entity A should do the following:
Generate two large distinct random primes p and q, each roughly
the same size
Compute n = pq i = (p - 1)(q - 1).
Select a random integer e, 1 < e < , tako da je nzd(e, ) = 1
Use the extended Euclidean algorithm [1]. (Algorithm 2.107) to
compute the unique integer d, 1 < d < , such that ed 1 (mod
).
As public key is (n, e); As private key is d
Example 1
Let the message m [4]: rat or binary: m = (0)10100100110000101110100
Select the two 128 BD primes:
p:
1000000000000000000000000001000000000000000000000000000000000
0000000000000000100000000000000000000000000000000000000010101
011111.
IJCSBI.ORG
q:
1000000000000000000000000001000000000000000000000000000000000
0000000000000000100000000000000000000000000000100000000000111
001011
We count
n = pq:
1000000000000000000000000010000000000000000000000000001000000
0000000000000001000000000000000000000000001000100000000011100
1010100000001000000000111001010110000000000000000000000000000
0100000000011100101010000000000000000000010101011111100110100
00101010101.
= ( p - 1 )(q 1 ):
1000000000000000000000000010000000000000000000000000001000000
0000000000000001000000000000000000000000001000100000000011100
1010000000001000000000111001010010000000000000000000000000000
0100000000011100101000000000000000000000010101011110100110011
01000101100.
Public key e = 3 or binary 11.
Secret key d:
1010101010101010101010101101010101010101010101010101100000000
0000000000000001010101010101010101010101100000101010101111011
1000000000001010101011110111000010101010101010101010101010101
1010101011010000110101010101010101010101110001111110001000100
0101110011.
Encrypted message m:
1000100001111110101010110100101100001100010110010011100000010
1000000.
Example 2
Let the message m [4]: Ratne godine!
binary:
m=(0)10100100110000101110100011011100110010100100000011001110
11011110110010001101001011011100110010100100001
Encrypted message m:
1011111001011001110110101110101110011111011010011010100110000
0011011001000101010101000010100101001110001011010101101100111
1011000001001001100000100100110111110100100101110001001001010
0111101110011001100001000111000010101011010010101010010000010
1011110101.
IJCSBI.ORG
Now we point out the two possible connections between RSA and the
Goldbach conjecture.
4.2.1 The First Possibility
Only powerful computers can calculate (GC) numbers of 1024, 2048, or a
larger number of bits. We have no reasons not to believe that (*1) and (GC)
are greater and greater numbers and at the same time (probably) they are
unique for a given number 2n. Even if various even numbers have the same
representation it does not matter for us because we will create a table that
will contain in each row for a given even number the hash value (*1) and
(GC).
For large, probably prime numbers p1 and q1 we will calculate the number
2n = p1 * q1 +1.
For this number we will find (*1) and (GC) and their hash values h (*1) and
h (GC). The procedure will be repeated k times and the table of k rows (each
of which contains an even number and its corresponding values h (*1) and h
(GC)) will be, in a safe manner, delivered to users.
Instead of a pair as a public key (2n-1, e), we suggest that the first part of
the public key, instead of 2n-1, be h (*1) and h (GC), based on which the
user would set the number 2n by reading the table, and therefore the number
of 2n-1 (the RSA modulus) would be known.
It is clear that this procedure does not weaken the RSA. It just makes it
difficult for those who intend to reveal the secret key, because prior to the
use of algorithms for finding prime factors of the number 2n-1=p1*q1, that
number should be determined primarily, which is very difficult for large
numbers if we know only h (*1) and h(GC).
4.2.2 The Second Possibility
Another possibility would be publishing the number 2n, which implicitly
publishes (GC), too. (GC) may be (another part of the key pairs) public key
for RSA (in the standard label e) if gcd((GC),)=1, or the first number
greater than it that is relatively prime to , where =(p1-1)*(q1-1). This
would be a semi-public key cryptosystem, as the users in addition to the
secret key d, obtain in the same way, safely, the public key e, while others
who have bad intentions must first find (implicitly published) public key e,
which is a very demanding job in terms of time, and only then they may
access the disclosure of the secret key d. It is clear that in the meantime, we
can change the parameters of RSA and thus further complicate efforts to
breach confidentiality and integrity of our data.
5. CONCLUSIONS
This work, too, is in line with our belief that it is necessary that each country
protects the confidentiality and integrity of its data using its own software
IJCSBI.ORG
[7]. Good experts are a prerequisite for this, and they cannot exist without
the increased interest of young people in cryptography. We believe that
there is no such an interest without more interesting approach to
cryptography, and encryption of cryptographic algorithms and
experimentation with own software is the best way for that. To this end we
have written this paper dealing with such an important topic for practical
cryptography: Minimizing the time of detection of large (probably) prime
numbers.
The consideration of our problem naturally led to the Goldbachs conjecture
[2]. We have noticed that the Goldbachs conjecture can find its place in
cryptography because its assumed property can only reduce the detection
time (not increase it). It is quite possible that Goldbachs conjecture can
play an important role in hindering the intentions of an unauthorized user to
find out the secret key mathematically, and thus to compromise the integrity
and confidentiality of our data.
Regarded more widely, we believe that the using of Goldbachs conjecture
can slow down the trend of massive transition from RSA to ECC. In that
case, the increase in the number of bits would no longer be the only asset of
RSA.
REFERENCES
[1] A. Menezes, P.C. Van Oorschot, S. Vanstone, Handbook of Applied Cryptography,
CRC Press, New York, 1997.
[2] Goldbach, C., Letter to L. Euler, June 7, 1742.
[3] R. Smith, Internet Cryptography, Addison-Wesley, Reading, MA, October 1997.
[4] D. Vidakovic, Analysis and implementation of asymmetric algorithms for data
secrecy and integrity protection, Master Thesis (mentor J. Golic), Faculty of
Electrical Engineering, Belgrade, Serbia, 1999.
[5] Koblitz N., Elliptic Curve Cryptosystems, Mathematics of Computation, 48, pp. 203-
209,1987.
[6] D. Vidakovic, D. Parezanovic, Generating keys in elliptic curve cryptosystems,
International Journal of Computer Science and Business Informatics, Vol. 4, No 1.
August 2013
[7] D. Vidakovic, D. Simic : A Novel Approach to Building Secure Systems, Second
International Conference on Availability, Reliability and Security, In 1th IEEE
International Workshop on Secure Software Engineering (SecSE 2007), Vienna, 2007.,
Austria, pp 1074-1081
IJCSBI.ORG
Nabil Messaoudi
University of Khenchela, MISC Laboratory University, of Constantine, Algeria
Dalal Bardou
University of Khenchela, Algeria
Allaoua Chaoui
MISC Laboratory, University of Constantine, Algeria
ABSTRACT
UML 2 sequence diagrams are a well-known graphical language and are widely
used to specify the dynamic behaviors of transaction-oriented systems. However,
sequence diagrams are expressed in a semi-formal modeling language and need a well-
defined formal semantic base for their notations. This formalization enables analysis and
verification tasks. Many efforts have been made to transform sequence diagrams into
formal representations including Petri Nets. Petri Nets are a mathematical tool
allowing formal specification of the system dynamics and they are commonly used in
Model Checking. In this paper, we present a transformation approach that consists of
a source metamodel for UML 2 sequence diagrams, a target metamodel for Petri Nets and
transformation rules. This approach has been implemented using Atlas
Transformation Language (ATL). A Cellular Phone System is considered, as a case
study.
Keywords
UML 2, Sequence diagrams, Petri Nets, Model checking, Model transformation,
Metamodeling, Transformation rules, ATL.
1. INTRODUCTION
The Unified Modeling Language (UML) [2] is a general-purpose graphical
object-oriented modeling language that is designed to visualize, specify,
construct and document software systems in both structural and
behavioral aspects. UML is intended to be a common way of capturing
a n d expressing relationships and behaviors in a notation t h a t i s easy to
learn and efficient to write [17].
IJCSBI.ORG
IJCSBI.ORG
IJCSBI.ORG
This paper deals with transforming UML 2 sequence diagrams into Petri
Nets models for analysis and verification purposes by using some
transformation rules expressed in the ATL language. Our work is a step
forward in a project that is exploring means to define a semantics for
UML 2 communication diagrams.
3. THE BASIC METAMODELS
3.1 UML 2 Diagrams For Interaction
UML 2 divides diagrams into two categories: structural modeling
diagrams and behavioral modeling diagrams:
Structural diagrams illustrate the static features of a model. Static
features include classes, objects, interfaces and physical components. In
addition, they are used to model the relationships and dependencies
between elements. Structural diagrams include Class diagram, Object
diagram, and some others.
Behavioral diagrams describe how the modeled resources in the
structural diagrams interact and how they execute each other
capabilities. The behavioral diagram puts the resources in motion, in
contrast to the structural view, which provides a static definition of the
resources [16]. Behavioral diagrams include the Interaction diagrams,
Use Case diagram, Activity diagram, State Machine diagram and others.
IJCSBI.ORG
IJCSBI.ORG
IJCSBI.ORG
IJCSBI.ORG
IJCSBI.ORG
IJCSBI.ORG
IJCSBI.ORG
- Rule1: The role of this rule is the verification of the kind, the
number of operands d o e s not matter. From the class
CombinedFragment, places and transitions correspond to Parallel
transformation are initialized.
- Rule2: Its the same rule (Rule2) of Basic Interaction
Transformation, the only difference that we handle all cases to
connect Parallel begin transition with the places correspond to
first send and receive messages , and places correspond to last
send and receive message with Parallel end transition.
IJCSBI.ORG
rule Interaction2PetriNet{
from
s: SequenceDiagram!Interaction - It produces Petri Nets name-
to
p: PetriNet!PetriNet (name <- s.name)
}
rule Interaction{ from s:SequenceDiagram!Message to
l:PetriNet!Place( - It produces the initial send place-
name<-Send+ s.name +.Before =+ s.SourceLifeLineName +.Begin, id <-
s.MessageSendOrder),
n:PetriNet!Transition( -It produces the send transition-
name<-Send + s.name +(+ s.SourceLifeLineName+ ,+
s.TargetLifeLineName+ )),
r: PetriNet!Place ( It produces the final send place- name<-
s.SourceLifeLineName +:Send+ s.name +.After, id <-
s.MessageSendOrder+1),
m:PetriNet!Place( -It produces the middle place
name<- s.name),
t:PetriNet!Place ( -It produces the initial receive place
name<-Receive+ s.name +.Before =+ s.TargetLifeLineName +.Begin,
id<-s.MessageReceiveOrder),
p:PetriNet!Transition( -It produces the receive transition
name<-Receive + s.name +(+ s.SourceLifeLineName+ ,+
s.TargetLifeLineName+ )),
d:PetriNet!Place ( -It produces the final receive place name<-
s.TargetLifeLineName+:Receive+ s.name+.After , id<-
s.MessageReceiveOrder+1),
isp-st :PetriNet!PlaceToTransArc( -It produces for each arc their source
and target nodes, the same thing for the rest below
source <-l,
target <-n),
...
st-mp:PetriNet!TransToPlaceArc( source<-n,target<-m)
}
IJCSBI.ORG
rule Alt{
from
c:SequenceDiagram!CombinedFragment (c.IsAlt() and c.Has2Operand())
It checks if the combined fragment is named Alt and if it has two operands
to
pl:PetriNet!Place ( It produces the first operand begin place
name<-Alt part one:Begin),
sl:PetriNet!Place ( -It produces the second operand begin place
name<-Alt part two:Begin),
rl:PetriNet!Place( It produces the first operand end place
name<-Alt part one:End),
tl:PetriNet!Place ( -It produces the second operand begin place-
name<-Alt part two:end),
nl:PetriNet!Transition ( -It produces the first operand transition contains the
operands name-
name<- ConditionOne : + c.getFirstOperandName),
bl:PetriNet!Transition ( -It produces the second operand transition contains the
operands name-
name<- ConditionTwo :+ c.getSecondOperandName),
al:PetriNet!Transition( -It produces the first operand transition end contains the
operands end-
name<-ConditionOne:+ c.getFirstOperandName +.End ),
dl:PetriNet!Transition( -It produces the second operand transition end con-
tains the operands end- name<-ConditionTwo:+ c.getSecondOperandName
+.End ),
The arcs source and target nodes as Basic Interaction
Transformation
}
13
International Journal of Computer Science and Business Informatics
IJCSBI.ORG
rule AltCase1{
from
s:SequenceDiagram!Message(s.FirstSendMessage() and
s.FirstReceiveMessage() and s.IsPartOne() and s.IsTheLastSend() and
s.IsTheLastReceive()) It checks which part is the message and its extremities to
attach it with the right arcs
to
l:PetriNet!Place( name<-Send+ s.name +.Before =+
s.SourceLifeLineName +.Begin, id -s.MessageSendOrder),
n:PetriNet!Transition(
name-Send + s.name +(+ s.SourceLifeLineName+ ,+
s.TargetLifeLineName+ )),
r:PetriNet!Place (name-Send+ s.name +.After =+ s.SourceLifeLineName
+.End,
id - s.MessageSendOrder+1), m:PetriNet!Place(name-s.name),
t:PetriNet!Place (
name-Receive+ s.name +.Before =+ s.TargetLifeLineName +.Begin, id-
s.MessageReceiveOrder),
p:PetriNet!Transition(
name-Receive + s.name +(+ s.SourceLifeLineName+ ,+
s.TargetLifeLineName+ )), d:PetriNet!Place (
name-Receive+ s.name +.After =+ s.SourceLifeLineName +.End , id-
s.MessageReceiveOrder+1),
isp-st :PetriNet!PlaceToTransArc(source <-l,target <-n),... fsp1-
ft:PetriNet!PlaceToTransArc(
source<-r,
target<-thisModule.resolveTemp(thisModule.root,al)), frp1-
ft:PetriNet!PlaceToTransArc(
source<-d,target<-thisModule.resolveTemp(thisModule.root,al))
}
14
International Journal of Computer Science and Business Informatics
IJCSBI.ORG
rule Parallel{
from
c:SequenceDiagram!CombinedFragment (c.IsParallel())
to
pl:PetriNet!Place (
name<-Operator Parallel), pl2:PetriNet!Place (
name<-Operator Parallel), nl:PetriNet!Transition (
name<-Operator Parallel Begin), tl:PetriNet!Place (
name<-Operator Parallel), tl2:PetriNet!Place (
name<-Operator Parallel), kl:PetriNet!Transition (
name<-Operator Parallel End), pl-nl:PetriNet!PlaceToTransArc( source <-
pl,
target <-nl),
pl2-nl:PetriNet!PlaceToTransArc(
source <-pl2, target <-nl),
tl-kl:PetriNet!TransToPlaceArc(source <-kl,target <-tl),
pl2-nll:PetriNet!TransToPlaceArc(source <-kl,target <-tl2)
}
rule ParallelCase1 {
from
s:SequenceDiagram!Message(s.FirstSendMessage() and
s.FirstReceiveMessage() and s.IsTheLastSend() and s.IsTheLastReceive())
c:SequenceDiagram!CombinedFragment (c.IsParallel())
to l:PetriNet!Place(
name<-Send+ s.name +.Before =+ s.SourceLifeLineName +.Begin,id
<-s.MessageSendOrder),n:PetriNet!Transition(
name<-Send + s.name +(+ s.SourceLifeLineName+ ,+
s.TargetLifeLineName+ )), r:PetriNet!Place (
name<-Send+ s.name +.After =+ s.SourceLifeLineName +.End,id <-
s.MessageSendOrder+1),
m:PetriNet!Place(name<-s.name), t:PetriNet!Place (
name<-Receive+ s.name +.Before =+ s.TargetLifeLineName
+.Begin,id<-s.MessageReceiveOrder), p:PetriNet!Transition(
name<-Receive + s.name +(+ s.SourceLifeLineName+ ,+
s.TargetLifeLineName+ )), d:PetriNet!Place (
name<-Receive+ s.name +.After =+ s.SourceLifeLineName +.End
,id<-s.MessageReceiveOrder+1),
tl-kl:PetriNet!TransToPlaceArc(source <-kl,target <-tl),
pl2-nll:PetriNet!TransToPlaceArc(source <-kl,target <-tl2)... d-
kl:PetriNet!PlaceToTransArc(source<-d,
target<-thisModule.resolveTemp(thisModule.root,kl))
}
15
International Journal of Computer Science and Business Informatics
IJCSBI.ORG
16
International Journal of Computer Science and Business Informatics
IJCSBI.ORG
6. IMPLEMENTATION
We have chosen Atlas Transformation Language (ATL) [8][7] under the
Eclipse development platform [6] to express the transformation rules.
ATL is a model transformation language that contains a mixture of
declarative and imperative constructs. ATL is accompanied b y a set
of tools built on top of the Eclipse platform. According to the adopted
transformation process, the implementation of this process requires the
following steps:
1. The r e p r e s e n t a t i o n of the s o u r c e metamodel d e s c r i b e d
i n U ML2- s e q u e n c e diagram i n Ecore D i a gr a m T o o l which
generates An Ecore file named Sequence Diagram.ecore described in XMI
language [14].
2. The representation of the target metamodel described in Petri Nets in
Ecore Diagram Tool which generates an Ecore file named PetriNet.ecore
described in XMI language.
3. The representation of a model instance, i.e. a sequence diagram, of the
source metamodel in Ecore file.
4. Applying the rules of model transformation specified in ATL
language to the source model. This process generates an XMI file
containing a Petri Net describing formally the behavior of the source
sequence diagram.
17
International Journal of Computer Science and Business Informatics
IJCSBI.ORG
18
International Journal of Computer Science and Business Informatics
IJCSBI.ORG
Figure 9. An extract f r o m Petri Net for the phone system in abstract syntax
19
International Journal of Computer Science and Business Informatics
IJCSBI.ORG
REFERENCES
20
International Journal of Computer Science and Business Informatics
IJCSBI.ORG
21