Sunteți pe pagina 1din 15

Wireless Personal Communications (2019) 106:971–985

https://doi.org/10.1007/s11277-019-06171-x

Sequence Statistical Code Based Data Compression


Algorithm for Wireless Sensor Network

S. Jancy1 · C. Jayakumar2

Published online: 27 April 2019


© Springer Science+Business Media, LLC, part of Springer Nature 2019

Abstract
Sensors play an integral part in the technologically advanced real world. Wireless sensors
are which have powered by batteries with limited capacity. Hence energy efficiency is one
of the major issues with wireless sensors. Many techniques have been proposed in order
to improve sensor efficiency. This paper discusses to improve energy efficiency of sensor
through data compression. Sequence statistical code based data compression algorithm is
being proposed to improve the energy efficiency of sensors. SDC and FOST codes were
used in this algorithm in order to achieve better compression ratio. The simulation result
was compared with arithmetic data compression techniques. In the proposed algorithm
computation process is very simple than arithmetic data compression techniques.

Keywords  Data compression · Wireless sensor networks · Entropy coding · Arithmetic


data compression techniques

1 Introduction

1.1 Data Compression

Data compression involves encoding and modelling. It is a process in which bits struc-
ture of data is converted in a manner that it takes less space on the disk. It aids the
reduction of storage size of the data. This process is also called as bit rate reduction
or source reduction. Reference [1] DCT and DWT data compression techniques are
discussed. Reference [2] analyse the state of the art study of confidentiality preserv-
ing techniques for WSN. Reference [3] examine the problem of irregular energy con-
sumption. Reference [4] present the methodical and comprehensive classification of
the energy conservation schemes. Reference [5] evaluate the recent trends and devel-
opment in the use of wavelet in wireless communication. Reference [6] to propose the
saving energy clustering algorithm. Reference [7] analyse the signal processing tasks.

* S. Jancy
jancyphd16@gmail.com
1
CSE Department, Sathyabama University, Chennai, India
2
Sri Venkateswara College of Engineering, Chennai, India

13
Vol.:(0123456789)
972 S. Jancy, C. Jayakumar

Table 1  Compression ratio of File no. File size (bytes) Compression ratio Compression ratio
arithmetic coding and Huffman (arithmetic coding) (Huffman coding)
coding
1 52,331 1.12 1.06
2 57,206 1.10 1.25
3 37,137 1.58 1.51
4 55,050 1.13 1.09

Table 2  Compression ratio and compression time (for sequential code data compression (SDC) [18])
File no. File size Compressed file size Compression ratio Compres-
sion time
(ms)

1 35,060 14,036 1.7768 185


2 687,426 465,841 1.70568 480
3 7692 5521 1.8165 78
4 269,852 154,781 967 209

References [8–13] analyse the power consumption problem and hybrid routing algo-
rithm. With wide execution in computing services, data compression is majorly used
in data communication. In order to reduce the size of data several software solutions
and compression techniques are employed. An average compression ratio is normally
achieved with common types of data using the available techniques. The requirement
for a much smaller space for storage and faster communication outcome is of these
data compression techniques. Redundancy of data storage and storage costs are be lim-
ited through data storage compression (Table 1).

1.2 Entropy Coding

A coding scheme in which symbols get assigned to codes in order to match lengths of
code with the probabilities of the symbol is called as entropy coding. Characteristi-
cally symbols represented by codes in proportional to the negative logarithm of the
probability replace symbols represented by equal length codes. Hence the shortest
codes are used by the common symbols. In accordance with Shannon theorem − logb p
is symbols optimal code length. Where b is the number of symbols used to make out-
put codes and p is the input symbols probability. Huffman coding and arithmetic cod-
ing are two majorly used encoding techniques. Variable length code and fixed length
code are the two classifications of the types of code. In accordance with Shannon theo-
rem the entropy h of discrete random variable x is a major of amount of uncertainly
involves with the value of x.
∑ 1
H(x) = p(x) ⋅ log2
x∈X
p(x)

Information theory point of view, x—information source. And p(x)—probability


that symbol x in X will occur (Table 2).

13
Sequence Statistical Code Based Data Compression Algorithm… 973

1.3 Information Theory

Mathematical laws constrain the transformations of information. The two fundamental con-
cept of communication theory is addressed by the information theory. These are transmis-
sion rate of communication and data compression. In data compression the compression
limits is the entropy of data h. Channel capacity c is a rate limit of the transmission rate of
communication. Every communication scheme are position in between these two limits.
The term entropy is used by information which is encoded in a message [14]. In informa-
tion theory, entropy is a measure that allows for the evaluation of the level of randomness
in a string of symbols [15]. The two major entropy coding techniques are:

(A) Huffman coding.


(B) Arithmetic coding.

A. Huffman coding

The Huffman coding algorithm is used when the data compression is perform using
individual letter frequencies’. It is an optimal compression algorithm. The algorithm works
in the concepts of using few a bits to encode the letters which are more frequent than the
letters which are less frequent. This algorithm based on statistical coding. A symbols prob-
ability has a direct bearing on it representation length with Huffman compression coding
system, the more frequently used characters are assigned with smaller codes and less fre-
quently used characters with larger codes. This variable length coding system reduces the
size of the file which have been compressed and transfer. Drawback of the Huffman algo-
rithm compression ratio is based on the individual characters frequency (Fig. 1).

B. Arithmetic coding

Lossless and lossy data compression are the algorithms in which arithmetic coding is com-
monly used. The entropy coding technique encodes the frequency of the symbol with few a
bits than the symbols with lesser frequency. With arithmetic coding the input message which

60,000
40,000
20,000 Compression Ra o(Huffman Coding)
0
1 File size(bytes)
2 3 4

1 2 3 4
File size(bytes) 52,331 57,206 37,137 55,050
Compression
1.12 1.1 1.58 1.13
Ra o(Arithme c Coding)
Compression
1.06 1.25 1.51 1.09
Ra o(Huffman Coding)

Fig. 1  Compression ratio of arithmetic coding and Huffman coding

13
974 S. Jancy, C. Jayakumar

Fig. 2  Compression ratio and 800000


compression time for SDC 700000
600000
500000
400000
300000
200000
100000
0
File size Compressed Compression Compression
File Size Rao Time(ms)

Series1 Series2 Series3 Series4

Fig. 3  Compression ratio and 800000


compression time for PLDC 700000
600000
500000
400000
300000
200000
100000
0
File size Compressed Compression Compression
File Size Rao Time(ms)

Series1 Series2 Series3 Series4

is composed of symbols is converted into a floating point number which is greater than or
equal to zero, however less than 1. In order to characterize the symbols, arithmetic coding is
processing it relies on a model. The models task is to tell the encoder on the probability of a
character in a input message. If the probability of the characters in the message is given accu-
rately by the message will be encoded close to optimum. On the contrary, if the probabilities
of the symbol are misrepresented by the model, the encoder expands message instead of com-
pressing it [12]. With arithmetic coding the interval of real numbers between 0 and 1 represent
a message. The interval needed to represent the message varies inversely with the length of a
message. With variation the number of bits needed to specify that interval grows. References
[16, 17] propose the universal algorithm for sequential data compression and pixel scanning
method for data compression (Fig. 2).

1.4 Theory of Data Compression


( )
The data compression theory defined as x = x1 , x2 , x3 … … … xn  . Where the whole sens-
ing information is represented by x and the first character the sensor information is represented
by x1 , x2 (Fig. 3).
The following model represents a simple data compression.

13
Sequence Statistical Code Based Data Compression Algorithm… 975

INPUT DATA COMPRESSION ALGORITHM OUTPUT


OUTPUT DECOMPRESSION ALGORITHM INPUT

1.5 First Order Model

The first order model the characters are statically independent. Let pi be the probability of
the ith letter in the alphabet. Let m be the size of the alphabet The entropy is calculated as,


m
H=− pi log2 pi
i=1

Here m—Size of alphabets. pi—Probibility of each and every letter (Table 3).


Statistical distribution of English text.

1. First order statistic.


2. Second order statistic.
3. Third order statistic.

First order statistic table is:

A b C D E F G h

0.0651738 0.0124248 0.0217339 0.0349835 0.1041442 0.0197881 0.0158610 0.0492888


i J K l M N O p

0.0558094 0.0009033 0.0050529 0.0331490 0.0202124 0.0564513 0.0596302 0.0137645


Q r s T U v W X

0.0008606 0.0497563 0.0515760 0.0729357 0.0225134 0.0082903 0.0171272 0.0013692


Y Z Space

0.0145984 0.0007836 0.1918182

Table 3  Compression ratio and compression time (for packet level data compression (PLDC) [18])
File no. File size Compressed file size Compression ratio Compres-
sion time
(ms)

1 35,060 13,300 1.8339 184


2 687,426 415,921 1.95940 460
3 7692 5060 1.6601 67
4 269,852 12,589 1.5378 190

13
976 S. Jancy, C. Jayakumar

Fig. 4  Compression ratio and 800000


compression time for SECSDC 700000
600000
500000
400000
300000
200000
100000
0
File size Compressed Compression Compression
File Size Rao Time(ms)

Series1 Series2 Series3 Series4

Table 4  Compression ratio and compression time [for proposed algorithm (SECSDC)]


File no. File size Compressed file size Compression ratio Compres-
sion time
(ms)

1 35,060 12,300 1.5980 143


2 687,426 315,921 1.83940 410
3 7692 4960 1.4301 54
4 269,852 11,519 1.4378 132

2 Proposed Algorithm (SECSDC)

The proposed algorithm uses first order static code (FOST) and sequence code (SDC)
(Fig. 4, Table 4).
Steps for Proposed Algorithm:

1. Sequence code is generated and assign for the given input data.
2. For the given input data the first order static code is assigned.
3. The difference between sequence code and first order statistic is calculated.
  ΔD = SCD − FOSD
 SCD—sequential coded.
 FOSD—first order statistic coded.
 D—difference.
4. Double digits and single digits are segregated from SDC.
5. All double digits are converted into single digits once segregated.
6. A location table is generated for every double and single digits.
7. Once converted from double digits to single digits the derived value is with the corresponding SDC
code.
8. Steps 3, 4 5, 6, 7 are repeated once the sequence digits are assigned.
9. This process continued until all the input data, SDC are converted into single digit.
10. The SDC of the resultant single digits are derived.

13
Sequence Statistical Code Based Data Compression Algorithm… 977

11. The FOST of the SDC is identified.


12. Only the integer part FOST is taken into account. The integer part is converted into binary digit.
13. Once the data is converted into binary value the resultant values are added.
14. Using this procedure the final compressed data is achieved.

3 Methodology for Proposed Algorithm (SECSDC)

Generate sequence code for all alphabets:

A b C D E F g h i J

1 2 3 4 5 6 7 8 9 10
K l M N O P q r s T

11 12 13 14 15 16 17 18 19 20
U v W X Y Z

21 22 23 24 25 26

Example:
The input data is “PROCEDURE”.
Assign sequence code for the given input data.

P r O C E d u r E

16 18 15 3 5 4 21 18 5
Input data SDC FOST

P 16 0.0137645
R 18 0.0497563
O 15 0.0564513
C 3 0.0217339
E 5 0.0141442
D 4 0.0349835
U 21 0.0225134
R 18 0.0497563
E 5 0.1041442

Calculate the difference.


ΔD = SCD − First order statistic code
ΔP = 15.9862355 ΔR = 17.9502437
ΔO = 14.9435487 ΔC = 2.9782661
ΔE = 4.8958558 ΔD = 3.9650165
ΔU = 20.9774866 ΔR = 17.9502437
ΔE = 4.8958558

13
978 S. Jancy, C. Jayakumar

Segregate the double digit and single digits in sequential coded data.
Double digit

P—16 R—18
O—15 U—21
R—18

Single digit

C—3 E—5
D—4 E—5

All the double digits are converted into single digits.

P—16—7 R—18—9

O—15—6 U—21—3
R—18—9.

Generate the location table for double digits and single digits.

Location Double-digit Single digit

A[0]P YES –
A[1]R YES –
A[2]O YES –
A[3]C – YES
A[4]E – YES
A[5]D – YES
A[6]U YES –
A[7]R YES –
A[8]E – YES

Once the double digits are converted into single digits, assign the corresponding SDC of
the single digits (Fig. 5, Table 5).

P − 16 − 7 − G − 0.0158610
R − 18 − 9 − I − 0.0558094
O − 15 − 6 − F − 0.0197881
U − 21 − 3 − C − 0.0124248
R − 18 − 9 − I − 0.0558094

13
Sequence Statistical Code Based Data Compression Algorithm… 979

Fig. 5  Comparison ratio of SDC, 40000


PLDC and proposed algorithm
35000 SDC
30000
25000 PLDC
20000
Proposed
15000
Algorithm
10000
Linear (SDC)
5000
0 Linear (PLDC)
-5000 0 2 4 6

-10000

Table 5  Comparison ratio of SDC, PLDC and SECSDC


Algorithm File size Compression ratio Compu-
tation
process

Arithmetic data compression 37,137 1.58 High


Huffman coding 37,137 1.51 High
SDC 35,060 1.7768 Low
PLDC 35,060 1.8339 Low
SECSDC 35,060 1.5980 Low

Generate the location table for double digits.

Location Double-digit Single digit

A[0]P(G) – YES
A[1]R(I) – YES
A[2]O(F) – YES
A[6]U(C) – YES
A[7]R(I) – YES

Obtain the difference between newly generate sequential coded data and the corresponding
FOST.

ΔG = 7 − 0.0158610 = 6.984139
ΔI = 9 − 0.0558094 = 8.9441906
ΔF = 6 − 0.0197881 = 5.9802119
ΔC = 3 − 0.0124248 = 2.9875752
ΔI = 9 − 0.0558094 = 8.9441906

13
980 S. Jancy, C. Jayakumar

Combine all the double digits.

P − 16 − 7 − G ⎫
R − 18 − 9 − I ⎪

O − 15 − 6 − F ⎬ 34 − 7 − G − 0.0158610
U − 21 − 3 − C ⎪
R − 18 − 9 − I ⎪

Obtain the difference between SDC and FOST:


ΔG = 7 − 0.0158610 = 6.984139
Combine all single digits.

C−3⎫
E−5 ⎪
D−4⎬
17 − 8 − H − 0.0492888

E−5 ⎭

Obtain the difference SDC and FOST ΔH = 8 − 0.0492888 = 7.9507112.


Therefore the derived values for “P R O C E D U R E” are G[6.984139], H[7.9507112].
Obtain only the integer part from these values. G[6] AND H[7].
The integer parts are converted into binary digits. G[110] AND H[111].
The integer parts after being converted [110, 111]. the derived binary digits are then added
110
111
1101

The Final compressed data value of “PROCEDURE” is 1101.

4 Calculation of Entropy Rate

Entropy rate for first pass:



9

8
H(P) = 15.9 log4 = 9.5.H(R) = 17.9 log4 = 10.7.
i=1 i=2


7

6
H(O) = 14.9 log4 = 8.9.H(C) = 2.9 log2 = 0.8.
i=3 i=4


5

4
H(E) = 4.8 log2 = 1.4.H(D) = 3.9 log2 = 1.1.
i=5 i=6


3

2
H(U) = 20.9 log4 = 12.5.H(R) = 17.9 log4 = 10.7.
i=7 i=8


1
H(P) = 4.8 log2 = 1.4.
i=9

13
Sequence Statistical Code Based Data Compression Algorithm… 981

Entropy rate for second pass:



9

8
H(P) = 6.9 log2 = 2.0.H(R) = 8.9 log2 = 2.6.
i=1 i=2


7

6
H(O) = 5.9 log2 = 1.7.H(C) = 2.9 log2 = 0.8.
i=3 i=4


5

4
H(E) = 4.8 log2 = 1.4.H(D) = 3.9 log2 = 1.1.
i=5 i=6


3

2
H(U) = 2.9 log2 = 0.8.H(R) = 8.9 log2 = 2.6.
i=7 i=8


1
H(P) = 4.8 log2 = 1.4.
i=9

Entropy rate for third pass:



2
H= 6.9 log2 = 2.0.
i=1


1
H= 7.9 log4 = 2.3.
i=2

The final entropy rate is:


H(2.0), H(2.3).
Entropy Bits/character

2.0 4.3
2.3

13
982 S. Jancy, C. Jayakumar

5 Algorithm for SECSDC Data Compression

13
Sequence Statistical Code Based Data Compression Algorithm… 983

6 Algorithm for SECSDC Data Decompression

7 Performance and Measurement

Data compression techniques can be measured by the following factors:


1.
Uncompressed size
Compression Ratio =
Compressed Size
2.
Compressed size
Saving space = 1 − .
Uncompressed Size
3.
Compression Time = The time taken by the algorithm to compress file.

8 Conclusion

Wireless Sensor Networks are made up of many sensors. Individual sensors are desig-
nated with unique processing capacity. These sensors are battery power. Energy effi-
ciency is wireless sensor network major disadvantage. In order to improve the efficiency

13
984 S. Jancy, C. Jayakumar

of wireless sensor network, compression techniques is majorly used. The Sequence Sta-
tistical Code based data compression is proposed this paper to achieve better compres-
sion ratio. This proposed algorithm used SDC and FOST codes. The computational pro-
cesses much better when compared with arithmetic coding.

References
1. Sheltami, T., Musaddiq, M., & Shakshuki, E. (2016). Data compression techniques in wireless sen-
sor networks. Future Generation Computer Systems, 64, 151–162.
2. Li, N., Zhang, N., Das, S. K., & Thuraisingham, B. (2009). Privacy preservation in wireless sensor
networks: a state-of-the-art survey. Ad Hoc Networks, 7, 1501–1514.
3. Li, J., & Mohapatra, P. (2007). Analytical modeling and mitigation techniques for the energy hole
problem in sensor networks. Pervasive and Mobile Computing, 3, 233–254.
4. Anastasi, G., Conti, M., Di Francesco, M., & Passarella, A. (2009). Energy conservation in wireless
sensor networks: A survey. Ad Hoc Networks, 7, 537–568.
5. Lskshmanan, M. K., & Nikookar, H. (2006). A review of wavelets for digital wireless communica-
tion. Wireless Personal Communications, 37, 387–420.
6. Chang, J.-Y., & Pei-Hao, J. (2012). An efficient cluster-based power saving scheme for wireless
sensor networks. EURASIP Journal on Wireless Communications and Networking, 2012, 172.
7. Xiao, J.-J., Ribeiro, A., Luo, Z.-Q., & Giannakis, G. B. (2006). Distributed compression-estimation
using wireless sensor networks. IEEE Signal Processing Magazine, 23(4), 41.
8. Alippi, C., Anastasi, G., Di Francesco, M., & Roveri, M. (2010). An adaptive sampling algorithm
for effective energy management in wireless sensor networks with energy hungry sensors. IEEE
Transcations on Instrumentation and Measurement, 59(2), 335–344.
9. Srisooksai, T., Keamarungsi, K., Lamsrichan, P., & Araki, K. (2012). Practical data compression in
wireless sensor networks: A survey. Journal of Network and Computer Applications, 35, 37–59.
10. Ravindra Babu, T., Narasimha Murty, M., & Agrawal, V. K. (2007). Classification of run-length
encoded binary data. Pattern Recognition, 40, 321–323.
11. Yick, J., Mukherjee, B., & Ghosal, D. (2008). Wireless sensor network survey. Computer Networks,
52, 2292–2330.
12. Abdulla, A. E. A. A., Nishiyama, H., & Kato, N. (2012). Extending the lifetime of wireless sensor
networks: A hybrid routing algorithm. Computer Communications, 35, 1056–1063.
13. Kolo, J. G., Ang, L.-M., Shanmugam, S. A., Lim, D. W. G., & Seng, K. P. (2013). A simple data
compression algorithm for wireless sensor networks. AISC, 188, 327–336.
14. Witten, I. H., Neal, R. M., & Cleary, J. G. (1987). Arithmetic coding for data compression. Com-
munication of the ACM, 30(6), 520–540.
15. Giancarlo, R., Scaturro, D., & Utro, F. (2012). Textual data compression in computational biology:
Algorithmic techniques. Computer Science Review, 6, 1–25.
16. Ziv, J., & Lempel, A. (1977). A universal algorithm for sequential data compression. IEEE Trans-
actions on Information Theory, 23(3), 337–343.
17. Kolo, J. G., Seng, K. P., Ang, L.-M., & Prabaharan, S. R. S. (2011). Data compression algorithm
for visual information. In ICIEIS 2011, Part III, CCIS (Vol. 253, pp. 484–497). Berlin: Springer.
18. Jancy, S., & Jayakumar, C. (2015). Packet level data compression techniques for wireless sensor
networks. Journal of Theoretical and Applied Information Technology, 75. ISSN:1992-8645.

Publisher’s Note  Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.

13
Sequence Statistical Code Based Data Compression Algorithm… 985

S. Jancy  received her B.Sc. in Information Technology from Bharathi-


dasan University in the year 2005, MCA (Master of Computer Appli-
cation) in the year 2008 and M.Tech. (Information Technology) in the
year 2011 from Sathyabama University. She is currently pursuing
Ph.D. with Sathyabama Institute of Science and Technology where she
is also working as an Assistant Professor.

Dr. C. Jayakumar  received his B.E. Degree in Electrical and Electronic


Engineering from Manonmaniam Sundaranar University in the year
1997. He received his M.E. Degree in Computer Science and Engi-
neering from Anna University in 2001. He received his Ph.D. degree
in 2006 from Anna University and currently works as a professor at Sri
Venkateshwara College of Engineering. His area of interest includes
Network, Mobile adhoc, Pervasive Computing and Wireless Sensor
Networks.

13

S-ar putea să vă placă și