Sunteți pe pagina 1din 5

2009 International Conference on Artificial Intelligence and Computational Intelligence

A Image Steganalysis Method Based on Characteristic Function Moments of Wavelet Subbands

Ziwen Sun, Hui Li, Zhijian Wu, Zhiping Zhou


School of communication and control engineering Jiangnan University Wuxi,china
sunziwen@jiangnan.edu.cn

AbstractIn this paper a universal steganalysis scheme is proposed for images. The scheme is based on the characteristic function moments of three-level wavelet subbands including the further decomposition coefficients of the first scale diagonal subband. The first three order statistical moments of each band are selected to form a feature vector for steganalysis. The Euclidean distance is used as the separability criterion to analysis the effectiveness of feature vectors for classification and the BP neural network is adopted as the classifier. Simulation results show the efficacy of our steganalyzer on several kinds of typical steganography algorithms. Compared to other well-known methods, the proposed scheme performs the best in attacking Jsteg, OutGuess, F5, JHide and S-Tools. Keywords-characteristic function noments; wavelet subbands; steganalysis

I.

INTRODUCTION

Information hiding technology is the art of transmitting secret information through cover media in such a way that the very presence of the covert message is deliberately concealed. Along with the rapid development of information technology, the application of information hiding techniques in the field of information security becomes more widespread. In order to prevent some of the illegal activities, the corresponding detection technology also developes rapidly, which is steganalysis technology. At present the various steganalysis methods can be divided into special steganalysis and general steganalysis. The first category is special steganalysis method which usually has higher detection rate. The second category is the blind detection that is independent of any special steganography algorithm. Given a group of image pixel or wavelet coefficient samples, two kinds of statistical moments have been used as features. The first is empirical probability density function (PDF) moments. The second is empirical characteristic function (CF) moments. In [1], Farid was the first person who proposed a universal steganalysis method to cope with the twin difficulties of unknown image statistics and unknown steganographic codes. Farid showed how a statistical model based on image first-order and high-order magnitude statistics moments derived from high-frequency wavelet subbands could be used to detect steganography in grayscale
978-0-7695-3816-7/09 $26.00 2009 IEEE DOI 10.1109/AICI.2009.185 291

images. The high-frequency subbands statistical moments are obtained as features for steganalysis that opened a new path for the studying of steganalysis. In [2], Lyu and Farid extended the statistical model to color images. In [3], Lyu and Farid extended the statistical model to include phase statistics. However, as the message size becomes small, the detection accuracy falls. In [4,5], Harmsen and Pearlman putted forward a steganalysis method based on histogram character function (HCF). By HCF and center of mass (COM), they proposed a method to attack spread spectrum steganographies in raw images. In [6,7],Liu et al. used statistical moments of differential CF and utilized technique of principal component analysis (PCA) to reduce the features dimension. In [8], Xuan et al. selected the first three order moments of CF of the subbands with the 3-level Haar wavelet decomposition, their experimental investigation had shown that it does not improve performance further if they used more than threelevel decomposition and/or used more than the first three order moments. And Bayes classifier was adopted to classify the testing images. In [9], applying HCF and moments of HCF to wavelet sub-bands of gray BMP images, Shi et al. designed a steganalytic method adopting the first three moments of CF derived from both the prediction-error image and the test image, and also their wavelet subbands, and artificial neural network was utilized as the classifier. By compared with [4], its performance has been improved. In [10], Mehrabi et al. decomposed the test image using threelevel Haar DWT into 13 subbands. The DFT of each subband was calculated and divided into low and high frequency bands. The first three statistical moments of each band were selected to form a 78-dimensional feature vector for Steganalysis. Testing results showed their scheme outperformed [1] and [4]. In [11], Wang presented CF moments of three-level wavelet coefficients are better choices than PDF moments of three-level wavelet coefficients in image steganalysis. The references above show the latter approach appears to be more successful. The second method for detection is adopted in our paper. At first we take the CF of the coefficients of the three-level wavelet decomposition subbands and the further decomposition of the first diagonal subband from image and its prediction error image respectively; second, adopt the first three moments of the CFs mentioned above to construct the feature vector, and

then train the BP neural network to obtain a classifier. Experimental results proved our method outperforms Harsmens method and Farids method. II. FEATURE EXTRACTION

M nA = h(t ) t dt
n
n

(4)

A. Prediction error image In steganalysis we just care for the implicit changes that caused by the embedding, for this change is so small that it is likely to be concealed by the noise, because of the continuity of the various nature images, there is high correlation between the adjacent pixels. If extract the feature of an image directly, the ability to distinguish whether a secrete message is embedded in or not and the different steganographical methods will be impacted by the image content itself. Therefore, in order to enhance the noise introduced by data hiding, we proposed to predict each pixel grayscale value in the original cover image by using its neighboring pixels' grayscale values and obtain a predictionerror image by subtracting the predicted image from the test image. It is expected that this prediction-error image removes various information other than that caused by data hiding, thus making the steganalysis more efficient because the hidden data are usually unrelated to the cover media. In other words, the prediction-error image is used to erase the image content. The relativity of the feature extracted from the prediction-error image becomes smaller. It can enhance the sensitivity of the steganalysis. Suppose the size of an image is M N and the current cover gray-scale pixel value is x , the predicted image pixel , the prediction-error image pixel value is value is x . The adjacent contexts of the cover image pixel x = x x value x are defined as a, b, c . c is the diagonal context of x . The prediction algorithm is expressed below [12]. if c min(a, b) max(a, b) (1) = min(a, b) x if c max(a, b) (a + b c ) otherwise B. The statistical moment of characteristic function The statistics of samples of a random process are completely described by the joint probability distributions: by the probability density function (PDF) for a continuousvalued random process and by the probability mass function (PMF) for a discrete-valued random process. If take the image DWT coefficients as a random variable, its distribution reflects on the statistical characteristics of the image, which is the coefficients histogram. In fact, the image histogram is the PMF. The CF of the image is simply the Fourier transform of the PDF h( x ) defined as follow:

h(t ) is weighted by t ,any change in the tails of h(t ) ,which correspond to the high-frequency components of h( x ) ,is thus polynomially amplified. The statistical A moment of CF , M n and M n are related to nth derivation of h( x ) at x = 0 by: dn n M n = j 2 n h( x) (5) dx x =0
And

M nA M n = 2
If a CF

dn h( x ) dx n x =0

(6)

h(t ) has heavy tails and M n is large, then the corresponding PDF h( x ) is peaky.
In our experiment we used the discrete form of CF. The K 1 K-point discrete CF {h(k )}k =0 is defined as:
M 1 j 2mk h( k ) = h(m) exp ,0 k K 1 (7) K m =0

] , {h(m)}M 1 is M-bin m =0 histogram, used to estimate the PDF h( x ) .


In the formula, K = 2 log 2
M

Harmsen and Pearlman[4] defined the nth absolute moment of the discrete CF by:

= M n

K 2 1 k =0

h( k ) k

(8)

We use the following nth moment of the discrete CF defined in [11]:


K 1 k = h(k ) sin n M n K k =0

(9)

and its nth absolute form is:

k A= M h(k ) sin n n K k =0
K 1
M 1 of the histogram {h(m)}m =0 .

(10)

A provides an upper bound on the discrete derivatives M n


C. Feature extraction and analysis 1) Feature extraction In this paper, image and its prediction error image are first decomposed into three-level through a Haar wavelet transform to obtain nine detail subbands: horizontal H i , vertical Vi and diagonal Di

h(t ) = E e jtx = h( x)e jtx dx

[ ]

(2)

For the CF

h(t ) , its nth moment is defined by:


+

M n = h(t )t n dt
and its nth absolute moment is given in (4):

(3)

(i = 1,2,3) ,and approximation

subbands Li (i = 1,2,3) . The further decomposition of the

292

first-scale diagonal subband D1 was proposed to improve the performance of the learning system[10]. So four extra ' subbands are obtained: lowpass L'2 , horizontal H 2 ,vertical
' . The reason for doing so is as follows V2' , and diagonal D2 D1 is the finest detail subband in the Haar wavelet

(i ) ( j) s ( xk(i ) , xl( j ) ) = [| xkm xlm |] j =1

(13)

transform, and each of its coefficients involves diagonal differences in a four-pixel block. The coefficients in the further decomposition of D1 involve more neighboring
' ' , V2' and D2 reveal more information pixels. Hence L'2 , H 2 about the differences between neighboring pixels. According to formula (10) we calculate CF moments, which include the first three order moments of : image itself and the nine detail subbands(horizontal H i ,vertical Vi or diagonal Di ,

i = 1,2,3 ) , three approximation subbands Li (i = 1,2,3) and


' ' , V2' and D2 . The same CF moments can be also L'2 , H 2 obtained for the prediction-error image. So we obtain 102 dimension features. 2) The illustration of the selected feature. The CF moments will be selected as features. The bigger the difference between coverimage moments and stegoimage moments, the more sensitively the corresponding moments are to embedding. In this section, well illustrate the effectiveness of the selected feature from the point of view of the between-class distance measure. The task of feature selection and extraction is to derive the most effective features for classification. Various types of samples can be separated from each other because of their feature spaces are located in different regions. The greater the distance between these regions the greater the classes can be classified. Here the distance between classes is used as the separability criterion. The D-dimensional feature vector

where m is the serial number of features. When s = 2 the Euclidean distance is gained. We use the average Euclidean distance to analysis the effectiveness of vectors for classification. We randomly select 100 pairs of coverimages and stegoimages with different embedding rates messages using different embedding tools of F5 JPHide JSteg and Outguess. Compute the average Euclidean distances of first three CF moments of various subbands combinations for the selected images. The CF moments of all subbands are scaled to range [-1,1] before computing distances. In the case of different embedding rates messages and various embedding tools, the simulational results show the largest average distance is obtained when the total 102 moments mentioned above are selected as features. From the comparison, it can be seen the selected CF moments are effective, particularly in the prediction-error image the CF moments also have significantly changed. By comparing and analyzing the images we get the total 102 moments as a feature vector. III. BP NEURAL NETWORK CLASSIFIER The design of classifier in steganalysis is also a very important part. In our experiments BP neural network classifier is used. Compared to other linear classifiers, the learning ability of BP neural network is greatly superior. The hidden layer neurons in the beginning of the formula is based on experience formula (14): (14) n 2 = n1 + l + where n2 is the hidden layer nodes, n1 is input layer nodes, l is output layer nodes. is a constant between 1 and 10. According to the formula (14) we can get an experience value for the hidden layer. Then in the training process of constant revision, we get the number of hidden layer is 11 which is better than others. The function used in hidden layer is tan-sigmoid. The output layer neurons are set to 2, containing secret information is set to 1, which does not contain secret information is set to -1, and the log-sigmoid function is used. In the process of classification we use Euclidean Linear discriminate law. We take average points of the two categories as the "typical model" of the various types, and then divide the samples into the recent "typical model" that represent the belonged category. Classification rules are as follows: the mean value for the class 1 eigenvector is m1, the mean value for the class

(i ) xk , xl( j ) are the kth sample of class i and the lth sample

of class

respectively and

(i ) ( xk , xl( j ) ) is

the distance

between the two vectors. So the average distance between feature vectors of various classes is:

J d ( x) =

c 1 c 1 Pi Pj 2 i =1 j =1 ni n j

( x
k =1 l =1

ni

nj

(i ) k

, xl( k ) )

(11)

where c is the number of classes, of samples of class


*

ni and n j are the number

and

respectively, Pi and Pj are

the priori probability of corresponding classes. The selected feature vector x should enable the largest average distance between samples of all c classes, that is:

J ( x * ) = max( J d ( x))

(12)

eigenvector is m2, x is the

There are various distance measures to compute ( xk(i ) , xl( j ) ) , s order Minkowski measures are commonly used:

eigenvector of a sample. Then

x 1 x 2

if

x m1 < x m2 otherwise

(15)

Here assume all the distance scales are Euclid distance.

293

If using the mean value of the two average points mc:

m + m2 mc = 1 2

(16)

then (15) can be simple as:

x 1 x 2
IV.

if

x < mc

otherwise
EXPERIMENTAL RESULTS

(17)

In order to detect the performance of this proposed method, we have made the following experiments with F5, Outguess, Jsteg, Jphide and S-Tool which are commonly used types of steganography algorithm. We randomly select 596 images from the CorelDraw image database. Before experiment the images were unified into size 256 * 256.Then a secret information was embedded in them by each steganography algorithm referred above to produce the stego images.
TABLE I. Steg. Tools Embedding rate 90% 50% Table 1. The experimental results Harsmens method 83.70% 58.40% 51.20% 64.40% 75.00% 56.20% 50.00% 60.40% 57.90% 52.70% 50.10% 53.60% 90.20% 79.20% 73.40% 80.90% 79.70% 73.40% 68.30% 73.70% Farids method 98.90% 78.60% 54.30% 77.30% 68.30% 52.90% 45.00% 55.40% 63.30% 54.70% 50.30% 56.10% 99.50% 87.30% 72.10% 86.30% 89.90% 80.10% 73.20% 81.10% Proposed method 89.40% 74.00% 55.60% 73.00% 74.70% 68.50% 56.80% 66.70% 76.40% 58.30% 52.20% 62.30% 98.90% 88.40% 79.90% 89.10% 94.40% 87.20% 72.70% 84.80%

proposed method more effectively, the 5 kinds of stego images were viewed as one kind. Randomly chose 396*5+396 images for training, the remaining 200*5+200 images for testing. Each type of test has been done a few times. The test results in table 1 are the average testing values. We also did the same experiments with Harsmens method[4] and Farids method[1]for compare. From the experimental results we can see, for the different embedding rates the detection rates are also very different. It can be seen that our proposed steganalysis method outperforms Harsmens method and Farids method. V. CONCLUSIONS The steganalysis success largely depends on the ability to identify the most changed statistics by embedding and to extract features being reliable and sensitive to these changes. This paper proposed a universal steganalysis method based CF moments. The steganalysis experiments towards several typical steganographical methods show that the proposed method receives good performance. But the detection rate under lower embedding rate is lower than that under higher embedding rate. So one of our future research directions is to find features that are more sensitive to low embedding rate situation. Our future work is the training of the different embedding rate of large quantities images, and the design of classifier, searching for better performance classifier. REFERENCES
[1] H.Farid, Detecting Hidden Messages Using Higher-order Statistical Models, Proc. IEEE Symp. Int'l Conf. on Image Processing (ICIP 2000), IEEE Press, Sep.2002,pp.905-908,doi:10.1109/ICIP.2002. 1040098. S. Lyu, and H. Farid, Steganalysis Using Color Wavelet Statistics and One Class Support Vector Machines, Proc. SPIE Symp. Electronic Imaging 5306, Security, Steganography, and Watermarking of Multimedia Contents VI, SPIE Press, Jan.2004,pp.35-45. doi:10.1.1.4.4010. S. Lyu, and H. Farid, Steganalysis Using Higher-order Image Statistics,Trans. Information Forensics and Security, vol.1, Jan. 2006,pp.111- 119,doi:10.1.1.116.1418. J.J. Harmsen,and W.A. Pearlman, Steganalysis of Additive Noise Modelable Information Hiding, Proc. SPIE Symp. Electronic Imaging 5022, Security,Steganography and Watermarking of Multimedia Contents VI, SPIE Press, Jan. 2003, pp.131-142, doi:10.1.1.12.1170. J.J. Harmsen, K.D. Bowers, and W.A. Pearlman, Fast Additive Noise Steganalysis, Proc. SPIE Symp. Electronic Imaging 5306, Security, Steganography, and Watermarking of Multimedia Contents VI, SPIE Press, Jan.2004,pp.489-495,doi:10.1117/12.526003. Z.G. Liu, L.D. Ping, J. Chen, J.M. Wang, and X.Z. Pan, Steganalysis Based on Differential Statistics, CANS 2006, Springer LNCS 4301, Springer Press,Dec. 2006, pp. 224-240. doi:10.1007/ 11935070 _16. Z.G. Liu, L.D. Ping, L. Shi, and X.Z. Pan, Steganalysis Based on Principal-component Features, Journal of Zhejiang University (Engineering Science), 41(12), 2007,pp.1991-1996. G.Xuan, Y.Q. SHI, J. GAO et al, Steganalysis Based on Multiple Features Formed by Statistical Moments of Wavelet Characteristic Functions, The 7th Information Hiding Workshop (IHW05), Springer LNCS 3727, Springer Press,Oct.2005,pp.262-277, doi:10.1.1.88.7384.

Jsteg

10% Average results 90% 50% 10% Average results 90% 50%

OutGuess

[2]

F5

[3]

10% Average results about 90% about 50%

[4]

JPhide

10% Average results about 90% about 50%

[5]

[6]

S-Tool

10% Average results

[7]

All the selected 596 original images were embedded data for each embedding manner, and 596 pairs of images (stego images and cover images) were received. Randomly choose 396 pairs of images for training, and the remaining 200 pairs of images for testing. In order to test the universality of the

[8]

294

[9]

Y.Q. Shi, G.R. Xuan, D.K. Zou, J.J.Gao, C.Y.Yang, Z.P.Zhang,et al, Image Steganalysis Based on Moment of Characteristic Function Using Wavelet Decomposition, Prediction-Error Image, and Neural Network, Proc. IEEE Symp. International Conference on Multimedia and Expo(ICME 2005), IEEE Press, July.2005, pp. 269272, doi: 10.1109/ICME.2005.1521412. [10] M.A. Mehrabi, K. Faez, A.R. Bayesteh, Image Steganalysis Based on Statistical Moments of Wavelet Subband Histograms in Different Frequencies and Support Vector Machine, Proc. IEEE Symp.Third International Conference on Natural Computation (ICNC 2007),

IEEE Press, Aug.2007,pp.587-590, doi:http://dx.doi.org /10.1109 /ICNC.2007.432. [11] Y. Wang, Detection- and Information-theoretic Analysis of Steganography and Fingerprinting, Ph.D. Urbana-Champaign Department of Electrical and Computer Engineering in the Granduate College of the University of Illinois at Urbana-Champaign,2006. doi:10.1.1.141.200. [12] M.J.Weinberger, G. Seroussi, and G. Sapiro, LOCOI: A Low Complexity Context-based Lossless Image Compression Algorithm, Proc. IEEE Symp. Data Compression Conference(DCC '96), IEEE Press,Mar/Apr. 1996,pp.140-149,doi: 10.1109/DCC.1996.488319.

295

S-ar putea să vă placă și