Documente Academic
Documente Profesional
Documente Cultură
Abstract—This work investigates a central problem in Somewhat less work exists exploring capacity with arbi-
steganography, that is: How much data can safely be hidden trary detection functions. These works are written from a
without being detected? To answer this question a formal steganalysis perspective[4], [5] and accordingly give heavy
definition of steganographic capacity is presented. Once this has
been defined a general formula for the capacity is developed. The consideration to the detection function.
formula is applicable to a very broad spectrum of channels due This work differs from previous work in a number of
to the use of an information-spectrum approach. This approach aspects. Most notable is the use of information-spectrum meth-
allows for the analysis of arbitrary steganalyzers as well as non- ods that allow for the analysis of arbitrary detection algorithms
stationary, non-ergodic encoder and attack channels. and channels. This eliminates the need to restrict interest
After the general formula is presented, various simplifications
are applied to gain insight into example hiding and detection to detection algorithms that operate on sample averages or
methodologies. Finally, the context and applications of the work behave consistently. Instead the detection functions may be
are summarized in a general discussion. instantaneous, that is, the properties of a detector for n samples
Index Terms—Steganographic capacity, stego-channel, ste- need not have any relation to the same detector for n + 1
ganalysis, steganography, information theory, information spec- samples. Additionally, the typical restriction that the channel
trum under consideration be consistent, ergodic or stationary is also
lifted.
I. I NTRODUCTION Another substantial difference is the presence of noise
before the detector. This placement enables the modeling of
A. Background common signal processing distortions such as compression,
HANNON’S pioneering work provides bounds on the quantization, etc. The location of the noise adds complexity
S amount of information that can be transmitted over a noisy
channel. His results show that capacity is an intrinsic property
not only because of confusion at the decoder, but also a signal,
carefully crafted to avoid detection, may be corrupted into one
of the channel itself. This work takes a similar viewpoint that will trigger the detector.
in seeking to find the amount of information that may be Finally, the consideration of a cover-signal and distortion
transferred over a stego-channel as seen in Figure 1. constraint in the encoding function is omitted. This is due
The stego-channel is equivalent to the classic channel with to the view that steganographic capacity is a property of the
the addition of the detection function and attack channel. For channel and the detection function. This viewpoint, along with
the classic channel, a transmission is considered successful if the above differences, make a direct comparison to previous
the decoder properly determines which message the encoder work somewhat difficult, although possible with a number of
has sent. In the stego-channel a transmission is successful not simplifications explored in Section V.
only if the decoder properly determines the sent message, but
if the detection function is not triggered as well.
C. Groundwork
This additional constraint on the channel use leads to the
fundamental view that the capacity of a stego-channel is This chapter lays the groundwork for determining the
an intrinsic property of both the channel and the detection amount of information that may be transferred over the chan-
function. That is, the properties of the detection function nel shown in Figure 1. Here, the adversary’s goal is to disrupt
influence the capacity just as much as the noise in the channel. any steganographic communication between the encoder and
decoder. To accomplish this a steganalyzer is used to intercept
steganographic messages, and an attack function may alter the
B. Previous Work signal.
There have been a number of applications of information We now formally define each of the components in the
theory to the steganographic capacity problem[ 1], [2], [3]. system, beginning with the random variable notation.
These works give capacity results under distortion constraints 1) Random Variables: Random variables are denoted by
on the hider as well as active adversary. The additional capital letters, e.g. X. Realizations of these random variables
constraint that the stego-signal retain the same distribution as are denoted as lowercase letters, e.g. x. Each random variable
the cover-signal serves as the steganalysis detection function. is defined over a domain denoted with a script X . A sequence
of n random variables is denoted with X n = (X1 , . . . , Xn ).
This work was carried out at Rensselaer Polytechnic Institute and was
supported by the Air Force Research Laboratory, Rome, NY. Similarly, an n-length sequence of random variable realiza-
J. Harmsen is now with Google Inc. in Mountain View, CA 94043, USA; tions is denoted x = (x1 , . . . , xn ) ∈ X n . The probability of
E-mail: jeremiah@google.com. X taking value x ∈ X is pX (x).
W. Pearlman is with the Elec. Comp. and Syst. Engineering Dept.,
Rensselaer Polytechnic Institute, Troy, NY 12180-3590, USA; E-mail: Following a signal through Figure 1 we begin in the space of
pearlw@ecse.rpi.edu. n-length stego-signals denoted X n . The signal then undergoes
SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY 2
Xn Yn Zn
fn (m) W n (y|x) An (z|y) φn (z)
Encoder Noise Attack Channel
Encoder Decoder
gn (y)
Steganalyzer
Yn
gn−1 ({0}) 4) Impermissible Set: The impermissible set I gn ⊆ Y n is
the inverse image of 1 under g n . That is,
gn
Pgn {0, 1}
Ign := gn−1 ({1}) = {y ∈ Y n : gn (y) = 1}. (4)
Ign gn−1 ({1}) For a given g n the impermissible set is the set of all signals
in Y n that gn will classify as steganographic.
Fig. 2. Permissible and Impermissible Sets
Example 1: Consider the illustrative sum steganalyzer de-
fined for the binary channel outputs (Y = {0, 1}). The
steganalyzer is defined for y = (y 1 , . . . , yn ) as,
n n
1, if i=1 yi > 2
some distortion as it travels through the encoder-channel. This gn (y) = (5)
0, else
results in an element from the corrupted stego-signal space of
Y n . Finally, the signal is attacked to produce the attacked The permissible sets for n = 1, 2, 3, 4 are shown in Table I.
stego-signal in space Z n .
TABLE I
2) Steganalyzer: The steganalyzer is a function g n : Y n → S UM S TEGANALYZER P ERMISSIBLE S ETS
{0, 1} that classifies a sequence of signals from Y n into one of P1 = {(0)}
two categories: containing steganographic information, and not P2 = {(0,0),(0,1),(1,0)}
containing steganographic information. The function is defined P3 = {(0,0,0),(1,0,0),(0,1,0),(0,0,1)}
as follows for all y ∈ Y n , P4 = {(0,0,0,0),(1,0,0,0),(0,1,0,0),(0,0,1,0),(0,0,0,1),
1, if y is steganographic (1,1,0,0),(1,0,1,0),(1,0,0,1),(0,1,1,0),(0,1,0,1),(0,0,1,1)}
gn (y) = (1)
0, if y is not steganographic
5) Memoryless Steganalyzers: A memoryless steganalyzer,
The specific type of function may be that of support vector g = {gn }∞ n=1 is one where each g n is defined for y =
machine or a Bayesian classifier, etc. (y1 , y2 , . . . , yn ) as,
A steganalyzer sequence is denoted as,
1, if ∃i ∈ {1, 2, . . . , n} such that g(yi ) = 1
gn (y) =
g := {g1 , g2 , g3 , . . .}, (2) 0, if g(yi ) = 0 ∀ i ∈ {1, 2, . . . , n}
(6)
where gn : Y n → {0, 1}. where g ∈ G1 is said to specify gn (and g). To denote a
The set of all n length steganalyzers is denoted G n . steganalyzer sequence is memoryless the following notation
3) Permissible Set: For any steganalyzer g n , the space of will be used g = {g}.
signals Y n is split into the permissible set and the impermis- The analysis of the memoryless steganalyzer is motivated
sible set, defined below. by the current real world implementation of detection systems.
The permissible set Pgn ⊆ Y n is the inverse image of 0 As an example we may consider each y i to be a digital image
under gn . That is, sent via email. When sending n emails, the hider attaches one
of the yi ’s to each message. The entire sequence of images
Pgn := gn−1 ({0}) = {y ∈ Y n : gn (y) = 0}. (3) is considered to be y. Typically steganalyzers do not make
use of entire sequence y. Instead each image is sequentially
The permissible set is the set of all signals of Y n that the processed by a given steganalyzer g, where if any of the y i
given steganalyzer, g n will classify as non-steganographic. trigger the detector the entire sequence of emails is treated as
Since each steganalyzer has a binary range, a steganalyzer steganographic.
sequence may be completely described by a sequence of Clearly for a memoryless steganalyzer g n , defined by g we
permissible sets. To denote a steganalyzer sequence in such have that,
a way the following notation is used, Pgn = Pg × Pg × · · · × Pg (7)
g∼
= {P1 , P2 , P3 , . . .},
n
That is, the permissible set of g n is defined by the n-
where Pn ⊆ Y n is the permissible set for g n . dimensional product of P g .
SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY 3
4) Memoryless Channels: In the case where channel dis- 1) Discrete Stego-Channel: A discrete stego-channel is one
tortions act independently and identically on each input letter where at least one of the following holds:
xi , we say it is a memoryless channel. In this instance the |X | < ∞, |Y| < ∞, |Z| < ∞, or |Pgn | < ∞ ∀n.
n-length transition probabilities can be written as,
2) Discrete Memoryless Stego-Channel: A discrete memo-
n
ryless stego-channel (DMSC) is a stego-channel where,
W n (y|x) = W (yi |xi ), (11)
1) (W, g, A) is discrete
i=1
2) W is memoryless
where W is said to define the channel. To denote a channel 3) g is memoryless
is memoryless and defined by W we will write W = {W }. 4) A is memoryless
SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY 4
A DMSC is said to be defined by the triple (W, g, A) and Similarly, the liminf in probability of a sequence of random
will be denoted (W, g, A) = {(W, g, A)}. variables, {Zn }∞n=1 is,
p- lim inf Zn := sup β : lim Pr {Zn < β} = 0 .
n→∞
G. Steganographic Capacity
The spectral sup-entropy rate of a general source X =
The secure capacity tells us how much information can {X n }∞
n=1 is defined as,
be transferred with arbitrarily low probabilities of error and
1 1
detection. H(X) := p- lim sup log n
. (15)
An (n, Mn , n , δn )-code (for a given stego-channel) consists n→∞ n p X (X )
n
of an encoder and decoder. The encoder and decoder are Analogously, the spectral inf-entropy rate of a general
capable of transferring one of M n messages in n uses of the source X = {X n }∞ n=1 is defined as,
channel with an average probability of error of less than (or 1 1
H(X) := p- lim inf log . (16)
equal to) n and a probability of detection of less than (or n→∞ n pX n (X n )
equal to) δn .
The spectral entropy rate has a number of natural properties
1) Secure Capacity: A rate R is said to be securely achiev- such as for any X, H(X) ≥ H(X) ≥ 0 [6, Thm. 1.7.2].
able for a stego-channel (W, g, A) = {(W n , gn , An )}∞ n=1 , if The spectral sup-mutual information rate for the pair of
there exists a sequence of (n, M n , n , δn )-codes such that: general sequences (X, Y) = {(X n , Y n )}∞ n=1 is defined as,
1) limn→∞ n = 0 1
2) limn→∞ δn = 0 I(X; Y) := p- lim sup i(X n ; Y n ), (17)
n→∞ n
3) lim inf n→∞ n1 log Mn ≥ R
where,
The secure capacity of a stego-channel (W, g, A) is de- pY n |X n (Y n |X n )
noted as C(W, g, A). This is defined as the supremum of all i(X n ; Y n ) := log . (18)
pY n (Y n )
securely achievable rates for (W, g, A).
Likewise the spectral inf-mutual information rate for the
pair of general sequences (X, Y) = {(X n , Y n )}∞ n=1 is
H. (, δ)-Secure Capacity defined as,
A rate R is said to be (, δ)-securely achievable for a stego- 1
I(X; Y) := p- lim inf i(X n ; Y n ). (19)
channel (W, g, A) = {(W n , gn , An )}∞ n=1 , if there exists a
n→∞ n
sequence of (n, M n , n , δn )-codes such that:
B. Information-Spectrum Results
1) lim supn→∞ n ≤
This section lists some of the fundamental results from
2) lim supn→∞ δn ≤ δ
information-spectrum theory [6] that will be used in the
3) lim inf n→∞ n1 log Mn ≥ R
remainder of the paper.
1
II. S ECURE C APACITY F ORMULA H(X) ≤ lim inf H (X n ) (20)
n→∞ n
A. Information-Spectrum Methods I(X; Y) ≤ H(Y) − H(Y|X) (21)
The information-spectrum method[6], [7], [8], [9], [10] I(X; Y) ≥ H(X) − H(Y|X) (22)
is a generalization of information theory created to apply
to systems where either the channel or its inputs are not C. Secure Sequences
necessarily ergodic or stationary. Its use is required in this 1) Secure Input Sequences: For a given stegochannel
work because the steganalyzer is not assumed to have any (W, g, A), a general source X = {X n }∞ n=1 is called δ-secure
ergodic or stationary properties. if the resulting Y = {Y n }∞
n=1 satisfies,
The information-spectrum method uses the general source
lim sup Pr {gn (Y n ) = 1} ≤ δ, (23)
(also called general sequence) defined as, n→∞
∞ or either of the following equivalent conditions,
(n) (n)
X := X n = (X1 , X2 , . . . , Xn(n) ) , (14)
n=1
lim sup pY n (Ign ) ≤ δ, (24)
(n) n→∞
where each Xm is a random variable defined over alphabet or
X . It is important to note that the general source makes no lim inf pY n (Pgn ) ≥ 1 − δ. (25)
assumptions about consistency, ergodicity, or stationarity. n→∞
The information-spectrum method also uses two novel quan- The set, Sδ , of all general sources that are δ-secure is
tities defined for sequences of random variables, called the defined as,
lim sup and lim inf in probability.
The limsup in probability of a sequence of random variables, Sδ := X : lim sup W n (Ign |x) pX n (x) ≤ δ , (26)
n→∞
{Zn }∞ n=1 is defined as, x∈X n
where X = {X n }∞
n=1 .
p- lim sup Zn := inf α : lim Pr {Zn > α} = 0 . The set for δ = 0 is called secure input set and denoted S 0 .
n→∞
SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY 5
2) Secure Output Sequences: For a given steganalyzer with J (R + γ|X) ≤ shows that lim supn→∞ n ≤ .
sequence g = {gn }∞ n ∞
n=1 , a general sequence Y = {Y }n=1 is Finally since X ∈ Sδ we have that,
called δ-secure if,
lim sup pZ n (Ign ) ≤ δ. (35)
lim sup Pr {gn (Y n ) = 1} ≤ δ, (27) n→∞
n→∞
Converse: Let R > C, and choose γ > 0 such that R −
The set, Tδ , of all δ-secure general output sequences is defined 2γ > C. Assume that R is (, δ)-achievable, so there exists
as, an (n, Mn , n , δn )-code such that,
n ∞
Tδ := Y = {Y }n=1 : lim sup pY n (Ign ) ≤ δ . 1
(28) lim inf log Mn ≥ R, (36)
n→∞ n→∞ n
The set for δ = 0 is called secure output set and denoted T 0 . lim sup n ≤ , (37)
n→∞
(W, g, A) TABLE II
Xn Yn Zn S ECURE C APACITY F ORMULAS
W n (y|x) gn (y) An (y|z)
Fig. 3. Stegochannels
F. Strong Converse
A stego-channel (W, g, A) is said to satisfy the -strong
converse property if for any R > C(0, δ|W, g, A), every
(n, Mn , n , δn )-code with,
Theorem 2.2 (Secure Capacity): The secure capacity
C(W, g, A) of a stegochannel (W, g, A) is given by, 1
lim inf log Mn ≥ R,
n→∞ n
C(W, g, A) = sup I(X; Z). (44)
X∈S0 and
Proof: We apply Theorem 2.1 with = 0 and δ = 0. lim sup δn ≤ δ,
n→∞
This gives,
we have,
C(W, g, A)
= C(0, 0|W, g, A) (45a) lim n = 1.
n→∞
= sup sup {R : J (R|X) ≤ 0} (45b)
X∈S0 Thus if a channel satisfies the -strong converse,
1 C(, δ|W, g, A) = C(0, δ|W, g, A),
= sup sup R : lim sup Pr i(X n ; Z n ) ≤ R ≤ 0 (49)
X∈S0 n→∞ n
(45c) for any ∈ [0, 1).
= sup I(X; Z) (45d) Theorem 2.6 (-Strong Converse): A stego-channel
X∈S0 (W, g, A) satisfies the -strong converse property (for
a fixed δ) if and only if,
Here the last line is due to the definition of p- lim inf.
Theorem 2.3 (Noiseless Encoder, Active Adversary): The sup I(X; Z) = sup I(X; Z). (50)
secure capacity of a stego-channel, (·, g, A), with a noiseless- X∈Sδ X∈Sδ
encoder and active adversary, denoted C(·, g, A), is given This proof is essentially the -strong converse[6], [7] with a
by, restriction to the secure input set. See details in Appendix A
C(·, g, A) = sup I(Y; Z). (46)
Y∈T0
G. Bounds
Proof: Apply Theorem 2.2 with X = Y and S0 = T0 .
Theorem 2.4 (Passive Adversary): The secure channel ca- We now derive a number of useful bounds on the spectral-
pacity with a passive adversary, denoted C(W, g) of a stego- entropy of an output sequence in relation to the permissible
channel (W, g, ·) is given by, set. These bounds will then be used to prove general bounds
for steganographic systems, and see further application in
C(W, g) = sup I(X; Y). (47) Chapter III.
X∈S0
Theorem 2.7 (Spectral inf-entropy bound): For a discrete
Proof: Since the adversary is passive, we have that Z = g = {Pn }∞n=1 with corresponding secure output set T 0 ,
Y.
1
Theorem 2.5 (Noiseless Encoder, Passive Adversary): sup H(Y) = lim inf log |Pn | (51)
Y∈T0 n→∞ n
The secure capacity of a stego-channel (·, g, ·), with a
noiseless-encoder and passive adversary, denoted C(·, g), is See Appendix B for proof.
given by, Theorem 2.8 (Spectral sup-entropy bound): For discrete
C(·, g) = sup I(X; Y). (48) g = {Pn }∞
n=1 with corresponding secure output set T 0 ,
X∈S0
1
Proof: Since the adversary is passive, we have that Z = sup H(Y) = lim sup log |Pn | (52)
Y∈T0 n→∞ n
Y, and since there is no encoder noise we have that X = Y
and S0 = T0 . See Appendix C for proof.
SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY 7
W
H. Capacity Bounds Here the final line follows since if X ∈ S 0 and X → Y then
This section present a number of fundamental bounds on Y ∈ T0 .
the secure capacity of a stego-channel based on the properties The next corollary specializes the above theorem when the
of that channel. permissible set is finite.
We make use of the following lemma, Corallary 2.1 (Discrete Permissible Set Bound):
Lemma 2.1: For a stego-channel (W, g, A) the following For a given discrete stego-channel (W, g, A) =
hold, {(W n , Pgn , An )}∞
n=1 the secure capacity is bounded
from above as,
I(X; Z) ≤ I(X; Y), (53)
1
I(X; Z) ≤ I(Y; Z). (54) C(W, g, A) ≤ lim sup log |Pgn | (61)
n→∞ n
Proof: We note that the general distributions form a
Proof: Combining Theorem 2.8 and line (59b) of Theo-
Markov chain, X → Y → Z 1 . A property of the inf- rem 2.10 gives the desired result.
information rate[7] is, The next theorem provides an intuitive result dealing with
I(X; Z) ≤ I(X; Y), (55) the capacity of two stego-channels having related steganalyz-
ers.
when X → Y → Z.
Since X → Y → Z implies Z → Y → X we also have, Theorem 2.11 (Permissible Set Relation): For two stego-
channels, (W, g, A) and (W, v, A) if P gn ⊆ Pvn for all
I(X; Z) ≤ I(Y; Z). (56) but finitely many n, then,
C(W, g, A) ≤ C(W, v, A). (62)
The first capacity bound gives an upperbound based on the
sup-entropy of the secure input set. Proof: Let {fn }∞ ∞
n=1 and {φn }n=1 be a sequence of
Theorem 2.9 (Input Sup-Entropy Bound): For a stego- encoding and decoding functions that achieves C(W, g, A).
channel (W, g, A) the secure capacity is bounded as, Such a sequence exists by the definition of secure capacity.
The following definitions will be used for i = 1, . . . , M n ,
C(W, g, A) ≤ sup H(X) (57)
X∈S0
ui = fn (i),
Proof: Using (21) and the property that H(X|Z) ≥ 0 we Di = φ−1
n ({i}) .
have,
The probability of error for this sequence is given by ( 12),
C(W, g, A) = sup I(X; Z)
(T2.2)
X∈S0 Mn
1
(21)
n = Qn (Dic |ui ) ,
≤ sup H(X) − H(X|Z) Mn i=1
X∈S0
≤ sup H(X) where Qn = An ◦ W n .
X∈S0 Clearly, this value is independent of the permissible sets and
if n → 0 for the stego-channel (W, g, A) then it also goes
The next theorem gives two upper bounds on the capacity to zero for (W, v, A).
based on the sup-entropy of the secure input and output sets. Next we know that the probability of detection for
Theorem 2.10 (Output Sup-Entropy Bounds): For a stego- (W, g, A) is given by (13),
channel (W, g, A) the secure capacity is bounded as,
Mn
1
C(W, g, A) ≤ sup H(Y) (59a) δng = W n (Ign |ui ) ,
X∈S0 Mn i=1
≤ sup H(Y) (59b) and that δng → 0.
Y∈T0
Since Pgn ⊆ Pvn for all n > N , we have that, I gn ⊇ Ivn
Proof: Using (21) and the property that H(Z|X) ≥ 0 we
if n > N and thus,
have,
W n (Ign |x) ≥ W n (Ivn |x) , ∀n > N, x ∈ X n . (63)
C(W, g, A) = sup I(X; Z)
X∈S0
(L2.1)
Using this we may bound the probability of detection for
≤ sup I(X; Y) (W, v, A) and n > N as,
X∈S0
Mn
(21) 1
≤ sup H(Y) − H(Y|X) δnv = W n (Ivn |ui )
X∈S0 Mn i=1
≤ sup H(Y) Mn
(63) 1
X∈S0
≤ W n (Ign |ui )
≤ sup H(Y) Mn i=1
Y∈T0
=δng
1X → Y → Z is said to hold when for all n, Xn and Zn are conditionally
independent given Y n . Since δng → 0 we see that δnv → 0 as well.
SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY 8
Xn = Y n = Zn
fn (m) W n (y|x) φn (y) Mn fn (m) φn (z) M̂n
A(y|x) B(z|y) To see this consider the two steganalyzers g and v. Assume
Noise A Noise B that g classifies signals with positive means as stegano-
graphic, while v classifies signals with negative means as
gn (y) vn (z) steganographic. If these detection functions were in series, the
Detection g Detection v permissible set (of the composite detection function) is empty
as a signal cannot have a positive and negative mean. Now
Fig. 5. Two Noise Channel consider a specific, deterministic distortion B n (−y|y) = 1.
Now we may send any signal we wish, as long as its mean is
positive. So in some instances, it is possible for the addition
I. Applications of a distortion to actually increase the capacity.
1) Composite steganalyzers: This final theorem of the pre-
vious section is intuitively pleasing and leads to some imme- III. N OISELESS C HANNELS
diate results. An example of this is the composite steganalyzer
pictured in Figure 4. This section investigates the capacity of the noiseless stego-
In this system two steganalyzers, g and v are used sequen- channel shown in Figure 6. In this system there is no encoder-
tially on the corrupted stego-signal. If either of these stegana- noise and the adversary is passive. This means that not only
lyzers are triggered, the message is considered steganographic. does the decoder receive exactly what the encoder sends, but
We will denote the composite stego-channel of this system as the steganalyzer does as well.
(W, h, A). This section finds the secure capacity of this system, and
As one would expect the capacity of the composite chan- then derives a number of intuitive bounds relating to this
nel, C(W, h, A), is smaller than either C(W, g, A) or capacity.
C(W, v, A). This is shown in the next theorem.
Theorem 2.12 (Composite Stego-Channel): For a compos-
A. Secure Noiseless Capacity
ite stego-channel (W, h, A) defined by g and v, the following
inequality holds, Theorem 3.1 (Secure Noiseless Capacity): For a discrete
noiseless channel (·, g, ·) the secure capacity is given by,
C(W, h, A) ≤ min {C(W, g, A), C(W, v, A)} . (65)
Proof: We first show that C(W, h, A) ≤ C(W, g, A). 1
C(·, g) = lim inf log |Pgn | (67)
n→∞ n
The permissible set of the composite is equal to the inter-
section of the base detection functions, Proof: The proof follows directly from Theorem 2.5 and
Phn = Pgn ∩ Pvn , ∀n, (66) Theorem 2.7.
Example 2 (Capacity of the Sum Steganalyzer): We now
thus we have that P hn ⊆ Pgn and we may apply Theorem 2.11 use this result to find the secure noiseless capacity of the
to state, parity steganalyzer of Example 1. The size of the permissible
C(W, h, A) ≤ C(W, g, A). set for n is equal to the number of different ways we may
arrange up to
n/2 1s into n positions.
The above argument may be applied using P hn ⊆ Pvn to
show C(W, h, A) ≤ C(W, v, A).
n
2) Two Noise Systems: We briefly present and discuss an |Pgn | = . (68)
i
interesting case that is somewhat counter-intuitive. Consider i:0≤i≤
n
2
the channel shown in Figure 5. In this case there is distortion
A after the encoder and a second distortion, B before the n−1 n 1
For n even |Pgn | = 2 + 2 and for n odd,
second steganalyzer. In the previous section it was shown n/2
n−1
that in the composite steganalyzer the addition of a second |Pgn | = 2 . Applying the noiseless Theorem,
steganalyzer (Figure 5) lowers the capacity of the stego-
1 1
channel. A surprising result for the two noise system is that C(·, g) = lim inf log |Pgn | = lim log 2n−1
n→∞ n n→∞ n
this may not be the case- in fact, the addition of a second
distortion may increase the capacity of a stego-channel! = 1bit/use. (69a)
SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY 9
B. -Strong Converse for Noiseless Channels From Example 2 the size of the permissible set is,
We now present a fundamental result for discrete noiseless n−1 1 n
channels regarding the -strong converse property. It gives 2 +2 , for even n
|Pgn | = n/2 (75)
n−1
the necessary and sufficient conditions for a noiseless stego- 2 , for odd n
channel to satisfy the -strong converse property.
Theorem 3.2 (Noiseless -Strong Converse): A discrete We will make use of Stirling’s approximation,
√
noiseless stego-channel (·, g, ·) satisfies the -strong converse 1
n! = 2πnn+ 2 e−n+λn , (76)
property if and only if,
1 where 1/(12n + 1) < λn < 1/(12n).
C(·, g) = lim log |Pgn | . (70) For n even,
n→∞ n
n→∞
Since the liminf and limsup coincide the limit is indeed a true
n
one. Thus, this stego-channel satisfies the -strong converse.
= sup H(Y)
(73c)
Y∈T0
1 C. Properties of the Noiseless DMSC
= lim sup log |Pgn |
(T2.8)
Y∈T0
Proof: As the channel is noiseless and the input alphabet
1
= lim log |Pgn | is finite we may use Theorem 3.1,
n→∞ n
1 1
= lim sup log |Pgn | C(·, g) = lim inf log |Pgn | . (84)
n→∞ n n→∞ n
= sup H(Y) Note that by (7) we have for all n,
(T2.8)
Y∈T0
= sup I(X; Z)
(72)
1 1
log |Pgn | = log Pg × Pg × · · · × Pg
X∈S0 n n
n
Thus, supX∈S0 I(X; Z) = supX∈S0 I(X; Z) and by The- 1 n
orem 2.6 the stego-channel satisfies the -strong-converse = log |Pg |
n
property.
= log |Pg | .
Example 3 (Sum Steganalyzer): We now determine if the
sum steganalyzer satisfies the -strong converse.
SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY 10
Xn Y n = X n + Nen Z n = Y n + Nan
fn (m) φn (y)
fn (m) φn (z)
Encoder Decoder
Encoder Decoder
For example when testing the signal y = (y 1 , . . . , yn ) the This allows for a lower bound of,
variance steganalyzer operates as, 1
C(W, g, A) = sup H(Z) − log 2πe(σe2 + σa2 ) (103a)
(101)
n 2
1, if n1 i=1 yi2 > c X∈S0
gn (y) = (97) 1
0, else ≥H(Z) − log 2πe(σe2 + σa2 ) (103b)
2
Thus, if the empirical variance of a test signal is above a
1 c + σa2
certain threshold, the signal is considered steganographic. = log 2 (103c)
2) Additive Gaussian Channel Active Adversary: In this 2 σe + σa2
section we derive the capacity under an active adversary. Converse:
Assume that the adversary uses an additive i.i.d. Gaussian To find the upperbound we will make use of a number of
noise with variance σ a2 while the encoder noise is additive simple lemmas:
i.i.d. Gaussian with σe2 . Lemma 4.1: For a given stego-channel with secure input
Let Ne = {Ne }2 where Ne ∼ N (0, σe2 ) and Na = {Na } distribution set S0 and secure output distribution set T 0 , the
where Na ∼ N (0, σa2 ). following holds,
Let N = Ne + Na = {N n = Nen + Nan }∞ n=1 . Since both
Ne and Na are i.i.d. as N (0, σe2 ) and N (0, σa2 ), respectively, sup H(Z) ≤ sup H(Z). (104)
X∈S0 Y∈T0
their sum is i.i.d. as N (0, σe2 + σa2 ), i.e. N = {N } with N ∼
N (0, σe2 + σa2 ). Proof: By definition for any X ∈ S 0 and X → Y, we
W
TABLE III
The result follows from application of the arithmetic-geometric G AUSSIAN A DDITIVE N OISE C APACITIES
inequality.
Lemma 4.4: For the above stego-channel, any Y ∈ T 0 and Channel Secure Capacity Encoder Noise Attack Noise
c+σ 2
any > 0 we have, C(W, g, A) 1
2
log σ 2 +σa2 σe2 σa2
e a
1
C(W, g) log σc2 σe2 0
1 1 2
H(Z n ) < log 2πe(c + σa2 ) + ,
e
lim inf (110) C(·, g, A) 1 c+σ 2
log σ 2 a 0 σa2
n→∞ n 2 2 a
2
A C(·, g) limσ 2 →0 12 log c+σ
2σ 2
0 0
where Z = {Z n }∞ n=1 and Y → Z.
Proof: Let any > 0 be given and choose γ > 0 such
that, fn (m) φn (y)
γ ≤ (c + σa2 ) e2 − 1 , Encoder Decoder
1 1 Noise Detection
log 2πe c + σa2 + γ ≤ log 2πe(c + σa2 ) + . (111)
2 2 Fig. 9. AWGN Channel Passive Adversary
(n) (n) (n) (n) (n) (n)
Letting Cij = E Zi Zj and Kij = E Yi Yj
(n) (n)
we note that Zi = Yi + Na . This gives,
5) Large Attack Case: We first consider the case where σ a2
(n)
Cii =
(n)
Kii + σa2 . (112) is much larger than both c and σ e2 . This gives,
1 c + σa2 1 σa2
This gives, C(W, g, A) = log 2 ≈ log = 0.
2 σe + σa2 2 σa2
n
n
1 (L4.3) 1 1 (n) Thus when the attack noise is large enough the capacity of
H(Z n ) ≤ log(2πe)n C (113)
n 2n n i=1 ii the stego-channel goes to zero. Intuitively this is due to the
n
n fact that the variance steganalyzer places a power constraint
1 1 (n) (of c) on any signals it allows to pass. If the attack noise is
= log(2πe)n K + σa2
(112)
(114)
2n n i=1 ii much larger than c, a message simply cannot be transmitted
(L4.2) 1 n with enough power to overcome that noise and n → 0 is
< log(2πe)n c + σa2 + γ (115) impossible.
2n
(111) 1 6) Large Encoder-Noise Case: Next we consider the case
≤ log 2πe(c + σa2 ) + (116) where σe2 ≥ c.
2
c+σ2 c+σ2
The inequality of (115) holds for all but a finite number of n Since σ2 +σa2 ≤ 1, we have log σ2 +σa2 ≤ 0. This gives,
e a e a
From the above theorem we see that, Thus using the center of each sphere
n as a codeword, we
1 c have Mn codewords where M n = σc2 2 .
C(W, g) = log 2 . (118) If we consider the capacity as C(W, g) = lim n1 log Mn
2 σ
we have,
The most basic element will be the volume of an n
1 c n2
dimensional sphere of radius r. In this case the volume is C(W, g) = lim log 2 (120a)
equal to An rn where An is a constant dependent only on the n σ
1 c
dimension n. = log 2 , (120b)
2 σ
The fundamental question is what is the capacity of the
stego-channel, or how many codewords can we reliably use. which agrees with the result of Theorem 4.2.
To answer this, we must consider the two constraints on a
secure system: error probability and detection probability. V. P REVIOUS W ORK R EVISITED
9) Error Probability: Since we have that X n = Y n = n , A. Cachin Perfect Security
we may view each codeword as a point in n . When we In Cachin’s definition of perfect security[16] the cover-
transmit a given codeword we may think of the addition of signal distribution and the stego-signal distribution are each
noise as moving the point around in that space. Since the required to be independent and identically distributed. This
power of the noise is σ 2 , the probability
√ that the received gives the following secure-input set,
codeword has moved more than nσ 2 away from where it
1
started goes to zero as n → ∞. Thus we know that if we S0 = X = {X} : lim D (S n ||X n ) = 0 . (121)
n→∞ n
transmit a codeword, it will likely be√contained in a sphere
(centered on the codeword) of radius nσ 2 . The i.i.d. property means that D (S n ||X n ) = nD (S||X)
This means that if we receive a signal inside such a sphere, so we see that the above is equivalent to,
it is likely that the transmitted codeword was the center of that
S0 = {X = {X} : D (S||X) = 0} (122)
sphere. In this manner we can define a coding system.
We know that for secure capacity the probability of error = {X = {X} : pS = pX } (123)
must go to zero. We also know that each codeword has an Since Cachin’s definition does not model noise, we may
associated sphere that the received signal will fall inside. consider it as noiseless and apply Theorem 3.1,
Thus if we choose the codewords such that their spheres do
not overlap, there will be no confusion in decoding and the C(W, g) = sup H(X) = H(S). (124)
X∈S0
probability of error will go to zero.
10) Detection Probability: We begin by looking at the This result states that in a system that is perfectly secure (in
permissible set. The permissible set for our g n is given by, Cachin’s definition) the limit on the amount of information that
may be transferred each channel use is equal to the entropy of
n
the source. This is intuitive because in Cachin’s definition the
Pgn = {y ∈ Y n : yi2 < nc}. (119)
output distribution of the encoder is constrained to be equal
i=1
√ to the cover distribution.
Clearly the permissible set is a sphere of radius nc centered
at the origin. If a test signal falls inside this sphere it is B. Empirical Distribution Steganalyzer
classified as non-steganographic, whereas if it is outside it is
The empirical distribution steganalyzer is motivated by the
considered steganographic.
fact that the empirical distribution from a stationary memory-
The second criteria for a secure system is that the probability
less source converges to the actual distribution of that source.
of detection go to zero. If we were to place each codeword
Accordingly, if the empirical distribution of the test signal
such that its sphere was inside the permissible set, we know
converges to the cover-signal distribution it is considered to
that the probability of detection will go to zero.
be non-steganographic.
11) Capacity: From the above we know that the codeword Assume that pS is a discrete distribution over the finite
spheres cannot overlap (to ensure no errors), and we also know alphabet S. Let a sequence, {s n }∞ n n
n=1 with each s ∈ S be
that all the codeword spheres must fit inside the permissible used to specify the steganalyzer for a test signal x as,
set (to ensure no detection). Thus if we calculate the number
of non-overlapping spheres we may pack into the permissible 0 if P[sn ] = P[x] ,
gn (x) = (125)
set, we will have a general idea of the number of codewords 1 if P[sn ] = P[x] .
we can use.
n
where P[x] is the empirical distribution of x.
Since the volume of the permissible set is A n (nc) 2 and the The permissible set for g n is equal to the type class of P [sn ] ,
n
volume of each codeword sphere is A n (nσ 2 ) 2 we can place i.e.,
approximately,
Pgn = T (P[sn ] ) := x ∈ X n : P[x] = P[sn ] . (126)
An (nc) 2
n c n2
n = , Theorem 5.1 (Empircal Distribution Steganalyzer Capacity):
An (nσ 2 ) 2 σ2
non-overlapping sphere inside the permissible set. C(W, g) = H (S) . (127)
SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY 14
pS
Detection X ∼ pS Proof: Theorem 5.1 shows C(W, g) = H (S).
We now show Moulin’s capacity is equal to this value. In
the case of a passive adversary (D 2 = 0), the following is the
M
fN (m, s) An (y|x) φN (y) M̂
capacity of the stego-channel[2],
Source X Y
S
Encoder Noise Decoder C ST EG (D1 , 0) = sup H(X|S) (130)
Q ∈Q
D1 D2
where a p ∈ Q is feasible if,
Fig. 10. Moulin Stego-channel p(x|s)pS (s)d(s, x) ≤ D1 , (131)
s,x
pS
Detection
and
p(x|s)pS (s) = pS (x). (132)
X s
= H(S) (133c)
Proof: Since the channel is noiseless we may apply where the final line comes from choosing p(x) = p S (x).
Theorem 3.1.
1 VI. C ONCLUSIONS
C(W, g) = lim inf log |Pgn | (128a)
n→∞ n A framework for evaluating the capacity of steganographic
1
= lim inf log |T (sn )| (128b) channels under an active adversary has been introduced. The
n→∞ n system considers a noise corrupting the signal before the
= H(S) (128c) detection function in order to model real-world distortions
such as compression, quantization, etc.
Here we have used the fact that the permissible set for
Constraints on the encoder dealing with distortion and a
the empirical distribution detection function is the type
cover-signal are not considered. Instead, the focus is to develop
class in (128b). Additionally, by Varadarjan’s Theorem[ 17],
the theory necessary to analyze the interplay between the chan-
P[sn ] (x) → pS (x) almost surely (here the convergence is
nel and detection function that results in the steganographic
uniform in x as well). This allows for the use of the type
capacity.
class-entropy bound from Theorem D.1 that provides the final
The method uses an information-spectrum approach that
result.
allows for the analysis of arbitrary detection functions and
channels. This provides machinery necessary to analyze a very
C. Moulin Steganographic Capacity broad range of steganographic channels.
Moulin’s formulation[2], [3] of the stego-channel is shown In addition to offering insight into the limits of performance
in Figure 10. This is somewhat different than the formulation for steganographic algorithms, this formulation of capacity can
shown in Figure 1; most notable is the presence of distortion be used to analyze a different, and fundamentally important,
constraints and an absence of a distortion function prior to facet of steganalysis. While false alarms and missed signals
the steganalyzer. Additionally, an explicit steganalyzer is not have rightfully dominated the steganalysis literature, very little
defined and a hypothetical X ∼ p S is used. In order to have is known about the amount of information that can be sent past
the two formulations coincide a number of simplifications are these algorithms. This work presents a theory to shed light
needed for each model. onto this important quantity called steganographic capacity.
For our model,
A PPENDIX A
• The stego-channel is noiseless
-S TRONG C ONVERSE P ROOF
• The steganalyzer is the empirical distribution
A stego-channel (W, g, A) satisfies the -strong converse
For Moulin’s model,
property (for a fixed δ) if and only if,
• Passive adversary (D 2 = 0)
• No distortion constraint on encoder (D 1 = ∞) sup I(X; Z) = sup I(X; Z). (A.134)
X∈Sδ X∈Sδ
These changes produce the stego-channel shown in Fig-
Proof: First assume supX∈Sδ I(X; Z) =
ure 11.
supX∈Sδ I(X; Z). Let R = C(0, δ|W, g, A) + 3γ with
Theorem 5.2: For the stego-channel shown in Figure 11,
γ > 0. Consider an (n, M n , n , δn )-code with,
the capacities of this work and Moulin’s agree. That is,
1
C(W, g) = C ST EG (∞, 0) = H (S) . (129) lim inf log Mn ≥ R,
n→∞ n
SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY 15
Let, For n > n0 the probability of the permissible set (in this
subsequence) is,
n kn e−kn γ
Akn = y ∈ Y : pȲ kn (Ȳ ) > , (B.161)
|Pkn |
pȲ kn (Pkn ) = pȲ kn (y) (C.170a)
and for all n > n0 , we have pȲ kn (Akn ) < . x∈Pkn
For n > n0 we may calculate the probability of the
permissible set (for the subsequence) as, = pȲ kn (y)
y∈Pkn ∩Ackn
pȲ kn (Pkn ) = pȲ kn (y) (B.162a)
+ pȲ kn (y) (C.170b)
y∈Pkn
y∈Pkn ∩Akn
= pȲ kn (y) + pȲ kn (y) e−kn γ
y∈Pkn ∩Ackn y∈Pkn ∩Akn ≤ 1
|Pkn |
(B.162b) y∈Pkn ∩Ackn
e−kn γ + pȲ kn (y) (C.170c)
≤ + pȲ kn (y) (B.162c)
|Pkn | y∈Pkn ∩Akn
y∈Pkn y∈Akn
< e−kn γ + (C.170d)
< e−kn γ + (B.162d)
This shows pȲ kn (Pkn )−→
1 and we have a contradiction showing it is impossible for Y ∈ T 0 .
as Y ∈
/ T0 .
A PPENDIX C
S PECTRAL SUP - ENTROPY BOUND A PPENDIX D
T YPE S ET S IZE E NTROPY
For discrete g = {Pn }∞
n=1 with corresponding secure output
set T0 ,
1 Theorem D.1: Let (p 1 , p2 , . . .) be a sequence of types de-
sup H(Y) = lim sup log |Pn |
Y∈T0 n→∞ n fined over the finite alphabet X where p n ∈ Pn . Assume this
sequence satisfies the following:
Proof: Since Y ∗ = {U(Pn )}∞
i=1 ∈ T0 we have,
1) pn → p
sup H(Y) ≥ H(Y∗ ) (C.163a) 2) pn ≺≺ p, ∀n
Y∈T0
1 Then,
= lim sup log |Pn | (C.163b)
n→∞ n
1
lim log |T (pn )| = H(p). (D.171)
Now assume there exists Y ∈ T0 , with Y = {Ȳ n }∞ n=1 such n→∞ n
that,
γ
H(Y) = H(Y∗ ) + , (C.164) Proof: We first show,
4
for any γ > 0.
1
This means that, lim inf log |T (pn )| ≥ H(p). (D.172)
n→∞ n
1 1 ∗ γ
lim Pr log > H(Y ) + = 0 (C.165)
n→∞ n pȲ n (Ȳ n ) 2 A sharpening of Stirling’s approximation states that for
1 1
By the definition of lim sup for some subsequence k n we 12n+1 < λn < 12 ,
have, √
1 γ n! =
1
2πnn+ 2 e−n eλn .
log |Pkn | + γ > H(Y∗ ) + (C.166)
kn 2
and
Let the empirical distribution, p n be specified by
1 1 1 (n1 , . . . , nKn ). That is, if we enumerate the outcomes as
lim Pr log k
> log |Pkn | + γ = 0.
n→∞ kn pȲ kn (Ȳ )
n kn (a1 , . . . , aKn ) we have that,
(C.167)
For any > 0 letting, ni
pn (ai ) = .
e−kn γ n
Akn = y ∈ X n : pȲ kn (Ȳ kn ) < (C.168)
|Pkn |
Kn
we may find n0 where for n > n 0 , By definition i=1 ni = n, and from the above condition
of absolute continuity we have that K n ≤ s(p) for all n, where
pȲ kn (Akn ) < . (C.169) s(p) is the support of the final distribution.
SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY 17