Documente Academic
Documente Profesional
Documente Cultură
Outline
Introduction to Data Hiding Watermarking Definition and History Applications Basic Principles Requirements Attacks Evaluation and Benchmarking Examples Steganography Definition and History Applications Basic Principles Examples of Techniques Demos
Data Hiding
Key Carrier document Secret message Embedding algorithm Transmission via network Secret message Detector Key
Information Hiding is a general term encompassing many subdisciplines Two important sub-disciplines are: Steganography and Watermarking
Steganography:
Hiding: keeping the existence of the information secret
Watermarking
Hiding: making the information imperceptible
Information hiding is different than cryptography (cryptography is about protecting the content of messages)
Requirements
Application
Covert communication Copyright protection of images (authentication) Fingerprinting (traitor-tracing) Adding captions to images, additional information, such as subtitles, to videos Image integrity protection (fraud detection) Copy control in DVD Intelligent browsers, automatic copyright information, viewing movies in given rated version
Digital watermarking
Security
Robustness
and
Information-theoretic Removed by lossless compression
=
5 gray levels
+
Original
=
31 gray levels
Watermarking
Intent: data embedding conveys some information about the cover medium such as owner, copyright, or other information Watermark can be considered to be an extended attribute of the data Robustness of watermark is a main issue Know watermark may be there Can be visible or invisible
Steganography
Intent: transmit secret message hidden in innocuous-looking cover medium so that its existence is undetectable Robustness not typically an issue Capacity desired for message is large Always invisible Typically dependent on file format
Watermarking
Watermarking:Definition
Watermarking is the practice of imperceptibly altering a cover to embed a message about that cover Watermarking is closely related to steganography but, there are differences between the two In watermarking the message is related to the cover Steganography typically relates to covert point-to-point communication between two parties .Therefore, steganography requires only limited robustness Watermarking is often used whenever the cover is available to parties who know the existence of the hidden data and may have an interest in removing it Therefore, watermarking has the additional notion resilience against attempts to remove the hidden data Watermarks are inseparable from the cover in which they are embedded. Unlike cryptography, watermarks can protect content even after they are decoded.
Watermarking:History
More than 700 years ago, watermarks were used in Italy indicate the paper brand and the mill that produced it By the 18th century watermarks began to be used as anticounterfeiting measures on money and other documents The term watermark was introduced near the end of the century.It was probably given because the marks resemble the effects of water on paper The first example of a technology similar to digital watermarking is a patent filed in 1954 by Emil Hembrooke for identifying works In 1988, Komatsu and Tominaga appear to be the first to use the term "digital watermarking" About 1995, interest in digital watermarking began to mushroom
Motivation
The rapid revolution in digital multimedia and the ease of generating identical and unauthorized digital data. USA Today, Jan. 2000:Estimated lost revenue from digital audio piracy US $8,500,000,000.00 The need to establish reliable methods for copyright protection and authentication. The need to establish secure invisible channels for covert communications. Adding caption and other additional information.
Watermarking:Applications
Copyright protection
Most prominent application Embed information about the owner to prevent others from claiming copyright Require very high level of robustness
Copy protection
Embed watermark to disallow unauthorized copying of the cover For example, a compliant DVD player will not playback or data that carry a "copy never" watermark
Content Authentication
Embed a watermark to detect modifications to the cover The watermark in this case has low robustness, "fragile"
Watermarking:Basic principles
Watermarking: Requirements
Imperceptibility
The modifications caused by watermark embedding should be below the perceptible threshold
Robustness
- The ability of the watermark to resist distortion introduced by standard or malicious data processing
Security
- A watermark is secure if knowing the algorithms for embedding and extracting does not help unauthorized party to detect or remove the watermark
Effect on quality of original content how does watermarking technique impact level of degradation and what is the level of acceptability with the degradation Visible vs. invisible visible such as a company logo stamped on an image or movie or invisible and imperceptible Fragile vs. robust fragile watermarks break down easily whereas robust survive manipulations of content (in some watermarking of audio files, both are used)
Public vs. private private watermarking techniques require that the original be used as a basis of encryption whereas public does not Public-key vs. secret-key secret-key watermarking uses the same watermarking key to read the content as the key that was inserted into the image; public key uses different keys for watermarking the image and reading the image
+
Original Watermark
=
Watermarked image
Robustness against all kinds of image distortion Robustness to intentional removal even when all details about the watermarking scheme are known (Kerckhoffs principle) Watermark pattern must be perceptually transparent Watermark depends on a secret key Robustness to over-watermarking, collusion, and other attacks
Detectable watermark: Pseudo-random sequence is either present or not present (1 bit embedded)
Readable watermark: One can recover a short message, e.g. info about the owner (100 bits)
+
W1 W2
WN
N customers
Robust, secure, invisible watermark, resistant with respect to the collusion attack (averaging copies of documents with different marks).
Watermarking principles
In spatial domain watermark embedded by directly modifying the pixel values In transform domain watermark embedded in the transform space by modifying coefficients
Inverse DCT
Watermarking for color images One or more selected color channels. Luminance
Oblivious vs. non-oblivious watermarking non-oblivious = original image is needed for extraction oblivious = original image is not necessary
NEC Scheme
Watermark embedding: 1000 highest energy DCT coefficients are modulated with a Gaussian random sequence wk N(0,1). The watermark is embedded by modifying the 1000 highest energy DCT coefficients vk vk = vk (1 + awk ), where vk are the modified DCT coefficients, and a is the watermark strength also directly influencing watermark visibility.
NEC Scheme
Watermark detection: Subtract the original image from the watermarked (attacked) image, and extract the watermark sequence (may be corrupted due to image distortion) Correlate with = original watermark sequence
sim( , ' ) =
sim(, ) is called similarity sim(, ) > Th => watermark is present sim(, ) < Th => watermark is not present
Watermark detection
S=
i =1
(ai bi )
Hypotheses testing is used to confirm the presence of watermark on a certain confidence level. S = 0 with = 104.5 n if no watermark is present S 2n if watermark present Set threshold Th to adjust probability of false alarms and missed detections Using patches of pixels rather than single pixels improves robustness
vk vk vk
Watermark approximation
Correlate uk with wk Threshold the result Make a decision about watermark presence
Frequency masking
The presence of a signal of one frequency can raise the perceptual threshold of signals with frequencies close to the masking frequency.
Masking signal Masking threshold Frequency Masked signal
Spatial masking
Image discontinuities also have the ability to mask small image distortions. Luminance
Masking threshold
Edge
x
8 x 8 signature S
DCT
p
p = kTT/4 ~ 0 p = kT+T/4 ~ 1
(k-1)T
kT
(k+1)T
Forensic analysis
Analysis of lighting and shadows Localized analysis of - noise - histogram - colors Looking for discontinuities
Fragile watermarks
Break easily Computationally cheap Good localization properties Too sensitive for redundant data Embedding check-sums in the LSBs Adding m-sequences to image blocks
Properties:
Examples:
Steve Walton, Information authentication for a slippery new age, Dr. Dobbs Journal, vol. 20, no. 4, pp. 1826, April 1995.
The image authenticity is verified by checking the relationship B(i,j) = fR(R(i,j)) fG(G(i,j)) fB(B(i,j)) for each pixel (i,j)
Original image
f( ) = 1
Perturb Corresponding pixels
Authenticated image
Binary logo
Properties:
Examples:
J. Fridrich, Image Watermarking for Tamper Detection, Proc. ICIP 98, Chicago, Oct 1998.
B
Secret key K Block #
W(K, B)
Watermarked block B
Hybrid watermark
Fragile, sensitive, and robust Good localization properties Can distinguish malicious and non-malicious modifications
Properties:
Examples:
The watermarked image "Lena" with outlined blocks and block numbers.
(After brightness adjustment and JPG compression) Presence of the robust watermark (above); Fragile watermark indicated tampered areas with black dots (below).
(Retouched eyes) Presence of the robust watermark (above); Fragile watermark indicated tampered areas with black dots (below).
(Replaced face and softened) Presence of the robust watermark (above); Fragile watermark indicated tampered areas with black dots (below).
Self-embedding
Fragile Security problems Good localization properties Tampered areas can be fixed Easy to remove Coding quantized DCT transformed blocks in distant blocks
Properties:
Examples:
J. Fridrich and M. Goljan Protection of Digital Images Using Self Embedding, Symposium on Content Security and Data Hiding in Digital Media, New Jersey Institute of Technology, May 14, 1999.
Content of block B1 is compressed and encoded in the LSBs of B2 B1 and B2 are separated by a random vector p
Selfembedding algorithm #1
QUANTIZATION
Selfembedding algorithm #2
QUANTIZATION
Binary encoding 21 coefficients + up to 2 next nonzero coefficients CODE2 : 128 bits per block
The bit lengths provided for encoding the 64 coefficients (with sign)
Original image
Tampered image - The license plate has been replaced with a different one 2 LSBs have been used for selfembedding
Reconstructed image
Attacks
Attacks are carried out with an intension to destroy watermark for the purposes of use without having to pay royalties to the originator of the content. Must withstand various signal processing attacks: Compression Cropping, editing, composing. Printing. Adding small amounts of noise.
Attack: Example
Alice puts image on her web page. Eve and Mallet copy image and claim it as their own. All three appear before Judge. Alice, using her image and Eves, extracts the watermark. Alice, using her image and Mallets altered one, extracts a noisy version of her watermark. Alice must convince Judge that the noisy watermark is indeed hers and not a false alarm.
Watermark attacks
Robustness attacks: Intended to remove the watermark. JPEG compression, filtering, cropping, histogram equalization additive noise etc. Presentation Attacks: Watermark detection failure. Geometric transformation, rotation,scaling, translation, change aspect ratio, line/frame dropping, affine transformation etc. Counterfeiting attacks: Render the original image useless, generate fake original, dead lock problem. Court of law attacks: take advantage of legal issues.
+
Original X belongs to Alice Alices watermark W1
=
Watermarked image Y identical
Bob generates a random watermark W2 Subtracts YW2 = X and creates a false original X X + W2 = Y = X + W1 X = X + W1 W2 X contains W1 X = X + W2 W1 X contains W2
Distributed image
+
False original X belongs to Bob Bobs watermark W2
=
Watermarked image Y
Th threshold H Heaviside step function, H(x)=1 for x > 0, H=0 otherwise Attack: (Cox, Linnartz, Kalker, Dijk, ...) (1) Find a critical image by progressively deteriorating the image (for example, by replacing the pixel values one-by-one by the average gray level) (2) Feed the detector with special images to reconstruct wk or to learn the sensitivity of the detection function to various pixels.
Steganography
Steganography
Embed information in such a way, its very existence is concealed. Goal
Hide information in undetectable way both perceptually and statistically. Security, prevent extraction of the hidden information.
Different concept than cryptography, but use some of its basic principles.
HISTORY
440 B.C.
Histiaeus shaved the head of his most trusted slave and tattooed it with a message which disappeared after the hair had regrown. To instigate a revolt against Persians.
Early steganography
Pictographs: e.g., Sherlock Holmess Dancing Men.
An Example: Null-Cipher
Message sent by a German spy during World war-I:
PRESIDENTS EMBARGO RULING SHOULD HAVE IMMEDIATE NOTICE. GRAVE SITUATION AFFECTING INTERNATIONAL LAW. STATEMENT FORESHADOWS RUIN OF MANY NEUTRALS. YELLOW JOURNALS UNIFYING NATIONAL EXCITEMENT IMMENSELY.
Null Cipher-Solved!
Message sent by a German spy during World war-I: PRESIDENTS EMBARGO RULING SHOULD HAVE IMMEDIATE NOTICE. GRAVE SITUATION AFFECTING INTERNATIONAL LAW. STATEMENT FORESHADOWS RUIN OF MANY NEUTRALS. YELLOW JOURNALS UNIFYING NATIONAL EXCITEMENT IMMENSELY.
Problem Formulation
Hello
Hello
Hello
Wendy
Terminology
Alice
Secret Message
Wendy
Bob
Cover Message
Embedding Algorithm
Stego Message
Is Stego Message?
No
Secret Message
Steganography Techniques
Substitution methods
Bit plane methods Palette-based methods
Coding methods
Quantizing, dithering Error correcting codes
Stego-system Criteria
Cover data should not be significantly modified ie perceptible to human perception system The embedded data should be directly encoded in the cover & not in wrapper or header Embedded data should be immune to modifications to cover Distortion cannot be eliminated so errorcorrecting codes need to be included whenever required
Steganography in Text
Soft Copy Text
Encode data by varying the number of spaces after punctuation Slight modifications of formatted text will be immediately apparent to anyone reading the text
Steganography in Text
Soft Copy Text
Use of White Space (tabs & spaces) is much more effective and less noticeable This is most common method for hiding data in text
Steganography in Text
Soft Copy Text
Encode data in additional spaces placed at the end of a line
F o u r s c o r e a n d
Steganography in Text
Hard Copy Text
Line Shift Coding
Shifts every other line up or down slightly in order to encode data
Steganography in Text
Some methods that can be used with either hard or soft copy text
Feature Coding Syntactic Semantic
Steganography in Audio
Low Bit Coding Phase Coding Spread Spectrum Echo Data Hiding
Steganography in Audio
Low Bit Coding
Most digital audio is created by sampling the signal and quantizing the sample with a 16-bit quantizer. The rightmost bit, or low order bit, of each sample can be changed from 0 to 1 or 1 to 0 This modification from one sample value to another is not perceptible by most people and the audio signal still sounds the same
Steganography in Audio
Phase Coding
Relies on the relative insensitivity of the human auditory system to phase changes Substitutes the initial phase of an audio signal with a reference phase that represents the data More complex than low bit encoding, but it is much more robust and less likely to distort the signal that is carrying the hidden data.
Steganography in Audio
Direct Sequence Spread Spectrum
Spreads the signal by multiplying it by a chip, which is a maximal length pseudorandom sequence DSSS introduces additive random noise to the sound file
Steganography in Audio
Echo Data Hiding
Discrete copies of the original signal are mixed in with the original signal creating echoes of each sound. By using two different time values between an echo and the original sound, a binary 1 or binary 0 can be encoded.
Steganography in MP3
Music company publishes albums in mp3 and publishes over internet. Some people take these mp3 files and publish under their own name. Case goes to court. The Music company needs to prove that the material which is exhibit is indeed the one they published. They need a hidden copyright.
Steganography in Images
Way images are stored:
Array of numbers representing RGB values for each pixel Common images are in 8-bit/pixel and 24-bit/pixel format. 24-bit images have lot of space for storage but are huge and invite compression 8-bits are good options. Proper selection of cover image is important. Best candidates: gray scale images .. Cashing on limitations of perception in human vision
According to researchers on an average only 50% of the pixels actually change from 0-1 or 1-0.
TOP SECRET
Example: Copyright Fabian A.P. Petitcolas, Computer Laboratory, University of Cambridge http://www.cl.cam.ac.uk/~fapp2/steganography/image_downgrading/
Original Image
Extracted Image
Original Image
Extracted Image
Palette-based Methods
Palette manipulation means changing the way the color or grayscale palette represents the image colors Bit methods are used in palette manipulation schemes Data hidden in noise of image Often radical color shifts occur - can tip off that data is hidden Use grayscale to overcome color shift problem
Sample palettes
Palette-based Methods
Pseudo color 8-bit image: 256 different colors that are indexed by the numbers 0,,255 To insert information, for example, S-Tools reduces the number of colors from 256 to 32 and uses the lower LSB bit places to hide data In this case, 8 colors are the same before data embedding; after data embedding 8 colors are very close visually but differ in their bit representation
A secure steganographic method will produce modified carriers compatible with the source
Possibilities
Hiding in the palette Hiding in the image data Non-adaptive techniques Adaptive techniques
Artifacts
Palette artifacts Image data artifacts
Possible approaches
Message hiding in the image data - greedy techniques
Decrease color depth and expand 1. Collapse 256 colors 128 colors 2. Expand 128 colors 256 colors by including a close color (e.g., flip the LSB of the blue channel) 3. Embed a binary message into the LSB of the blue channel of randomly selected pixels 1 bpp 4. Read the message from the LSB of the blue channel Alternatively 1. Decrease color depth to 32 colors and include all colors obtained from LSB shuffling of all 32 colors (one color produces 23 new colors) 3 bpp 2. Encode messages into the LSB of pixel colors
Possible approaches
Message hiding in the image data
Parity embedding 1. Assign parity to palette colors 2. Embed message bits as the parity of colors
Message: 0 1 1 0 0 1 0 1 0 1 1 1 0 1 0 1 0 1 0 0 0 1 1 1 1
Randomly chosen pixel with color C1
C1
00011110 Replace the LSB of the index to color C1 with the message bit 00011111 C2 The new index now points to a neighboring color C2 Replace the index of the pixel in the original image to point to the new color C2.
Sorted palette
Critical assumption: Colors close in the luminance-sorted palette are also close in the color space.
White Noise Storm inserts data by using spread spectrum technology and frequency hopping, but severely changes the palette.
Airfield embedded using White Noise Storm Original 24-bit Renoir converted to 248 color GIF; Airfield inserted using S-Tools; Final stego image has 256 colors in GIF format.
Palette Methods
Color ci 10110 0 1 0 Color ci+1 10110 0 1 1 When order palette by luminance there are groups of pixel colors that look identical to the eye; L = 0.299R + 0.587G + 0.114B Airfield is a 3 bit image put in last 3 bits of Renoir image Very fragile destroyed by image manipulation
Original Image
Steganography: DCT
The DCT is applied to each 8 x 8 block in the image producing a block Bi Each block will encode a single bit, 0 or 1 If the message bit is a 1 then the larger of the two values Bi(4,1) and Bi(3,2) is put in location (4,1), otherwise if the message bit is a 0, the smaller of the two values is put in location (4,1)
DCT Steganography
If the difference |Bi(4,1) - Bi(3,2)| < , then the values Bi(4,1) and Bi(3,2) are adjusted so that |Bi(4,1) - Bi(3,2)| > This assures that the relative difference will not be lost when the compression is done This last step can introduce distortion into image The JPEG compression is performed (if desired) and then the resulting image is inversed transformed Other modifications to this algorithm have been researched that overcome some of these limitations
DCT Steganography
To extract the data, the DCT is performed on each block, and the coefficient values at locations (4,1) and (3,2) are compared If Bi(4,1) > Bi(3,2) then the message bit is a 1, otherwise it is a 0
Wavelet Steganography
Many different schemes proposed Wang and Kao give a multithreshold wavelet coding scheme where coefficients with high values are used to store information These coefficients are assumed to keep relative values the same even after multiple image processing operations If the coefficients change value much, the visual difference is noticeable in the image Can be used for textured and natural images
1 a ( x, y ) = 2 N
Ref : K. Solanki, N. Jacobsen, U. Madhow, B. S. Manjunath and S. Chandrasekaran, "Robust ImageAdaptive Data Hiding Based on Erasure and Error Correction" IEEE Transactions on Image Processing, vol. 13, no. 12, pp. 1627-1639, Dec. 2004.
The
where, bl is the message and Qbl is the quantizer Q0 or Q1 depending upon the messa Q0: Quantize to even number. Q1:Quantize to odd number. If after embedding = t then the same message is embedded into the next qualified coefficient to have synchronization with the decoder.
Steganalysis
Definition Searching for the existence of hidden messages or Stego-content in a given medium.
Stego-only: only stego-medium is available for analysis Known cover: both original cover media and stegomedia are used Known message: hidden message is revealed to facilitate review of media in preparation for future attacks
Goals
Passive steganalysis
Active Steganalysis
Estimate the message length and location Determine the algorithm/Stego tool Estimate the Secret Key in embedding Extract the message
Types of Steganalysis
Embedding Universal
Steganalysis
Steganalysis in Practice
Techniques designed for a specific steganography algorithm
Good
Hence, by estimating the transformation A & its inverse the secret message can be obtained.
Which yields,
The log error between the actual & predicted coefficients is, Then the mean, variance mean, variance, skewness and kurtosis of this log error is used as another 36 features per color.
Embedding leaves Statistical Artifacts. Correlation between the low-bit planes for a cover image differs from a stego image. Set of Binary Similarity Measures used to detect the artifacts. A feature vector is generated using the BSMs.
Bit planes
11010011 00011011 00011010
1 2
0 3
1 4
0 5
0 6
1 7
1 8
Bitplane-8
Bitplane-7
. . . .
1 1
0 1
= 3 = 4
We define an agreement variable for pixel Xi as: , j = 1,4., K = 4, i = 1....M x N. ,the Kronecker Delta function Now, we can calculate the one step cooccurrence values :-
The first group consists of the computed similarity differences dmi = mi7th mi8th , i = 110 across the 7th and 8th bit-planes. These use { a, b, c, d }. The second group consists of histogram and entropic features. We first normalize the histograms of the agreement scores for the new bit-planes(7th and 8th).
The third set of measures are some what different as we use the neighborhood weighing mask in that. For each binary image, we compute a 512 bin histogram based on weighted neighborhood where, the score given by weighing the eight directional neighbors with following mask.
We get 18 such measures for grayscale images and 54, for color images
Digimarc MarcSpider
Tracks all images with Digimarcs watermark on the Internet Searches over 50 million images on the Internet Digimarc is providing secure identification solution to over 200 government units for over 24 countries including the state of New Jersey, Vermont, and Michigan Philips Digital Network WaterCast for videos
Playboy
Webbworld paid $310,000 as well as reasonable attorneys fee for using 62 Playboys images
Conclusion
Steganography has its place in the security. Field is very young. On its own, it wont serve much but when used as a layer of cryptography, it would lead to a greater security. Far fetched applications in privacy protection and intellectual property rights protection. Research is going on in both the directions One is how to incorporate hidden or visible copyright information in various media, which would be published. At the same time, in opposite direction, researcher are working on how to detect the trafficking of illicit material & covert messages published by certain outlawed groups.
On-line Sources
Stego-Tools: <http://www.stegoarchive.com/>
Lots of freeware (and commercial) tools for hiding information in text, audio, video, and image files Famous Stego-tools for image Outguess+, F5+, S-Tools, etc,.