This paper explores methods of audio steganography with emphasis on psychoacoustic approaches. Specifically, it describes a project that had the requirement of hiding a text-based message inside an audio signal with minimal or no distortion of the signal as perceived by the human ear. The theory and experimental results of each approach are discussed.
DevoxxFR 2024 Reproducible Builds with Apache Maven
Psychoacoustic Approaches to Audio Steganography Report
1. ECES 434 1. The first and simplest method we implemented
REPORT is known as the Least-Significant Bit (LSB)
Psychoacoustic Approaches to Audio Steganography method. In the LSB method, the least signifi-
cant bit of each sampling point of the original
Cody A. Ray signal is substituted with a binary message.
Drexel University 2. The second method we demonstrated was a
Fall 2009
amplitude modulation (AM) algorithm for the
time-domain. We slice the time signal into
Introduction “blocks” and scale each block according to bits
Steganography is the art and science of writing hidden of the message.
messages in such a way that no one, apart from the
3. The last method we explored was use of the
sender and intended recipient, suspects the existence of
MPEG Model 1 Layer 1 psychoacoustic model
the message, a form of security through obscurity. The to calculate the unnecessary bits using the
word steganography is of Greek origin and means signal-to-mask ratios (SMR). Then we replace
"concealed writing". Apart from the obvious applica- the unnecessary bits with those of the message.
tions of transporting hidden information between enti-
ties, the methods of steganography are also used within Least-Significant Bit Method
copyright protection, the detection of content manipu- The method of least-significant bit (LSB) coding is
lation, fingerprinting, and watermarking. the simplest technique for embedding information
in a digital audio file. The least-significant bit of
The objective of this project was to explore meth-
each sample in the signal is substituted with a bit
ods of audio steganography with emphasis on psy-
from the secret message. One bit is embedded per
choacoustic approaches. Specifically, the project has
each sample; thus, the LSB method allows for en-
the requirement of hiding a text-based message in-
coding a large amount of data.
side an audio signal with minimal or no distortion of
the signal as perceived by the human ear. In all ap- To recover the message hidden inside an LSB en-
proaches, we assume that the length of the message coded audio track, the receiver needs to know the
to be hidden is much smaller than the number of sequence of indices corresponding to each embed-
samples in the original sound track. We did not con- ded sample. There are a number of methods used to
sider the resilience of the embedded message to choose the subset of samples in which to embed bits
attacks or otherwise “friendly” transformations of of the message; however, whatever the method, the
the host signal. receiver must also know the algorithm used for se-
lecting the samples. One trivial method starts at a
Approaches constant distance from the beginning of the audio
We will compare and contrast three different ap- track and perform LSB coding until the message has
proaches to audio steganography. been completely embedded within the signal, with-
out changing any of the remaining samples. How-
ECES 434 Report
1
2. ever, this approach creates an easy-to-detect statisti- is the ratio of the lengths of the samples to the mes-
cal anomaly as the probabilities are non-uniform sage. Correspondingly, a smaller message can be en-
across the sample set. coded in this technique.
One way to avoid this issue is by padding the mes- To recover the message hidden inside an TDAM
sage with random bits in order to make the message encoded audio track, the receiver needs access to
length the same as the number of samples. However, the original audio file, and must know the scale fac-
we’re now embedding far more information than tors used in coding the message. Extraction is done
required to convey the given message. By modifying by scaling the original file by the lowest scale factor,
more of the file than necessary, we’re increasing the and comparing whether each frame of the “dirty”
amount of noise in the signal, which in turn in- signal is greater than the scaled original.
creases the probability of detection of the hidden
Many of the issues addressed in the previous section
message.
on LSB coding apply to TDAM as well. These issues
A more sophisticated approach involves the use of a will not be covered again here. Note, however, that
random number generator to spread the secret mes- this method doesn’t require any additional data to
sage out over the audio track in a random manner. be embedded, and the signal is modified uniformly.
One popular approach uses a shared secret as a seed
for the random number generator, allowing the Amplitude Modulation via Psychoacoustic Models
sender and receiver to independently construct the The most sophisticated approach is amplitude
same pseudorandom sequence of sample indices.
modulation in the frequency domain based upon
One drawback is the necessity to avoid collisions MPEG Model 1 Layer 1 psychoacoustic model. The
created by using the same sample index twice; a basic algorithm is as follows:
bookkeeping system can be used to track previous
indices. Alternatively, a pseudorandom permutation 1. Calculate the power spectrum.
of the entire set can be constructed through the use
2. Identify the tonal and non-tonal components.
of a secure hash function.
3. Decimate the maskers to eliminate all irrelevant
All of the above variants do not require the original
maskers.
audio track to recover the message.
4. Compute the individual masking thresholds.
Since we did not consider resilience to attacks in
this study, we implemented the trivial method out- 5. Compute the global masking threshold.
lined above. As a matter of practical concern, we
also prefixed the message with an identifier string to 6. Determine the minimum masking threshold in
mark the file as containing a secret message, and each subband.
included the size of the secret message to guide the
7. Shape the power of the message below the mask-
receiver as to where to stop decoding the signal.
ing threshold.
Time-domain Amplitude Modulation Method The psychoacoustic model shows components in
Time domain amplitude modulation (TDAM) capi- the signal that do not affect perception. The mask-
talizes on the difficulty of differentiating between ing threshold defines the frequency response of the
subtle changes in perception of loudness. The signal loudness threshold minimum filter, which is used to
is sliced in the time domain, and the message is en- shape the message. The filtered message is scaled to
coded as a scale factor applied to each time slice. shift the message noise and added to the delayed
One bit is encoded per block, where the block size original signal in order to produce the “dirty” track.
ECES 434 Report
2
3. Results
LSB Coding for Mono Wav
LSB Coding for Stereo Wav
ECES 434 Report
3
4. Time Domain Amplitude Modulation for Mono Wav
Time Domain Amplitude Modulation for Stereo Wav
ECES 434 Report
4
5. Discussion and Conclusion Bibliography
We tested on both mono and stereo channel wave
Arnold, Michael. “Audio Watermarking.” Published:
files. These are depicted in the results section above. November 1, 2001. Access: December 2, 2009.
It should be noted that the magnitude access for the http://www.ddj.com/security/184404839
mono channel is always twice as large as that of the
Cvejic, Nedeljko. “Algorithms for Audio Watermark-
stereo files due to an artifact from the preprocessing
ing and Steganography.” University of Oulu. 2004.
state, where we converted the original stereo wav
file to a mono wav file. Also, note that our time- Garcia, R.A. “Digital Watermarking of Audio Signals
domain amplitude modulation approach currently using Psychoacoustic Auditory Model and Spread
Spectrum Theory.” Preprints-Audio Engineering Society.
outputs a mono channel WAVE format file regard-
Citeseer. 1999.
less of the number of channels available in the input.
Petitcolas, Fabien. “MPEG for MATLAB.” Pub-
In LSB coding, when modifying the least significant lished: August 11, 2003. Access: December 2, 2009.
bit in the first coding system, the “bin” into which http://www.petitcolas.net/fabien/software/mpeg/
the quantized signal falls is being directly modified.
Welsh, Eric. Chen, Alex. Shehad, Nader. Virani,
Since we’re only modifying the quantization level by Aamir. “W.A.V.S Compression.”
one, at worst, we’re only modifying the time-domain http://is.rice.edu/~welsh/elec431/
signal by a small value that’s dependent on the num-
Wikipedia. “Steganography.” Access: November 17,
ber of bits used for quantization. Effectively, we’re 2009. http://en.wikipedia.org/wiki/Steganography
increasing the noise due to quantization and hiding
the message in this noise. This induces a small pen- Wilson, Scott. “Microsoft WAVE soundfile format.”
Published: January 20, 2033. Access: November 19,
alty due to being an audibly perceptual modification.
2009.
In TDAM coding, we’re decreasing the amplitude of ccrma.stanford.edu/courses/422/projects/WaveForm
at/
the time-domain signal by 1-2%. This will primarily
affect the perceived loudness of the sound. Addi-
tionally, this coding system could slightly affect per-
ception of pitch, due to intensity-dependent factors
related to the perception of pitch. However, because
the scale is small, this system produces a “dirty”
audio signal that yields a negligible difference from
the magnitude of the original signal.
Unfortunately, time did not allow for the comple-
tion of the MPEG-based steganography system im-
plementation prior to final reporting. However, the
hypothesis that each approach is successively better
than the previous was true, which indicates that
when completed this technique will be superior to
the others.
ECES 434 Report
5