A
Project Report
On
Developing Secure Steganography Scheme
Developed at
Ahmedabad University
IET, Ahmedabad University,
Opposite M.G. Science College,
Ahmedabad, Gujarat, India-380009
Developed by
Anuradha Chaudhary - IT Department, DD University
Guided By
Internal Guide: External Guide:
Nikita Desai Dr. Mehul Raval
Department of Information Technology Associate Professor
Faculty of Technology IET
DD University Ahmedabad University
Department of Information Technology
Faculty of Technology, Dharmsinh Desai University
College Road, Nadiad-387001
April-2015
CANDIDATE’S DECLARATION
I declare that final semester report entitled “Developing Secure Steganography Scheme”
is my own work conducted under the supervision of the external guide Dr. Mehul Raval,
Associate Professor at IET, Ahmedabad University.
I further declare that to the best of my knowledge the report for B.Tech. final semester
does not contain part of the work which has been submitted for the award of B.Tech.
Degree either in this or any other university without proper citation.
Candidate’s Signature
Anuradha Chaudhary
Branch: IT
Student ID: 11ITUOS013
DHARMSINH DESAI UNIVERSITY
NADIAD-387001, GUJARAT
CERTIFICATE
This is to certify that the project entitled “Developing Secure
Steganography Scheme” is a bonafied report of the work carried out by
Ms. Anuradha S Chaudhary, Student ID No: 11ITUOS013 of Department of
Information Technology, semester VIII, under the guidance and supervision
for the award of the degree of Bachelor of Technology at Dharmsinh Desai
University, Nadiad (Gujarat). She was involved in Project training during
academic year 2014-2015.
Mrs. Nikita Desai
(Project Guide)
Department of Information Technology,
Faculty of Technology,
Dharmsinh Desai University, Nadiad
Date:
Prof. R.S.Chhajed
Head, Department of Information Technology,
Faculty of Technology,
Dharmsinh Desai University, Nadiad
Date
B.Tech. Dissertation, Information Technology Department, D. D. University Page i
Acknowledgement
At this moment of accomplishment, I acknowledge the valuable guidance and wisdom of
my research supervisors without whom this dissertation would not have been feasible.
First and foremost I would like to thank my project guide Dr Mehul S. Raval, Associate
Professor, IET, Ahmedabad University for introducing me to this interesting and
challenging domain of steganography and steganalysis. I am grateful to him for being
very patient with my knowledge gaps in the area. His teaching style and enthusiasm for
topic made a strong impression on me. It served as a gateway for me to think
innovatively. During our discussions he raised my precious points which I hope I have
managed to address here.
I would also like to show my gratitude to my guide Prof Nikita P Desai, Associate
Professor, Dharmsinh Desai University for her dedicated involvement in every step
throughout the process of research. I appreciate all of her expertise, guidance and careful
critique of this research work. Her valuable guidance and support encouraged me and
demonstrated to me that learning never ends.
Lastly, I would like to express my sincere thanks to our Head of Department Prof.
R.S.Chhajed who gave me an opportunity to explore the research domain at
undergraduate level.
Chaudhary Anuradha Sanjay
(11ITUOS013)
B.Tech IT
Dharmsinh Desai University, Nadiad
April 2015
chaudhary.anuradha94@gmail.com
B.Tech. Dissertation, Information Technology Department, D. D. University Page ii
Abstract
Developing Secure Steganography Scheme
B.Tech Dissertation by Anuradha Sanjay Chaudhary
At
Dharmsinh Desai University, Nadiad, April 2015
Steganography is an art of hidden communication from a sender to a receiver. A novel
steganography scheme is proposed in this dissertation where an image which is divided
into sub blocks and data is embedded in them using the variance. The embedding blocks
are selected based on global variance of overall image is used as the threshold. An image
sub block whose variance is greater than threshold is eligible of data embedding. The
Least Significant Bit (LSB) embedding technique is used for the data insertion. Empirical
results show that the proposed technique provides high steganographic capacity. Highly
textured images gives good results with this technique. As there is no constraint on the
image selection one is free to use textured images in steganography domain. The
experimental results are derived on a data set consisting of 2000 grayscale images
derived from NRCS and Corel databases.
B.Tech. Dissertation, Information Technology Department, D. D. University Page iii
TABLE OF CONTENTS
Chapter Topics Page No
Acknowledgment i
Abstract ii
Table of Contents iii
List of Tables v
List of Figures vi
Abbreviations vii
Definitions viii
1.0 Introduction 1
1.1 Introduction to the Research Problem 1
1.2 Motivation for the Research Work 1
1.3 Objective and Scope 2
2.0 Background Theory 3
2.1 The Steganography Problem 3
2.2 Applications of Steganography 3
2.3 Steganalysis 4
2.4 Performance Measures 5
3.0 Review of Literature 7
3.1 Literature Survey Summary 8
4.0 Analysis and Findings 9
4.1 Definition of various statistical parameter 9
4.2 Proposed Technique 10
4.3 Analysis 11
4.3.1 Effect of block size on embedding capacity 11
4.4 Snapshots 13
4.5 Steganalysis by SPAM 18
4.5.1 The SPAM features 19
4.5.2 Ensemble Classifiers 22
B.Tech. Dissertation, Information Technology Department, D. D. University Page iv
6.0 Conclusion and Future Work 31
References 32
5.0 Implementation Details 23
5.1 What is MATLAB? 23
5.2 Installation steps 23
5.3 What is an m-file? 24
5.4 Why use m-files? 24
5.5 How to run the m-file? 24
5.6 Significant Language Features 24
5.7 Applications 25
5.8 Experiment Setup 25
5.8.1 Experiment Setup-I: 25
5.8.1.1 Results 26
5.8.2 Experiment Setup-II: 28
5.8.2.1 Results 29
B.Tech. Dissertation, Information Technology Department, D. D. University Page v
List of Tables
Table 1: Example of Confusion Matrix 5
Table 2: Literature Survey Summary 8
Table 3: Effect of block size on embedding capacity and PSNR 11
Table 4: Order and Dimension of Spam features 19
Table 5: Result set of NRCS database. 26
Table 6: Result set of Corel database 28
B.Tech. Dissertation, Information Technology Department, D. D. University Page vi
List of Figures
Figure 1: Block Diagram of Proposed Technique 11
Figure 2: Graph of PSNR v/s No of Bits embedded 12
Figure 3: Graph of No of bits embedded v/s size of block 12
Figure 4: LSB bit slicing technique sample 13
Figure 5: Snapshot of image from NRCS database 14
Figure 6: Snapshot of image from Corel database 15
Figure 7: Snapshot of textured image 16
Figure 8: Snapshot of facial image 17
Figure 9: Block Diagram of Steganalysis 19
Figure 10: Feature Matrix of NRCS training dataset 25
Figure 11: Ensemble output for NRCS database 26
Figure 12: Feature Matrix of Corel training dataset 28
Figure 13: Ensemble output for Corel database 29
B.Tech. Dissertation, Information Technology Department, D. D. University Page vii
Abbreviations
LSB: Least Significant Bit
MSE: Mean Squared Error
PSNR: Peak Signal to Noise Ratio
SNR: Signal to Noise Ratio
SPAM: Subtractive Pixel Adjacency Matrix
SRM: Spatial Rich Model
SVM: Support Vector Machine
Stego: Steganographic content
B.Tech. Dissertation, Information Technology Department, D. D. University Page viii
Definitions
Steganography: Steganography is an art of hidden communication from a sender to a
receiver.
Steganalysis: Steganalysis is the practice of detecting hidden message in the stego
content.
Steganalyzer: The person practicing steganalysis is called the steganalyzer.
Binary steganalyzer: The steganalyzer who concerned only with the presence of hidden
message and simply classifies the sample as stego or non stego is
called a binary steganalyzer.
Quantitative steganalyzer: The steganalyzer who is estimates the message length or the
number of embedding changes is called a quantitative
steganalyzer.
Steganographic capacity: The number of bits that can be embedded in the cover without
beingdetected by the steganalyzers is called steganographic
capacity.
Bits per pixel (bpp): bpp is the ratio of number of bits embedded to the product of size of
imageand bit scale of image.
1 Introduction
B.Tech. Dissertation, Information Technology Department, D. D. University Page 1
1 Introduction
1.1 Introduction to Research Problem
Steganography is an art of hidden communication from a sender to a receiver. To
achieve secrecy different types of cover media like images, video and audio files
are used as carriers which contain the secret message. Steganography should
ensure that the third party cannot conclude anything about the hidden message.
The main idea here is to hide data within the cover in such a manner that the
unintended receiver cannot even predict its existence.
However, embedding the secret message in the cover medium generated stego
content and it causes distortion in the visual as well as statistical properties of the
cover medium. The embedding distortion may lead to detection of confidential
message. The person trying to detect the hidden message and break steganography
technique is known as steganalyst and this study of detecting messages from stego
content is known as steganalysis.
1.2 Motivation for Research work
Motivation for steganography is quite simple; it among the most efficient way of
covert communication. Using images as cover one can transmit the confidential
message easily across the Internet. Image steganography has bloomed in recent
years as images serve as good carriers. Content adaptability, visual resilience and
smaller size of images make them more preferable [10]. Also more detailed the
image, fewer the constraints on embedding.
There exist several image steganography techniques along with various attacks by
steganalysts. Security of steganography technique depends on the selection of
pixels. Pixels of noisy or textured area serve as good candidate for embedding bits
because the variance of such areas is high compared to smoother areas. The
steganographic security is challenged by the classical binary steganalyzers which
predict presence or absence of hidden message within a cover media. Recently
proposed quantitative steganalyzers go a step beyond the binary decision and
estimates the hidden message within a cover media. The steganalyzer uses
machine learning for detecting the stego content.
1 Introduction
B.Tech. Dissertation, Information Technology Department, D. D. University Page 2
1.3 Objective and Scope
The objective is to develop a steganographic scheme which has large data hiding
capacity while it is hidden to the steganalyzer. The challenge is to find a way to
minimize the distortion caused due to message embedding which varies statistical
features. If they do not change significantly then technique remains undetectable
to state of the art: binary and quantitative steganalyzers.
The objective of this dissertation is to find optimal steganographic capacity for
cover images. To test the security of steganographic scheme testing is done on
2000 grayscale images of both NRCS and Corel database. SPAM feature extractor
is used for extracting features from cover and stego images. Ensemble classifier is
trained using supervised learning to learn the difference in features of cover and
stego images. Lastly, random cover and stego images are given to classifier to
check whether the stego images are detected or not. The goal is to embed data in
such a way that the classifier is not able to detect the stego image correctly as
stego image.
2 Background Theory
B.Tech. Dissertation, Information Technology Department, D. D. University Page 3
2 Background Theory
2.1 Steganography Problem [8]
The steganography problem can be defined in the following manner:
A sender more commonly known as Alice is the steganographer who wants to
send a confidential message to Bob, the receiver. She possesses a source of
covers for covert communication and there exists a channel for the
communication. This channel is monitored by the attacker or warden who is the
steganalyst. He wishes to detect the hidden message and sometimes even decode
the message.
One solution is to use a channel that the warden is not aware of. However, this
approach is not satisfactory because it relies on ignorance of the warden. Other
solution is to determine appropriate steganographic capacity for given cover.
2.2 Application of Steganography
Steganography is a means of storing information in a way that hides the
information’s existence. Steganography can be used to carry out confidential
communication by combining it with existing communication methods. Digital
Steganography provides vast potential for following [7]:
 National security is a major concern for any Government. To protect national
security from terrorist organizations, government can use digital steganography
techniques.
 In today’s competitive world businesses need to protect their trade secrets or new
product information from their competitors. Hence, internally in any business they
can communicate using steganography in order to avoid leakage of their private
information.
 Lastly, Steganography can also be used for private communication between two
individuals. If a person wants to communicate without being subjected to
monitoring systems then digital steganography is a good solution.
2 Background Theory
[B.Tech. Dissertation, Information Technology Department, D. D. University Page 4
2.3 Steganalysis
Steganalysis is the countermeasure to steganography. Steganalysis is the practice
of detecting hidden message in the stego image constructed using steganographic
scheme. To detect the presence of secret message in an image Steganalysis tools
are used. These tools track the distortion caused due to data insertion. There are
three different types of steganalysis tools: visual, structural and non-structural.
Visual Steganalysis attacks analyze images for some distortions which are visible
to human vision system. The distortions could be visible in stego image or in LSB
plane extracted from the stego image [10].
Structural attacks analyze structural properties of an image to find any anomaly
which are introduced by steganography. Structural detectors such as histogram
attack, sample pair analysis, RS method and weighted stego can reliably detect
presence of stego data and even estimate message length [10].
Non-structural detectors use feature extractors to model cover image and to
compute distortion between cover and stego image to detect embedding. A
classifier is trained by the feature set from large number of stego and cover
images. During training, the classifier learns the differences in features and this
learning is used to classify a fresh image into stego or clean image. Non-structural
detectors such as subtractive pixel adjacency matrix (SPAM) and spatial rich
model (SRM) claim better probability of detection of embedding in a stego image.
Feature based on steganalysis techniques use support vector machine (SVM) or
ensemble classifiers for supervised learning. SVM is not suitable for any high-
dimension feature vector, while this not the case with ensemble classifier but its
performance is comparable to SVM [10].
For the steganalysis of proposed steganographic scheme SPAM feature extractor
and ensemble classifier is used. SPAM feature extractor gives high-dimension
feature vector of 686 features of an image. Therefore, ensemble classifier is used
as SVM does not support such high dimension feature vector. Also accuracy of
ensemble classifier is greater than SVM because ensemble classifier is an
ensemble of classifiers and final output is the majority amongst the votes of each
classifier.
Details of SPAM feature extractor and ensemble classifier is discussed in section
4.
2 Background Theory
B.Tech. Dissertation, Information Technology Department, D. D. University Page 5
2.4 Performance Measure
The performance measures used to evaluate the classifier is confusion matrix. In
the field of machine learning, a confusion matrix, also known as a contingency
table or an error matrix, is a specific table layout that allows visualization of the
performance of an algorithm, typically a supervised learning one (in unsupervised
learning it is usually called a matching matrix). Supervised learning is the type of
learning in which the training instances are labeled with the correct class. In an
unsupervised algorithm the training instances are not labeled. For steganography
and steganalysis supervised learning is used as two classes are known: Stego and
Non stego. So the classifier is trained using supervised learning to get satisfactory
results.
Each column of the matrix represents the instances in a predicted class, while
each row represents the instances in an actual class. The name stems from the fact
that it makes it easy to see if the system is confusing two classes (i.e. commonly
mislabeling one as another).
Example: If a classification system has been trained to distinguish between stego
and non stego images, a confusion matrix will summarize the results of testing the
algorithm for further inspection. Assuming a sample of 50 images — 25 non stego
and 25 stego images, the resulting confusion matrix could look like the table
below:
Predicted class
Non stego Stego
Actual
Class
Non stego 15 10
Stego 14 11
Table 1: Confusion matrix
2 Background Theory
B.Tech. Dissertation, Information Technology Department, D. D. University Page 6
Here there are two classes: Actual class and predicted class for stego and non
stego images. Out of 25 non stego images 15 are correctly predicted as non stego
and 10 as stego. Similarly out of 25 stego images 14 are correctly predicted as non
stego and 10 as stego.
In predictive analytics, a table of confusion (sometimes also called a confusion
matrix), is a table with two rows and two columns that reports the number of false
positives, false negatives, true positives, and true negatives. This allows more
detailed analysis than mere proportion of correct guesses (accuracy). Accuracy is
not a reliable metric for the real performance of a classifier, because it will yield
misleading results if the data set is unbalanced (that is, when the number of
samples in different classes vary greatly). For example, if there were 95 non stego
and only 5 stego in the data set, the classifier could easily be biased into
classifying all the samples as non stego. The overall accuracy would be 95%, but
in practice the classifier would have a 100% recognition rate for the non stego
class but a 0% recognition rate for the stego class. Assuming the confusion matrix
above, its corresponding table of confusion, for the non stego class, would be:
The final table of confusion would contain the average values for all
classes combine.
5 true positives
(actual non stego that were
correctly classified as non stego)
3 false negatives
(non stego that were
incorrectly marked as stego)
2 false positives
(stego that were
incorrectly labeled as non stego)
17 true negatives
(stego correctly
classified as stego)
3 Literature Review
B.Tech. Dissertation, Information Technology Department, D. D. University Page 7
3 Literature Review
Saiful Islam, Mangat R Modi and Phalguni Gupta [10] in their paper “Edge-based image
steganography” had proposed a steganography which can hide the secret message only in
the edges of the cover image. The proposed steganography technique has excellent
security against steganalysis attacks.
Andrew D. ker, Patrick Bas, Rainer Bohme, Remi Cogranne, Scott Craver, Tomas Filler,
Jessica Fridrich and Tomas Pevny[8] in their paper “Moving Steganography and
Steganalysis from the Laboratory into the Real World” had raised some of the important
questions which have been left unanswered and highlighted some that have been already
addressed successfully, for steganography and steganalysis to be used in the real world.
Jan Kodovsky, Jessica Fridrich and Vojtech Holub [9] in their paper “Ensemble
Classifiers for Steganalysis of Digital Media” had proposed well known machine learning
tool - ensemble classifiers. They have argued that ensemble classifiers scale much more
favorably w.r.t. the number of training examples and the feature dimensionality with
performance comparable to more complex SVMs.
Donovan Artz in his paper “Digital Steganography: Hiding Data within Data” [7] had
given detailed description of Steganography and Steganalysis highlighting its importance.
History, Uses of Steganography, Various Methods for Steganography and Steganalysis
were discussed.
Neil F Johnson and Sushil Jajodia in their paper “Exploring Steganography: Seeing the
Unseen” [11] had discussed image files and how information can be hidden in them.
They also discussed results obtained from evaluating available steganographic software.
3 Literature Review
B.Tech. Dissertation, Information Technology Department, D. D. University Page 8
3.1 Literature review summary
Table 2: Literature survey summary
Author Title Year Method
Described
Advantage Accura
cy
Saiful Islam,
Mangat R
Modi and
Phalguni Gupta
[10]
Edge-based image
Steganography
2014 Edge-based
image
Steganography
Robust to
visual,
structural,
Non-
structural
Attacks
51.1%
Andrew ker,
Patrick Bas,
Rainer Bohme,
Remi
Cogranne,
Scott Craver,
Tomas Filler,
Jessica Fridich
[8]
Moving
Steganography and
Steganalysis from
the Laboratory into
the Real World
2013 Open problems
on
Steganography
and
Steganalysis
discussed
Hints for
Steganograp
hy and
steganalysis
techniques to
address real
world
N/A
Jan Kodovsky,
Jessica
Fridrich,
Vojtech Holub
[9]
Ensemble
Classifiers for
Steganalysis of
Digital Media
2010 Ensemble
classifier
Better
precision and
accuracy for
high
dimensional
feature set.
Training
time is less
compared to
SVM
Better
than
SVMs.
Donovan Artz
[7]
Digital
Steganography
2001 Summary of
various
methods of
digital
steganography
Highlights
use of
steganograph
y and tools
used for
same
N/A
Neil F Johnson
and Sushil
Jajodia [11]
Exploring
Steganography:
Seeing the unseen.
1998 Hidding
message using
image files.
LSB insertion
technique.
Easy to
implement.
Does not
change
visual and
statistical
properties
N/A
4 Analysis and Finding
B.Tech. Dissertation, Information Technology Department, D. D. University Page 9
4 Analysis and Finding
From the literature reviews we got some results that message embedded in noisy or
textured regions is not easily detected by the steganalyzers. Also any steganography
technique should be such that the embedding does not cause distortion in visual and
statistical properties of an image. So we came up with the idea of using a statistical
property-Variance of an image as threshold for embedding.
4.1 Definition of Statistical parameters
Variance:
In statistics, variance measures how far a set of numbers is spread out. A variance of zero
indicates that all the values are identical. Variance is always non-negative: a small
variance indicates that the data points tend to be very close to the mean (expected value)
and hence to each other, while a high variance indicates that the data points are very
spread out around the mean and from each other [13].
The variance of a set of n equally likely values can be written as:
Variance Var(X) =
1
𝑛
∑ (𝑥̅𝑛
𝑖=1 – 𝑥𝑖)2
Mean squared error (MSE):
In statistics, the mean squared error (MSE) of an estimator measures the average of the
square of the errors that is the difference between the estimator and what is estimated
[12].
If Y^
is a vector of n predictions, and Y is the vector of the true values, then the
(estimated)
MSE of the predictor is:
MSE =
1
𝑛
∑ (𝑌𝑖
^
− 𝑌𝑖)𝑛
𝑖=1
2
4 Analysis and Finding
B.Tech. Dissertation, Information Technology Department, D. D. University Page 10
Peak signal to noise ratio (PSNR):
PSNR is an engineering term for the ratio between the maximum power of a signal and
the power of corrupting noise that affects the fidelity of its representation [14].
It is usually expressed in terms of the logarithmic decibel scale.
PSNR = 10 log10(
𝑀𝐴𝑋2
𝑀𝑆𝐸
)
For gray scale images range of pixels is 0 to 255
PSNR for gray scale images = 10 log10(
2552
𝑀𝑆𝐸
)
Ideally PSNR should be infinite.
4.2 Proposed technique
 The proposed technique is a novel steganography technique.
 An image of NxN size is divided into nxn sub-blocks.
 Variance of each sub block is calculated. Variance of overall image is
used as threshold.
 Each block with variance greater than or equal to threshold is an eligible
candidate for data insertion
 Least significant bit (LSB) embedding technique is used for data insertion.
 In order to maintain visual and statistical properties of an image,
steganographic capacity i.e. number of bits that can be embedded in the
cover image is determined for given cover. If more number of bits is
embedded than the steganographic capacity of image then it may lead to
steganographic detectability.
 Variance of textured areas is high. Hence, these areas prove to be better
candidate than smoother areas for embedding data because manipulations
done in these areas aren’t visible to human eye.
4 Analysis and Finding
B.Tech. Dissertation, Information Technology Department, D. D. University Page 11
Figure 1: Block Diagram of Proposed technique
4.3 Analysis
4.3.1 Effect of block size on embedding capacity and
PSNR
With increase in block size embedding capacity increase and PSNR
decreases
Database Block Size Embedding Capacity PSNR
NRCS 8x8 6400 70.30
16x16 25600 64.28
32x32 51200 60.51
64x64 81920 59.23
Table 3: Effect of block size on embedding capacity and PSNR
4 Analysis and Finding
B.Tech. Dissertation, Information Technology Department, D. D. University Page 12
Figure 2: Graph of PSNR v/s No of Bits embedded
Figure 3: Graph of No of bits embedded v/s size of block
4 Analysis and Finding
B.Tech. Dissertation, Information Technology Department, D. D. University Page 13
4.4 Screenshots
Example of LSB bit slicing method which is used in the proposed technique for
embedding bits
Original Image Stego Image
Figure 4: LSB bit slicing technique sample
4 Analysis and Finding
B.Tech. Dissertation, Information Technology Department, D. D. University Page 14
Stego images constructed using proposed steganography technique
1) NRCS database:
Figure 5: Snapshot of image from NRCS database
4 Analysis and Finding
B.Tech. Dissertation, Information Technology Department, D. D. University Page 15
2) Corel Database:
Figure 6: Snapshot of image from Corel database
4 Analysis and Finding
B.Tech. Dissertation, Information Technology Department, D. D. University Page 16
3) Textured Image:
Figure 7: Snapshot of textured image
4 Analysis and Finding
B.Tech. Dissertation, Information Technology Department, D. D. University Page 17
4) Facial Image:
Figure 8: Snapshot of facial image
4 Analysis and Finding
B.Tech. Dissertation, Information Technology Department, D. D. University Page 18
4.5 Steganalysis by SPAM
 Analysis of these techniques is performed by taking SPAM feature sets
from their respective stego images and natural images.
 These features are used to train Ensemble classifier to learn the difference
in features caused by steganography.
 Ensemble classifier is trained by supervised learning- a variable Y=1 is
assigned for cover images and Y=0 for stego images.
 Testing is performed by taking random samples of cover and stego
images.
 For binary classification of the testing set, predict method of ensemble
classifier is used.
 Training and testing is performed on images from following databases:
1) NRCS
2) Corel
4 Analysis and Finding
B.Tech. Dissertation, Information Technology Department, D. D. University Page 19
Figure 9: Block Diagram of Steganalysis
4.5.1 The SPAM features [6]
We now explain the Subtractive Pixel Adjacency Model of covers (SPAM) that
will be used to compute features for steganalysis. First, the transition probabilities
along eight directions are computed. The differences and the transition probability
are always computed along the same direction. We explain further calculations
only on the horizontal direction as the other directions are obtained in a similar
manner. All direction-specific quantities will be denoted by a superscript showing
the direction of the calculation.
The calculation of features starts by computing the difference array 𝐷.
. For a
horizontal direction left-to-right
𝐷𝑖,𝑗
→
= 𝐼𝑖,𝑗 − 𝐼𝑖,𝑗+1 ,
4 Analysis and Finding
B.Tech. Dissertation, Information Technology Department, D. D. University Page 20
𝑖 ∈ {1, … … . 𝑚}, 𝑗 𝜖{1, … . . 𝑛 − 1}.
Order T Dimension
1st
4 162
2nd
3 686
Table 4: 686 Dimension of model is used in our experiments. Column
“order” shows the order of the Markov chain and T is the range of
differences.
The first-order SPAM features, F1st
, model the difference arrays D by a first-order
Markov process. For the horizontal direction, this leads to
𝑀 𝑢,𝑣
→
= Pr( 𝐷𝑖,𝑗+1
→
= 𝑢|𝐷𝑖,𝑗
→
= 𝑣),
where 𝑢, 𝑣 𝜖 {−𝑇, … … … , 𝑇}. If Pr( 𝐷𝑖,𝑗
→
= 𝑣) = 0 then 𝑀 𝑢,𝑣
→
=
Pr( 𝐷𝑖,𝑗+1
→
= 𝑢|𝐷𝑖,𝑗
→
= 𝑣) = 0.
The second-order SPAM features, F2nd
, model the difference arrays D by a
second-order Markov process. Again, for the horizontal direction,
𝑀 𝑢,𝑣,𝑤
→
= Pr( 𝐷𝑖,𝑗+2
→
= 𝑢| 𝐷𝑖,𝑗+1
→
= 𝑣 , 𝐷𝑖,𝑗
→
= 𝑤),
Where 𝑢, 𝑣, 𝑤 𝜖{−𝑇, … … . . , 𝑇}. If Pr(𝐷𝑖,𝑗+1
→
= 𝑣 , 𝐷𝑖,𝑗
→
= 𝑤) = 0 then
𝑀 𝑢,𝑣,𝑤
→
= Pr( 𝐷𝑖,𝑗+2
→
= 𝑢| 𝐷𝑖,𝑗+1
→
= 𝑣 , 𝐷𝑖,𝑗
→
= 𝑤) = 0.
To decrease the feature dimensionality, we make a plausible assumption that the
statistics in natural images are symmetric with respect to mirroring and flipping
(the effect of portrait / landscape orientation is negligible). Thus, we separately
average the horizontal and vertical matrices and then the diagonal matrices to
form the final feature sets, F1st
, F2nd
. With a slight abuse of notation, this can be
formally written:
𝐹1,….,𝑘
.
=
1
4
{𝑀.
→
, 𝑀.
←
, 𝑀.
↓
, 𝑀.
↑
}
𝐹𝑘+1,….,2𝑘
.
=
1
4
{𝑀. , 𝑀. , 𝑀. , 𝑀. } (1)
4 Analysis and Finding
B.Tech. Dissertation, Information Technology Department, D. D. University Page 21
Where k = (2T + 1)2
for the first-order features and k = (2T + 1)3
for the second-
order features. In experiments described in Section 3, we used T = 4 for the first-
order features, obtaining thus 2k = 162 features, and T = 3 for the second-order
features, leading to 2k = 686 features (c.f., Table 1). To summarize, the SPAM
features are formed by the averaged sample Markov transition probability
matrices (1) in the range [−T, T]. The dimensionality of the model is determined
by the order of the Markov model and the range of differences T).
The order of the Markov chain, together with the parameter T, controls the
complexity of the model. The concrete choice depends on the application,
computational resources, and the number of images available for the classifier
training.
The calculation of the difference array can be interpreted as high-pass filtering
with the kernel [−1, +1], which is, in fact, the simplest edge detector. The filtering
suppresses the image content and exposes the stego noise, which results in a
higher SNR. The filtering can be also seen as a different form of calibration. From
this point of view, it would make sense to use more sophisticated filters with a
better SNR. Interestingly, none of the filter provided consistently better
performance. We believe that the superior accuracy of the simple filter [−1, +1] is
because it does not distort the stego noise as more complex filters do.
4 Analysis and Finding
B.Tech. Dissertation, Information Technology Department, D. D. University Page 22
4.5.2 Ensemble Classifier
An ensemble of classifier is a set of classifiers whose individual decisions
are integrated to classify the new samples [1]. The advantage of using this
classifier is that the result obtained is more certain, precise and accurate compared
to SVM.
Numerous methods are used for creating ensemble classifiers [2]:
Bagging
One of the method is called bagging. In bagging from a training set
of size X, X random instances from it are drawn with replacement (i.e.
using a uniform distribution). These X instances can be learned, and this
process can be repeated several times. The instances drawn will contain
some duplicates and some omissions as compared to the original training
set since the draw is with replacement. Each cycle through the process
results in one classifier. Several such classifiers are constructed and final
prediction is made by taking votes of each classifier.
Boosting
Another method is called boosting. AdaBoost is a practical version
of the boosting approach. Boosting is similar in overall structure to
bagging. The only difference here is that one keeps track of the
performance of the learning algorithm and forces it to concentrate its
efforts on instances that have not been correctly learned. Instead of
choosing the X training instances randomly using a uniform distribution,
one chooses the training instances in such a manner as to favor the
instances that have not been accurately learned. After several cycles, the
prediction is performed by taking a weighted vote of the predictions of
each classifier, with the weights being proportional to each classifier’s
accuracy on its training set.
Boosting algorithms are considered stronger than bagging on noise free
data. However, there are strong empirical indications that bagging is much
more robust than boosting in noisy settings.
5 Implementation Details
B.Tech. Dissertation, Information Technology Department, D. D. University Page 23
5 Implementation Details
We have chosen Matlab tool for Implementation. The Detail of tool and language is
given below:
5.1 What is MATLAB?
MATLAB is widely used in all areas of applied mathematics, in education and research
at universities, and in the industry. MATLAB stands for MATrix LABoratory and the
software is built up around vectors and matrices. This makes the software particularly
useful for linear algebra but MATLAB is also a great tool for solving algebraic and
differential equations and for numerical integration. MATLAB has powerful graphic
tools and can produce nice pictures in both 2D and 3D. It is also a programming
language, and is one of the easiest programming
languages for writing mathematical programs. MATLAB also has some tool boxes useful
for signal processing; image processing, optimization and many other applications will be
discussed latter.
5.2 Installation steps
1. Load the DVD into the PC you want to install Matlab onto. The DVD should
automatically start the installation program whereby you will see the first splash screen.
Press Next…
2. You need to agree to the Mathworks license. Press Next…
3. Choose the ‘Typical’ installation. Press Next…
4. Choose the location of the installation. Press Next…
5. If the location doesn’t exist, you will be prompted to create it. Press Yes…
6. Confirm the installation settings by pressing Install
7. Matlab will now install, this may take several minutes
8. After the installation has completed, you then need to license your install. You need to
have the serial number ready. This number can be located on the DVD case. Press
Next…
9. Matlab will initially make an internet connection the Mathworks prior to you entering
the serial number.
10. Answer yes when asked if you are a student. Press Next…
11. Continue with the rest of the registration process until the installation is complete.
5 Implementation Details
B.Tech. Dissertation, Information Technology Department, D. D. University Page 24
5.3 What is an m-file?
An m-file, or script file, is a simple text file where you can place MATLAB commands.
When the file is run, MATLAB reads the commands and executes them exactly as it
would if you had typed each command sequentially at the MATLAB prompt. All m-file
names must end with the extension '.m' (e.g. test.m). If you create a new m-file with the
same name as an existing m-file, MATLAB will choose the one which appears first in the
path order (type help path in the
command window for more information). To make life easier, choose a name for your m-
file which doesn't already exist. To see if a filename.m already exists, type help filename
at the MATLAB prompt.
5.4 Why use m-files?
For simple problems, entering your requests at the MATLAB prompt is fast and efficient.
However, as the number of commands increases or trial and error is done by changing
certain variables or values, typing the commands over and over at the MATLAB prompt
becomes tedious. M-files will be helpful and almost necessary in these cases.
5.5 How to run the m-file?
After the m-file is saved with the name filename.m in the current MATLAB folder or
directory, you can execute the commands in the m-file by simply typing filename at the
MATLAB command window prompt. If you don't want to run the whole m-file, you can
just copy the part of the m-file that you want to run and paste it at the MATLAB prompt.
5.6 Significant Language Features
MATLAB® is a mathematical scripting language that looks very much like C++. Some
features of the language are:
Efficient matrix and vector computations
Easy creation of scientific and engineering graphics
Application development, including graphical user interface building
Object-oriented programming
Extensibility (Tool Boxes)
File I/O functions
String Processing
5 Implementation Details
B.Tech. Dissertation, Information Technology Department, D. D. University Page 25
5.7 Applications
Because of MATLAB®'s numerous matrix and vector computation and manipulation
algorithms, the software is primarily used for:
Producing solutions to complex systems of equations
Modeling, simulation, and prototyping
Data analysis, exploration, and visualization
5.8 Experiment Setup
5.8.1 Experiment Setup I
Database: NRCS
 The stego algorithm of proposed technique is applied on 1500 grayscale images of
NRCS database. As a result 1500 stego images are constructed for respective
cover images. To extract the features of cover and stego images they are given to
SPAM feature extractor.
 Thereafter two datasets are created
i) Training dataset- contains 686 features of each of 3000 images;
1500-1500 cover and stego pair of images.
ii) Testing dataset- contains 686 features of each of 100 images; 50-
50 cover and stego pair of images.
Figure 10: Feature Matrix - training set
5 Implementation Details
B.Tech. Dissertation, Information Technology Department, D. D. University Page 26
 Ensemble classifier is built using fitensemble method of matlab.
Ensemble = fitensemble(X, Y,’Adaboost’,3000,’Tree’)
Where X is the training dataset
Y is the logical column vector. 0 for stego image and 1 for cover image.
Each row
of Y represents the classification of the corresponding row of X.
AdaBoostM1 is an algorithm used for classification of two classes.
3000 is number of learners to be trained.
Tree is a weak learner supported by fitensemble for classification.
Output is the Ensemble model.
Figure 11: Ensemble classifier structure
 The predict property of Ensemble model is used for binary classification of the
testing dataset.
5.8.1.1: Results
 Each 512x512 image is divided into sub blocks of various sizes: 8x8, 16x16,
32x32 and 64x64. Experiments were carried out for each of these sizes. Table 5
shows the output results for different sub block sizes. Number of bits embedded,
bits per pixel and accuracy of classifier varies with each of them.
5 Implementation Details
B.Tech. Dissertation, Information Technology Department, D. D. University Page 27
Block
size
No of bits
embedded
Bits per
pixel (bpp)
Accuracy Detected
8x8 6400 0.003 53% Misclassification
16x16 25600 0.012 73% Misclassification
32x32 51200 0.024 85.2% Misclassification
64x64 81920 0.039 91% Correctly classified
Table 5: Result set for NRCS
 The classifier gives misclassification till 51200 bits are embedded i.e. the
classifier is confused is not able to correctly determine stego and non stego
images. After that it starts correctly classifying the stego and non stego images.
Therefore, steganographic capacity for NRCS database is 51200 bits i.e. 0.024
bits per pixel (bpp).
5 Implementation Details
B.Tech. Dissertation, Information Technology Department, D. D. University Page 28
5.8.2 Experiment Setup II
Database: Corel
 The same procedure as discussed above for NRCS database is carried out for gray
scale images of Corel database. Only difference here is 2000 images are used.
Therefore following changes are to be made.
 Training dataset consists of 686 features of each of 3800 images; 1900-
1900 cover and stego image pairs
 Testing dataset consists of 686 features of each of 200 images; 100-100
cover and stego image pairs.
 Ensemble classifier is built by giving 3800 base learners in fitensemble
property.
Figure 12: Feature Matrix – training set
5 Implementation Details
B.Tech. Dissertation, Information Technology Department, D. D. University Page 29
Figure 13: Ensemble Output
5.8.2.1: Results
 Each 512x512 image is divided into sub blocks of various sizes: 8x8, 16x16,
32x32 and 64x64. Experiments were carried out for each of these sizes. Table 5
shows the output results for different sub block sizes. Number of bits embedded,
bits per pixel and accuracy of classifier varies with each of them.
5 Implementation Details
B.Tech. Dissertation, Information Technology Department, D. D. University Page 30
Block
Size
No of bits
embedded
Bits per
pixel (bpp)
Accuracy Detected
8x8 6400 0.003 56% Misclassification
16x16 25600 0.012 75% Misclassification
32x32 51200 0.024 98% Correctly classified
64x64 81920 0.039 100% Correctly classified
Table 6: Result set of Corel Database
The classifier gives misclassification till 25600 bits are embedded i.e. the classifier
is confused is not able to correctly determine stego and non stego images. After
that it starts correctly classifying the stego and non stego images. Therefore,
steganographic capacity for COREL database is 25600 bits i.e. 0.012 bits per pixel
(bpp).
6 Conclusion and Future Work
B.Tech. Dissertation, Information Technology Department, D. D. University Page 31
6 Conclusion and Future Work
From the result sets it can be concluded that the proposed steganographic scheme
provides high steganographic capacity compared to other prevailing steganographic
schemes and remains robust to the attacks of steganalysts.
In order to make the steganographic scheme more secure, randomization technique
can be used. In this technique data is embedded in a block randomly chosen inside the
sub block of the image. As the candidate block is chosen randomly for data insertion
the classifier is not able to detect the presence of stego content. But the
steganographic capacity is less compared to that of proposed technique. In future one
can work on this problem to increase the steganographic capacity of randomization
technique.
References
B.Tech. Dissertation, Information Technology Department, D. D. University Page 32
References
1) http://www.cs.iastate.edu/~jtian/cs573/WWW/Lectures/lecture13-Ensemble-
2up.pdf
last accessed on 27/03/2015
2) http://en.wikipedia.org/wiki/Ensembles_of_classifiers
last accessed on 22/03/2015
3) http://en.wikipedia.org/wiki/Steganography
last accessed on 22/03/2015
4) http://en.wikipedia.org/wiki/Steganalysis
last accessed on 22/03/2015
5) http://dde.binghamton.edu/download/spam/ last accessed on 27/01/2015
6) Pevny, T. Dept. of Cybern., Czech Tech. Univ. in Prague, Prague, Czech
Republic Bas, P. ; Fridrich, J., “Steganalysis by Subtractive Pixel Adjacency
Matrix”, Information Forensics and Security, IEEE Transactions on (Volume:5
, Issue: 2 )
“http://ws2.binghamton.edu/fridrich/Research/paper_6_dc.pdf” last accessed on
28/01/2015
7) Artz, D. ; Los Alamos Nat. Lab., NM, USA, “Digital steganography: hiding data
within data”, Internet Computing, IEEE (Volume:5 , Issue: 3 )
”http://ieeexplore.ieee.org/xpl/articleDetails.jsp?reload=true&arnumber=935180”
last accessed on 23/12/2014.
8) Andrew ker, Patrick Bas, Rainer Bohme, Remi Cogranne, Scott Craver, Tomas
Filler, Jessica Fridich, “Moving Steganography and Steganalysis from the
Laboratory into the Real World” DOI: 10.1145/2482513.2482965 Conference:
Proceedings of the first ACM workshop on Information hiding and multimedia
security http://www.cs.ox.ac.uk/andrew.ker/docs/ADK57B.pdf last accessed on
15/12/2014
9) Kodovsky, J. ; Dept. of Electr. & Comput. Eng., Binghamton Univ., Binghamton,
NY, USA ; Fridrich, J. ; Holub, V., “Ensemble Classifiers for Steganalysis of
Digital Media” , Information Forensics and Security, IEEE Transactions
on (Volume:7 , Issue: 2 )
“http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=6081929&url=http%3A
%2F%2Fieeexplore.ieee.org%2Fiel5%2F10206%2F6166811%2F06081929.pdf%
3Farnumber%3D608192”
10) Saiful Islam* , Mangat R Modi and Phalguni Gupta, “Edge-based image
steganography”,Islam et al. EURASIP Journal on Information Security 2014,
2014:8
http://jis.eurasipjournals.com/content/pdf/1687-417X-2014-8.pdf
References
B.Tech. Dissertation, Information Technology Department, D. D. University Page 33
11) Johnson, N.F. ; Jajodia, S., “Exploring steganography: Seeing the unseen”.
Computer (Volume:31, Issue:2 ),pages26–34,
http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=4655281&url=http%3A%
2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D465528
12) http://en.wikipedia.org/wiki/Mean_squared_error
13) http://en.wikipedia.org/wiki/Variance
14) http://en.wikipedia.org/wiki/Peak_signal-to-noise_ratiol
15) Madhuri A. Joshi, Mehul S. Raval, Yogesh H. Dandawate, Kalyani R. Joshi,
Shilpa P. Metkar, Image and Video Compression: Fundamentals, Techniques
and Applications , Chapman and Hall / CRC, FL, US, 17th Nov. 2014, DOI:
10.1201/b17738-1, Print ISBN 978-1-4822-2822-9 eBook ISBN: 978-1-4822-
2823-6.

ANURADHA_FINAL_REPORT

  • 1.
    A Project Report On Developing SecureSteganography Scheme Developed at Ahmedabad University IET, Ahmedabad University, Opposite M.G. Science College, Ahmedabad, Gujarat, India-380009 Developed by Anuradha Chaudhary - IT Department, DD University Guided By Internal Guide: External Guide: Nikita Desai Dr. Mehul Raval Department of Information Technology Associate Professor Faculty of Technology IET DD University Ahmedabad University Department of Information Technology Faculty of Technology, Dharmsinh Desai University College Road, Nadiad-387001 April-2015
  • 2.
    CANDIDATE’S DECLARATION I declarethat final semester report entitled “Developing Secure Steganography Scheme” is my own work conducted under the supervision of the external guide Dr. Mehul Raval, Associate Professor at IET, Ahmedabad University. I further declare that to the best of my knowledge the report for B.Tech. final semester does not contain part of the work which has been submitted for the award of B.Tech. Degree either in this or any other university without proper citation. Candidate’s Signature Anuradha Chaudhary Branch: IT Student ID: 11ITUOS013
  • 3.
    DHARMSINH DESAI UNIVERSITY NADIAD-387001,GUJARAT CERTIFICATE This is to certify that the project entitled “Developing Secure Steganography Scheme” is a bonafied report of the work carried out by Ms. Anuradha S Chaudhary, Student ID No: 11ITUOS013 of Department of Information Technology, semester VIII, under the guidance and supervision for the award of the degree of Bachelor of Technology at Dharmsinh Desai University, Nadiad (Gujarat). She was involved in Project training during academic year 2014-2015. Mrs. Nikita Desai (Project Guide) Department of Information Technology, Faculty of Technology, Dharmsinh Desai University, Nadiad Date: Prof. R.S.Chhajed Head, Department of Information Technology, Faculty of Technology, Dharmsinh Desai University, Nadiad Date
  • 4.
    B.Tech. Dissertation, InformationTechnology Department, D. D. University Page i Acknowledgement At this moment of accomplishment, I acknowledge the valuable guidance and wisdom of my research supervisors without whom this dissertation would not have been feasible. First and foremost I would like to thank my project guide Dr Mehul S. Raval, Associate Professor, IET, Ahmedabad University for introducing me to this interesting and challenging domain of steganography and steganalysis. I am grateful to him for being very patient with my knowledge gaps in the area. His teaching style and enthusiasm for topic made a strong impression on me. It served as a gateway for me to think innovatively. During our discussions he raised my precious points which I hope I have managed to address here. I would also like to show my gratitude to my guide Prof Nikita P Desai, Associate Professor, Dharmsinh Desai University for her dedicated involvement in every step throughout the process of research. I appreciate all of her expertise, guidance and careful critique of this research work. Her valuable guidance and support encouraged me and demonstrated to me that learning never ends. Lastly, I would like to express my sincere thanks to our Head of Department Prof. R.S.Chhajed who gave me an opportunity to explore the research domain at undergraduate level. Chaudhary Anuradha Sanjay (11ITUOS013) B.Tech IT Dharmsinh Desai University, Nadiad April 2015 chaudhary.anuradha94@gmail.com
  • 5.
    B.Tech. Dissertation, InformationTechnology Department, D. D. University Page ii Abstract Developing Secure Steganography Scheme B.Tech Dissertation by Anuradha Sanjay Chaudhary At Dharmsinh Desai University, Nadiad, April 2015 Steganography is an art of hidden communication from a sender to a receiver. A novel steganography scheme is proposed in this dissertation where an image which is divided into sub blocks and data is embedded in them using the variance. The embedding blocks are selected based on global variance of overall image is used as the threshold. An image sub block whose variance is greater than threshold is eligible of data embedding. The Least Significant Bit (LSB) embedding technique is used for the data insertion. Empirical results show that the proposed technique provides high steganographic capacity. Highly textured images gives good results with this technique. As there is no constraint on the image selection one is free to use textured images in steganography domain. The experimental results are derived on a data set consisting of 2000 grayscale images derived from NRCS and Corel databases.
  • 6.
    B.Tech. Dissertation, InformationTechnology Department, D. D. University Page iii TABLE OF CONTENTS Chapter Topics Page No Acknowledgment i Abstract ii Table of Contents iii List of Tables v List of Figures vi Abbreviations vii Definitions viii 1.0 Introduction 1 1.1 Introduction to the Research Problem 1 1.2 Motivation for the Research Work 1 1.3 Objective and Scope 2 2.0 Background Theory 3 2.1 The Steganography Problem 3 2.2 Applications of Steganography 3 2.3 Steganalysis 4 2.4 Performance Measures 5 3.0 Review of Literature 7 3.1 Literature Survey Summary 8 4.0 Analysis and Findings 9 4.1 Definition of various statistical parameter 9 4.2 Proposed Technique 10 4.3 Analysis 11 4.3.1 Effect of block size on embedding capacity 11 4.4 Snapshots 13 4.5 Steganalysis by SPAM 18 4.5.1 The SPAM features 19 4.5.2 Ensemble Classifiers 22
  • 7.
    B.Tech. Dissertation, InformationTechnology Department, D. D. University Page iv 6.0 Conclusion and Future Work 31 References 32 5.0 Implementation Details 23 5.1 What is MATLAB? 23 5.2 Installation steps 23 5.3 What is an m-file? 24 5.4 Why use m-files? 24 5.5 How to run the m-file? 24 5.6 Significant Language Features 24 5.7 Applications 25 5.8 Experiment Setup 25 5.8.1 Experiment Setup-I: 25 5.8.1.1 Results 26 5.8.2 Experiment Setup-II: 28 5.8.2.1 Results 29
  • 8.
    B.Tech. Dissertation, InformationTechnology Department, D. D. University Page v List of Tables Table 1: Example of Confusion Matrix 5 Table 2: Literature Survey Summary 8 Table 3: Effect of block size on embedding capacity and PSNR 11 Table 4: Order and Dimension of Spam features 19 Table 5: Result set of NRCS database. 26 Table 6: Result set of Corel database 28
  • 9.
    B.Tech. Dissertation, InformationTechnology Department, D. D. University Page vi List of Figures Figure 1: Block Diagram of Proposed Technique 11 Figure 2: Graph of PSNR v/s No of Bits embedded 12 Figure 3: Graph of No of bits embedded v/s size of block 12 Figure 4: LSB bit slicing technique sample 13 Figure 5: Snapshot of image from NRCS database 14 Figure 6: Snapshot of image from Corel database 15 Figure 7: Snapshot of textured image 16 Figure 8: Snapshot of facial image 17 Figure 9: Block Diagram of Steganalysis 19 Figure 10: Feature Matrix of NRCS training dataset 25 Figure 11: Ensemble output for NRCS database 26 Figure 12: Feature Matrix of Corel training dataset 28 Figure 13: Ensemble output for Corel database 29
  • 10.
    B.Tech. Dissertation, InformationTechnology Department, D. D. University Page vii Abbreviations LSB: Least Significant Bit MSE: Mean Squared Error PSNR: Peak Signal to Noise Ratio SNR: Signal to Noise Ratio SPAM: Subtractive Pixel Adjacency Matrix SRM: Spatial Rich Model SVM: Support Vector Machine Stego: Steganographic content
  • 11.
    B.Tech. Dissertation, InformationTechnology Department, D. D. University Page viii Definitions Steganography: Steganography is an art of hidden communication from a sender to a receiver. Steganalysis: Steganalysis is the practice of detecting hidden message in the stego content. Steganalyzer: The person practicing steganalysis is called the steganalyzer. Binary steganalyzer: The steganalyzer who concerned only with the presence of hidden message and simply classifies the sample as stego or non stego is called a binary steganalyzer. Quantitative steganalyzer: The steganalyzer who is estimates the message length or the number of embedding changes is called a quantitative steganalyzer. Steganographic capacity: The number of bits that can be embedded in the cover without beingdetected by the steganalyzers is called steganographic capacity. Bits per pixel (bpp): bpp is the ratio of number of bits embedded to the product of size of imageand bit scale of image.
  • 12.
    1 Introduction B.Tech. Dissertation,Information Technology Department, D. D. University Page 1 1 Introduction 1.1 Introduction to Research Problem Steganography is an art of hidden communication from a sender to a receiver. To achieve secrecy different types of cover media like images, video and audio files are used as carriers which contain the secret message. Steganography should ensure that the third party cannot conclude anything about the hidden message. The main idea here is to hide data within the cover in such a manner that the unintended receiver cannot even predict its existence. However, embedding the secret message in the cover medium generated stego content and it causes distortion in the visual as well as statistical properties of the cover medium. The embedding distortion may lead to detection of confidential message. The person trying to detect the hidden message and break steganography technique is known as steganalyst and this study of detecting messages from stego content is known as steganalysis. 1.2 Motivation for Research work Motivation for steganography is quite simple; it among the most efficient way of covert communication. Using images as cover one can transmit the confidential message easily across the Internet. Image steganography has bloomed in recent years as images serve as good carriers. Content adaptability, visual resilience and smaller size of images make them more preferable [10]. Also more detailed the image, fewer the constraints on embedding. There exist several image steganography techniques along with various attacks by steganalysts. Security of steganography technique depends on the selection of pixels. Pixels of noisy or textured area serve as good candidate for embedding bits because the variance of such areas is high compared to smoother areas. The steganographic security is challenged by the classical binary steganalyzers which predict presence or absence of hidden message within a cover media. Recently proposed quantitative steganalyzers go a step beyond the binary decision and estimates the hidden message within a cover media. The steganalyzer uses machine learning for detecting the stego content.
  • 13.
    1 Introduction B.Tech. Dissertation,Information Technology Department, D. D. University Page 2 1.3 Objective and Scope The objective is to develop a steganographic scheme which has large data hiding capacity while it is hidden to the steganalyzer. The challenge is to find a way to minimize the distortion caused due to message embedding which varies statistical features. If they do not change significantly then technique remains undetectable to state of the art: binary and quantitative steganalyzers. The objective of this dissertation is to find optimal steganographic capacity for cover images. To test the security of steganographic scheme testing is done on 2000 grayscale images of both NRCS and Corel database. SPAM feature extractor is used for extracting features from cover and stego images. Ensemble classifier is trained using supervised learning to learn the difference in features of cover and stego images. Lastly, random cover and stego images are given to classifier to check whether the stego images are detected or not. The goal is to embed data in such a way that the classifier is not able to detect the stego image correctly as stego image.
  • 14.
    2 Background Theory B.Tech.Dissertation, Information Technology Department, D. D. University Page 3 2 Background Theory 2.1 Steganography Problem [8] The steganography problem can be defined in the following manner: A sender more commonly known as Alice is the steganographer who wants to send a confidential message to Bob, the receiver. She possesses a source of covers for covert communication and there exists a channel for the communication. This channel is monitored by the attacker or warden who is the steganalyst. He wishes to detect the hidden message and sometimes even decode the message. One solution is to use a channel that the warden is not aware of. However, this approach is not satisfactory because it relies on ignorance of the warden. Other solution is to determine appropriate steganographic capacity for given cover. 2.2 Application of Steganography Steganography is a means of storing information in a way that hides the information’s existence. Steganography can be used to carry out confidential communication by combining it with existing communication methods. Digital Steganography provides vast potential for following [7]:  National security is a major concern for any Government. To protect national security from terrorist organizations, government can use digital steganography techniques.  In today’s competitive world businesses need to protect their trade secrets or new product information from their competitors. Hence, internally in any business they can communicate using steganography in order to avoid leakage of their private information.  Lastly, Steganography can also be used for private communication between two individuals. If a person wants to communicate without being subjected to monitoring systems then digital steganography is a good solution.
  • 15.
    2 Background Theory [B.Tech.Dissertation, Information Technology Department, D. D. University Page 4 2.3 Steganalysis Steganalysis is the countermeasure to steganography. Steganalysis is the practice of detecting hidden message in the stego image constructed using steganographic scheme. To detect the presence of secret message in an image Steganalysis tools are used. These tools track the distortion caused due to data insertion. There are three different types of steganalysis tools: visual, structural and non-structural. Visual Steganalysis attacks analyze images for some distortions which are visible to human vision system. The distortions could be visible in stego image or in LSB plane extracted from the stego image [10]. Structural attacks analyze structural properties of an image to find any anomaly which are introduced by steganography. Structural detectors such as histogram attack, sample pair analysis, RS method and weighted stego can reliably detect presence of stego data and even estimate message length [10]. Non-structural detectors use feature extractors to model cover image and to compute distortion between cover and stego image to detect embedding. A classifier is trained by the feature set from large number of stego and cover images. During training, the classifier learns the differences in features and this learning is used to classify a fresh image into stego or clean image. Non-structural detectors such as subtractive pixel adjacency matrix (SPAM) and spatial rich model (SRM) claim better probability of detection of embedding in a stego image. Feature based on steganalysis techniques use support vector machine (SVM) or ensemble classifiers for supervised learning. SVM is not suitable for any high- dimension feature vector, while this not the case with ensemble classifier but its performance is comparable to SVM [10]. For the steganalysis of proposed steganographic scheme SPAM feature extractor and ensemble classifier is used. SPAM feature extractor gives high-dimension feature vector of 686 features of an image. Therefore, ensemble classifier is used as SVM does not support such high dimension feature vector. Also accuracy of ensemble classifier is greater than SVM because ensemble classifier is an ensemble of classifiers and final output is the majority amongst the votes of each classifier. Details of SPAM feature extractor and ensemble classifier is discussed in section 4.
  • 16.
    2 Background Theory B.Tech.Dissertation, Information Technology Department, D. D. University Page 5 2.4 Performance Measure The performance measures used to evaluate the classifier is confusion matrix. In the field of machine learning, a confusion matrix, also known as a contingency table or an error matrix, is a specific table layout that allows visualization of the performance of an algorithm, typically a supervised learning one (in unsupervised learning it is usually called a matching matrix). Supervised learning is the type of learning in which the training instances are labeled with the correct class. In an unsupervised algorithm the training instances are not labeled. For steganography and steganalysis supervised learning is used as two classes are known: Stego and Non stego. So the classifier is trained using supervised learning to get satisfactory results. Each column of the matrix represents the instances in a predicted class, while each row represents the instances in an actual class. The name stems from the fact that it makes it easy to see if the system is confusing two classes (i.e. commonly mislabeling one as another). Example: If a classification system has been trained to distinguish between stego and non stego images, a confusion matrix will summarize the results of testing the algorithm for further inspection. Assuming a sample of 50 images — 25 non stego and 25 stego images, the resulting confusion matrix could look like the table below: Predicted class Non stego Stego Actual Class Non stego 15 10 Stego 14 11 Table 1: Confusion matrix
  • 17.
    2 Background Theory B.Tech.Dissertation, Information Technology Department, D. D. University Page 6 Here there are two classes: Actual class and predicted class for stego and non stego images. Out of 25 non stego images 15 are correctly predicted as non stego and 10 as stego. Similarly out of 25 stego images 14 are correctly predicted as non stego and 10 as stego. In predictive analytics, a table of confusion (sometimes also called a confusion matrix), is a table with two rows and two columns that reports the number of false positives, false negatives, true positives, and true negatives. This allows more detailed analysis than mere proportion of correct guesses (accuracy). Accuracy is not a reliable metric for the real performance of a classifier, because it will yield misleading results if the data set is unbalanced (that is, when the number of samples in different classes vary greatly). For example, if there were 95 non stego and only 5 stego in the data set, the classifier could easily be biased into classifying all the samples as non stego. The overall accuracy would be 95%, but in practice the classifier would have a 100% recognition rate for the non stego class but a 0% recognition rate for the stego class. Assuming the confusion matrix above, its corresponding table of confusion, for the non stego class, would be: The final table of confusion would contain the average values for all classes combine. 5 true positives (actual non stego that were correctly classified as non stego) 3 false negatives (non stego that were incorrectly marked as stego) 2 false positives (stego that were incorrectly labeled as non stego) 17 true negatives (stego correctly classified as stego)
  • 18.
    3 Literature Review B.Tech.Dissertation, Information Technology Department, D. D. University Page 7 3 Literature Review Saiful Islam, Mangat R Modi and Phalguni Gupta [10] in their paper “Edge-based image steganography” had proposed a steganography which can hide the secret message only in the edges of the cover image. The proposed steganography technique has excellent security against steganalysis attacks. Andrew D. ker, Patrick Bas, Rainer Bohme, Remi Cogranne, Scott Craver, Tomas Filler, Jessica Fridrich and Tomas Pevny[8] in their paper “Moving Steganography and Steganalysis from the Laboratory into the Real World” had raised some of the important questions which have been left unanswered and highlighted some that have been already addressed successfully, for steganography and steganalysis to be used in the real world. Jan Kodovsky, Jessica Fridrich and Vojtech Holub [9] in their paper “Ensemble Classifiers for Steganalysis of Digital Media” had proposed well known machine learning tool - ensemble classifiers. They have argued that ensemble classifiers scale much more favorably w.r.t. the number of training examples and the feature dimensionality with performance comparable to more complex SVMs. Donovan Artz in his paper “Digital Steganography: Hiding Data within Data” [7] had given detailed description of Steganography and Steganalysis highlighting its importance. History, Uses of Steganography, Various Methods for Steganography and Steganalysis were discussed. Neil F Johnson and Sushil Jajodia in their paper “Exploring Steganography: Seeing the Unseen” [11] had discussed image files and how information can be hidden in them. They also discussed results obtained from evaluating available steganographic software.
  • 19.
    3 Literature Review B.Tech.Dissertation, Information Technology Department, D. D. University Page 8 3.1 Literature review summary Table 2: Literature survey summary Author Title Year Method Described Advantage Accura cy Saiful Islam, Mangat R Modi and Phalguni Gupta [10] Edge-based image Steganography 2014 Edge-based image Steganography Robust to visual, structural, Non- structural Attacks 51.1% Andrew ker, Patrick Bas, Rainer Bohme, Remi Cogranne, Scott Craver, Tomas Filler, Jessica Fridich [8] Moving Steganography and Steganalysis from the Laboratory into the Real World 2013 Open problems on Steganography and Steganalysis discussed Hints for Steganograp hy and steganalysis techniques to address real world N/A Jan Kodovsky, Jessica Fridrich, Vojtech Holub [9] Ensemble Classifiers for Steganalysis of Digital Media 2010 Ensemble classifier Better precision and accuracy for high dimensional feature set. Training time is less compared to SVM Better than SVMs. Donovan Artz [7] Digital Steganography 2001 Summary of various methods of digital steganography Highlights use of steganograph y and tools used for same N/A Neil F Johnson and Sushil Jajodia [11] Exploring Steganography: Seeing the unseen. 1998 Hidding message using image files. LSB insertion technique. Easy to implement. Does not change visual and statistical properties N/A
  • 20.
    4 Analysis andFinding B.Tech. Dissertation, Information Technology Department, D. D. University Page 9 4 Analysis and Finding From the literature reviews we got some results that message embedded in noisy or textured regions is not easily detected by the steganalyzers. Also any steganography technique should be such that the embedding does not cause distortion in visual and statistical properties of an image. So we came up with the idea of using a statistical property-Variance of an image as threshold for embedding. 4.1 Definition of Statistical parameters Variance: In statistics, variance measures how far a set of numbers is spread out. A variance of zero indicates that all the values are identical. Variance is always non-negative: a small variance indicates that the data points tend to be very close to the mean (expected value) and hence to each other, while a high variance indicates that the data points are very spread out around the mean and from each other [13]. The variance of a set of n equally likely values can be written as: Variance Var(X) = 1 𝑛 ∑ (𝑥̅𝑛 𝑖=1 – 𝑥𝑖)2 Mean squared error (MSE): In statistics, the mean squared error (MSE) of an estimator measures the average of the square of the errors that is the difference between the estimator and what is estimated [12]. If Y^ is a vector of n predictions, and Y is the vector of the true values, then the (estimated) MSE of the predictor is: MSE = 1 𝑛 ∑ (𝑌𝑖 ^ − 𝑌𝑖)𝑛 𝑖=1 2
  • 21.
    4 Analysis andFinding B.Tech. Dissertation, Information Technology Department, D. D. University Page 10 Peak signal to noise ratio (PSNR): PSNR is an engineering term for the ratio between the maximum power of a signal and the power of corrupting noise that affects the fidelity of its representation [14]. It is usually expressed in terms of the logarithmic decibel scale. PSNR = 10 log10( 𝑀𝐴𝑋2 𝑀𝑆𝐸 ) For gray scale images range of pixels is 0 to 255 PSNR for gray scale images = 10 log10( 2552 𝑀𝑆𝐸 ) Ideally PSNR should be infinite. 4.2 Proposed technique  The proposed technique is a novel steganography technique.  An image of NxN size is divided into nxn sub-blocks.  Variance of each sub block is calculated. Variance of overall image is used as threshold.  Each block with variance greater than or equal to threshold is an eligible candidate for data insertion  Least significant bit (LSB) embedding technique is used for data insertion.  In order to maintain visual and statistical properties of an image, steganographic capacity i.e. number of bits that can be embedded in the cover image is determined for given cover. If more number of bits is embedded than the steganographic capacity of image then it may lead to steganographic detectability.  Variance of textured areas is high. Hence, these areas prove to be better candidate than smoother areas for embedding data because manipulations done in these areas aren’t visible to human eye.
  • 22.
    4 Analysis andFinding B.Tech. Dissertation, Information Technology Department, D. D. University Page 11 Figure 1: Block Diagram of Proposed technique 4.3 Analysis 4.3.1 Effect of block size on embedding capacity and PSNR With increase in block size embedding capacity increase and PSNR decreases Database Block Size Embedding Capacity PSNR NRCS 8x8 6400 70.30 16x16 25600 64.28 32x32 51200 60.51 64x64 81920 59.23 Table 3: Effect of block size on embedding capacity and PSNR
  • 23.
    4 Analysis andFinding B.Tech. Dissertation, Information Technology Department, D. D. University Page 12 Figure 2: Graph of PSNR v/s No of Bits embedded Figure 3: Graph of No of bits embedded v/s size of block
  • 24.
    4 Analysis andFinding B.Tech. Dissertation, Information Technology Department, D. D. University Page 13 4.4 Screenshots Example of LSB bit slicing method which is used in the proposed technique for embedding bits Original Image Stego Image Figure 4: LSB bit slicing technique sample
  • 25.
    4 Analysis andFinding B.Tech. Dissertation, Information Technology Department, D. D. University Page 14 Stego images constructed using proposed steganography technique 1) NRCS database: Figure 5: Snapshot of image from NRCS database
  • 26.
    4 Analysis andFinding B.Tech. Dissertation, Information Technology Department, D. D. University Page 15 2) Corel Database: Figure 6: Snapshot of image from Corel database
  • 27.
    4 Analysis andFinding B.Tech. Dissertation, Information Technology Department, D. D. University Page 16 3) Textured Image: Figure 7: Snapshot of textured image
  • 28.
    4 Analysis andFinding B.Tech. Dissertation, Information Technology Department, D. D. University Page 17 4) Facial Image: Figure 8: Snapshot of facial image
  • 29.
    4 Analysis andFinding B.Tech. Dissertation, Information Technology Department, D. D. University Page 18 4.5 Steganalysis by SPAM  Analysis of these techniques is performed by taking SPAM feature sets from their respective stego images and natural images.  These features are used to train Ensemble classifier to learn the difference in features caused by steganography.  Ensemble classifier is trained by supervised learning- a variable Y=1 is assigned for cover images and Y=0 for stego images.  Testing is performed by taking random samples of cover and stego images.  For binary classification of the testing set, predict method of ensemble classifier is used.  Training and testing is performed on images from following databases: 1) NRCS 2) Corel
  • 30.
    4 Analysis andFinding B.Tech. Dissertation, Information Technology Department, D. D. University Page 19 Figure 9: Block Diagram of Steganalysis 4.5.1 The SPAM features [6] We now explain the Subtractive Pixel Adjacency Model of covers (SPAM) that will be used to compute features for steganalysis. First, the transition probabilities along eight directions are computed. The differences and the transition probability are always computed along the same direction. We explain further calculations only on the horizontal direction as the other directions are obtained in a similar manner. All direction-specific quantities will be denoted by a superscript showing the direction of the calculation. The calculation of features starts by computing the difference array 𝐷. . For a horizontal direction left-to-right 𝐷𝑖,𝑗 → = 𝐼𝑖,𝑗 − 𝐼𝑖,𝑗+1 ,
  • 31.
    4 Analysis andFinding B.Tech. Dissertation, Information Technology Department, D. D. University Page 20 𝑖 ∈ {1, … … . 𝑚}, 𝑗 𝜖{1, … . . 𝑛 − 1}. Order T Dimension 1st 4 162 2nd 3 686 Table 4: 686 Dimension of model is used in our experiments. Column “order” shows the order of the Markov chain and T is the range of differences. The first-order SPAM features, F1st , model the difference arrays D by a first-order Markov process. For the horizontal direction, this leads to 𝑀 𝑢,𝑣 → = Pr( 𝐷𝑖,𝑗+1 → = 𝑢|𝐷𝑖,𝑗 → = 𝑣), where 𝑢, 𝑣 𝜖 {−𝑇, … … … , 𝑇}. If Pr( 𝐷𝑖,𝑗 → = 𝑣) = 0 then 𝑀 𝑢,𝑣 → = Pr( 𝐷𝑖,𝑗+1 → = 𝑢|𝐷𝑖,𝑗 → = 𝑣) = 0. The second-order SPAM features, F2nd , model the difference arrays D by a second-order Markov process. Again, for the horizontal direction, 𝑀 𝑢,𝑣,𝑤 → = Pr( 𝐷𝑖,𝑗+2 → = 𝑢| 𝐷𝑖,𝑗+1 → = 𝑣 , 𝐷𝑖,𝑗 → = 𝑤), Where 𝑢, 𝑣, 𝑤 𝜖{−𝑇, … … . . , 𝑇}. If Pr(𝐷𝑖,𝑗+1 → = 𝑣 , 𝐷𝑖,𝑗 → = 𝑤) = 0 then 𝑀 𝑢,𝑣,𝑤 → = Pr( 𝐷𝑖,𝑗+2 → = 𝑢| 𝐷𝑖,𝑗+1 → = 𝑣 , 𝐷𝑖,𝑗 → = 𝑤) = 0. To decrease the feature dimensionality, we make a plausible assumption that the statistics in natural images are symmetric with respect to mirroring and flipping (the effect of portrait / landscape orientation is negligible). Thus, we separately average the horizontal and vertical matrices and then the diagonal matrices to form the final feature sets, F1st , F2nd . With a slight abuse of notation, this can be formally written: 𝐹1,….,𝑘 . = 1 4 {𝑀. → , 𝑀. ← , 𝑀. ↓ , 𝑀. ↑ } 𝐹𝑘+1,….,2𝑘 . = 1 4 {𝑀. , 𝑀. , 𝑀. , 𝑀. } (1)
  • 32.
    4 Analysis andFinding B.Tech. Dissertation, Information Technology Department, D. D. University Page 21 Where k = (2T + 1)2 for the first-order features and k = (2T + 1)3 for the second- order features. In experiments described in Section 3, we used T = 4 for the first- order features, obtaining thus 2k = 162 features, and T = 3 for the second-order features, leading to 2k = 686 features (c.f., Table 1). To summarize, the SPAM features are formed by the averaged sample Markov transition probability matrices (1) in the range [−T, T]. The dimensionality of the model is determined by the order of the Markov model and the range of differences T). The order of the Markov chain, together with the parameter T, controls the complexity of the model. The concrete choice depends on the application, computational resources, and the number of images available for the classifier training. The calculation of the difference array can be interpreted as high-pass filtering with the kernel [−1, +1], which is, in fact, the simplest edge detector. The filtering suppresses the image content and exposes the stego noise, which results in a higher SNR. The filtering can be also seen as a different form of calibration. From this point of view, it would make sense to use more sophisticated filters with a better SNR. Interestingly, none of the filter provided consistently better performance. We believe that the superior accuracy of the simple filter [−1, +1] is because it does not distort the stego noise as more complex filters do.
  • 33.
    4 Analysis andFinding B.Tech. Dissertation, Information Technology Department, D. D. University Page 22 4.5.2 Ensemble Classifier An ensemble of classifier is a set of classifiers whose individual decisions are integrated to classify the new samples [1]. The advantage of using this classifier is that the result obtained is more certain, precise and accurate compared to SVM. Numerous methods are used for creating ensemble classifiers [2]: Bagging One of the method is called bagging. In bagging from a training set of size X, X random instances from it are drawn with replacement (i.e. using a uniform distribution). These X instances can be learned, and this process can be repeated several times. The instances drawn will contain some duplicates and some omissions as compared to the original training set since the draw is with replacement. Each cycle through the process results in one classifier. Several such classifiers are constructed and final prediction is made by taking votes of each classifier. Boosting Another method is called boosting. AdaBoost is a practical version of the boosting approach. Boosting is similar in overall structure to bagging. The only difference here is that one keeps track of the performance of the learning algorithm and forces it to concentrate its efforts on instances that have not been correctly learned. Instead of choosing the X training instances randomly using a uniform distribution, one chooses the training instances in such a manner as to favor the instances that have not been accurately learned. After several cycles, the prediction is performed by taking a weighted vote of the predictions of each classifier, with the weights being proportional to each classifier’s accuracy on its training set. Boosting algorithms are considered stronger than bagging on noise free data. However, there are strong empirical indications that bagging is much more robust than boosting in noisy settings.
  • 34.
    5 Implementation Details B.Tech.Dissertation, Information Technology Department, D. D. University Page 23 5 Implementation Details We have chosen Matlab tool for Implementation. The Detail of tool and language is given below: 5.1 What is MATLAB? MATLAB is widely used in all areas of applied mathematics, in education and research at universities, and in the industry. MATLAB stands for MATrix LABoratory and the software is built up around vectors and matrices. This makes the software particularly useful for linear algebra but MATLAB is also a great tool for solving algebraic and differential equations and for numerical integration. MATLAB has powerful graphic tools and can produce nice pictures in both 2D and 3D. It is also a programming language, and is one of the easiest programming languages for writing mathematical programs. MATLAB also has some tool boxes useful for signal processing; image processing, optimization and many other applications will be discussed latter. 5.2 Installation steps 1. Load the DVD into the PC you want to install Matlab onto. The DVD should automatically start the installation program whereby you will see the first splash screen. Press Next… 2. You need to agree to the Mathworks license. Press Next… 3. Choose the ‘Typical’ installation. Press Next… 4. Choose the location of the installation. Press Next… 5. If the location doesn’t exist, you will be prompted to create it. Press Yes… 6. Confirm the installation settings by pressing Install 7. Matlab will now install, this may take several minutes 8. After the installation has completed, you then need to license your install. You need to have the serial number ready. This number can be located on the DVD case. Press Next… 9. Matlab will initially make an internet connection the Mathworks prior to you entering the serial number. 10. Answer yes when asked if you are a student. Press Next… 11. Continue with the rest of the registration process until the installation is complete.
  • 35.
    5 Implementation Details B.Tech.Dissertation, Information Technology Department, D. D. University Page 24 5.3 What is an m-file? An m-file, or script file, is a simple text file where you can place MATLAB commands. When the file is run, MATLAB reads the commands and executes them exactly as it would if you had typed each command sequentially at the MATLAB prompt. All m-file names must end with the extension '.m' (e.g. test.m). If you create a new m-file with the same name as an existing m-file, MATLAB will choose the one which appears first in the path order (type help path in the command window for more information). To make life easier, choose a name for your m- file which doesn't already exist. To see if a filename.m already exists, type help filename at the MATLAB prompt. 5.4 Why use m-files? For simple problems, entering your requests at the MATLAB prompt is fast and efficient. However, as the number of commands increases or trial and error is done by changing certain variables or values, typing the commands over and over at the MATLAB prompt becomes tedious. M-files will be helpful and almost necessary in these cases. 5.5 How to run the m-file? After the m-file is saved with the name filename.m in the current MATLAB folder or directory, you can execute the commands in the m-file by simply typing filename at the MATLAB command window prompt. If you don't want to run the whole m-file, you can just copy the part of the m-file that you want to run and paste it at the MATLAB prompt. 5.6 Significant Language Features MATLAB® is a mathematical scripting language that looks very much like C++. Some features of the language are: Efficient matrix and vector computations Easy creation of scientific and engineering graphics Application development, including graphical user interface building Object-oriented programming Extensibility (Tool Boxes) File I/O functions String Processing
  • 36.
    5 Implementation Details B.Tech.Dissertation, Information Technology Department, D. D. University Page 25 5.7 Applications Because of MATLAB®'s numerous matrix and vector computation and manipulation algorithms, the software is primarily used for: Producing solutions to complex systems of equations Modeling, simulation, and prototyping Data analysis, exploration, and visualization 5.8 Experiment Setup 5.8.1 Experiment Setup I Database: NRCS  The stego algorithm of proposed technique is applied on 1500 grayscale images of NRCS database. As a result 1500 stego images are constructed for respective cover images. To extract the features of cover and stego images they are given to SPAM feature extractor.  Thereafter two datasets are created i) Training dataset- contains 686 features of each of 3000 images; 1500-1500 cover and stego pair of images. ii) Testing dataset- contains 686 features of each of 100 images; 50- 50 cover and stego pair of images. Figure 10: Feature Matrix - training set
  • 37.
    5 Implementation Details B.Tech.Dissertation, Information Technology Department, D. D. University Page 26  Ensemble classifier is built using fitensemble method of matlab. Ensemble = fitensemble(X, Y,’Adaboost’,3000,’Tree’) Where X is the training dataset Y is the logical column vector. 0 for stego image and 1 for cover image. Each row of Y represents the classification of the corresponding row of X. AdaBoostM1 is an algorithm used for classification of two classes. 3000 is number of learners to be trained. Tree is a weak learner supported by fitensemble for classification. Output is the Ensemble model. Figure 11: Ensemble classifier structure  The predict property of Ensemble model is used for binary classification of the testing dataset. 5.8.1.1: Results  Each 512x512 image is divided into sub blocks of various sizes: 8x8, 16x16, 32x32 and 64x64. Experiments were carried out for each of these sizes. Table 5 shows the output results for different sub block sizes. Number of bits embedded, bits per pixel and accuracy of classifier varies with each of them.
  • 38.
    5 Implementation Details B.Tech.Dissertation, Information Technology Department, D. D. University Page 27 Block size No of bits embedded Bits per pixel (bpp) Accuracy Detected 8x8 6400 0.003 53% Misclassification 16x16 25600 0.012 73% Misclassification 32x32 51200 0.024 85.2% Misclassification 64x64 81920 0.039 91% Correctly classified Table 5: Result set for NRCS  The classifier gives misclassification till 51200 bits are embedded i.e. the classifier is confused is not able to correctly determine stego and non stego images. After that it starts correctly classifying the stego and non stego images. Therefore, steganographic capacity for NRCS database is 51200 bits i.e. 0.024 bits per pixel (bpp).
  • 39.
    5 Implementation Details B.Tech.Dissertation, Information Technology Department, D. D. University Page 28 5.8.2 Experiment Setup II Database: Corel  The same procedure as discussed above for NRCS database is carried out for gray scale images of Corel database. Only difference here is 2000 images are used. Therefore following changes are to be made.  Training dataset consists of 686 features of each of 3800 images; 1900- 1900 cover and stego image pairs  Testing dataset consists of 686 features of each of 200 images; 100-100 cover and stego image pairs.  Ensemble classifier is built by giving 3800 base learners in fitensemble property. Figure 12: Feature Matrix – training set
  • 40.
    5 Implementation Details B.Tech.Dissertation, Information Technology Department, D. D. University Page 29 Figure 13: Ensemble Output 5.8.2.1: Results  Each 512x512 image is divided into sub blocks of various sizes: 8x8, 16x16, 32x32 and 64x64. Experiments were carried out for each of these sizes. Table 5 shows the output results for different sub block sizes. Number of bits embedded, bits per pixel and accuracy of classifier varies with each of them.
  • 41.
    5 Implementation Details B.Tech.Dissertation, Information Technology Department, D. D. University Page 30 Block Size No of bits embedded Bits per pixel (bpp) Accuracy Detected 8x8 6400 0.003 56% Misclassification 16x16 25600 0.012 75% Misclassification 32x32 51200 0.024 98% Correctly classified 64x64 81920 0.039 100% Correctly classified Table 6: Result set of Corel Database The classifier gives misclassification till 25600 bits are embedded i.e. the classifier is confused is not able to correctly determine stego and non stego images. After that it starts correctly classifying the stego and non stego images. Therefore, steganographic capacity for COREL database is 25600 bits i.e. 0.012 bits per pixel (bpp).
  • 42.
    6 Conclusion andFuture Work B.Tech. Dissertation, Information Technology Department, D. D. University Page 31 6 Conclusion and Future Work From the result sets it can be concluded that the proposed steganographic scheme provides high steganographic capacity compared to other prevailing steganographic schemes and remains robust to the attacks of steganalysts. In order to make the steganographic scheme more secure, randomization technique can be used. In this technique data is embedded in a block randomly chosen inside the sub block of the image. As the candidate block is chosen randomly for data insertion the classifier is not able to detect the presence of stego content. But the steganographic capacity is less compared to that of proposed technique. In future one can work on this problem to increase the steganographic capacity of randomization technique.
  • 43.
    References B.Tech. Dissertation, InformationTechnology Department, D. D. University Page 32 References 1) http://www.cs.iastate.edu/~jtian/cs573/WWW/Lectures/lecture13-Ensemble- 2up.pdf last accessed on 27/03/2015 2) http://en.wikipedia.org/wiki/Ensembles_of_classifiers last accessed on 22/03/2015 3) http://en.wikipedia.org/wiki/Steganography last accessed on 22/03/2015 4) http://en.wikipedia.org/wiki/Steganalysis last accessed on 22/03/2015 5) http://dde.binghamton.edu/download/spam/ last accessed on 27/01/2015 6) Pevny, T. Dept. of Cybern., Czech Tech. Univ. in Prague, Prague, Czech Republic Bas, P. ; Fridrich, J., “Steganalysis by Subtractive Pixel Adjacency Matrix”, Information Forensics and Security, IEEE Transactions on (Volume:5 , Issue: 2 ) “http://ws2.binghamton.edu/fridrich/Research/paper_6_dc.pdf” last accessed on 28/01/2015 7) Artz, D. ; Los Alamos Nat. Lab., NM, USA, “Digital steganography: hiding data within data”, Internet Computing, IEEE (Volume:5 , Issue: 3 ) ”http://ieeexplore.ieee.org/xpl/articleDetails.jsp?reload=true&arnumber=935180” last accessed on 23/12/2014. 8) Andrew ker, Patrick Bas, Rainer Bohme, Remi Cogranne, Scott Craver, Tomas Filler, Jessica Fridich, “Moving Steganography and Steganalysis from the Laboratory into the Real World” DOI: 10.1145/2482513.2482965 Conference: Proceedings of the first ACM workshop on Information hiding and multimedia security http://www.cs.ox.ac.uk/andrew.ker/docs/ADK57B.pdf last accessed on 15/12/2014 9) Kodovsky, J. ; Dept. of Electr. & Comput. Eng., Binghamton Univ., Binghamton, NY, USA ; Fridrich, J. ; Holub, V., “Ensemble Classifiers for Steganalysis of Digital Media” , Information Forensics and Security, IEEE Transactions on (Volume:7 , Issue: 2 ) “http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=6081929&url=http%3A %2F%2Fieeexplore.ieee.org%2Fiel5%2F10206%2F6166811%2F06081929.pdf% 3Farnumber%3D608192” 10) Saiful Islam* , Mangat R Modi and Phalguni Gupta, “Edge-based image steganography”,Islam et al. EURASIP Journal on Information Security 2014, 2014:8 http://jis.eurasipjournals.com/content/pdf/1687-417X-2014-8.pdf
  • 44.
    References B.Tech. Dissertation, InformationTechnology Department, D. D. University Page 33 11) Johnson, N.F. ; Jajodia, S., “Exploring steganography: Seeing the unseen”. Computer (Volume:31, Issue:2 ),pages26–34, http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=4655281&url=http%3A% 2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D465528 12) http://en.wikipedia.org/wiki/Mean_squared_error 13) http://en.wikipedia.org/wiki/Variance 14) http://en.wikipedia.org/wiki/Peak_signal-to-noise_ratiol 15) Madhuri A. Joshi, Mehul S. Raval, Yogesh H. Dandawate, Kalyani R. Joshi, Shilpa P. Metkar, Image and Video Compression: Fundamentals, Techniques and Applications , Chapman and Hall / CRC, FL, US, 17th Nov. 2014, DOI: 10.1201/b17738-1, Print ISBN 978-1-4822-2822-9 eBook ISBN: 978-1-4822- 2823-6.