Literature Survey on Interest Points based Watermarking

Multimedia System (CS 569)
Literature Survey
On
SIFT based Video Watermarking
Group Members
Priyatham Bollimapalli 10010148
Pydi Peddigari Venkat Sai 10010149
Pasumarthi Venkata Sai Dileep 10010180

Contents
1. Interest Point Detectors
1.1. Motivation
1.2. Harris Corner Detector
1.3. Scale Invariant Feature Point Detector
1.4. Harris 3D
1.5. n-SIFT
1.6. MoSIFT
1.7. Discussion
2. An Overview of Watermarking Techniques
2.1. Introduction
2.2. Digital Watermarking
2.3. Classifications of Watermarking
3. Application of SIFT in Image Watermarking
3.1. Introduction
3.2. Local Invariant Features
3.3. Watermarking Scheme
3.4. Other related works
4. Video Watermarking
4.1. Introduction
4.2. Applications of watermarking video content
4.3. Challenges in video watermarking
4.3.1. Various nonhostile video processings
4.3.2. Resilience against collusion
4.3.3. Real-time watermarking
4.4. The major trends in video watermarking
4.4.1. From still image to video watermarking
4.4.2. Integration of the temporal dimension
4.4.3. Exploiting the video compression formats
4.5. Discussion
5. Application of SIFT in Video Watermarking
6. Bibliography

1. Interest Point Detectors
1.1. Motivation
With the widespread distribution of digital information over the World Wide Web
(WWW), the protection of intellectual property rights has become increasingly
important. The digital information which include still images, video, audio or text
can be easily copied without loss of quality and efficiently distributed. Because of
easy reproduction, retransmission and even manipulation, it allows a pirate (a
person or organization) to violate the copyright of real owner.
Digital watermarking is expected to be a perfect tool for protecting the intellectual
property rights. The ideal properties of a digital watermark include its
imperceptibility and robustness. The watermarked data should retain the quality of
the original one as closely as possible. Robustness refers to the ability to detect the
watermark after various types of intentional or unintentional alterations (so called
attacks).
The robustness of the watermark on geometrical attacks is a major problem in the
field of watermarking. Even minor geometrical manipulation to the watermarked
image dramatically reduces the ability of the watermark detector to detect the
watermark. Moreover, due to diversity in the applications and devices used by the
consumers today, video streams which adapts to the requirement of various
communication channels and user-end display devices are used. These suffer from
content adaptation attack where scaling of the resolution quality and frame rate of
the video corrupts the data and causes problem to watermarking.
Thus, there is a need for identification of stable interest/feature points in videos
which are invariant to rotation, scaling, translation, and partial illumination
changes. These points can be used as the reference locations for both the
watermark embedding and detection process.
Feature point detectors are used to extract the feature points. This section describes
two of the most popular feature point detectors today, namely Harris corner
detector and scale invariant feature point detector. Their extension to videos in the
form of 3D Harris, n-SIFT and MoSIFT is discussed. The drawbacks of each
technique and the scope for further improvements are also discussed.

1.2. Harris Corner Detector
The ubiquitous Harris Corner detector starts with the assumption that the corners
are the interest points in any image. Corners are defined as the point at which the
direction of the boundary of object changes abruptly. So a corner can be
recognized in a window and shifting a window in any direction should give a large
change in intensity.
The salient features are
 The variation of intensity of every pixel E with an analytic expansion about
the origin of the shifts using Taylor series expansion is considered.
 A circular Gaussian window is considered to weigh the intensity variations
in the neighborhood. So the intensity variations closer to the center of the
window are assigned higher importance keeping a smooth weighting over
the entire window.
 Instead of considering the minimum of Ex,y along the direction of shifts, the
variation of it with the directions of shifts of window is considered. The
intensity variation is expressed in matrix form and R-measure is calculated
which is a measure for finding change in intensity in both x and y directions
 Corners are detected as the local maxima of R over 8-neighbourhood of a
pixel.

1.3. Scale Invariant Feature Point Detector
The SIFT detector developed by Lowe involves the following step-by-step process
Scale space peak selection:
The scale-space representation is a set of images represented at different levels of
resolution. Different levels of resolution are created by the convolution of the
Gaussian kernel G (σ) with the image I(x1, x2):
Is(x1, x2, σ) = G(σ) * I(x1, x2)
where ∗ is a the convolution operation in x1 and x2.
The variance σ of the Gaussian kernel is referred to as scale parameter.
The characteristic scale is a feature relatively independent of the image scale. The
characteristic scale can be defined as that at which the result of a differential
operator is maximized. Laplacian obtains the highest percentage of correct scale
detection.
The feature points are detected through a staged filtering approach that identifies
stable points in the scale-space. To detect the stable keypoint locations in scale
space efficiently, the scale-space extreme in the difference-of Gaussian function
(DDG(x1, x2, σ)) convolved with an image is used.. To detect the local maxima and
minima of DDG(x1, x2, σ) each point is compared with its 8 neighbors at the same
scale, and its 9 neighbors from the upper and lower scale. If this value is the
minimum or maximum of all these points then this point is an extreme.

Key point localization:
When the candidate points are found, the points with a low contrast or poorly
localized points are removed by measuring the stability of each feature point at its
location and scale. Similar to Harris Corner, this is done using Hessian Matrix and
Taylor series expansion.
Orientation Assignment:
Orientation of each feature point is assigned by considering the local image
properties. The keypoint descriptor can then be represented relative to this
orientation, achieving invariance to rotation.
An orientation histogram is formed from the gradient orientation of sample points
within a region (a circular window) around the keypoint. Each sample added to the
histogram is weighted by its gradient magnitude and by Gaussian-weighted
circular window. Peaks in the orientation histogram correspond to dominant
orientation of local gradients. Using this peak and any other local peak within 80%
of the height of this peak, a keypoint with that orientation is created. Some points
will be assigned with multiple orientations.
Key point descriptor:
To compute the descriptor, the local gradient data is used to create keypoint
descriptors. In order to achieve the rotation invariance the coordinates of the
descriptor and the gradient information is rotated to line up with the orientation of
the keypoint. The gradient magnitude is weighted by a Gaussian function with
variance, which is dependent on keypoint scale. This data is then used to create a
set of histograms over a window centered on the keypoint. SIFT uses a set of 16
histograms, aligned in a 4x4 grid, each with 8 orientation bins. This gives a feature
vector containing 4x4 x8=128 elements.
The figure shows 2x2 array of orientation histograms.

The numbers found are stored in a vector. Then the vector is normalized to unit
vector to account for contrast changes and to get illumination invariance. For non-
linear intensity transforms, each item in unit vector is bound to maximum 0.2 i.e.
larger gradients are removed. Now the unit vector is renormalized.
Key point matching: The descriptors for the two images which are to be
compared are calculated. The nearest neighbor i.e. a key points with minimum
Euclidean distance is found. For efficient nearest neighbor matching, SIFT
matches the points only if the ratio of distance between best and 2nd best match is
more than 0.8
1.4. Harris 3D
Harris 3-D space-time interest point detector was developed by Ivan Laptev using
3-D gradient structure along with scale-space representation to find interest points.
The video f(x,y,t) is convolved with 3D Gaussian g(

) to give,
L(x,y,t;

) = g(

) * f(x,y,t)
Now 3D Harris is computed similar to 2-D Harris matrix, to obtain
Spatio-temporal interest points are detected as the local maxima of 3-D Harris
corner measure defined by
H = det() - ktrace3
()
The spatial and temporal scales of each interest point are determined as the
maxima of scale-normalized Laplacian.

1.5. n-SIFT
n-SIFT is a direct extension of SIFT from 2D images to arbitrary nD images. n-
SIFT uses nD Gaussian scale-space to find interest points and then describes them
using nD gradients in terms of orientation bins as SIFT descriptor does.
The method used to find the interest points is exactly similar to SIFT and can be
directly understood by looking at the following figures.
Scale-Space Pyramid
Local Maxima

Orientation Assignment
n-SIFT creates 25n-3
dimensional feature vector by closely following the descriptor
computation steps of SIFT
1. First the gradients in 16n
hypercube around the interest points are calculated
which are expressed in terms of magnitude and (n-1) orientation direction. The
gradient magnitudes are weighted by a Gaussian cantered at the interest point
location.
2. (n-1) orientation bin histogram with each voxel gradient magnitude is added to
the bin corresponding to its orientation and the bin with highest value will be
considered.
3. The hypercube is spitted into 4n
sub regions, each of which is described by 8
bins for each if the (n-1) directions. Thus each sub region is represented by 8n-1
bins and in total, 4n
8n-1
= 25n-3
dimensional feature vector is created.
4. Normalize the feature vector as in the case of SIFT.

1.6. MoSIFT
The motion descriptor describes both spatial gradient structure as well as the local
motion structure. The algorithm is in the following figure.
The SIFT points which cross a minimum threshold of optical flow are chosen as
the spatio-temporal interest points
For spatial dimensions, the SIFT descriptor is computed as described before. For
describing local motion, descriptor is computed from the local optical flow, in the
same way SFIT descriptor is computed from the image gradients. The local 16 x 16
patch is divided into 16 4 x 4 patches and each of them are described using 8 bin
orientation histogram computed from the optical flow. The sixteen 8-bin histogram
are concatenated into a 128-bin optical flow histogram to describe the local
motion. The descriptor is obtain by concatenating SIFT and optical flow
descriptors to obtain 256 bit descriptor

1.7. Discussion
Brief critique for all the descriptors is given below.
Harris-Corner: The corner points detected by Harris Corner detector are
invariant to rotation. But they were susceptible to scaling of images and were
dependent on the scale at which the derivatives and hence intensity variations were
computed.
SIFT: SIFT detector considers local image characteristic and retrieves feature
points that are invariant to image rotation, scaling, translation, partly illumination
changes and projective transform. The interest points are very robust and can
efficiently match the feature points across similar images.
Harris 3-D: Harris 3D also does not have a method for descriptor computation. As
such the concatenated histogram of gradient descriptor provided by Piotr Dollar is
used along with the detector. Out of the interest points detected by Harris 3-D,
some spatial points with no motion are also captured by the detector. These points
have high gradients in all three dimensions, even though there is no motion. This
shows the susceptibility of this technique to intensity variations.
This is a problem due to the spatiotemporal interest detectors since they do no
explicitly compute the motion between frames and rather go with the gradient
magnitude, which is susceptible to such intensity variations between the frames.
This problem can be solved to some extent by Gaussian smoothing along spatial
and temporal domains.
n-SIFT: The feature point detection detects a large number of spatial interest
points without motion. The techniques used to remove unstable edges and points
due to variations in contrast (like in SIFT) were tried here without any success.
Some of such techniques involve thresholding on ratio of all three eigen values of
Hessian matrix, or computing 2-D Hessian and using Lowe’s threshold criterion
etc. The problem can be solved to some extent by Harris 3-D using decoupled
Gaussian convolution which effectively weights the gradient computation and
hence handles inter-frame brightness variation.
Another problem with n-SIFT is its large memory usage. Since it treats video as
3D image and builds octaves from it, the memory requirement is very huge. When
handling 1000 frames of resolution 200 x 300, the algorithm is observed to take
8GB of memory

MoSIFT: The motion-SIFT captures multiple similar points with same motion,
which is redundant. The algorithm tries to detect points by frame by frame basis
and hence the redundancy though useful in getting sufficient number of points
might not be efficient in terms of repeatability. This is because optical flow is not
temporal invariant. So instead of mere thresholding in terms of optical flow, the
local characteristics of optical flow may be utilized to further prune the interest
points for stability in temporal domain.
Optical flow magnitude of the region around an interest point tends to be similar
and hence the descriptor using optical flow structure around the interest point need
not be unique. Instead, if the local motion trajectory is encoded in terms of optical
low values in time of the interest point in the nearby frames, then the descriptor
can be unique since natural motion tends to vary with time.
2. An Overview of Watermarking Techniques
2.1. Introduction
In the early days, encryption and control access techniques were used to protect the
ownership of media. Recently, the watermark techniques are utilized to keep the
copyright of media. Digital contents are spreading rapidly in the world via the
internet. It is possible to produce a number of the same one with the original data
without any limitation. The current rapid development of new IT technologies for
multimedia services has resulted in a strong demand for reliable and secure
copyright protection techniques for multimedia data.
Digital watermarking is a technique to embed invisible or inaudible data within
multimedia contents. Watermarked contents contain a particular data for
copyrights. A hidden data is called a watermark, and the format can be an image or
any type media. In case of ownership confliction in the process of distribution,
digital watermark technique makes it possible to search and extract the ground for
ownership.

2.2. Digital Watermarking
The principles of watermark embedding & detection:
If an original image I and a watermark W are given, the watermarked image I’ is
represented as I’ = I + f(I,W) . An optional public or secret key k may be used for
this purpose.
Generic Watermark insertion
Watermark extraction and detection
The embedded watermark can be extracted later by many ways. There are some
ways which can evaluate the similarity between the original and extracted
watermarks. However, mostly used similarity measures are the correlation-based
methods. A widely used similarity measure is as follows:
To decide whether w and w* match, one may determine, sim(w, w*)>T, where T is
some threshold.
Main characteristics for a watermarking algorithm:
 Invisibility: an embedded watermark is not visible.
 Robustness: piracy attack or image processing should not affect the
embedded watermark.
 Security: a particular watermark signal is related with a special number used
embedding and extracting.

2.3. Classifications of Watermarking
 On perceptivity:
o Visible watermarking
o Invisible watermarking
 On robustness:
o Robust watermarking: the most important factor in dealing with
digital watermarking is the robustness. The robustness watermarking
is the most common case.
o Semi-fragile watermarking: semi-fragile watermark is capable of
tolerating some degree of the change to a watermarked image, such as
the addition of quantization noise from lossy compression.
o Fragile Watermarking: fragile watermark is designed to be easily
destroyed if a watermarked image is manipulated in the slightest
manner. This watermarking method can be used for the protection and
the verification of original contents.
3. Application of SIFT in Image Watermarking
3.1. Introduction
The following literature survey is based on robust image watermarking using local
invariant features by Lee, Kim, et al. Most previous watermarking algorithms are
unable to resist geometric distortions that desynchronize the location where
copyright information is inserted and here, a watermarking method that is robust to
geometric distortion is proposed.
Geometric distortion desynchronizes the location of watermark and hence causes
incorrect watermark detection. The use of media contents is a solution for
watermark synchronization and this method belongs to that approach. In this case,
the location of the watermark is not related to image coordinates, but to image
semantics.
In content based synchronization methods, the selection of features is a major
criterion. It is believed that local image characteristics are more useful than the
global ones. As discussed in previous sections, SIFT extracts features by
considering the local image properties and is invariant to rotation, scaling,
translation, and partial illumination changes.

Using SIFT, circular patches invariant to translation and scaling distortions are
generated. The watermark is inserted into the circular patches in an additive way in
the spatial domain, and the rotation invariance is achieved using the translation
property of the polar-mapped circular patches.
3.2. Local Invariant Features
As discussed in Section1, the SIFT descriptor extracts features and their properties,
such as the location (t1 ,t2), the scale s, and the orientation theta.
Modifications for Watermarking:
The local features from the SIFT descriptor are not directly applicable to
watermarking. Moreover, the SIFT descriptor was originally devised for image-
matching applications, so it extracts many features that have dense distribution
over the whole image. Hence, the number, distribution, and scale of the features
are adjusted and features that are susceptible to watermarks attacks are removed.
A circular patch is constructed using only the location (t1, t2) and scale s of
extracted SIFT features, as follows:
where k is a magnification factor to control the radius of the circular patches. These
patches are invariant to image scaling and translation as well as spatial
modifications.
The distance between adjacent features must also be taken into consideration. If the
distance is small, patches will overlap in large areas, and if the distance is large,
the number of patches will not be sufficient for the effective insertion of the
watermark. The distance D between adjacent features depends on the dimensions
of the image and is quantized by the r value as follows:
where the width and height of the image are denoted by w and h,
respectively. The r value is a constant to control the distance between adjacent
features and is set at 16 and 32 in the insertion and detection processes,
respectively.
3.3. Watermarking Scheme
Watermark Generation:
A 2-D rectangular watermark is generated, that follows a Gaussian distribution,
using a random number generator. Here, the rectangular watermark is considered
to be a polar-mapped watermark and inversely polar-map it to assign the insertion

location of the circular patches. Note that the size of circular patches differs, so we
should generate a separate circular watermark for each patch.
M and N are the dimensions of the rectangle and r is the radius of a circular patch.
The circular patch is divided into homocentric regions. To generate the circular
watermark, the x- and the y-axis of the rectangular watermark are inversely polar-
mapped into the radius and angle directions of the patch. The relation between the
coordinates of the rectangular watermark and the circular watermark is represented
as follows:
where x and y are the rectangular watermark coordinates, ri and theta are the
coordinates of the circular watermark, rM is equal to the radius of the patch, and r0
is a fixed fraction of rM.
To increase the robustness and invisibility of the inserted watermark, we transform
the rectangular watermark to be mapped to only the upper half of the patch, i.e., the
y-axis of the rectangular watermark is scaled by the angle of a half circle, not the
angle of a full circle. The lower half of the patch is set symmetrically with respect
to the upper half.
Watermark insertion:
This consists of the following steps:
 Circular patches are extracted using SIFT descriptors. Watermark is inserted
into all the patches of the image to increase the robustness of the scheme
 Circular watermark is generated, which is dependent on the radius of the
patch, as described above

 This is inserted into the spatial domain additively. The insertion of the
watermark is represented as the spatial addition between the pixels of images
and the pixels of the circular watermark as follows:
, where vi and wci denote the pixels of images and of the
circular watermark, respectively, and denotes the perceptual mask that
controls the insertion strength of the watermark.
Watermark detection:
This consists of the steps as:
 Extracting circular patches using SIFT descriptor. When there are several
patches in an image, watermark detection is applied on all the patches
 The additive watermarking method in the spatial domain inserts the
watermark into the image contents as noise. Therefore, we first apply a
Wiener filter to extract this noise by calculating the difference between the
watermarked image and its Wiener-filtered image, and then regard that
difference as the retrieved watermark.
 To measure the similarity between the reference watermark generated during
watermark insertion and the retrieved watermark, the retrieved circular
watermark should be converted into a rectangular watermark by applying the
polar-mapping technique. Considering the fact that the watermark is inserted
symmetrically, we take the mean value from the two semi-circular areas. By
this mapping, the rotation of circular patches is represented as a translation,
and hence we achieve rotation invariance for our watermarking scheme.
 As there are several circular patches in an image, and hence, if the
watermark is detected from at least one patch, ownership is proved, and not
otherwise. As the watermark is inserted into several circular patches, rather
than just one, it is highly likely that the proposed scheme will detect the
watermark, even after image distortions.
This watermarking scheme is robust against geometric distortion attacks as
well as signal-processing attacks. Scaling and translation invariance is achieved by
extracting circular patches from the SIFT descriptor. Rotation invariance is
achieved by using the translation property of the polar mapped circular patches.

3.4. Other related works
Hanling,Jie et al, proposed a novel robust image water marking scheme for digital
images using local invariant features and Independent Component Analysis(ICA).
This method belongs to the blind watermark category, since it uses ICA for
detection, which does not need the original image.
Framework for Watermark detection
It differs in the process that it uses Fast ICA for the watermark extraction.
where Iwe is the patch extracted in the detection procedure and
K is a random key. Then we obtain three signals and extract the watermark by
FastICA. This method is robust against the geometric distortion attacks as well as
the signal processing attacks.
Another method proposed by Pham, Miyaki, et al, deals with a robust object based
watermarking algorithm using the scale invariant features in conjunction with a
new data embedding method based on Discrete Cosine Transform
(DCT).Watermark is embedded by modifying the DCT coefficients. To detect the
hidden information in the object, first the detection of the object region is done by
using object matching.

Embedding Scheme Detecting Scheme
And by calculating the affine parameters, we can geometrically recover the object,
and can easily read the hidden message. The results have shown that this method
can resist to very strong attacks such as 0.4 scaling, all angle rotation, etc.
4. Video Watermarking
4.1. Introduction
There exists a complex trade-off between three parameters in digital watermarking:
data payload, fidelity and robustness. The data payload is the amount of
information, i.e. the number of bits, that is encoded by the watermark. The fidelity
is another property of the watermark: the distortion, that the watermarking process
is bound to introduce, should remain imperceptible to a human observer. Finally,
the robustness of a watermarking scheme can be seen as the ability of the
detector to extract the hidden watermark from some altered watermarked data. The
watermarking process is considered as the transmission of a signal through a noisy
channel.
4.2. Applications of watermarking video content
If the increasing interest concerning digital watermarking during the last decade is
most likely due to the increase in concern over copyright protection of digital
content.
Applications Purpose of the embedded watermark
Copy Control Prevent unauthorized copying

Broadcast Monitoring Identify the video item being broadcasted
Fingerprinting Trace back a malicious user
Video Authentication Ensure that original content hasn’t been tampered
Copyright Protection Prove ownership
Enhanced Video Coding Bringing addition information. eg. error correction
Table 1: Video watermarking: applications and associated purpose
4.3. Challenges in video watermarking
Watermarking in still images and video is a similar problem, it is not identical.
Three challenges for digital video watermarking are
 Presence of Non Hostile video processing, which are likely to alter the
watermark signal
 Resilience to collusion is much more critical in the context of video.
 Real-Time is a requirement for video watermarking
4.3.1. Various nonhostile video processings
Robustness of digital watermarking has always been evaluated via the survival of
the embedded watermark after attacks. Nonhostile refers to the fact that even
content provider are likely to process a bit their digital data in order to manage
efficiently their resources.
Photometric Attacks:
This category gathers all the attacks which modify the pixel values in the frames.
Data transmission is likely to introduce some noise for example. Similarly, digital
to analog and analog to digital conversions introduce some distortions in the
video signal. Another common processing is to perform a gamma correction in
order to increase the contrast. Conversion of video from a format to another one
can also cause this attack.
Spatial Desynchronization:

Many watermarking algorithms rely on an implicit spatial synchronisation between
the embedder and the detector. A pixel at a given location in the frame is assumed
to be associated with a given bit of the watermark. Many nonhostile video
processings introduce spatial desynchronisation which may result in a drastic loss
of performance of a watermarking scheme.
The pixel position is susceptible to jitter. In particular, positional jitter occurs for
video over poor analog links e.g. broadcasting in a wireless environment.
Temporal desynchronisation:
Temporal desynchronisation may affect the watermark signal. For example, if the
secret key for embedding is different for each frame, simple frame rate
modification would make the detection algorithm fail. Since changing frame rate
is a quite common processing, watermarks should be designed so that they survive
such an operation.
Video editing:
Cut-and-splice and cut-insert-splice are two very common processings used during
video editing. Cut-insert-splice is basically what happens when a commercial is
inserted in the middle of a movie. Moreover, transition effects, like fade-and-
dissolve or wipe-and-matte, can be used in order to smooth the transition between
two scenes of the video.
4.3.2. Resilience against collusion
Collusion is a problem that has already been pointed out for still images some time
ago. It refers to a set of malicious users who merge their knowledge, i.e. different
watermarked data, in order to produce illegal content, i.e. unwater-marked data.
Such collusion is successful in two different distinct cases.
 Collusion type I:
The same watermark is embedded into different copies of different data. The
collusion can estimate the watermark from each watermarked data and

obtain a refined estimate of the watermark by linear combination, e.g. the
average, of the individual estimations. Having a good estimate of the
watermark permits to obtain unwatermarked data with a simple subtraction
with the watermarked one.
 Collusion type II:
Different watermarks are embedded into different copies of the same data.
The collusion only has to make a linear combination of the different
watermarked data, e.g. the average, to produce unwatermarked data. Indeed,
generally, averaging different watermarks converges toward zero.
Collusion is a very important issue in the context of digital video since there are
twice more opportunities to design collusion than with still images. When video is
considered, the origin of the collusion can be twofold.
1. Inter-videos collusion: This is the initial origin considered for still images.
A set of users have a watermarked version of a video which they gather in
order to produce unwatermarked video content. In the context of copyright
protection, the same watermark is embedded in different videos and
collusion type I is possible. Alternatively, in a fingerprinting application, the
watermark will be different for each user and collusion type II can be
considered. Inter- videos collusion requires several watermarked videos to
produce unwatermarked video content.
2. Intra-video collusion: This is a video-specific origin. Many water-marking
algorithms consider a video as a succession of still images. Watermarking
video comes then down to watermarking series of still images. Unfortunately
this opens new opportunities for collusion. If the same watermark is inserted
in each frame, collusion type I can be enforced since different images can be
obtained from moving scenes. On the other hand, if alternative watermarks
are embedded in each frame, collusion type II becomes a danger in static
scenes since they produce similar images. As a result, the water-marked
video alone permits to remove the water-mark from the video stream.
The main danger is intra-frame collusion i.e. when a watermarked video
alone is enough to remove the watermark from the video. It has been shown
that both strategies always insert the same watermark in each frame and

always insert a different watermark in each frame make collusion attacks
conceivable.
A basic rule has been denounced so that intra-video collusion is prevented.
The watermarks inserted into two different frames of a video should be as
similar, in terms of correlation, as the two frames are similar. In other
terms,” if two frames look like quite the same, the embedded watermarks
should be highly correlated. On the contrary, if two frames are really
different, the watermark inserted into those frames should be unalike”.
4.3.3. Real-time watermarking
Real-time can be an additional specification for video watermarking. It was not a
real concern with still images. In order to meet the real-time requirement, the
complexity of the watermarking algorithm should obviously be as low as possible.
Moreover, if the watermark can be inserted directly into the compressed stream,
this will prevent full decompression and recompression and consequently, it will
reduce computational needs.
Another way of achieving real-time is to split the computations. The basic idea
is to perform intensive computations once for all on the provider side and then
simple client-dependent processing on request. This can be seen as some sort of
preprocessing
4.4. The major trends in video watermarking
The most simple and straightforward algorithm is to consider a video as a
succession of still images and to reuse an existing watermarking scheme for still
images. Another point of view considers and exploits the additional temporal
dimension in order to design new robust video watermarking algorithms.
4.4.1. From still image to video watermarking
The first proposed algorithm for video coding was indeed Moving JPEG (M-
JPEG), which simply compresses each frame of the video with the image

compression standard JPEG. The simplest way of extending a watermarking
scheme for still images is to embed the same watermark in the frames of the video
at a regular rate. On the detector side, the presence of the watermark is checked in
every frame.
Differential Energy Watermarks (DEW) was initially designed for still images
and has been extended to video by water-marking the I-frames of an MPEG
stream. It is based on selectively discarding high frequency DCT coefficients in the
compressed data stream.
4.4.2. Integration of the temporal dimension
Many researchers have investigated how to reduce the visual impact of the
watermark for still image by considering the properties of the Human Visual
System (HVS) such as frequency masking, luminance masking and contrast
masking.
4.4.3. Exploiting the video compression formats
The last trend considers the video data as somedata compressed with a video
specific compression standard. Indeed, most of the time, a video is stored in a
compressed version in order to spare some storage space. As a result,
watermarking methods have been designed, which embed the watermark directly
into the compressed video stream. Algorithms are adapted so that the watermark
can be directly inserted in the nonzero DCT coefficients of an MPEG video stream.
4.5. Discussion
A watermark can be separated into two parts: one for copyright protection and the
other for customer fingerprinting. However many challenges have to be taken up.
Robustness has to be considered attentively. There are indeed many nonhostile
video processings which might alter the watermark signal. It might not even be
possible to be immune against all those attacks and detailed constraints has to be
defined according to the targeted application. Since collusion is far more critical in
the context of video, it must be seriously considered. Finally the real-time
constraint has to be met in many applications.

5. Application of SIFT in Video Watermarking
This area of research is relatively new and is unexplored to a large extend due to
lack of robust and proper interest point detectors. The problems associated with
them are already discussed in section 1.7.
In the paper titled SIFT features in semi-fragile video watermarks by Stefan
Thiemert et. al. SIFT is used to detect manipulations in videos. With the detected
SIFT feature points, an authentication message is generated, which is embedded
with a robust video watermark. In the verification process a temporal filtering
approach is introduced to reduce the distortions caused by content-preserving
manipulations.
6. Bibliography
1. Distinctive image features from scale-invariant keypoints by D.G.Lowe
2. A combined corner and edge Detector by Harris, C. and Stephens
3. n-SIFT: n-Dimensional Scale Invariant Feature Transform by Warren
Cheung and Ghassan Hamarneh
4. On Space-Time Interest Points by Ivan Laptev
5. MoSIFT: Recognizing Human Actions in Surveillance Videos by Ming-Yu
Chen and Alexander Hauptmann
6. A Survey of Watermarking Techniques applied to Multimedia by Sin-Joo
Lee, Sung-Hwan Jung
7. Robust image watermarking using local invariant features by Hae-Yeoun
Lee and Hyungshin Kim
8. Robust Image Watermarking Using Local Invariant Features and
Independent Component Analysis by Zhang Hanling and Liu Jie
9. Gometrically Invariant Object-Based Watermarking using SIFT feature by
Viet Quoc PHAM, Takashi MIYAKI, Toshihiko YAMASAKI, Kiyoharu
AIZAWA
10.A guide tour of video watermarking by Gwena.el Do.err, Jean-Luc Dugelay
11.SIFT features in semi-fragile video watermarks by Stefan Thiemert, Martin
Steinebach

Literature Survey on Interest Points based Watermarking

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Literature Survey on Interest Points based Watermarking

Similar to Literature Survey on Interest Points based Watermarking (20)

More from Priyatham Bollimpalli

More from Priyatham Bollimpalli (10)

Recently uploaded

Recently uploaded (20)

Literature Survey on Interest Points based Watermarking