Literature Survey on Interest Points based Watermarking
Upcoming SlideShare
Loading in...5

Literature Survey on Interest Points based Watermarking



A Literature Survey on Interest Points based Watermarking in images and videos.

A Literature Survey on Interest Points based Watermarking in images and videos.



Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds


Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Literature Survey on Interest Points based Watermarking Literature Survey on Interest Points based Watermarking Document Transcript

  • Multimedia System (CS 569) Literature Survey On SIFT based Video Watermarking Group Members Priyatham Bollimapalli 10010148 Pydi Peddigari Venkat Sai 10010149 Pasumarthi Venkata Sai Dileep 10010180
  • Contents 1. Interest Point Detectors 1.1. Motivation 1.2. Harris Corner Detector 1.3. Scale Invariant Feature Point Detector 1.4. Harris 3D 1.5. n-SIFT 1.6. MoSIFT 1.7. Discussion 2. An Overview of Watermarking Techniques 2.1. Introduction 2.2. Digital Watermarking 2.3. Classifications of Watermarking 3. Application of SIFT in Image Watermarking 3.1. Introduction 3.2. Local Invariant Features 3.3. Watermarking Scheme 3.4. Other related works 4. Video Watermarking 4.1. Introduction 4.2. Applications of watermarking video content 4.3. Challenges in video watermarking 4.3.1. Various nonhostile video processings 4.3.2. Resilience against collusion 4.3.3. Real-time watermarking 4.4. The major trends in video watermarking 4.4.1. From still image to video watermarking 4.4.2. Integration of the temporal dimension 4.4.3. Exploiting the video compression formats 4.5. Discussion 5. Application of SIFT in Video Watermarking 6. Bibliography
  • 1. Interest Point Detectors 1.1. Motivation With the widespread distribution of digital information over the World Wide Web (WWW), the protection of intellectual property rights has become increasingly important. The digital information which include still images, video, audio or text can be easily copied without loss of quality and efficiently distributed. Because of easy reproduction, retransmission and even manipulation, it allows a pirate (a person or organization) to violate the copyright of real owner. Digital watermarking is expected to be a perfect tool for protecting the intellectual property rights. The ideal properties of a digital watermark include its imperceptibility and robustness. The watermarked data should retain the quality of the original one as closely as possible. Robustness refers to the ability to detect the watermark after various types of intentional or unintentional alterations (so called attacks). The robustness of the watermark on geometrical attacks is a major problem in the field of watermarking. Even minor geometrical manipulation to the watermarked image dramatically reduces the ability of the watermark detector to detect the watermark. Moreover, due to diversity in the applications and devices used by the consumers today, video streams which adapts to the requirement of various communication channels and user-end display devices are used. These suffer from content adaptation attack where scaling of the resolution quality and frame rate of the video corrupts the data and causes problem to watermarking. Thus, there is a need for identification of stable interest/feature points in videos which are invariant to rotation, scaling, translation, and partial illumination changes. These points can be used as the reference locations for both the watermark embedding and detection process. Feature point detectors are used to extract the feature points. This section describes two of the most popular feature point detectors today, namely Harris corner detector and scale invariant feature point detector. Their extension to videos in the form of 3D Harris, n-SIFT and MoSIFT is discussed. The drawbacks of each technique and the scope for further improvements are also discussed.
  • 1.2. Harris Corner Detector The ubiquitous Harris Corner detector starts with the assumption that the corners are the interest points in any image. Corners are defined as the point at which the direction of the boundary of object changes abruptly. So a corner can be recognized in a window and shifting a window in any direction should give a large change in intensity. The salient features are  The variation of intensity of every pixel E with an analytic expansion about the origin of the shifts using Taylor series expansion is considered.  A circular Gaussian window is considered to weigh the intensity variations in the neighborhood. So the intensity variations closer to the center of the window are assigned higher importance keeping a smooth weighting over the entire window.  Instead of considering the minimum of Ex,y along the direction of shifts, the variation of it with the directions of shifts of window is considered. The intensity variation is expressed in matrix form and R-measure is calculated which is a measure for finding change in intensity in both x and y directions  Corners are detected as the local maxima of R over 8-neighbourhood of a pixel.
  • 1.3. Scale Invariant Feature Point Detector The SIFT detector developed by Lowe involves the following step-by-step process Scale space peak selection: The scale-space representation is a set of images represented at different levels of resolution. Different levels of resolution are created by the convolution of the Gaussian kernel G (σ) with the image I(x1, x2): Is(x1, x2, σ) = G(σ) * I(x1, x2) where ∗ is a the convolution operation in x1 and x2. The variance σ of the Gaussian kernel is referred to as scale parameter. The characteristic scale is a feature relatively independent of the image scale. The characteristic scale can be defined as that at which the result of a differential operator is maximized. Laplacian obtains the highest percentage of correct scale detection. The feature points are detected through a staged filtering approach that identifies stable points in the scale-space. To detect the stable keypoint locations in scale space efficiently, the scale-space extreme in the difference-of Gaussian function (DDG(x1, x2, σ)) convolved with an image is used.. To detect the local maxima and minima of DDG(x1, x2, σ) each point is compared with its 8 neighbors at the same scale, and its 9 neighbors from the upper and lower scale. If this value is the minimum or maximum of all these points then this point is an extreme.
  • Key point localization: When the candidate points are found, the points with a low contrast or poorly localized points are removed by measuring the stability of each feature point at its location and scale. Similar to Harris Corner, this is done using Hessian Matrix and Taylor series expansion. Orientation Assignment: Orientation of each feature point is assigned by considering the local image properties. The keypoint descriptor can then be represented relative to this orientation, achieving invariance to rotation. An orientation histogram is formed from the gradient orientation of sample points within a region (a circular window) around the keypoint. Each sample added to the histogram is weighted by its gradient magnitude and by Gaussian-weighted circular window. Peaks in the orientation histogram correspond to dominant orientation of local gradients. Using this peak and any other local peak within 80% of the height of this peak, a keypoint with that orientation is created. Some points will be assigned with multiple orientations. Key point descriptor: To compute the descriptor, the local gradient data is used to create keypoint descriptors. In order to achieve the rotation invariance the coordinates of the descriptor and the gradient information is rotated to line up with the orientation of the keypoint. The gradient magnitude is weighted by a Gaussian function with variance, which is dependent on keypoint scale. This data is then used to create a set of histograms over a window centered on the keypoint. SIFT uses a set of 16 histograms, aligned in a 4x4 grid, each with 8 orientation bins. This gives a feature vector containing 4x4 x8=128 elements. The figure shows 2x2 array of orientation histograms.
  • The numbers found are stored in a vector. Then the vector is normalized to unit vector to account for contrast changes and to get illumination invariance. For non- linear intensity transforms, each item in unit vector is bound to maximum 0.2 i.e. larger gradients are removed. Now the unit vector is renormalized. Key point matching: The descriptors for the two images which are to be compared are calculated. The nearest neighbor i.e. a key points with minimum Euclidean distance is found. For efficient nearest neighbor matching, SIFT matches the points only if the ratio of distance between best and 2nd best match is more than 0.8 1.4. Harris 3D Harris 3-D space-time interest point detector was developed by Ivan Laptev using 3-D gradient structure along with scale-space representation to find interest points. The video f(x,y,t) is convolved with 3D Gaussian g(  ) to give, L(x,y,t;  ) = g(  ) * f(x,y,t) Now 3D Harris is computed similar to 2-D Harris matrix, to obtain Spatio-temporal interest points are detected as the local maxima of 3-D Harris corner measure defined by H = det() - ktrace3 () The spatial and temporal scales of each interest point are determined as the maxima of scale-normalized Laplacian.
  • 1.5. n-SIFT n-SIFT is a direct extension of SIFT from 2D images to arbitrary nD images. n- SIFT uses nD Gaussian scale-space to find interest points and then describes them using nD gradients in terms of orientation bins as SIFT descriptor does. The method used to find the interest points is exactly similar to SIFT and can be directly understood by looking at the following figures. Scale-Space Pyramid Local Maxima
  • Orientation Assignment n-SIFT creates 25n-3 dimensional feature vector by closely following the descriptor computation steps of SIFT 1. First the gradients in 16n hypercube around the interest points are calculated which are expressed in terms of magnitude and (n-1) orientation direction. The gradient magnitudes are weighted by a Gaussian cantered at the interest point location. 2. (n-1) orientation bin histogram with each voxel gradient magnitude is added to the bin corresponding to its orientation and the bin with highest value will be considered. 3. The hypercube is spitted into 4n sub regions, each of which is described by 8 bins for each if the (n-1) directions. Thus each sub region is represented by 8n-1 bins and in total, 4n 8n-1 = 25n-3 dimensional feature vector is created. 4. Normalize the feature vector as in the case of SIFT.
  • 1.6. MoSIFT The motion descriptor describes both spatial gradient structure as well as the local motion structure. The algorithm is in the following figure. The SIFT points which cross a minimum threshold of optical flow are chosen as the spatio-temporal interest points For spatial dimensions, the SIFT descriptor is computed as described before. For describing local motion, descriptor is computed from the local optical flow, in the same way SFIT descriptor is computed from the image gradients. The local 16 x 16 patch is divided into 16 4 x 4 patches and each of them are described using 8 bin orientation histogram computed from the optical flow. The sixteen 8-bin histogram are concatenated into a 128-bin optical flow histogram to describe the local motion. The descriptor is obtain by concatenating SIFT and optical flow descriptors to obtain 256 bit descriptor
  • 1.7. Discussion Brief critique for all the descriptors is given below. Harris-Corner: The corner points detected by Harris Corner detector are invariant to rotation. But they were susceptible to scaling of images and were dependent on the scale at which the derivatives and hence intensity variations were computed. SIFT: SIFT detector considers local image characteristic and retrieves feature points that are invariant to image rotation, scaling, translation, partly illumination changes and projective transform. The interest points are very robust and can efficiently match the feature points across similar images. Harris 3-D: Harris 3D also does not have a method for descriptor computation. As such the concatenated histogram of gradient descriptor provided by Piotr Dollar is used along with the detector. Out of the interest points detected by Harris 3-D, some spatial points with no motion are also captured by the detector. These points have high gradients in all three dimensions, even though there is no motion. This shows the susceptibility of this technique to intensity variations. This is a problem due to the spatiotemporal interest detectors since they do no explicitly compute the motion between frames and rather go with the gradient magnitude, which is susceptible to such intensity variations between the frames. This problem can be solved to some extent by Gaussian smoothing along spatial and temporal domains. n-SIFT: The feature point detection detects a large number of spatial interest points without motion. The techniques used to remove unstable edges and points due to variations in contrast (like in SIFT) were tried here without any success. Some of such techniques involve thresholding on ratio of all three eigen values of Hessian matrix, or computing 2-D Hessian and using Lowe’s threshold criterion etc. The problem can be solved to some extent by Harris 3-D using decoupled Gaussian convolution which effectively weights the gradient computation and hence handles inter-frame brightness variation. Another problem with n-SIFT is its large memory usage. Since it treats video as 3D image and builds octaves from it, the memory requirement is very huge. When handling 1000 frames of resolution 200 x 300, the algorithm is observed to take 8GB of memory
  • MoSIFT: The motion-SIFT captures multiple similar points with same motion, which is redundant. The algorithm tries to detect points by frame by frame basis and hence the redundancy though useful in getting sufficient number of points might not be efficient in terms of repeatability. This is because optical flow is not temporal invariant. So instead of mere thresholding in terms of optical flow, the local characteristics of optical flow may be utilized to further prune the interest points for stability in temporal domain. Optical flow magnitude of the region around an interest point tends to be similar and hence the descriptor using optical flow structure around the interest point need not be unique. Instead, if the local motion trajectory is encoded in terms of optical low values in time of the interest point in the nearby frames, then the descriptor can be unique since natural motion tends to vary with time. 2. An Overview of Watermarking Techniques 2.1. Introduction In the early days, encryption and control access techniques were used to protect the ownership of media. Recently, the watermark techniques are utilized to keep the copyright of media. Digital contents are spreading rapidly in the world via the internet. It is possible to produce a number of the same one with the original data without any limitation. The current rapid development of new IT technologies for multimedia services has resulted in a strong demand for reliable and secure copyright protection techniques for multimedia data. Digital watermarking is a technique to embed invisible or inaudible data within multimedia contents. Watermarked contents contain a particular data for copyrights. A hidden data is called a watermark, and the format can be an image or any type media. In case of ownership confliction in the process of distribution, digital watermark technique makes it possible to search and extract the ground for ownership.
  • 2.2. Digital Watermarking The principles of watermark embedding & detection: If an original image I and a watermark W are given, the watermarked image I’ is represented as I’ = I + f(I,W) . An optional public or secret key k may be used for this purpose. Generic Watermark insertion Watermark extraction and detection The embedded watermark can be extracted later by many ways. There are some ways which can evaluate the similarity between the original and extracted watermarks. However, mostly used similarity measures are the correlation-based methods. A widely used similarity measure is as follows: To decide whether w and w* match, one may determine, sim(w, w*)>T, where T is some threshold. Main characteristics for a watermarking algorithm:  Invisibility: an embedded watermark is not visible.  Robustness: piracy attack or image processing should not affect the embedded watermark.  Security: a particular watermark signal is related with a special number used embedding and extracting.
  • 2.3. Classifications of Watermarking  On perceptivity: o Visible watermarking o Invisible watermarking  On robustness: o Robust watermarking: the most important factor in dealing with digital watermarking is the robustness. The robustness watermarking is the most common case. o Semi-fragile watermarking: semi-fragile watermark is capable of tolerating some degree of the change to a watermarked image, such as the addition of quantization noise from lossy compression. o Fragile Watermarking: fragile watermark is designed to be easily destroyed if a watermarked image is manipulated in the slightest manner. This watermarking method can be used for the protection and the verification of original contents. 3. Application of SIFT in Image Watermarking 3.1. Introduction The following literature survey is based on robust image watermarking using local invariant features by Lee, Kim, et al. Most previous watermarking algorithms are unable to resist geometric distortions that desynchronize the location where copyright information is inserted and here, a watermarking method that is robust to geometric distortion is proposed. Geometric distortion desynchronizes the location of watermark and hence causes incorrect watermark detection. The use of media contents is a solution for watermark synchronization and this method belongs to that approach. In this case, the location of the watermark is not related to image coordinates, but to image semantics. In content based synchronization methods, the selection of features is a major criterion. It is believed that local image characteristics are more useful than the global ones. As discussed in previous sections, SIFT extracts features by considering the local image properties and is invariant to rotation, scaling, translation, and partial illumination changes.
  • Using SIFT, circular patches invariant to translation and scaling distortions are generated. The watermark is inserted into the circular patches in an additive way in the spatial domain, and the rotation invariance is achieved using the translation property of the polar-mapped circular patches. 3.2. Local Invariant Features As discussed in Section1, the SIFT descriptor extracts features and their properties, such as the location (t1 ,t2), the scale s, and the orientation theta. Modifications for Watermarking: The local features from the SIFT descriptor are not directly applicable to watermarking. Moreover, the SIFT descriptor was originally devised for image- matching applications, so it extracts many features that have dense distribution over the whole image. Hence, the number, distribution, and scale of the features are adjusted and features that are susceptible to watermarks attacks are removed. A circular patch is constructed using only the location (t1, t2) and scale s of extracted SIFT features, as follows: where k is a magnification factor to control the radius of the circular patches. These patches are invariant to image scaling and translation as well as spatial modifications. The distance between adjacent features must also be taken into consideration. If the distance is small, patches will overlap in large areas, and if the distance is large, the number of patches will not be sufficient for the effective insertion of the watermark. The distance D between adjacent features depends on the dimensions of the image and is quantized by the r value as follows: where the width and height of the image are denoted by w and h, respectively. The r value is a constant to control the distance between adjacent features and is set at 16 and 32 in the insertion and detection processes, respectively. 3.3. Watermarking Scheme Watermark Generation: A 2-D rectangular watermark is generated, that follows a Gaussian distribution, using a random number generator. Here, the rectangular watermark is considered to be a polar-mapped watermark and inversely polar-map it to assign the insertion
  • location of the circular patches. Note that the size of circular patches differs, so we should generate a separate circular watermark for each patch. M and N are the dimensions of the rectangle and r is the radius of a circular patch. The circular patch is divided into homocentric regions. To generate the circular watermark, the x- and the y-axis of the rectangular watermark are inversely polar- mapped into the radius and angle directions of the patch. The relation between the coordinates of the rectangular watermark and the circular watermark is represented as follows: where x and y are the rectangular watermark coordinates, ri and theta are the coordinates of the circular watermark, rM is equal to the radius of the patch, and r0 is a fixed fraction of rM. To increase the robustness and invisibility of the inserted watermark, we transform the rectangular watermark to be mapped to only the upper half of the patch, i.e., the y-axis of the rectangular watermark is scaled by the angle of a half circle, not the angle of a full circle. The lower half of the patch is set symmetrically with respect to the upper half. Watermark insertion: This consists of the following steps:  Circular patches are extracted using SIFT descriptors. Watermark is inserted into all the patches of the image to increase the robustness of the scheme  Circular watermark is generated, which is dependent on the radius of the patch, as described above
  •  This is inserted into the spatial domain additively. The insertion of the watermark is represented as the spatial addition between the pixels of images and the pixels of the circular watermark as follows: , where vi and wci denote the pixels of images and of the circular watermark, respectively, and denotes the perceptual mask that controls the insertion strength of the watermark. Watermark detection: This consists of the steps as:  Extracting circular patches using SIFT descriptor. When there are several patches in an image, watermark detection is applied on all the patches  The additive watermarking method in the spatial domain inserts the watermark into the image contents as noise. Therefore, we first apply a Wiener filter to extract this noise by calculating the difference between the watermarked image and its Wiener-filtered image, and then regard that difference as the retrieved watermark.  To measure the similarity between the reference watermark generated during watermark insertion and the retrieved watermark, the retrieved circular watermark should be converted into a rectangular watermark by applying the polar-mapping technique. Considering the fact that the watermark is inserted symmetrically, we take the mean value from the two semi-circular areas. By this mapping, the rotation of circular patches is represented as a translation, and hence we achieve rotation invariance for our watermarking scheme.  As there are several circular patches in an image, and hence, if the watermark is detected from at least one patch, ownership is proved, and not otherwise. As the watermark is inserted into several circular patches, rather than just one, it is highly likely that the proposed scheme will detect the watermark, even after image distortions. This watermarking scheme is robust against geometric distortion attacks as well as signal-processing attacks. Scaling and translation invariance is achieved by extracting circular patches from the SIFT descriptor. Rotation invariance is achieved by using the translation property of the polar mapped circular patches.
  • 3.4. Other related works Hanling,Jie et al, proposed a novel robust image water marking scheme for digital images using local invariant features and Independent Component Analysis(ICA). This method belongs to the blind watermark category, since it uses ICA for detection, which does not need the original image. Framework for Watermark detection It differs in the process that it uses Fast ICA for the watermark extraction. where Iwe is the patch extracted in the detection procedure and K is a random key. Then we obtain three signals and extract the watermark by FastICA. This method is robust against the geometric distortion attacks as well as the signal processing attacks. Another method proposed by Pham, Miyaki, et al, deals with a robust object based watermarking algorithm using the scale invariant features in conjunction with a new data embedding method based on Discrete Cosine Transform (DCT).Watermark is embedded by modifying the DCT coefficients. To detect the hidden information in the object, first the detection of the object region is done by using object matching.
  • Embedding Scheme Detecting Scheme And by calculating the affine parameters, we can geometrically recover the object, and can easily read the hidden message. The results have shown that this method can resist to very strong attacks such as 0.4 scaling, all angle rotation, etc. 4. Video Watermarking 4.1. Introduction There exists a complex trade-off between three parameters in digital watermarking: data payload, fidelity and robustness. The data payload is the amount of information, i.e. the number of bits, that is encoded by the watermark. The fidelity is another property of the watermark: the distortion, that the watermarking process is bound to introduce, should remain imperceptible to a human observer. Finally, the robustness of a water- marking scheme can be seen as the ability of the detector to extract the hidden watermark from some altered watermarked data. The watermarking process is considered as the transmission of a signal through a noisy channel. 4.2. Applications of watermarking video content If the increasing interest concerning digital watermarking during the last decade is most likely due to the increase in concern over copyright protection of digital content. Applications Purpose of the embedded watermark Copy Control Prevent unauthorized copying
  • Broadcast Monitoring Identify the video item being broadcasted Fingerprinting Trace back a malicious user Video Authentication Ensure that original content hasn’t been tampered Copyright Protection Prove ownership Enhanced Video Coding Bringing addition information. eg. error correction Table 1: Video watermarking: applications and associated purpose 4.3. Challenges in video watermarking Watermarking in still images and video is a similar problem, it is not identical. Three challenges for digital video watermarking are  Presence of Non Hostile video processing, which are likely to alter the watermark signal  Resilience to collusion is much more critical in the context of video.  Real-Time is a requirement for video watermarking 4.3.1. Various nonhostile video processings Robustness of digital watermarking has always been evaluated via the survival of the embedded watermark after attacks. Nonhostile refers to the fact that even content provider are likely to process a bit their digital data in order to manage efficiently their resources. Photometric Attacks: This category gathers all the attacks which modify the pixel values in the frames. Data transmission is likely to introduce some noise for example. Similarly, digital to analog and analog to digital conversions introduce some distortions in the video signal. Another common processing is to perform a gamma correction in order to increase the contrast. Conversion of video from a format to another one can also cause this attack. Spatial Desynchronization:
  • Many watermarking algorithms rely on an implicit spatial synchronisation between the embedder and the detector. A pixel at a given location in the frame is assumed to be associated with a given bit of the watermark. Many nonhostile video processings introduce spatial desynchronisation which may result in a drastic loss of performance of a watermarking scheme. The pixel position is susceptible to jitter. In particular, positional jitter occurs for video over poor analog links e.g. broadcasting in a wireless environment. Temporal desynchronisation: Temporal desynchronisation may affect the watermark signal. For example, if the secret key for embedding is different for each frame, simple frame rate modification would make the detection algorithm fail. Since changing frame rate is a quite common processing, watermarks should be designed so that they survive such an operation. Video editing: Cut-and-splice and cut-insert-splice are two very common processings used during video editing. Cut-insert-splice is basically what happens when a commercial is inserted in the middle of a movie. Moreover, transition effects, like fade-and- dissolve or wipe-and-matte, can be used in order to smooth the transition between two scenes of the video. 4.3.2. Resilience against collusion Collusion is a problem that has already been pointed out for still images some time ago. It refers to a set of malicious users who merge their knowledge, i.e. different watermarked data, in order to produce illegal content, i.e. unwater-marked data. Such collusion is successful in two different distinct cases.  Collusion type I: The same watermark is embedded into different copies of different data. The collusion can estimate the watermark from each watermarked data and
  • obtain a refined estimate of the watermark by linear combination, e.g. the average, of the individual estimations. Having a good estimate of the watermark permits to obtain unwatermarked data with a simple subtraction with the watermarked one.  Collusion type II: Different watermarks are embedded into different copies of the same data. The collusion only has to make a linear combination of the different watermarked data, e.g. the average, to produce unwatermarked data. Indeed, generally, averaging different watermarks converges toward zero. Collusion is a very important issue in the context of digital video since there are twice more opportunities to design collusion than with still images. When video is considered, the origin of the collusion can be twofold. 1. Inter-videos collusion: This is the initial origin considered for still images. A set of users have a watermarked version of a video which they gather in order to produce unwatermarked video content. In the context of copyright protection, the same watermark is embedded in different videos and collusion type I is possible. Alternatively, in a fingerprinting application, the watermark will be different for each user and collusion type II can be considered. Inter- videos collusion requires several watermarked videos to produce unwatermarked video content. 2. Intra-video collusion: This is a video-specific origin. Many water-marking algorithms consider a video as a succession of still images. Watermarking video comes then down to watermarking series of still images. Unfortunately this opens new opportunities for collusion. If the same watermark is inserted in each frame, collusion type I can be enforced since different images can be obtained from moving scenes. On the other hand, if alternative water- marks are embedded in each frame, collusion type II becomes a danger in static scenes since they produce similar images. As a result, the water-marked video alone permits to remove the water-mark from the video stream. The main danger is intra-frame collusion i.e. when a watermarked video alone is enough to remove the watermark from the video. It has been shown that both strategies always insert the same watermark in each frame and
  • always insert a different watermark in each frame make collusion attacks conceivable. A basic rule has been denounced so that intra-video collusion is prevented. The watermarks inserted into two different frames of a video should be as similar, in terms of correlation, as the two frames are similar. In other terms,” if two frames look like quite the same, the embedded watermarks should be highly correlated. On the contrary, if two frames are really different, the watermark inserted into those frames should be unalike”. 4.3.3. Real-time watermarking Real-time can be an additional specification for video watermarking. It was not a real concern with still images. In order to meet the real-time requirement, the complexity of the watermarking algorithm should obviously be as low as possible. Moreover, if the watermark can be inserted directly into the compressed stream, this will prevent full decompression and recompression and consequently, it will reduce computational needs. Another way of achieving real-time is to split the computations. The basic idea is to perform intensive computations once for all on the provider side and then simple client-dependent processing on request. This can be seen as some sort of preprocessing 4.4. The major trends in video watermarking The most simple and straightforward algorithm is to consider a video as a succession of still images and to reuse an existing watermarking scheme for still images. Another point of view considers and exploits the additional temporal dimension in order to design new robust video watermarking algorithms. 4.4.1. From still image to video watermarking The first proposed algorithm for video coding was indeed Moving JPEG (M- JPEG), which simply compresses each frame of the video with the image
  • compression standard JPEG. The simplest way of extending a watermarking scheme for still images is to embed the same watermark in the frames of the video at a regular rate. On the detector side, the presence of the watermark is checked in every frame. Differential Energy Watermarks (DEW) was initially designed for still images and has been extended to video by water-marking the I-frames of an MPEG stream. It is based on selectively discarding high frequency DCT coefficients in the compressed data stream. 4.4.2. Integration of the temporal dimension Many researchers have investigated how to reduce the visual impact of the watermark for still image by considering the properties of the Human Visual System (HVS) such as frequency masking, luminance masking and contrast masking. 4.4.3. Exploiting the video compression formats The last trend considers the video data as somedata compressed with a video specific compression standard. Indeed, most of the time, a video is stored in a compressed version in order to spare some storage space. As a result, watermarking methods have been designed, which embed the watermark directly into the compressed video stream. Algorithms are adapted so that the watermark can be directly inserted in the nonzero DCT coefficients of an MPEG video stream. 4.5. Discussion A watermark can be separated into two parts: one for copyright protection and the other for customer fingerprinting. However many challenges have to be taken up. Robustness has to be considered attentively. There are indeed many nonhostile video processings which might alter the watermark signal. It might not even be possible to be immune against all those attacks and detailed constraints has to be defined according to the targeted application. Since collusion is far more critical in the context of video, it must be seriously considered. Finally the real-time constraint has to be met in many applications.
  • 5. Application of SIFT in Video Watermarking This area of research is relatively new and is unexplored to a large extend due to lack of robust and proper interest point detectors. The problems associated with them are already discussed in section 1.7. In the paper titled SIFT features in semi-fragile video watermarks by Stefan Thiemert et. al. SIFT is used to detect manipulations in videos. With the detected SIFT feature points, an authentication message is generated, which is embedded with a robust video watermark. In the verification process a temporal filtering approach is introduced to reduce the distortions caused by content-preserving manipulations. 6. Bibliography 1. Distinctive image features from scale-invariant keypoints by D.G.Lowe 2. A combined corner and edge Detector by Harris, C. and Stephens 3. n-SIFT: n-Dimensional Scale Invariant Feature Transform by Warren Cheung and Ghassan Hamarneh 4. On Space-Time Interest Points by Ivan Laptev 5. MoSIFT: Recognizing Human Actions in Surveillance Videos by Ming-Yu Chen and Alexander Hauptmann 6. A Survey of Watermarking Techniques applied to Multimedia by Sin-Joo Lee, Sung-Hwan Jung 7. Robust image watermarking using local invariant features by Hae-Yeoun Lee and Hyungshin Kim 8. Robust Image Watermarking Using Local Invariant Features and Independent Component Analysis by Zhang Hanling and Liu Jie 9. Gometrically Invariant Object-Based Watermarking using SIFT feature by Viet Quoc PHAM, Takashi MIYAKI, Toshihiko YAMASAKI, Kiyoharu AIZAWA 10.A guide tour of video watermarking by Gwena.el Do.err, Jean-Luc Dugelay 11.SIFT features in semi-fragile video watermarks by Stefan Thiemert, Martin Steinebach