What’s Video Fingerprinting
• A video fingerprint is a unique identifier extracted
from video content
– Video fingerprints are often just string of bits,
representing some “signatures” of the video content,
and usually not in fixed length.
– Video fingerprinting refers to the process of extracting
fingerprints from the video content.
– Comparing to watermarking, fingerprinting does not
add to or alter video content.
– Also known as “robust hashing”, “perceptual hashing”,
“content-based copy detection (CBCD)” in research
Human vs. Video Fingerprint
Human Fingerprint Video Fingerprint
Uniquely identify human Uniquely identify video
Physical form Digital form
Pictorial Time-based binary
Identification by Fingerprint
– Largely invariant for the same content under various
types of processing, conversion, and manipulation.
– Distinctly different for different content.
– Low data rate
• Low complexity
– Fast fingerprint generation and matching
Type of Video Signatures
Spatial Temporal Color Transform-D
Signatures Signatures Signatures Signatures
Group of Bins of 3D transforms
frames histograms on GOP
Blocks or Down-
other types of sampled
Variants of Spatial Signatures
– Quantized mean block intensity
– Luminance block patterns ✪
• ordinal ranking of average block intensity
– Differential luminance block patterns ✪
• Centroid of gradient orientations
• Dominant edge orientation
– Corner features (Harris points)
– Scale-space features
Variants of Temporal Signatures
• Temporal luminance patterns
– Ordinal ranking of average frame or block intensity in
a group of frames
• Temporal differential luminance patterns ✪
– Sum of absolute pixel or block difference – quantized
– Block motion vectors – histogram of quantized
• Shot duration sequence
– Level-quantized histogram, e.g., (32, 16, 16)
for Y, U, V, followed by magnitude quantization
on each bin ✪
– Level-quantized histogram, followed by
ordinal ranking of histogram bins by
• Affine transformation resilient
– Polar Fourier transform
– Radon transform ✪
– Singular Value Decomposition
• Energy compaction
– 3D DCT
– 3D Wavelet transform
Which One to Use?
• Spatial signatures, particularly block-based, are the
overall category winner, and most widely used.
• Temporal and color signatures are less robust, but can
be used along with spatial signatures to enhance
• Transform-domain signatures are computationally
expensive and not widely used in practice.
• The weakness of block-based spatial signatures is their
lack of resilience against excessive geometric distortion,
e.g., rotation and cropping.
Challenges of Geometric Distortions
Rotation by 10 degrees Rotation + Cropping
• Video fingerprint using block-based spatial
– Data size: a few hundreds bits per frame or
– Speed: 1/10 playback time (10x RT) or faster
for standard-def video.
• Distance-based ✪
– L1 (Manhattan) or L2 (Euclidean) distance
• For non-binary signatures
• Weights can be assigned when multiple signatures are used
– Hamming Distance
• For binary signatures
– Probabilistic models for common distortion vectors
Complexity of Fingerprint Search
• Exhaustive search has linear complexity, or
– N is the size of reference fingerprint DB, in minutes or
– K is length of the query video.
– N can be further decomposed into M*L
• M is number of reference video fingerprints in DB
• L is the average length of video fingerprints in DB
• The curse is on N or M, the DB size.
Strategies for Fast Search
Strategies Fingerprint Search Motion Vector Search
Reduce search space ✪ LSH
Greedy search Sequential alignment Hierarchical search
Early exit Hamming distance > T SAD > T
Frame down-sampling Block down-sampling
Locality Sensitive Hashing (LSH)
• Consider ε-NNS problem,
– For a query point q, find an approximate point p such
that d(q,p) < (1+ε) d(q,P)
– LSH guarantees p can be found, with high probability,
• Geometric reasoning:
– Close points in space are likely to be close after
hashing (e.g., a projection onto a lower dimensional
– By using multiple hash functions, the probability of
close points falling close is increased
Other Approximation Techniques
• Multi-resolution coarse-to-fine search
– Fine-level search can be terminated (early exit) if
coarse-level search is far off.
– Rank candidates by coarse-level search scores and
take only top N candidates for fine-level search.
• Adaptive hashing – “learning to hashing”
– Hashing is non-deterministic; system is trained to
adapt to identification task and data.
– A substantial reduction in search space.
UGC & P2P – copyright concerns?
• UGC Traffic in 07/2007 (Source: comScore, November 30, 2007)
– 70 million people viewed 2.5 billion videos on YouTube.com (39.4% of total
– 38 million people viewed 360 million videos on MySpace.com (22.6% of total
• P2P Traffic 2007 (Source: iPoque, November 28, 2007)
– Average 50-60% total Internet traffic: 49% in Middle East; 83% in Eastern
– BitTorrent 66.7%, eDonkey 28.6% of total P2P traffic
Video Content Registration
• A reference video fingerprint database is pre-
• Two types of information are stored with video
fingerprint data in the reference database
– Metadata, e.g., title, owner, release date, etc.
– Business rules, e.g., allow, filter, or advertise, possibly
based on certain conditions
• MovieLabs’ Content Recognition Rules (CRR) is an industry
standard interface for expressing and exchanging rules.
• Broadcast monitoring
– Audit TV program and commercial airings
• Contextual Ads (monetization)
– Pair ads with identified content like Google AdSense
• Video asset management
– Content-based IDs identify linkage between edits and
• Content-based video search
– Query by video clip
• Research in video fingerprinting began a decade ago; it had
developed into a technology and been adopted by the industry.
• Different types of signatures are used to form a video fingerprint,
including spatial, temporal, color, and transform-domain signatures.
• Spatial signatures are overall winner judged by multiple criteria, and
widely adopted as primary signatures; temporal and color signatures
can be used as secondary signatures to enhance discriminability.
• Brute-force, exhaustive fingerprint search is an O(K*N) problem.
• Fast approximate algorithms make fingerprint search tractable and
scalable for practical applications.
• Current applications focus on copyright enforcement, other
applications being developed and experimented include contextual
advertising, asset management, and content-based video search.