SlideShare a Scribd company logo
1 of 1
Video Copy Detection Using
                                     Inclined Video Tomography and Bag-of-Visual-Words
                                          Hyun-seok Min, Se Min Kim, Wesley De Neve, and Yong Man Ro
                                                                   Image and Video Systems Lab
                                                      Korea Advanced Institute of Science and Technology (KAIST)
                                                                        Daejeon, South Korea
                                     e-mail: ymro@ee.kaist.ac.kr                                                                                       website: http://ivylab.kaist.ac.kr

I. INTRODUCTION                                                                                               III. VIDEO MATCHING USING HISTOGRAMS
                                                                                                              - The dissimilarity between two video clips Vq and Vr:
- BoVW-based approaches can be effectively used for the detection of both
  image and video copies                                                                                                                                   N                             p: the position of the video shot in the
                                                                                                                                   1
   - however, these approaches typically ignore the inherent temporal
                                                                                                                             q
                                                                                                                   D(V , V ) = min
                                                                                                                                p N
                                                                                                                                    r
                                                                                                                                                          ∑               q     r
                                                                                                                                                                 Dshot (S i , S i + p ),    reference video clip at which similarity
                                                                                                                                                                                            measurement starts
     nature of video content                                                                                                                              i =1
                                                                                                                                                                                                                        q N              r        r L
- Conventional video tomography extracts slices from a space-time cube
                                                                                                                                                                                                            q
                                                                                                                                                                                                           V =         Si          V =           Sl
                                                                                                                                                                                                                          i =1                      l =1
  that are parallel to the time axis
   - however, slices that are parallel to the time axis do not take advantage
                                                                                                              - The dissimilarity between two video shots Sq and Sr is measured by
     of spatial information
                                                                                                                making use of the cosine similarity:
- This paper proposes to create a content-based video signature by means                                                                                              M
  of the following two sequential steps                                                                                                                              ∑       q
                                                                                                                                                                            aj     r
                                                                                                                                                                                 ×aj                        M: the number of visual words in the
                                                                                                                                                                                                               vocabulary
   1) extraction of inclined tomography images from the video content                                                                   q     r                      j =1
      - angle of inclination is dependent on the amount of motion                                                         Dshot (S , S ) = 1 -                                                       , aj: the weight of the jth visual word
                                                                                                                                                                 M               M
   2) characterization of the inclined tomography images by means of BoVW                                                                                        ∑ )∑ )
                                                                                                                                                                  (  (   q 2
                                                                                                                                                                        aj               r 2
                                                                                                                                                                                        aj
                                                                                                                                                                 j =1            j =1

II. CREATION OF A VIDEO SIGNATURE BY MEANS OF                                                                IV. EXPERIMENTS
  INCLINED VIDEO TOMOGRAPHY AND BOVW                                              1. Experimental setup
1. Extraction of inclined tomography images                                        - Use of TRECVID 2009 for creating NDVCs and reference video clips
   - To extract inclined tomography images from a video clip V, we first           - Use of 100 query video clips by applying five transformations to 20
     segment V into N space-time cubes such that V = <S1, S2, …, SN>                 video clips randomly selected from the reference video database
   - We subsequently segment each space-time cube into several space-                 - blurring: we blurred frames using a Gaussian kernel with a radius
     time sub-cubes                                                                     of 15;
                                                                                      - picture-in-picture: we inserted a picture with a size that is 30% of
                                            Fv, Fb : number of frames in a space-       the size of the main frame;
                                                     time cube and space-time
                                                     sub-cube                         - change in brightness: we increased the brightness with 40%;
                                            Wv, Wb : width of a space-time cube       - mirroring: we reversed frames from the left to the right;
                                                       and space-time sub-cube        - change in frame rate: we halved the frame rate.
                                                                  Hv, Hb : height of a space-time cube
                                                                           and space-time sub-cube           2. Experimental results
                                                                                                                  1.1                                                                                 1.1
                                                                                                                    1                                                                                   1
                                                                                                                  0.9                                                                                 0.9
       Fig. 1. Segmentation of a space-time cube into space-time sub-cubes.                                       0.8                                                                                 0.8
                                                                                                                  0.7                                                                                 0.7
                                                                                                                                                                                               Precision
                                                                                                         Recall




                                                                                                                  0.6                                                                                 0.6
   - The angle of inclination of the tomography image extracted reflects the                                      0.5                                                                                 0.5
     intensity of motion in the space-time sub-cube under consideration                                           0.4                                                                                 0.4
                                                                                                                  0.3                                                                                 0.3
                                                                                                                  0.2                                                                                 0.2
                                                                                                                  0.1                                                                                 0.1
                                                                                                                    0                                                                                   0
                                                                                                                            blur      pattern change in mirroring frame rate      average                       blur      pattern change in mirroring frame rate   average
                                                                                                                                     insertion brightness           change                                               insertion brightness           change
                                                                                                                                                   Transformations                                                                     Transformations

                                                                                                                        Proposed video signature    BoVW using SIFT       Video tomography                  Proposed video signature    BoVW using SIFT   Video tomography



                                                                                                                              Fig. 4. Comparison of the effectiveness of several video signatures.




   Fig. 2. Extraction of an inclined tomography image from a space-time sub-cube.
                                                             L(x, y, t): the luminance value of a
           β
   θ=
      Wb × H b × Fb
                         ∑L( x, y, f + 1) - L( x, y, f ) ,              pixel (x, y) of a particular
                                                                        frame at time t
                      ( x, y , f )
                                                             β: a weight parameter

2. BoVW applied to inclined tomography images
   - each space-time cube Si can be represented as a vector Ai that
     summarizes how the space-time sub-cubes are distributed over the
     vocabulary of visual words used                                                                                        (a)                                       (b)
                                                        M: the number of visual words vj in the        Fig. 5. Example images: (a) example key frame and (b) 16 inclined tomography images
        A i = ai,1 , ai ,2 ,...,ai ,M ,                    vocabulary used
                                                                                                                             extracted from the key frame shown in (a).
                                                        ai,j: the weight of the jth visual word
                                                                                                       V. CONCLUSIONS
                                                                                                         - This paper introduced a novel video signature that takes advantage of
                                                                                                           both inclined video tomography and BoVW
                                                                                                         - The proposed video signature is able to capture both spatial and
                                                                                                           temporal information
                                                                                                            - the angle of inclination of the extracted tomography images is
Fig. 3. Extraction of a histogram of visual words from an inclined tomography image.                          dependent on the amount of motion in the local volumes

                                     IEEE International Conference on Multimedia and Expo (ICME), July 2012, Melbourne (Australia)

More Related Content

More from Wesley De Neve

More from Wesley De Neve (20)

Towards diagnosis of rotator cuff tears in 3-D MRI using 3-D convolutional ne...
Towards diagnosis of rotator cuff tears in 3-D MRI using 3-D convolutional ne...Towards diagnosis of rotator cuff tears in 3-D MRI using 3-D convolutional ne...
Towards diagnosis of rotator cuff tears in 3-D MRI using 3-D convolutional ne...
 
Investigating the biological relevance in trained embedding representations o...
Investigating the biological relevance in trained embedding representations o...Investigating the biological relevance in trained embedding representations o...
Investigating the biological relevance in trained embedding representations o...
 
Impact of adversarial examples on deep learning models for biomedical image s...
Impact of adversarial examples on deep learning models for biomedical image s...Impact of adversarial examples on deep learning models for biomedical image s...
Impact of adversarial examples on deep learning models for biomedical image s...
 
Learning Biologically Relevant Features Using Convolutional Neural Networks f...
Learning Biologically Relevant Features Using Convolutional Neural Networks f...Learning Biologically Relevant Features Using Convolutional Neural Networks f...
Learning Biologically Relevant Features Using Convolutional Neural Networks f...
 
The 5th Aslla Symposium
The 5th Aslla SymposiumThe 5th Aslla Symposium
The 5th Aslla Symposium
 
Ghent University Global Campus 101
Ghent University Global Campus 101Ghent University Global Campus 101
Ghent University Global Campus 101
 
Booklet for the First GUGC Research Symposium
Booklet for the First GUGC Research SymposiumBooklet for the First GUGC Research Symposium
Booklet for the First GUGC Research Symposium
 
Center for Biotech Data Science at Ghent University Global Campus
Center for Biotech Data Science at Ghent University Global CampusCenter for Biotech Data Science at Ghent University Global Campus
Center for Biotech Data Science at Ghent University Global Campus
 
Center for Biotech Data Science at Ghent University Global Campus
Center for Biotech Data Science at Ghent University Global CampusCenter for Biotech Data Science at Ghent University Global Campus
Center for Biotech Data Science at Ghent University Global Campus
 
Learning biologically relevant features using convolutional neural networks f...
Learning biologically relevant features using convolutional neural networks f...Learning biologically relevant features using convolutional neural networks f...
Learning biologically relevant features using convolutional neural networks f...
 
Towards reading genomic data using deep learning-driven NLP techniques
Towards reading genomic data using deep learning-driven NLP techniquesTowards reading genomic data using deep learning-driven NLP techniques
Towards reading genomic data using deep learning-driven NLP techniques
 
Deep Machine Learning for Making Sense of Biotech Data - From Clean Energy to...
Deep Machine Learning for Making Sense of Biotech Data - From Clean Energy to...Deep Machine Learning for Making Sense of Biotech Data - From Clean Energy to...
Deep Machine Learning for Making Sense of Biotech Data - From Clean Energy to...
 
GUGC Info Session - Informatics and Bioinformatics
GUGC Info Session - Informatics and BioinformaticsGUGC Info Session - Informatics and Bioinformatics
GUGC Info Session - Informatics and Bioinformatics
 
Ghent University Global Campus - Sungkyunkwan University: Workshop on Researc...
Ghent University Global Campus - Sungkyunkwan University: Workshop on Researc...Ghent University Global Campus - Sungkyunkwan University: Workshop on Researc...
Ghent University Global Campus - Sungkyunkwan University: Workshop on Researc...
 
Ghent University and GUGC-K: Overview of Teaching and Research Activities
Ghent University and GUGC-K: Overview of Teaching and Research ActivitiesGhent University and GUGC-K: Overview of Teaching and Research Activities
Ghent University and GUGC-K: Overview of Teaching and Research Activities
 
Biotech Data Science @ GUGC in Korea: Deep Learning for Prediction of Drug-Ta...
Biotech Data Science @ GUGC in Korea: Deep Learning for Prediction of Drug-Ta...Biotech Data Science @ GUGC in Korea: Deep Learning for Prediction of Drug-Ta...
Biotech Data Science @ GUGC in Korea: Deep Learning for Prediction of Drug-Ta...
 
Exploring Deep Machine Learning for Automatic Right Whale Recognition and No...
 Exploring Deep Machine Learning for Automatic Right Whale Recognition and No... Exploring Deep Machine Learning for Automatic Right Whale Recognition and No...
Exploring Deep Machine Learning for Automatic Right Whale Recognition and No...
 
Deep Machine Learning for Automating Biotech Tasks Through Self-Learning Expe...
Deep Machine Learning for Automating Biotech Tasks Through Self-Learning Expe...Deep Machine Learning for Automating Biotech Tasks Through Self-Learning Expe...
Deep Machine Learning for Automating Biotech Tasks Through Self-Learning Expe...
 
Towards using multimedia technology for biological data processing
Towards using multimedia technology for biological data processingTowards using multimedia technology for biological data processing
Towards using multimedia technology for biological data processing
 
Multimedia Lab @ Ghent University - iMinds - Organizational Overview & Outlin...
Multimedia Lab @ Ghent University - iMinds - Organizational Overview & Outlin...Multimedia Lab @ Ghent University - iMinds - Organizational Overview & Outlin...
Multimedia Lab @ Ghent University - iMinds - Organizational Overview & Outlin...
 

Recently uploaded

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Recently uploaded (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 

Video Copy Detection Using Inclined Video Tomography and Bag-of-Visual-Words

  • 1. Video Copy Detection Using Inclined Video Tomography and Bag-of-Visual-Words Hyun-seok Min, Se Min Kim, Wesley De Neve, and Yong Man Ro Image and Video Systems Lab Korea Advanced Institute of Science and Technology (KAIST) Daejeon, South Korea e-mail: ymro@ee.kaist.ac.kr website: http://ivylab.kaist.ac.kr I. INTRODUCTION III. VIDEO MATCHING USING HISTOGRAMS - The dissimilarity between two video clips Vq and Vr: - BoVW-based approaches can be effectively used for the detection of both image and video copies N p: the position of the video shot in the 1 - however, these approaches typically ignore the inherent temporal q D(V , V ) = min p N r ∑ q r Dshot (S i , S i + p ), reference video clip at which similarity measurement starts nature of video content i =1 q N r r L - Conventional video tomography extracts slices from a space-time cube q V = Si V = Sl i =1 l =1 that are parallel to the time axis - however, slices that are parallel to the time axis do not take advantage - The dissimilarity between two video shots Sq and Sr is measured by of spatial information making use of the cosine similarity: - This paper proposes to create a content-based video signature by means M of the following two sequential steps ∑ q aj r ×aj M: the number of visual words in the vocabulary 1) extraction of inclined tomography images from the video content q r j =1 - angle of inclination is dependent on the amount of motion Dshot (S , S ) = 1 - , aj: the weight of the jth visual word M M 2) characterization of the inclined tomography images by means of BoVW ∑ )∑ ) ( ( q 2 aj r 2 aj j =1 j =1 II. CREATION OF A VIDEO SIGNATURE BY MEANS OF IV. EXPERIMENTS INCLINED VIDEO TOMOGRAPHY AND BOVW 1. Experimental setup 1. Extraction of inclined tomography images - Use of TRECVID 2009 for creating NDVCs and reference video clips - To extract inclined tomography images from a video clip V, we first - Use of 100 query video clips by applying five transformations to 20 segment V into N space-time cubes such that V = <S1, S2, …, SN> video clips randomly selected from the reference video database - We subsequently segment each space-time cube into several space- - blurring: we blurred frames using a Gaussian kernel with a radius time sub-cubes of 15; - picture-in-picture: we inserted a picture with a size that is 30% of Fv, Fb : number of frames in a space- the size of the main frame; time cube and space-time sub-cube - change in brightness: we increased the brightness with 40%; Wv, Wb : width of a space-time cube - mirroring: we reversed frames from the left to the right; and space-time sub-cube - change in frame rate: we halved the frame rate. Hv, Hb : height of a space-time cube and space-time sub-cube 2. Experimental results 1.1 1.1 1 1 0.9 0.9 Fig. 1. Segmentation of a space-time cube into space-time sub-cubes. 0.8 0.8 0.7 0.7 Precision Recall 0.6 0.6 - The angle of inclination of the tomography image extracted reflects the 0.5 0.5 intensity of motion in the space-time sub-cube under consideration 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 blur pattern change in mirroring frame rate average blur pattern change in mirroring frame rate average insertion brightness change insertion brightness change Transformations Transformations Proposed video signature BoVW using SIFT Video tomography Proposed video signature BoVW using SIFT Video tomography Fig. 4. Comparison of the effectiveness of several video signatures. Fig. 2. Extraction of an inclined tomography image from a space-time sub-cube. L(x, y, t): the luminance value of a β θ= Wb × H b × Fb ∑L( x, y, f + 1) - L( x, y, f ) , pixel (x, y) of a particular frame at time t ( x, y , f ) β: a weight parameter 2. BoVW applied to inclined tomography images - each space-time cube Si can be represented as a vector Ai that summarizes how the space-time sub-cubes are distributed over the vocabulary of visual words used (a) (b) M: the number of visual words vj in the Fig. 5. Example images: (a) example key frame and (b) 16 inclined tomography images A i = ai,1 , ai ,2 ,...,ai ,M , vocabulary used extracted from the key frame shown in (a). ai,j: the weight of the jth visual word V. CONCLUSIONS - This paper introduced a novel video signature that takes advantage of both inclined video tomography and BoVW - The proposed video signature is able to capture both spatial and temporal information - the angle of inclination of the extracted tomography images is Fig. 3. Extraction of a histogram of visual words from an inclined tomography image. dependent on the amount of motion in the local volumes IEEE International Conference on Multimedia and Expo (ICME), July 2012, Melbourne (Australia)