Audio-based Near-Duplicate Video Retrieval with Audio Similarity Learning | Presentation@ICPR2020

•

0 likes•71 views

In this work, we address the problem of audio-based near-duplicate video retrieval. We propose the Audio Similarity Learning (AuSiL) approach that effectively captures temporal patterns of audio similarity between video pairs. For the robust similarity calculation between two videos, we first extract representative audio-based video descriptors by leveraging transfer learning based on a Convolutional Neural Network (CNN) trained on a large scale dataset of audio events, and then we calculate the similarity matrix derived from the pairwise similarity of these descriptors. The similarity matrix is subsequently fed to a CNN network that captures the temporal structures existing within its content. We train our network following a triplet generation process and optimizing the triplet loss function. To evaluate the effectiveness of the proposed approach, we have manually annotated two publicly available video datasets based on the audio duplicity between their videos. The proposed approach achieves very competitive results compared to three state-of-the-art methods. Also, unlike the competing methods, it is very robust to the retrieval of audio duplicates generated with speed transformations.

Science

Audio-based Near-Duplicate Video
Retrieval with Audio Similarity Learning
Pavlos Avgoustinakis Giorgos Kordopatis-Zilos Symeon Papadopoulos
Andreas L. Symeonidis Ioannis Kompatsiaris

Problem statement
Duplicate Audio Video Retrieval (DAVR)
• Given a video query, search a video database and retrieve videos that
share the same audio content
Query video Retrieved videosVideo database

Motivation
State-of-the-art limitations
• Use of handcrafted approaches for fingerprint extraction
• Rigid aggregation schemes for similarity calculation
• Lack of evaluation benchmarks with user-generated content annotated
based on audio duplicity
Our objective
• Leverage of deep learning and transfer learning
• Composition of evaluation benchmarks

Contribution
• Robust audio-based video similarity calculation
• Transfer learning from a pre-trained CNN
• Audio similarity learning
• Two annotated datasets that serve as benchmarks for DAVR

Feature Extraction
• Generate audio Mel-spectrograms of audio signals
• Divide into overlapping time segments
• Employ a pre-trained CNN (Kumar et al. 2018)
• Max Activation of Convolutions (MAC) on intermediate CNN layers
• Apply PCA whitening and attention-based weighting
A. Kumar, M. Khadkevich, and C. Fugen. “Knowledge transfer from weakly labeled audio using convolutional neural network for sound events and
scenes”. ICASSP, 2018.

Similarity calculation
• Audio Similarity Learning
network
• Four-layer CNN
• Captures the temporal
structures
• Chamfer Similarity
CS(𝑞, 𝑝) =
1
𝑋′ ෍
𝑖=1
𝑋′
max
𝑗∈[1,𝑌′]
Htanh(𝒮 𝜐
𝑞𝑝
(𝑖, 𝑗))
• Generation of the
similarity matrix
𝑆 𝑞𝑝 = 𝑄 ⋅ 𝑃 𝑇

Evaluation datasets
FIVR-200Kα
• FIVR-200K (Kordopatis-Zilos et al., 2019) annotated for Fine-grained Incident
Video Retrieval
• 76 video queries
• 3,392 audio duplicate pairs
SVDα
• SVD (Jiang et al., 2019) annotated for Near-Duplicate Video Retrieval
• 167 video queries
• 1,492 audio duplicate pairs
G. Kordopatis-Zilos, S. Papadopoulos, I. Patras, and I.Kompatsiaris. “Fine-grained Incident Video Retrieval”. IEEE TMM, 2019.
Q. Y. Jiang, Y. He, G. Li, J. Lin, L. Li, and W. J. Li. “SVD: A Large-Scale Short Video Dataset for Near Duplicate Video Retrieval”. ICCV, 2019.

Experimental results
• Duplicate Audio Video Retrieval (DAVR)
FIVR-200Kα SVDα
• Audio speed transformations

Experimental results
• Visual-based video retrieval

Thank you!
Code available in:
https://github.com/mever-team/ausil
With the support of:
Get in touch:
Giorgos Kordopatis-Zilos: georgekordopatis@iti.gr / @g_kordo
MeVer team:
https://mever.iti.gr/web/ / @meverteam

Recently uploaded

Formation of low mass protostars and their circumstellar disksSérgio Sacani

Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani

Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823

Chemistry 4th semester series (krishna).pdfSumit Kumar yadav

VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P

Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174

Green chemistry and Sustainable development.pptxRajatChauhan518211

STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P

Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal

The Philosophy of ScienceUniversity of Hertfordshire

Orientation, design and principles of polyhousejana861314

Animal Communication- Auditory and Visual.pptxUmerFayaz5

9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Low Rate Call Girls In Saket, Delhi NCR

Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1

Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25

CELL -Structural and Functional unit of life.pdfNistarini College, Purulia (W.B) India

Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009

Natural Polymer Based NanomaterialsAArockiyaNisha

Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora

PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani

Recently uploaded (20)

Formation of low mass protostars and their circumstellar disks

Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b

Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...

Chemistry 4th semester series (krishna).pdf

VIRUSES structure and classification ppt by Dr.Prince C P

Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN

Green chemistry and Sustainable development.pptx

STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE

Spermiogenesis or Spermateleosis or metamorphosis of spermatid

The Philosophy of Science

Orientation, design and principles of polyhouse

Animal Communication- Auditory and Visual.pptx

9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service

Recombinant DNA technology (Immunological screening)

Recombination DNA Technology (Nucleic Acid Hybridization )

CELL -Structural and Functional unit of life.pdf

Presentation Vikram Lander by Vedansh Gupta.pptx

Natural Polymer Based Nanomaterials

Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency

PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...

Audio-based Near-Duplicate Video Retrieval with Audio Similarity Learning | Presentation@ICPR2020

1. Audio-based Near-Duplicate Video Retrieval with Audio Similarity Learning Pavlos Avgoustinakis Giorgos Kordopatis-Zilos Symeon Papadopoulos Andreas L. Symeonidis Ioannis Kompatsiaris

2. Problem statement Duplicate Audio Video Retrieval (DAVR) • Given a video query, search a video database and retrieve videos that share the same audio content Query video Retrieved videosVideo database

3. Motivation State-of-the-art limitations • Use of handcrafted approaches for fingerprint extraction • Rigid aggregation schemes for similarity calculation • Lack of evaluation benchmarks with user-generated content annotated based on audio duplicity Our objective • Leverage of deep learning and transfer learning • Composition of evaluation benchmarks

4. Contribution • Robust audio-based video similarity calculation • Transfer learning from a pre-trained CNN • Audio similarity learning • Two annotated datasets that serve as benchmarks for DAVR

5. Feature Extraction • Generate audio Mel-spectrograms of audio signals • Divide into overlapping time segments • Employ a pre-trained CNN (Kumar et al. 2018) • Max Activation of Convolutions (MAC) on intermediate CNN layers • Apply PCA whitening and attention-based weighting A. Kumar, M. Khadkevich, and C. Fugen. “Knowledge transfer from weakly labeled audio using convolutional neural network for sound events and scenes”. ICASSP, 2018.

6. Similarity calculation • Audio Similarity Learning network • Four-layer CNN • Captures the temporal structures • Chamfer Similarity CS(𝑞, 𝑝) = 1 𝑋′ ෍ 𝑖=1 𝑋′ max 𝑗∈[1,𝑌′] Htanh(𝒮 𝜐 𝑞𝑝 (𝑖, 𝑗)) • Generation of the similarity matrix 𝑆 𝑞𝑝 = 𝑄 ⋅ 𝑃 𝑇

7. Evaluation datasets FIVR-200Kα • FIVR-200K (Kordopatis-Zilos et al., 2019) annotated for Fine-grained Incident Video Retrieval • 76 video queries • 3,392 audio duplicate pairs SVDα • SVD (Jiang et al., 2019) annotated for Near-Duplicate Video Retrieval • 167 video queries • 1,492 audio duplicate pairs G. Kordopatis-Zilos, S. Papadopoulos, I. Patras, and I.Kompatsiaris. “Fine-grained Incident Video Retrieval”. IEEE TMM, 2019. Q. Y. Jiang, Y. He, G. Li, J. Lin, L. Li, and W. J. Li. “SVD: A Large-Scale Short Video Dataset for Near Duplicate Video Retrieval”. ICCV, 2019.

8. Experimental results • Duplicate Audio Video Retrieval (DAVR) FIVR-200Kα SVDα • Audio speed transformations

9. Experimental results • Visual-based video retrieval

10. Thank you! Code available in: https://github.com/mever-team/ausil With the support of: Get in touch: Giorgos Kordopatis-Zilos: georgekordopatis@iti.gr / @g_kordo MeVer team: https://mever.iti.gr/web/ / @meverteam

Audio-based Near-Duplicate Video Retrieval with Audio Similarity Learning | Presentation@ICPR2020

Recommended

Recommended

More Related Content

Similar to Audio-based Near-Duplicate Video Retrieval with Audio Similarity Learning | Presentation@ICPR2020

Similar to Audio-based Near-Duplicate Video Retrieval with Audio Similarity Learning | Presentation@ICPR2020 (20)

Recently uploaded

Recently uploaded (20)

Audio-based Near-Duplicate Video Retrieval with Audio Similarity Learning | Presentation@ICPR2020