SlideShare a Scribd company logo
1 of 20
Download to read offline
CERTH at MediaEval 2015
Synchronization of Multi-User Event
Media Task
Konstantinos Apostolidis, Vasileios Mezaris
Information Technologies Institute / Centre for Research and Technology Hellas
Overview
• Aims and objectives
• Proposed approach
• Experimental setup
• Experiments and results
• Conclusions & Future Work
Aims and objectives
• People attending large events collect dozens of photos and
video clips with their smartphones, tablets, cameras.
• Complications:
– Time information may be wrong.
– Geolocation maybe missing.
• SEM Task Challenge: align and present the media galleries of
different users in a consistent way, so as to preserve the
temporal evolution of the event.
Proposed Method
1. Media Similarity Assessment:
Assess media (photos, videos and
audio files) similarity using multiple
similarity measures.
2. Media Similarity Combination:
Combine all similarity measures to
construct a global similarity matrix.
4. Sub-event Clustering: Cluster media to
sub-events using the corrected timestamps,
geolocation information and DCNN features.
3. Temporal Synchronization: Very
similar media define links between
different collections. Create a graph
with galleries as nodes and those links
as edges. Traverse it starting from the
reference gallery.
1. Media Similarity Assessment
• Photo Similarity Measures:
1. Geometric Consistency of Local Features (GC): We check the geometric
consistency of SIFT keypoints for each pair of photos, using geometric
coding. The GC similarity can discover near-duplicate photos.
2. Scene Similarity (S): We calculate the pairwise cosine distances between
the extracted GIST descriptor of each photo. High S similarity indicates
photos captured at similar scenery (indoor, urban, nature).
3. Color Allocation Similarity (CA): We divide each image to three equal, non-
overlapping horizontal strips, and extract the HSV histogram of each. We
calculate the pairwise cosine distances between the concatenation of the
HSV histograms. High CA similarity indicates photos with similar colors.
4. DCNN Concept Scores (DCS): We use the Caffe DCNN and the googleNet
pre-trained model to extract concept scores for photos. We use the
Euclidean distance to calculate pairwise distances between concept scores
vectors of photos. High CSC similarity indicates semantically similar photos.
• Video Similarity Measures:
– We perform Normalized Cross Correlation of video barcodes.
– We construct video barcodes as follows:
1. Media Similarity Assessment
1. Media Similarity Assessment
• Audio Similarity Measures:
– We reduce to sampling rate of audio files to 11kHz.
– We perform Normalized Cross Correlation on the raw audio data.
2. Media Similarity Combination
• We combine the information of similarity measures for
photos, using the following procedure:
– Initially, the similarity O(i, j) of photos i and j is set equal to GC(i,j).
– If S(i,j) > ts and S(i,j) > GC(i,j) then O(i,j) is updated as O(i,j) = S(i,j).
– The same update process is subsequently repeated using CA similarity
and DCS similarity (and the respective tc, td thresholds).
– We weigh each similarity value so that the similarity of photos with
distance of capture locations lower than a m threshold is emphasized.
– The m threshold is calculated by estimating a Gaussian mixture model
of two Gaussian distributions on the histogram of all photo's pairwise
capture location distances. The Gaussian distribution with the lowest
mean (m) presumably signifies photos captured in the same sub-
event.
2. Media Similarity Combination
• All photo similarity measures information is combined into a
single similarity matrix.
• If different media types are present (audio or video) the
respective similarity matrices are normalized and added into
the same similarity matrix.
• Media that:
1. Exhibit similarity above a t threshold,
2. belong to different user galleries,
are treated as potential links between these photo galleries.
• The t threshold is computed empirically from the training
datasets.
3. Temporal Synchronization
• Having identified potential links, we construct a weighted
graph with:
– nodes representing the galleries,
– edges representing the links between galleries,
– weights calculated as the sum of similarities of the media linking the
two galleries.
• We compute the temporal offset of each gallery by traversing
the minimum spanning tree (MST) of the galleries graph.
3. Temporal Synchronization
• Two approaches of traversing the graph:
1. MSTt:
a) Start from the reference gallery node.
b) Select the next node of the MST.
c) Compute the temporal offset of the node as the median of the
capture time differences of the pairs of similar media.
d) Repeat for all nodes in MST.
2. MSTx:
a) Detect fully-connected triplets of nodes and average the offset of the
shortest path with the alternative path in each triplet.
b) Follow the procedure of MSTt method on the averaged offsets.
3. Temporal Synchronization
• MSTt and MSTx traverse methods comparison:
4. Sub-event Clustering
• Two different approaches for sub-event clustering:
1. MPC sub-event clustering approach:
a) Split the media timeline where consecutive photos have temporal
distance above the mean of all temporal distances.
b) Within each cluster, all media with capture locations distance above
m, are split to a different cluster. The process is repeated until no
splitting occurs.
c) Merge neighboring clusters where the maximum intra-cluster
temporal distances of media in each of these clusters are above the
minimum inter-cluster temporal distance.
Intra-cluster distances are the
pairwise distances between all the
media in a cluster.
intra-cluster
inter-cluster
Inter-cluster distances are the
pairwise distances between the media
of a cluster and another. Media #2
Media #3
Media #1
Cluster X
Cluster X Cluster Y
Media #1 of X
Media #2 of X Media #2 of Y
Media #1 of Y
4. Sub-event Clustering
d) Merge neighboring clusters where the maximum intra-cluster
spatial distances of media in each of these clusters are above the
minimum inter-cluster spatial distance.
e) For the neighboring clusters that none of their media have
geolocation information, the merging is continued by checking if the
pairwise inter-cluster concept scores distance is above an empirical
set threshold.
4. Sub-event Clustering
2. APC sub-event clustering approach:
a) Augment the DCNN feature vectors with the normalized time
information.
b) Cluster the media using Affinity Propagation.
Experimental setup
• We selected the Tour De France 2014 (TDF14), The NAMM
Show 2015 (NAMM15) and the Salford Test Shoot (SAL)
datasets to evaluate our method.
• For Temporal Synchronization we used the following
evaluation metrics:
 Precision (percentage of synchronized galleries).
 Accuracy (the average temporal offset calculated over the
synchronized collections).
• For Sub-event clustering we used the F-Score metric.
Experiments and results
Time Synchronization
Evaluation
Sub-event Clustering
Evaluation
Run Precision Accuracy F-Score
NAMM15
dataset
MSTt + APC
0.8333 0.9083
0.241
MSTt + MPC 0.3658
MSTx + APC
0.8333 0.9083
0.241
MSTx + MPC 0.3658
TDF14
dataset
MSTt + APC
0.125 0.8446
0.1134
MSTt + MPC 0.0167
MSTx + APC
0.125 0.8446
0.1134
MSTx + MPC 0.0167
SAL
dataset
MSTt + APC
0.4242 0.9998
0.1229
MSTt + MPC 0.164
MSTx + APC
0.4242 0.9998
0.1229
MSTx + MPC 0.164
Experiments and results
• Our method achieved very good accuracy but only managed
to synchronize only a small number of galleries, specifically in
TDF14 dataset.
• The offset averaging of MSTx method takes place only if the
difference of the two paths is lower than maxDiff threshold.
In our experiments, we set maxDiff = 10, i.e. we perform the
averaging if the two paths' offsets in each triplet difference is
less than 10 seconds. The MSTt and MSTx methods
performed the same because maxDiff was set too low and
allowed only minute adjustments, degenerating the MSTx
method to MSTt.
Conclusions & Future Work
• Better fine-tuning of the algorithm parameters is required to
achieve constant good performance on diverse datasets.
• Extend the algorithm with automatic parameter selection
(e.g. select more links between galleries to improve
precision), experiment with different values of maxDiff
threshold.
• Perform cross correlation on extracted audio and video
features and not directly to raw data.
• Apply a more sophisticated method to combine different
similarity measures.
Thank you for your attention!
Questions?
More information and contact:
Dr. Vasileios Mezaris
bmezaris@iti.gr
http://www.iti.gr/~bmezaris

More Related Content

What's hot

Real-Time Hardware Simulation with Portable Hardware-in-the-Loop (PHIL-Rebooted)
Real-Time Hardware Simulation with Portable Hardware-in-the-Loop (PHIL-Rebooted)Real-Time Hardware Simulation with Portable Hardware-in-the-Loop (PHIL-Rebooted)
Real-Time Hardware Simulation with Portable Hardware-in-the-Loop (PHIL-Rebooted)
Riley Waite
 
Incremental clustering based object tracking in wireless sensor networks
Incremental clustering based object tracking in wireless sensor networksIncremental clustering based object tracking in wireless sensor networks
Incremental clustering based object tracking in wireless sensor networks
Sachin MS
 

What's hot (11)

Time fluid field-based Coordination
Time fluid field-based CoordinationTime fluid field-based Coordination
Time fluid field-based Coordination
 
IMAGE QUALITY OPTIMIZATION USING RSATV
IMAGE QUALITY OPTIMIZATION USING RSATVIMAGE QUALITY OPTIMIZATION USING RSATV
IMAGE QUALITY OPTIMIZATION USING RSATV
 
Infinum Android Talks #04 - Google Maps Android API utility library
Infinum Android Talks #04 - Google Maps Android API utility libraryInfinum Android Talks #04 - Google Maps Android API utility library
Infinum Android Talks #04 - Google Maps Android API utility library
 
Real-Time Hardware Simulation with Portable Hardware-in-the-Loop (PHIL-Rebooted)
Real-Time Hardware Simulation with Portable Hardware-in-the-Loop (PHIL-Rebooted)Real-Time Hardware Simulation with Portable Hardware-in-the-Loop (PHIL-Rebooted)
Real-Time Hardware Simulation with Portable Hardware-in-the-Loop (PHIL-Rebooted)
 
ME_Poster
ME_PosterME_Poster
ME_Poster
 
Marvuglia
MarvugliaMarvuglia
Marvuglia
 
Robust FIR System Identification for Super-Gaussian Noise Based on Hyperbolic...
Robust FIR System Identification for Super-Gaussian Noise Based on Hyperbolic...Robust FIR System Identification for Super-Gaussian Noise Based on Hyperbolic...
Robust FIR System Identification for Super-Gaussian Noise Based on Hyperbolic...
 
Ay33292297
Ay33292297Ay33292297
Ay33292297
 
International Refereed Journal of Engineering and Science (IRJES)
International Refereed Journal of Engineering and Science (IRJES)International Refereed Journal of Engineering and Science (IRJES)
International Refereed Journal of Engineering and Science (IRJES)
 
Incremental clustering based object tracking in wireless sensor networks
Incremental clustering based object tracking in wireless sensor networksIncremental clustering based object tracking in wireless sensor networks
Incremental clustering based object tracking in wireless sensor networks
 
Group01_Project3
Group01_Project3Group01_Project3
Group01_Project3
 

Viewers also liked

Viewers also liked (17)

MediaEval 2016 - HUCVL Predicting Interesting Key Frames with Deep Models
MediaEval 2016 - HUCVL Predicting Interesting Key Frames with Deep ModelsMediaEval 2016 - HUCVL Predicting Interesting Key Frames with Deep Models
MediaEval 2016 - HUCVL Predicting Interesting Key Frames with Deep Models
 
MediaEval 2015 - GTM-UVigo Systems for Person Discovery Task at MediaEval 2015
MediaEval 2015 - GTM-UVigo Systems for Person Discovery Task at MediaEval 2015MediaEval 2015 - GTM-UVigo Systems for Person Discovery Task at MediaEval 2015
MediaEval 2015 - GTM-UVigo Systems for Person Discovery Task at MediaEval 2015
 
MediaEval 2016 - Emotion in Music Task: Lessons Learned
MediaEval 2016 - Emotion in Music Task: Lessons LearnedMediaEval 2016 - Emotion in Music Task: Lessons Learned
MediaEval 2016 - Emotion in Music Task: Lessons Learned
 
Media REVEALr: A social multimedia monitoring and intelligence system for Web...
Media REVEALr: A social multimedia monitoring and intelligence system for Web...Media REVEALr: A social multimedia monitoring and intelligence system for Web...
Media REVEALr: A social multimedia monitoring and intelligence system for Web...
 
MediaEval 2016 - Simula Team @ Context of Experience Task
MediaEval 2016 - Simula Team @ Context of Experience TaskMediaEval 2016 - Simula Team @ Context of Experience Task
MediaEval 2016 - Simula Team @ Context of Experience Task
 
The InVID Plug-in: Web Video Verification on the Browser
The InVID Plug-in: Web Video Verification on the BrowserThe InVID Plug-in: Web Video Verification on the Browser
The InVID Plug-in: Web Video Verification on the Browser
 
MediaEval 2016: LAPI at Predicting Media Interestingness Task
MediaEval 2016: LAPI at Predicting Media Interestingness TaskMediaEval 2016: LAPI at Predicting Media Interestingness Task
MediaEval 2016: LAPI at Predicting Media Interestingness Task
 
MediaEval 2016 - IR Evaluation: Putting the User Back in the Loop
MediaEval 2016 - IR Evaluation: Putting the User Back in the LoopMediaEval 2016 - IR Evaluation: Putting the User Back in the Loop
MediaEval 2016 - IR Evaluation: Putting the User Back in the Loop
 
MediaEval 2016 - Verifying Multimedia Use Task Overview
MediaEval 2016 - Verifying Multimedia Use Task OverviewMediaEval 2016 - Verifying Multimedia Use Task Overview
MediaEval 2016 - Verifying Multimedia Use Task Overview
 
MediaEval 2016 - BUT Zero-Cost Speech Recognition
MediaEval 2016 - BUT Zero-Cost Speech RecognitionMediaEval 2016 - BUT Zero-Cost Speech Recognition
MediaEval 2016 - BUT Zero-Cost Speech Recognition
 
MediaEval 2016 - TUD-MMC Predicting media Interestingness Task
MediaEval 2016 - TUD-MMC Predicting media Interestingness TaskMediaEval 2016 - TUD-MMC Predicting media Interestingness Task
MediaEval 2016 - TUD-MMC Predicting media Interestingness Task
 
MediaEval 2015 - Verifying Multimedia Use at MediaEval 2015
MediaEval 2015 - Verifying Multimedia Use at MediaEval 2015MediaEval 2015 - Verifying Multimedia Use at MediaEval 2015
MediaEval 2015 - Verifying Multimedia Use at MediaEval 2015
 
MediaEval 2016: A Multimodal System for the Verifying Multimedia Use Task
MediaEval 2016: A Multimodal System for the Verifying Multimedia Use TaskMediaEval 2016: A Multimodal System for the Verifying Multimedia Use Task
MediaEval 2016: A Multimodal System for the Verifying Multimedia Use Task
 
Video Retrieval for Multimedia Verification of Breaking News on Social Networks
Video Retrieval for Multimedia Verification  of Breaking News on Social NetworksVideo Retrieval for Multimedia Verification  of Breaking News on Social Networks
Video Retrieval for Multimedia Verification of Breaking News on Social Networks
 
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
 
MediaEval 2016 - Placing Images with Refined Language Models and Similarity S...
MediaEval 2016 - Placing Images with Refined Language Models and Similarity S...MediaEval 2016 - Placing Images with Refined Language Models and Similarity S...
MediaEval 2016 - Placing Images with Refined Language Models and Similarity S...
 
MediaEval 2016 - LAPI @ 2016 Retrieving Diverse Social Images Task: A Pseudo-...
MediaEval 2016 - LAPI @ 2016 Retrieving Diverse Social Images Task: A Pseudo-...MediaEval 2016 - LAPI @ 2016 Retrieving Diverse Social Images Task: A Pseudo-...
MediaEval 2016 - LAPI @ 2016 Retrieving Diverse Social Images Task: A Pseudo-...
 

Similar to MediaEval 2015 - CERTH at MediaEval 2015 Synchronization of Multi-User Event Media Task

final_project_1_2k21cse07.pptx
final_project_1_2k21cse07.pptxfinal_project_1_2k21cse07.pptx
final_project_1_2k21cse07.pptx
shwetabhagat25
 
CenterForDomainSpecificComputing-Poster
CenterForDomainSpecificComputing-PosterCenterForDomainSpecificComputing-Poster
CenterForDomainSpecificComputing-Poster
Yunming Zhang
 
CSA 3702 machine learning module 3
CSA 3702 machine learning module 3CSA 3702 machine learning module 3
CSA 3702 machine learning module 3
Nandhini S
 
Unsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and AssumptionsUnsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and Assumptions
refedey275
 

Similar to MediaEval 2015 - CERTH at MediaEval 2015 Synchronization of Multi-User Event Media Task (20)

final_project_1_2k21cse07.pptx
final_project_1_2k21cse07.pptxfinal_project_1_2k21cse07.pptx
final_project_1_2k21cse07.pptx
 
Multimedia Mining
Multimedia Mining Multimedia Mining
Multimedia Mining
 
Crowd Density Estimation Using Base Line Filtering
Crowd Density Estimation Using Base Line FilteringCrowd Density Estimation Using Base Line Filtering
Crowd Density Estimation Using Base Line Filtering
 
MediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - Poster
MediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - PosterMediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - Poster
MediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - Poster
 
Geotagging Social Media Content with a Refined Language Modelling Approach
Geotagging Social Media Content with a Refined Language Modelling ApproachGeotagging Social Media Content with a Refined Language Modelling Approach
Geotagging Social Media Content with a Refined Language Modelling Approach
 
Geotagging Social Media Content with a Refined Language Modelling Approach
Geotagging Social Media Content with a Refined Language Modelling ApproachGeotagging Social Media Content with a Refined Language Modelling Approach
Geotagging Social Media Content with a Refined Language Modelling Approach
 
CenterForDomainSpecificComputing-Poster
CenterForDomainSpecificComputing-PosterCenterForDomainSpecificComputing-Poster
CenterForDomainSpecificComputing-Poster
 
Paper 153
Paper 153Paper 153
Paper 153
 
CSA 3702 machine learning module 3
CSA 3702 machine learning module 3CSA 3702 machine learning module 3
CSA 3702 machine learning module 3
 
Tele immersion
Tele immersionTele immersion
Tele immersion
 
Deep Convolutional 3D Object Classification from a Single Depth Image and Its...
Deep Convolutional 3D Object Classification from a Single Depth Image and Its...Deep Convolutional 3D Object Classification from a Single Depth Image and Its...
Deep Convolutional 3D Object Classification from a Single Depth Image and Its...
 
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETSFAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
 
slide-171212080528.pptx
slide-171212080528.pptxslide-171212080528.pptx
slide-171212080528.pptx
 
Ay33292297
Ay33292297Ay33292297
Ay33292297
 
People counting in low density video sequences2
People counting in low density video sequences2People counting in low density video sequences2
People counting in low density video sequences2
 
Unsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and AssumptionsUnsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and Assumptions
 
Real Time Object Dectection using machine learning
Real Time Object Dectection using machine learningReal Time Object Dectection using machine learning
Real Time Object Dectection using machine learning
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
 
CLUSTERING HYPERSPECTRAL DATA
CLUSTERING HYPERSPECTRAL DATACLUSTERING HYPERSPECTRAL DATA
CLUSTERING HYPERSPECTRAL DATA
 
U_N.o.1T: A U-Net exploration, in Depth
U_N.o.1T: A U-Net exploration, in DepthU_N.o.1T: A U-Net exploration, in Depth
U_N.o.1T: A U-Net exploration, in Depth
 

More from multimediaeval

Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
multimediaeval
 

More from multimediaeval (20)

Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...
Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...
Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...
 
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...
 
Sports Video Classification: Classification of Strokes in Table Tennis for Me...
Sports Video Classification: Classification of Strokes in Table Tennis for Me...Sports Video Classification: Classification of Strokes in Table Tennis for Me...
Sports Video Classification: Classification of Strokes in Table Tennis for Me...
 
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...
 
Essex-NLIP at MediaEval Predicting Media Memorability 2020 Task
Essex-NLIP at MediaEval Predicting Media Memorability 2020 TaskEssex-NLIP at MediaEval Predicting Media Memorability 2020 Task
Essex-NLIP at MediaEval Predicting Media Memorability 2020 Task
 
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...
 
Fooling an Automatic Image Quality Estimator
Fooling an Automatic Image Quality EstimatorFooling an Automatic Image Quality Estimator
Fooling an Automatic Image Quality Estimator
 
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...
 
Pixel Privacy: Quality Camouflage for Social Images
Pixel Privacy: Quality Camouflage for Social ImagesPixel Privacy: Quality Camouflage for Social Images
Pixel Privacy: Quality Camouflage for Social Images
 
HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching
HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-MatchingHCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching
HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching
 
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
 
HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...
HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...
HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...
 
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...
 
Deep Conditional Adversarial learning for polyp Segmentation
Deep Conditional Adversarial learning for polyp SegmentationDeep Conditional Adversarial learning for polyp Segmentation
Deep Conditional Adversarial learning for polyp Segmentation
 
A Temporal-Spatial Attention Model for Medical Image Detection
A Temporal-Spatial Attention Model for Medical Image DetectionA Temporal-Spatial Attention Model for Medical Image Detection
A Temporal-Spatial Attention Model for Medical Image Detection
 
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...
 
Fine-tuning for Polyp Segmentation with Attention
Fine-tuning for Polyp Segmentation with AttentionFine-tuning for Polyp Segmentation with Attention
Fine-tuning for Polyp Segmentation with Attention
 
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...
 
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...
 
Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...
 Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ... Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...
Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...
 

Recently uploaded

Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPSSpellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
AnaAcapella
 
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
EADTU
 

Recently uploaded (20)

Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
21st_Century_Skills_Framework_Final_Presentation_2.pptx
21st_Century_Skills_Framework_Final_Presentation_2.pptx21st_Century_Skills_Framework_Final_Presentation_2.pptx
21st_Century_Skills_Framework_Final_Presentation_2.pptx
 
OSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsOSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & Systems
 
How to Add a Tool Tip to a Field in Odoo 17
How to Add a Tool Tip to a Field in Odoo 17How to Add a Tool Tip to a Field in Odoo 17
How to Add a Tool Tip to a Field in Odoo 17
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
Simple, Complex, and Compound Sentences Exercises.pdf
Simple, Complex, and Compound Sentences Exercises.pdfSimple, Complex, and Compound Sentences Exercises.pdf
Simple, Complex, and Compound Sentences Exercises.pdf
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
AIM of Education-Teachers Training-2024.ppt
AIM of Education-Teachers Training-2024.pptAIM of Education-Teachers Training-2024.ppt
AIM of Education-Teachers Training-2024.ppt
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
How to Manage Call for Tendor in Odoo 17
How to Manage Call for Tendor in Odoo 17How to Manage Call for Tendor in Odoo 17
How to Manage Call for Tendor in Odoo 17
 
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPSSpellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
Details on CBSE Compartment Exam.pptx1111
Details on CBSE Compartment Exam.pptx1111Details on CBSE Compartment Exam.pptx1111
Details on CBSE Compartment Exam.pptx1111
 
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
 

MediaEval 2015 - CERTH at MediaEval 2015 Synchronization of Multi-User Event Media Task

  • 1. CERTH at MediaEval 2015 Synchronization of Multi-User Event Media Task Konstantinos Apostolidis, Vasileios Mezaris Information Technologies Institute / Centre for Research and Technology Hellas
  • 2. Overview • Aims and objectives • Proposed approach • Experimental setup • Experiments and results • Conclusions & Future Work
  • 3. Aims and objectives • People attending large events collect dozens of photos and video clips with their smartphones, tablets, cameras. • Complications: – Time information may be wrong. – Geolocation maybe missing. • SEM Task Challenge: align and present the media galleries of different users in a consistent way, so as to preserve the temporal evolution of the event.
  • 4. Proposed Method 1. Media Similarity Assessment: Assess media (photos, videos and audio files) similarity using multiple similarity measures. 2. Media Similarity Combination: Combine all similarity measures to construct a global similarity matrix. 4. Sub-event Clustering: Cluster media to sub-events using the corrected timestamps, geolocation information and DCNN features. 3. Temporal Synchronization: Very similar media define links between different collections. Create a graph with galleries as nodes and those links as edges. Traverse it starting from the reference gallery.
  • 5. 1. Media Similarity Assessment • Photo Similarity Measures: 1. Geometric Consistency of Local Features (GC): We check the geometric consistency of SIFT keypoints for each pair of photos, using geometric coding. The GC similarity can discover near-duplicate photos. 2. Scene Similarity (S): We calculate the pairwise cosine distances between the extracted GIST descriptor of each photo. High S similarity indicates photos captured at similar scenery (indoor, urban, nature). 3. Color Allocation Similarity (CA): We divide each image to three equal, non- overlapping horizontal strips, and extract the HSV histogram of each. We calculate the pairwise cosine distances between the concatenation of the HSV histograms. High CA similarity indicates photos with similar colors. 4. DCNN Concept Scores (DCS): We use the Caffe DCNN and the googleNet pre-trained model to extract concept scores for photos. We use the Euclidean distance to calculate pairwise distances between concept scores vectors of photos. High CSC similarity indicates semantically similar photos.
  • 6. • Video Similarity Measures: – We perform Normalized Cross Correlation of video barcodes. – We construct video barcodes as follows: 1. Media Similarity Assessment
  • 7. 1. Media Similarity Assessment • Audio Similarity Measures: – We reduce to sampling rate of audio files to 11kHz. – We perform Normalized Cross Correlation on the raw audio data.
  • 8. 2. Media Similarity Combination • We combine the information of similarity measures for photos, using the following procedure: – Initially, the similarity O(i, j) of photos i and j is set equal to GC(i,j). – If S(i,j) > ts and S(i,j) > GC(i,j) then O(i,j) is updated as O(i,j) = S(i,j). – The same update process is subsequently repeated using CA similarity and DCS similarity (and the respective tc, td thresholds). – We weigh each similarity value so that the similarity of photos with distance of capture locations lower than a m threshold is emphasized. – The m threshold is calculated by estimating a Gaussian mixture model of two Gaussian distributions on the histogram of all photo's pairwise capture location distances. The Gaussian distribution with the lowest mean (m) presumably signifies photos captured in the same sub- event.
  • 9. 2. Media Similarity Combination • All photo similarity measures information is combined into a single similarity matrix. • If different media types are present (audio or video) the respective similarity matrices are normalized and added into the same similarity matrix. • Media that: 1. Exhibit similarity above a t threshold, 2. belong to different user galleries, are treated as potential links between these photo galleries. • The t threshold is computed empirically from the training datasets.
  • 10. 3. Temporal Synchronization • Having identified potential links, we construct a weighted graph with: – nodes representing the galleries, – edges representing the links between galleries, – weights calculated as the sum of similarities of the media linking the two galleries. • We compute the temporal offset of each gallery by traversing the minimum spanning tree (MST) of the galleries graph.
  • 11. 3. Temporal Synchronization • Two approaches of traversing the graph: 1. MSTt: a) Start from the reference gallery node. b) Select the next node of the MST. c) Compute the temporal offset of the node as the median of the capture time differences of the pairs of similar media. d) Repeat for all nodes in MST. 2. MSTx: a) Detect fully-connected triplets of nodes and average the offset of the shortest path with the alternative path in each triplet. b) Follow the procedure of MSTt method on the averaged offsets.
  • 12. 3. Temporal Synchronization • MSTt and MSTx traverse methods comparison:
  • 13. 4. Sub-event Clustering • Two different approaches for sub-event clustering: 1. MPC sub-event clustering approach: a) Split the media timeline where consecutive photos have temporal distance above the mean of all temporal distances. b) Within each cluster, all media with capture locations distance above m, are split to a different cluster. The process is repeated until no splitting occurs. c) Merge neighboring clusters where the maximum intra-cluster temporal distances of media in each of these clusters are above the minimum inter-cluster temporal distance. Intra-cluster distances are the pairwise distances between all the media in a cluster. intra-cluster inter-cluster Inter-cluster distances are the pairwise distances between the media of a cluster and another. Media #2 Media #3 Media #1 Cluster X Cluster X Cluster Y Media #1 of X Media #2 of X Media #2 of Y Media #1 of Y
  • 14. 4. Sub-event Clustering d) Merge neighboring clusters where the maximum intra-cluster spatial distances of media in each of these clusters are above the minimum inter-cluster spatial distance. e) For the neighboring clusters that none of their media have geolocation information, the merging is continued by checking if the pairwise inter-cluster concept scores distance is above an empirical set threshold.
  • 15. 4. Sub-event Clustering 2. APC sub-event clustering approach: a) Augment the DCNN feature vectors with the normalized time information. b) Cluster the media using Affinity Propagation.
  • 16. Experimental setup • We selected the Tour De France 2014 (TDF14), The NAMM Show 2015 (NAMM15) and the Salford Test Shoot (SAL) datasets to evaluate our method. • For Temporal Synchronization we used the following evaluation metrics:  Precision (percentage of synchronized galleries).  Accuracy (the average temporal offset calculated over the synchronized collections). • For Sub-event clustering we used the F-Score metric.
  • 17. Experiments and results Time Synchronization Evaluation Sub-event Clustering Evaluation Run Precision Accuracy F-Score NAMM15 dataset MSTt + APC 0.8333 0.9083 0.241 MSTt + MPC 0.3658 MSTx + APC 0.8333 0.9083 0.241 MSTx + MPC 0.3658 TDF14 dataset MSTt + APC 0.125 0.8446 0.1134 MSTt + MPC 0.0167 MSTx + APC 0.125 0.8446 0.1134 MSTx + MPC 0.0167 SAL dataset MSTt + APC 0.4242 0.9998 0.1229 MSTt + MPC 0.164 MSTx + APC 0.4242 0.9998 0.1229 MSTx + MPC 0.164
  • 18. Experiments and results • Our method achieved very good accuracy but only managed to synchronize only a small number of galleries, specifically in TDF14 dataset. • The offset averaging of MSTx method takes place only if the difference of the two paths is lower than maxDiff threshold. In our experiments, we set maxDiff = 10, i.e. we perform the averaging if the two paths' offsets in each triplet difference is less than 10 seconds. The MSTt and MSTx methods performed the same because maxDiff was set too low and allowed only minute adjustments, degenerating the MSTx method to MSTt.
  • 19. Conclusions & Future Work • Better fine-tuning of the algorithm parameters is required to achieve constant good performance on diverse datasets. • Extend the algorithm with automatic parameter selection (e.g. select more links between galleries to improve precision), experiment with different values of maxDiff threshold. • Perform cross correlation on extracted audio and video features and not directly to raw data. • Apply a more sophisticated method to combine different similarity measures.
  • 20. Thank you for your attention! Questions? More information and contact: Dr. Vasileios Mezaris bmezaris@iti.gr http://www.iti.gr/~bmezaris