SlideShare a Scribd company logo
HCMUS at MediaEval 2020: Image-Text Fusion for Automatic
News-Images Re-Matching
Thuc Nguyen-Quang 1,3, Tuan-Duy H. Nguyen1,3, Thang-Long Nguyen-Ho1,3,
Anh-Kiet Duong1,3, Xuan-Nhat Hoang1,3, Vinh-Thuyen Nguyen-Truong1,3,
Hai-Dang Nguyen1,3, Minh-Triet Tran1,2,3
1University of Science, VNU-HCM,
2John von Neumann Institute, VNU-HCM,
3Vietnam National University, Ho Chi Minh city, Vietnam
December 14-15, 2020
T. Nguyen-Quang et al. HCMUS at MediaEval 2020
Outlines
1 Introduction
2 Methods
Metric Learning
Image-Text Matching via Categorization
Image-Text Fusion with Image Captioning and
Contextual Embeddings
Image-Text Fusion with Knowledge Graph-based
Contextual Embeddings
Image-Text Fusion with Knowledge Graph-based
Contextual Embeddings
Graph-based Face-Name Matching
Ensemble
3 Results
4 Conclusion and future works
5 Bibliography
T. Nguyen-Quang et al. HCMUS at MediaEval 2020 December 14-15, 2020 1 / 19
Introduction
T. Nguyen-Quang et al. HCMUS at MediaEval 2020 December 14-15, 2020
Introduction
Introduction
Introduction
Mainly concern fusing cross-modal embedded information extracted as:
Simple set intersection
Deep neural features
Knowledge-graph-enhanced neural features
T. Nguyen-Quang et al. HCMUS at MediaEval 2020 December 14-15, 2020 2 / 19
Introduction
Introduction
M1 Metric Learning
M2 Image-Text Matching via Categorization
M3 Image-Text Fusion with Image Captioning and Contextual Embeddings
M4 Image-Text Fusion with Knowledge Graph-based Contextual Embeddings
M5 Graph-based Face-Name Matching
T. Nguyen-Quang et al. HCMUS at MediaEval 2020 December 14-15, 2020 3 / 19
Methods
T. Nguyen-Quang et al. HCMUS at MediaEval 2020 December 14-15, 2020
Methods
Methods • § Metric Learning
Metric Learning
Using a Triplet Loss model to project embeddings of image-text pairs to bases of
significant similarity.
Title texts are embedded with BERT.
Image embeddings:
Global context embedding: EfficientNet
Local context embedding: Top-k bottom-up-attention objects passed to a
self-attention sequential model.
T. Nguyen-Quang et al. HCMUS at MediaEval 2020 December 14-15, 2020 4 / 19
Methods • § Image-Text Matching via Categorization
Image-Text Matching via Categorization
Categorizing images and texts with two gradient boosting decision trees.
Target categories extracted from URLS:
nrw
kultur
region
panorama
sport
wirtscharft
koeln
ratgerber
politik
unknown
T. Nguyen-Quang et al. HCMUS at MediaEval 2020 December 14-15, 2020 5 / 19
Methods • § Image-Text Matching via Categorization
Image-Text Matching via Categorization
Augment and extract image features with VGG16, InceptionResNetV2, MobileNetV2,
EfficientNetB1-7, Xception, ResNet152V2, NASNetLarge, DenseNet201.
Texts are mapped to BERT and ELECTRA contextual embeddings.
An iterative ranking method that takes into account the order of matched categories:
At the k-th iteration, finds top-k categories for each image and top-k categories
for each article.
For each article: candidate images are ones having top-k categories intersect that
of the article.
Sequentially concatenate k candidate lists, then append the remaining images to
the tail to make the final ranked list.
T. Nguyen-Quang et al. HCMUS at MediaEval 2020 December 14-15, 2020 6 / 19
Methods • § Image-Text Fusion with Image Captioning and Contextual Embeddings
Image-Text Fusion with Image Captioning and Contextual
Embeddings
We hypothesize that the description of the image is semantically similar to the title.
Captioning model consist of three parts:
Image feature extractor: We use EfficientNetTan and Le, EfficientNet:
Rethinking Model Scaling for Convolutional Neural Networks for feature
extraction. The feature has the shape (8, 8, 2048)
Feature encoder: The features pass through fully connected giving a vector
256-dims.
Decoder: To generate the caption, we use Bahdanau attentionBahdanau, Cho,
and Bengio, Neural Machine Translation by Jointly Learning to Align and
Translate and GRU to predict the next word.
T. Nguyen-Quang et al. HCMUS at MediaEval 2020 December 14-15, 2020 7 / 19
Methods • § Image-Text Fusion with Image Captioning and Contextual Embeddings
Image-Text Fusion with Image Captioning and Contextual
Embeddings
To represent the caption and the title as vectors, we use RoBERTa and doc2vec. Then
we compute their similarity via:
Stotal = Swiki + Sapnews + SRoBERTa + (1 − Dfuzzy) + (1 − Dpartial)
with
Swiki, Sapnews, SRoBERTa are cosine similarity of two vectors generated by enwiki dbow,
apnews dbow, RoBERTa, respectively
Dfuzzy , Dpartial are fuzzywuzzy and partial ratios, respectively.
T. Nguyen-Quang et al. HCMUS at MediaEval 2020 December 14-15, 2020 8 / 19
Methods • § Image-Text Fusion with Knowledge Graph-based Contextual Embeddings
Image-Text Fusion with Knowledge Graph-based Contextual
Embeddings
To account for high-level semantics, we exploit BabelNet knowledge graph.
For articles:
Link textual entities from texts to their synsets in the WordNet subset of
BabelNet using EWISER word sense disambiguator.
Use mean of accompanied SenSemBERT+LMMS embeddings corresponds to
these extracted synsets representing the texts
For images:
Use ResNET-L with Asymmetric Loss (ASL) pre-trained on OpenImagesV6 to
extract multi-label from images.
Map concatenated labels to SenSemBERT+LMMS synset embeddings similar to
the texts.
T. Nguyen-Quang et al. HCMUS at MediaEval 2020 December 14-15, 2020 9 / 19
Methods • § Image-Text Fusion with Knowledge Graph-based Contextual Embeddings
Image-Text Fusion with Knowledge Graph-based Contextual
Embeddings
Train a canonical correlation analysis (CCA) on the train set to project cross-modal
embeddings to bases of significant similarity.
Finally, rank all images in the test set using the L2-distance between the transformed
embeddings.
T. Nguyen-Quang et al. HCMUS at MediaEval 2020 December 14-15, 2020 10 / 19
Methods • § Graph-based Face-Name Matching
Graph-based Face-Name Matching
In a lot of instances, the publisher uses a portrait of somebody mentioned in the text.
Person name extraction: We use entity-fishing to automatically extract people’s
name from the text.
Face encoding: We use face recognition open-source library to detect and
represent the face as 128-dims vectors.
We connect each person mentioned in the articles with features extracted from
accompanying faces on the train set.
During testing, we encode the face from the image and aggregate the number of
matched faces connected to the people mentioned in the text.
T. Nguyen-Quang et al. HCMUS at MediaEval 2020 December 14-15, 2020 11 / 19
Methods • § Ensemble
Ensemble
The Ensemble submission combines all described methods, weighting each models
based on their efficiency. As such, the final ranking of a candidate image is:
REnsemble = w1RCaption + w2RTriplet + w3RFace + w4RKG−Fusion.
With
REnsemble, RCaption, RTriplet, RFace, RKG−Fusion are ranks of the image produced by
respective methods.
Weighting factors are empirically chosen to be w1 = w4 = 1, w2 = 0.02 and w3 = 0.25.
T. Nguyen-Quang et al. HCMUS at MediaEval 2020 December 14-15, 2020 12 / 19
Results
T. Nguyen-Quang et al. HCMUS at MediaEval 2020 December 14-15, 2020
Results
Results
Results
Figure: Submission result
Figure: Visualized result
T. Nguyen-Quang et al. HCMUS at MediaEval 2020 December 14-15, 2020 13 / 19
Conclusion and future works
Conclusion
Our methods systematically increase the performance on the recall@100 metric.
Consistent results, i.e., high-ranking images are of relevance to queried articles.
T. Nguyen-Quang et al. HCMUS at MediaEval 2020 December 14-15, 2020 14 / 19
Conclusion and future works
Conclusion
Incorporating high-level semantics increase performance.
System builders should use multiple methods to handle different aspects of the
complex image-text multimodal relation.
T. Nguyen-Quang et al. HCMUS at MediaEval 2020 December 14-15, 2020 15 / 19
Conclusion and future works
Future Works
Investigate better fusion methods.
Thorough ablation study for proposed methods.
Enhance the dataset for thorough evaluation with information retrieval metrics
like NDCG
T. Nguyen-Quang et al. HCMUS at MediaEval 2020 December 14-15, 2020 16 / 19
Bibliography
References I
Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio. Neural Machine Translation by Jointly Learning to Align and Translate. 2016. arXiv:
1409.0473 [cs.CL].
Ben-Baruch, Emanuel et al. “Asymmetric Loss For Multi-Label Classification”. In: arXiv preprint arXiv:2009.14119 (2020).
Bevilacqua, Michele and Roberto Navigli. “Breaking through the 80% glass ceiling: Raising the state of the art in Word Sense Disambiguation by
incorporating knowledge graph information”. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020,
pp. 2854–2864.
Bollacker, Kurt et al. “Freebase: a collaboratively created graph database for structuring human knowledge”. In: Proceedings of the 2008 ACM
SIGMOD international conference on Management of data. 2008, pp. 1247–1250.
Branden Chan Timo Möller, Malte Pietsch Tanay Soni. “Model from https://huggingface.co/bert-base-german-cased”. In: (2020).
Chan, Branden, Stefan Schweter, and Timo Möller. German’s Next Language Model. 2020. arXiv: 2010.10906 [cs.CL].
Chollet, François. “Xception: Deep learning with depthwise separable convolutions”. In: Proceedings of the IEEE conference on computer vision and
pattern recognition. 2017, pp. 1251–1258.
dbmdz. “Model from https://huggingface.co/dbmdz/bert-base-german-uncased”. In: (2020).
Devlin, Jacob et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. 2019. arXiv: 1810.04805 [cs.CL].
Geitgey, Adam. Face Recognition. 2018. url: https://github.com/ageitgey/face_recognition.
He, Kaiming et al. “Identity mappings in deep residual networks”. In: European conference on computer vision. Springer. 2016, pp. 630–645.
Hoffer, Elad and Nir Ailon. Deep metric learning using Triplet network. 2018. arXiv: 1412.6622 [cs.LG].
Hossain, MD Zakir et al. “A comprehensive survey of deep learning for image captioning”. In: ACM Computing Surveys (CSUR) 51.6 (2019),
pp. 1–36.
Huang, Gao et al. “Densely connected convolutional networks”. In: Proceedings of the IEEE conference on computer vision and pattern recognition.
2017, pp. 4700–4708.
T. Nguyen-Quang et al. HCMUS at MediaEval 2020 December 14-15, 2020 17 / 19
Bibliography
References II
Ke, Guolin et al. “LightGBM: A Highly Efficient Gradient Boosting Decision Tree”. In: Advances in Neural Information Processing Systems. Ed. by
I. Guyon et al. Vol. 30. Curran Associates, Inc., 2017, pp. 3146–3154. url:
https://proceedings.neurips.cc/paper/2017/file/6449f44a102fde848669bdd9eb6b76fa-Paper.pdf.
Kille, Benjamin, Andreas Lommatzsch, and Özlem Özgöbek. “News Images in MediaEval 2020”. In: Proc. of the MediaEval 2020 Workshop. Online.
2020.
King, Davis E. dlib-models. 2018. url: https://github.com/davisking/dlib-models.
Kuznetsova, Alina et al. “The Open Images Dataset V4: Unified image classification, object detection, and visual relationship detection at scale”. In:
IJCV (2020).
Lau, Jey Han and Timothy Baldwin. An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation. 2016. arXiv:
1607.05368 [cs.CL].
Lopez, Patrice. Entity Fishing. 2020. url: https://github.com/kermitt2/entity-fishing.
“Model from https://huggingface.co/german-nlp-group/electra-base-german-uncased”. In: (2020).
“Model from https://huggingface.co/T-Systems-onsite/bert-german-dbmdz-uncased-sentence-stsb”. In: (2020).
Navigli, Roberto and Simone Paolo Ponzetto. “BabelNet: Building a very large multilingual semantic network”. In: Proceedings of the 48th annual
meeting of the association for computational linguistics. 2010, pp. 216–225.
Oostdijk, NHJ et al. “The Connection between the Text and Images of News Articles: New Insights for Multimedia Analysis”. In: (2020).
Ridnik, Tal et al. “TResNet: High Performance GPU-Dedicated Architecture”. In: arXiv preprint arXiv:2003.13630 (2020).
Sandler, Mark et al. “Mobilenetv2: Inverted residuals and linear bottlenecks”. In: Proceedings of the IEEE conference on computer vision and pattern
recognition. 2018, pp. 4510–4520.
Simonyan, Karen and Andrew Zisserman. “Very deep convolutional networks for large-scale image recognition”. In: arXiv preprint arXiv:1409.1556
(2014).
T. Nguyen-Quang et al. HCMUS at MediaEval 2020 December 14-15, 2020 18 / 19
Bibliography
References III
Szegedy, Christian et al. “Inception-v4, inception-resnet and the impact of residual connections on learning”. In: arXiv preprint arXiv:1602.07261
(2016).
Tan, Mingxing and Quoc V. Le. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. 2020. arXiv: 1905.11946 [cs.LG].
Xu, Kelvin et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. 2016. arXiv: 1502.03044 [cs.LG].
Zoph, Barret et al. “Learning transferable architectures for scalable image recognition”. In: Proceedings of the IEEE conference on computer vision
and pattern recognition. 2018, pp. 8697–8710.
T. Nguyen-Quang et al. HCMUS at MediaEval 2020 December 14-15, 2020 19 / 19

More Related Content

What's hot

Image Restoration Using Joint Statistical Modeling in a Space-Transform Domain
Image Restoration Using Joint Statistical Modeling in a Space-Transform DomainImage Restoration Using Joint Statistical Modeling in a Space-Transform Domain
Image Restoration Using Joint Statistical Modeling in a Space-Transform Domain
john236zaq
 
Detection of leaf diseases and classification using digital image processing
Detection of leaf diseases and classification using digital image processingDetection of leaf diseases and classification using digital image processing
Detection of leaf diseases and classification using digital image processing
Naeem Shehzad
 
Image Contrast Enhancement for Brightness Preservation Based on Dynamic Stret...
Image Contrast Enhancement for Brightness Preservation Based on Dynamic Stret...Image Contrast Enhancement for Brightness Preservation Based on Dynamic Stret...
Image Contrast Enhancement for Brightness Preservation Based on Dynamic Stret...
CSCJournals
 
Image Fusion and Image Quality Assessment of Fused Images
Image Fusion and Image Quality Assessment of Fused ImagesImage Fusion and Image Quality Assessment of Fused Images
Image Fusion and Image Quality Assessment of Fused Images
CSCJournals
 
Development and Comparison of Image Fusion Techniques for CT&MRI Images
Development and Comparison of Image Fusion Techniques for CT&MRI ImagesDevelopment and Comparison of Image Fusion Techniques for CT&MRI Images
Development and Comparison of Image Fusion Techniques for CT&MRI Images
IJERA Editor
 
Image Co-segmentation via Saliency Co-fusion
Image Co-segmentation via Saliency Co-fusionImage Co-segmentation via Saliency Co-fusion
Image Co-segmentation via Saliency Co-fusion
Koteswar Rao Jerripothula
 
Modelling Probability Distributions using Neural Networks: Applications to Me...
Modelling Probability Distributions using Neural Networks: Applications to Me...Modelling Probability Distributions using Neural Networks: Applications to Me...
Modelling Probability Distributions using Neural Networks: Applications to Me...
Christian Baumgartner
 
A survey on methods and applications of meta-learning with GNNs
A survey on methods and applications of meta-learning with GNNsA survey on methods and applications of meta-learning with GNNs
A survey on methods and applications of meta-learning with GNNs
Shreya Goyal
 
Microstructural Analysis and Machine Learning
Microstructural Analysis and Machine LearningMicrostructural Analysis and Machine Learning
Microstructural Analysis and Machine Learning
PFHub PFHub
 
Long-Term Robust Tracking Whith on Failure Recovery
Long-Term Robust Tracking Whith on Failure RecoveryLong-Term Robust Tracking Whith on Failure Recovery
Long-Term Robust Tracking Whith on Failure Recovery
TELKOMNIKA JOURNAL
 
Adaptive threshold for moving objects detection using gaussian mixture model
Adaptive threshold for moving objects detection using gaussian mixture modelAdaptive threshold for moving objects detection using gaussian mixture model
Adaptive threshold for moving objects detection using gaussian mixture model
TELKOMNIKA JOURNAL
 
CenterForDomainSpecificComputing-Poster
CenterForDomainSpecificComputing-PosterCenterForDomainSpecificComputing-Poster
CenterForDomainSpecificComputing-PosterYunming Zhang
 
Mnist soln
Mnist solnMnist soln
Mnist soln
DanishFaisal4
 
Energy minimization based spatially
Energy minimization based spatiallyEnergy minimization based spatially
Energy minimization based spatially
sipij
 
IRJET- A Review on Data Dependent Label Distribution Learning for Age Estimat...
IRJET- A Review on Data Dependent Label Distribution Learning for Age Estimat...IRJET- A Review on Data Dependent Label Distribution Learning for Age Estimat...
IRJET- A Review on Data Dependent Label Distribution Learning for Age Estimat...
IRJET Journal
 
Failed handoffs in collaborative Wi-Fi networks
Failed handoffs in collaborative Wi-Fi networksFailed handoffs in collaborative Wi-Fi networks
Failed handoffs in collaborative Wi-Fi networks
TELKOMNIKA JOURNAL
 
A LOCALITY SENSITIVE LOW-RANK MODEL FOR IMAGE TAG COMPLETION
A LOCALITY SENSITIVE LOW-RANK MODEL FOR IMAGE TAG COMPLETIONA LOCALITY SENSITIVE LOW-RANK MODEL FOR IMAGE TAG COMPLETION
A LOCALITY SENSITIVE LOW-RANK MODEL FOR IMAGE TAG COMPLETION
Nexgen Technology
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
inventionjournals
 
Quantitative Comparison of Artificial Honey Bee Colony Clustering and Enhance...
Quantitative Comparison of Artificial Honey Bee Colony Clustering and Enhance...Quantitative Comparison of Artificial Honey Bee Colony Clustering and Enhance...
Quantitative Comparison of Artificial Honey Bee Colony Clustering and Enhance...
idescitation
 

What's hot (20)

Image Restoration Using Joint Statistical Modeling in a Space-Transform Domain
Image Restoration Using Joint Statistical Modeling in a Space-Transform DomainImage Restoration Using Joint Statistical Modeling in a Space-Transform Domain
Image Restoration Using Joint Statistical Modeling in a Space-Transform Domain
 
Detection of leaf diseases and classification using digital image processing
Detection of leaf diseases and classification using digital image processingDetection of leaf diseases and classification using digital image processing
Detection of leaf diseases and classification using digital image processing
 
Image Contrast Enhancement for Brightness Preservation Based on Dynamic Stret...
Image Contrast Enhancement for Brightness Preservation Based on Dynamic Stret...Image Contrast Enhancement for Brightness Preservation Based on Dynamic Stret...
Image Contrast Enhancement for Brightness Preservation Based on Dynamic Stret...
 
Image Fusion and Image Quality Assessment of Fused Images
Image Fusion and Image Quality Assessment of Fused ImagesImage Fusion and Image Quality Assessment of Fused Images
Image Fusion and Image Quality Assessment of Fused Images
 
Development and Comparison of Image Fusion Techniques for CT&MRI Images
Development and Comparison of Image Fusion Techniques for CT&MRI ImagesDevelopment and Comparison of Image Fusion Techniques for CT&MRI Images
Development and Comparison of Image Fusion Techniques for CT&MRI Images
 
Image Co-segmentation via Saliency Co-fusion
Image Co-segmentation via Saliency Co-fusionImage Co-segmentation via Saliency Co-fusion
Image Co-segmentation via Saliency Co-fusion
 
40120130406009
4012013040600940120130406009
40120130406009
 
Modelling Probability Distributions using Neural Networks: Applications to Me...
Modelling Probability Distributions using Neural Networks: Applications to Me...Modelling Probability Distributions using Neural Networks: Applications to Me...
Modelling Probability Distributions using Neural Networks: Applications to Me...
 
A survey on methods and applications of meta-learning with GNNs
A survey on methods and applications of meta-learning with GNNsA survey on methods and applications of meta-learning with GNNs
A survey on methods and applications of meta-learning with GNNs
 
Microstructural Analysis and Machine Learning
Microstructural Analysis and Machine LearningMicrostructural Analysis and Machine Learning
Microstructural Analysis and Machine Learning
 
Long-Term Robust Tracking Whith on Failure Recovery
Long-Term Robust Tracking Whith on Failure RecoveryLong-Term Robust Tracking Whith on Failure Recovery
Long-Term Robust Tracking Whith on Failure Recovery
 
Adaptive threshold for moving objects detection using gaussian mixture model
Adaptive threshold for moving objects detection using gaussian mixture modelAdaptive threshold for moving objects detection using gaussian mixture model
Adaptive threshold for moving objects detection using gaussian mixture model
 
CenterForDomainSpecificComputing-Poster
CenterForDomainSpecificComputing-PosterCenterForDomainSpecificComputing-Poster
CenterForDomainSpecificComputing-Poster
 
Mnist soln
Mnist solnMnist soln
Mnist soln
 
Energy minimization based spatially
Energy minimization based spatiallyEnergy minimization based spatially
Energy minimization based spatially
 
IRJET- A Review on Data Dependent Label Distribution Learning for Age Estimat...
IRJET- A Review on Data Dependent Label Distribution Learning for Age Estimat...IRJET- A Review on Data Dependent Label Distribution Learning for Age Estimat...
IRJET- A Review on Data Dependent Label Distribution Learning for Age Estimat...
 
Failed handoffs in collaborative Wi-Fi networks
Failed handoffs in collaborative Wi-Fi networksFailed handoffs in collaborative Wi-Fi networks
Failed handoffs in collaborative Wi-Fi networks
 
A LOCALITY SENSITIVE LOW-RANK MODEL FOR IMAGE TAG COMPLETION
A LOCALITY SENSITIVE LOW-RANK MODEL FOR IMAGE TAG COMPLETIONA LOCALITY SENSITIVE LOW-RANK MODEL FOR IMAGE TAG COMPLETION
A LOCALITY SENSITIVE LOW-RANK MODEL FOR IMAGE TAG COMPLETION
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
 
Quantitative Comparison of Artificial Honey Bee Colony Clustering and Enhance...
Quantitative Comparison of Artificial Honey Bee Colony Clustering and Enhance...Quantitative Comparison of Artificial Honey Bee Colony Clustering and Enhance...
Quantitative Comparison of Artificial Honey Bee Colony Clustering and Enhance...
 

Similar to HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching

AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Ad...
AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Ad...AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Ad...
AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Ad...
Willy Marroquin (WillyDevNET)
 
Adversarial Variational Autoencoders to extend and improve generative model -...
Adversarial Variational Autoencoders to extend and improve generative model -...Adversarial Variational Autoencoders to extend and improve generative model -...
Adversarial Variational Autoencoders to extend and improve generative model -...
Loc Nguyen
 
Comparison of thresholding methods
Comparison of thresholding methodsComparison of thresholding methods
Comparison of thresholding methods
Vrushali Lanjewar
 
IRJET- Image Captioning using Multimodal Embedding
IRJET-  	  Image Captioning using Multimodal EmbeddingIRJET-  	  Image Captioning using Multimodal Embedding
IRJET- Image Captioning using Multimodal Embedding
IRJET Journal
 
Data dissemination and materials informatics at LBNL
Data dissemination and materials informatics at LBNLData dissemination and materials informatics at LBNL
Data dissemination and materials informatics at LBNL
Anubhav Jain
 
Turning Data into Knowledge (KESW2014 Keynote)
Turning Data into Knowledge (KESW2014 Keynote)Turning Data into Knowledge (KESW2014 Keynote)
Turning Data into Knowledge (KESW2014 Keynote)
Stefan Dietze
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
IJERD Editor
 
Ijarcet vol-2-issue-4-1374-1382
Ijarcet vol-2-issue-4-1374-1382Ijarcet vol-2-issue-4-1374-1382
Ijarcet vol-2-issue-4-1374-1382Editor IJARCET
 
Improved wolf algorithm on document images detection using optimum mean techn...
Improved wolf algorithm on document images detection using optimum mean techn...Improved wolf algorithm on document images detection using optimum mean techn...
Improved wolf algorithm on document images detection using optimum mean techn...
journalBEEI
 
An effective RGB color selection for complex 3D object structure in scene gra...
An effective RGB color selection for complex 3D object structure in scene gra...An effective RGB color selection for complex 3D object structure in scene gra...
An effective RGB color selection for complex 3D object structure in scene gra...
IJECEIAES
 
IMAGE GENERATION WITH GANS-BASED TECHNIQUES: A SURVEY
IMAGE GENERATION WITH GANS-BASED TECHNIQUES: A SURVEYIMAGE GENERATION WITH GANS-BASED TECHNIQUES: A SURVEY
IMAGE GENERATION WITH GANS-BASED TECHNIQUES: A SURVEY
ijcsit
 
Image Generation with Gans-based Techniques: A Survey
Image Generation with Gans-based Techniques: A SurveyImage Generation with Gans-based Techniques: A Survey
Image Generation with Gans-based Techniques: A Survey
AIRCC Publishing Corporation
 
TEXTS CLASSIFICATION WITH THE USAGE OF NEURAL NETWORK BASED ON THE WORD2VEC’S...
TEXTS CLASSIFICATION WITH THE USAGE OF NEURAL NETWORK BASED ON THE WORD2VEC’S...TEXTS CLASSIFICATION WITH THE USAGE OF NEURAL NETWORK BASED ON THE WORD2VEC’S...
TEXTS CLASSIFICATION WITH THE USAGE OF NEURAL NETWORK BASED ON THE WORD2VEC’S...
ijsc
 
Texts Classification with the usage of Neural Network based on the Word2vec’s...
Texts Classification with the usage of Neural Network based on the Word2vec’s...Texts Classification with the usage of Neural Network based on the Word2vec’s...
Texts Classification with the usage of Neural Network based on the Word2vec’s...
ijsc
 
Ensemble based method for the classification of flooding event using social m...
Ensemble based method for the classification of flooding event using social m...Ensemble based method for the classification of flooding event using social m...
Ensemble based method for the classification of flooding event using social m...
multimediaeval
 
A method for semantic-based image retrieval using hierarchical clustering tre...
A method for semantic-based image retrieval using hierarchical clustering tre...A method for semantic-based image retrieval using hierarchical clustering tre...
A method for semantic-based image retrieval using hierarchical clustering tre...
TELKOMNIKA JOURNAL
 
Channel and spatial attention mechanism for fashion image captioning
Channel and spatial attention mechanism for fashion image captioning Channel and spatial attention mechanism for fashion image captioning
Channel and spatial attention mechanism for fashion image captioning
IJECEIAES
 
Knowledge maps for e-learning. Jae Hwa Lee, Aviv Segev
Knowledge maps for e-learning. Jae Hwa Lee, Aviv SegevKnowledge maps for e-learning. Jae Hwa Lee, Aviv Segev
Knowledge maps for e-learning. Jae Hwa Lee, Aviv Segev
eraser Juan José Calderón
 
One dimensional vector based pattern
One dimensional vector based patternOne dimensional vector based pattern
One dimensional vector based pattern
ijcsit
 
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
tuxette
 

Similar to HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching (20)

AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Ad...
AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Ad...AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Ad...
AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Ad...
 
Adversarial Variational Autoencoders to extend and improve generative model -...
Adversarial Variational Autoencoders to extend and improve generative model -...Adversarial Variational Autoencoders to extend and improve generative model -...
Adversarial Variational Autoencoders to extend and improve generative model -...
 
Comparison of thresholding methods
Comparison of thresholding methodsComparison of thresholding methods
Comparison of thresholding methods
 
IRJET- Image Captioning using Multimodal Embedding
IRJET-  	  Image Captioning using Multimodal EmbeddingIRJET-  	  Image Captioning using Multimodal Embedding
IRJET- Image Captioning using Multimodal Embedding
 
Data dissemination and materials informatics at LBNL
Data dissemination and materials informatics at LBNLData dissemination and materials informatics at LBNL
Data dissemination and materials informatics at LBNL
 
Turning Data into Knowledge (KESW2014 Keynote)
Turning Data into Knowledge (KESW2014 Keynote)Turning Data into Knowledge (KESW2014 Keynote)
Turning Data into Knowledge (KESW2014 Keynote)
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
 
Ijarcet vol-2-issue-4-1374-1382
Ijarcet vol-2-issue-4-1374-1382Ijarcet vol-2-issue-4-1374-1382
Ijarcet vol-2-issue-4-1374-1382
 
Improved wolf algorithm on document images detection using optimum mean techn...
Improved wolf algorithm on document images detection using optimum mean techn...Improved wolf algorithm on document images detection using optimum mean techn...
Improved wolf algorithm on document images detection using optimum mean techn...
 
An effective RGB color selection for complex 3D object structure in scene gra...
An effective RGB color selection for complex 3D object structure in scene gra...An effective RGB color selection for complex 3D object structure in scene gra...
An effective RGB color selection for complex 3D object structure in scene gra...
 
IMAGE GENERATION WITH GANS-BASED TECHNIQUES: A SURVEY
IMAGE GENERATION WITH GANS-BASED TECHNIQUES: A SURVEYIMAGE GENERATION WITH GANS-BASED TECHNIQUES: A SURVEY
IMAGE GENERATION WITH GANS-BASED TECHNIQUES: A SURVEY
 
Image Generation with Gans-based Techniques: A Survey
Image Generation with Gans-based Techniques: A SurveyImage Generation with Gans-based Techniques: A Survey
Image Generation with Gans-based Techniques: A Survey
 
TEXTS CLASSIFICATION WITH THE USAGE OF NEURAL NETWORK BASED ON THE WORD2VEC’S...
TEXTS CLASSIFICATION WITH THE USAGE OF NEURAL NETWORK BASED ON THE WORD2VEC’S...TEXTS CLASSIFICATION WITH THE USAGE OF NEURAL NETWORK BASED ON THE WORD2VEC’S...
TEXTS CLASSIFICATION WITH THE USAGE OF NEURAL NETWORK BASED ON THE WORD2VEC’S...
 
Texts Classification with the usage of Neural Network based on the Word2vec’s...
Texts Classification with the usage of Neural Network based on the Word2vec’s...Texts Classification with the usage of Neural Network based on the Word2vec’s...
Texts Classification with the usage of Neural Network based on the Word2vec’s...
 
Ensemble based method for the classification of flooding event using social m...
Ensemble based method for the classification of flooding event using social m...Ensemble based method for the classification of flooding event using social m...
Ensemble based method for the classification of flooding event using social m...
 
A method for semantic-based image retrieval using hierarchical clustering tre...
A method for semantic-based image retrieval using hierarchical clustering tre...A method for semantic-based image retrieval using hierarchical clustering tre...
A method for semantic-based image retrieval using hierarchical clustering tre...
 
Channel and spatial attention mechanism for fashion image captioning
Channel and spatial attention mechanism for fashion image captioning Channel and spatial attention mechanism for fashion image captioning
Channel and spatial attention mechanism for fashion image captioning
 
Knowledge maps for e-learning. Jae Hwa Lee, Aviv Segev
Knowledge maps for e-learning. Jae Hwa Lee, Aviv SegevKnowledge maps for e-learning. Jae Hwa Lee, Aviv Segev
Knowledge maps for e-learning. Jae Hwa Lee, Aviv Segev
 
One dimensional vector based pattern
One dimensional vector based patternOne dimensional vector based pattern
One dimensional vector based pattern
 
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
 

More from multimediaeval

Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...
Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...
Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...
multimediaeval
 
Sports Video Classification: Classification of Strokes in Table Tennis for Me...
Sports Video Classification: Classification of Strokes in Table Tennis for Me...Sports Video Classification: Classification of Strokes in Table Tennis for Me...
Sports Video Classification: Classification of Strokes in Table Tennis for Me...
multimediaeval
 
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...
multimediaeval
 
Essex-NLIP at MediaEval Predicting Media Memorability 2020 Task
Essex-NLIP at MediaEval Predicting Media Memorability 2020 TaskEssex-NLIP at MediaEval Predicting Media Memorability 2020 Task
Essex-NLIP at MediaEval Predicting Media Memorability 2020 Task
multimediaeval
 
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...
multimediaeval
 
Fooling an Automatic Image Quality Estimator
Fooling an Automatic Image Quality EstimatorFooling an Automatic Image Quality Estimator
Fooling an Automatic Image Quality Estimator
multimediaeval
 
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...
multimediaeval
 
Pixel Privacy: Quality Camouflage for Social Images
Pixel Privacy: Quality Camouflage for Social ImagesPixel Privacy: Quality Camouflage for Social Images
Pixel Privacy: Quality Camouflage for Social Images
multimediaeval
 
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
multimediaeval
 
Deep Conditional Adversarial learning for polyp Segmentation
Deep Conditional Adversarial learning for polyp SegmentationDeep Conditional Adversarial learning for polyp Segmentation
Deep Conditional Adversarial learning for polyp Segmentation
multimediaeval
 
A Temporal-Spatial Attention Model for Medical Image Detection
A Temporal-Spatial Attention Model for Medical Image DetectionA Temporal-Spatial Attention Model for Medical Image Detection
A Temporal-Spatial Attention Model for Medical Image Detection
multimediaeval
 
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...
multimediaeval
 
Fine-tuning for Polyp Segmentation with Attention
Fine-tuning for Polyp Segmentation with AttentionFine-tuning for Polyp Segmentation with Attention
Fine-tuning for Polyp Segmentation with Attention
multimediaeval
 
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...
multimediaeval
 
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...
multimediaeval
 
Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...
 Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ... Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...
Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...
multimediaeval
 
Personal Air Quality Index Prediction Using Inverse Distance Weighting Method
Personal Air Quality Index Prediction Using Inverse Distance Weighting MethodPersonal Air Quality Index Prediction Using Inverse Distance Weighting Method
Personal Air Quality Index Prediction Using Inverse Distance Weighting Method
multimediaeval
 
Overview of MediaEval 2020 Insights for Wellbeing: Multimodal Personal Health...
Overview of MediaEval 2020 Insights for Wellbeing: Multimodal Personal Health...Overview of MediaEval 2020 Insights for Wellbeing: Multimodal Personal Health...
Overview of MediaEval 2020 Insights for Wellbeing: Multimodal Personal Health...
multimediaeval
 
Flood Detection via Twitter Streams using Textual and Visual Features
Flood Detection via Twitter Streams using Textual and Visual FeaturesFlood Detection via Twitter Streams using Textual and Visual Features
Flood Detection via Twitter Streams using Textual and Visual Features
multimediaeval
 
Floods Detection in Twitter Text and Images
Floods Detection in Twitter Text and ImagesFloods Detection in Twitter Text and Images
Floods Detection in Twitter Text and Images
multimediaeval
 

More from multimediaeval (20)

Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...
Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...
Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...
 
Sports Video Classification: Classification of Strokes in Table Tennis for Me...
Sports Video Classification: Classification of Strokes in Table Tennis for Me...Sports Video Classification: Classification of Strokes in Table Tennis for Me...
Sports Video Classification: Classification of Strokes in Table Tennis for Me...
 
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...
 
Essex-NLIP at MediaEval Predicting Media Memorability 2020 Task
Essex-NLIP at MediaEval Predicting Media Memorability 2020 TaskEssex-NLIP at MediaEval Predicting Media Memorability 2020 Task
Essex-NLIP at MediaEval Predicting Media Memorability 2020 Task
 
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...
 
Fooling an Automatic Image Quality Estimator
Fooling an Automatic Image Quality EstimatorFooling an Automatic Image Quality Estimator
Fooling an Automatic Image Quality Estimator
 
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...
 
Pixel Privacy: Quality Camouflage for Social Images
Pixel Privacy: Quality Camouflage for Social ImagesPixel Privacy: Quality Camouflage for Social Images
Pixel Privacy: Quality Camouflage for Social Images
 
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
 
Deep Conditional Adversarial learning for polyp Segmentation
Deep Conditional Adversarial learning for polyp SegmentationDeep Conditional Adversarial learning for polyp Segmentation
Deep Conditional Adversarial learning for polyp Segmentation
 
A Temporal-Spatial Attention Model for Medical Image Detection
A Temporal-Spatial Attention Model for Medical Image DetectionA Temporal-Spatial Attention Model for Medical Image Detection
A Temporal-Spatial Attention Model for Medical Image Detection
 
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...
 
Fine-tuning for Polyp Segmentation with Attention
Fine-tuning for Polyp Segmentation with AttentionFine-tuning for Polyp Segmentation with Attention
Fine-tuning for Polyp Segmentation with Attention
 
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...
 
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...
 
Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...
 Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ... Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...
Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...
 
Personal Air Quality Index Prediction Using Inverse Distance Weighting Method
Personal Air Quality Index Prediction Using Inverse Distance Weighting MethodPersonal Air Quality Index Prediction Using Inverse Distance Weighting Method
Personal Air Quality Index Prediction Using Inverse Distance Weighting Method
 
Overview of MediaEval 2020 Insights for Wellbeing: Multimodal Personal Health...
Overview of MediaEval 2020 Insights for Wellbeing: Multimodal Personal Health...Overview of MediaEval 2020 Insights for Wellbeing: Multimodal Personal Health...
Overview of MediaEval 2020 Insights for Wellbeing: Multimodal Personal Health...
 
Flood Detection via Twitter Streams using Textual and Visual Features
Flood Detection via Twitter Streams using Textual and Visual FeaturesFlood Detection via Twitter Streams using Textual and Visual Features
Flood Detection via Twitter Streams using Textual and Visual Features
 
Floods Detection in Twitter Text and Images
Floods Detection in Twitter Text and ImagesFloods Detection in Twitter Text and Images
Floods Detection in Twitter Text and Images
 

Recently uploaded

Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
NathanBaughman3
 
Structures and textures of metamorphic rocks
Structures and textures of metamorphic rocksStructures and textures of metamorphic rocks
Structures and textures of metamorphic rocks
kumarmathi863
 
Lab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerinLab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerin
ossaicprecious19
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
Columbia Weather Systems
 
Anemia_ different types_causes_ conditions
Anemia_ different types_causes_ conditionsAnemia_ different types_causes_ conditions
Anemia_ different types_causes_ conditions
muralinath2
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
Lokesh Patil
 
Penicillin...........................pptx
Penicillin...........................pptxPenicillin...........................pptx
Penicillin...........................pptx
Cherry
 
Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
subedisuryaofficial
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
muralinath2
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
AlaminAfendy1
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
pablovgd
 
Predicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdfPredicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdf
binhminhvu04
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Sérgio Sacani
 
insect morphology and physiology of insect
insect morphology and physiology of insectinsect morphology and physiology of insect
insect morphology and physiology of insect
anitaento25
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Sérgio Sacani
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
muralinath2
 
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
Areesha Ahmad
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
muralinath2
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
silvermistyshot
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
Health Advances
 

Recently uploaded (20)

Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
 
Structures and textures of metamorphic rocks
Structures and textures of metamorphic rocksStructures and textures of metamorphic rocks
Structures and textures of metamorphic rocks
 
Lab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerinLab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerin
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
 
Anemia_ different types_causes_ conditions
Anemia_ different types_causes_ conditionsAnemia_ different types_causes_ conditions
Anemia_ different types_causes_ conditions
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
 
Penicillin...........................pptx
Penicillin...........................pptxPenicillin...........................pptx
Penicillin...........................pptx
 
Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
 
Predicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdfPredicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdf
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
 
insect morphology and physiology of insect
insect morphology and physiology of insectinsect morphology and physiology of insect
insect morphology and physiology of insect
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
 
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
 

HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching

  • 1. HCMUS at MediaEval 2020: Image-Text Fusion for Automatic News-Images Re-Matching Thuc Nguyen-Quang 1,3, Tuan-Duy H. Nguyen1,3, Thang-Long Nguyen-Ho1,3, Anh-Kiet Duong1,3, Xuan-Nhat Hoang1,3, Vinh-Thuyen Nguyen-Truong1,3, Hai-Dang Nguyen1,3, Minh-Triet Tran1,2,3 1University of Science, VNU-HCM, 2John von Neumann Institute, VNU-HCM, 3Vietnam National University, Ho Chi Minh city, Vietnam December 14-15, 2020 T. Nguyen-Quang et al. HCMUS at MediaEval 2020
  • 2. Outlines 1 Introduction 2 Methods Metric Learning Image-Text Matching via Categorization Image-Text Fusion with Image Captioning and Contextual Embeddings Image-Text Fusion with Knowledge Graph-based Contextual Embeddings Image-Text Fusion with Knowledge Graph-based Contextual Embeddings Graph-based Face-Name Matching Ensemble 3 Results 4 Conclusion and future works 5 Bibliography T. Nguyen-Quang et al. HCMUS at MediaEval 2020 December 14-15, 2020 1 / 19
  • 3. Introduction T. Nguyen-Quang et al. HCMUS at MediaEval 2020 December 14-15, 2020 Introduction
  • 4. Introduction Introduction Mainly concern fusing cross-modal embedded information extracted as: Simple set intersection Deep neural features Knowledge-graph-enhanced neural features T. Nguyen-Quang et al. HCMUS at MediaEval 2020 December 14-15, 2020 2 / 19
  • 5. Introduction Introduction M1 Metric Learning M2 Image-Text Matching via Categorization M3 Image-Text Fusion with Image Captioning and Contextual Embeddings M4 Image-Text Fusion with Knowledge Graph-based Contextual Embeddings M5 Graph-based Face-Name Matching T. Nguyen-Quang et al. HCMUS at MediaEval 2020 December 14-15, 2020 3 / 19
  • 6. Methods T. Nguyen-Quang et al. HCMUS at MediaEval 2020 December 14-15, 2020 Methods
  • 7. Methods • § Metric Learning Metric Learning Using a Triplet Loss model to project embeddings of image-text pairs to bases of significant similarity. Title texts are embedded with BERT. Image embeddings: Global context embedding: EfficientNet Local context embedding: Top-k bottom-up-attention objects passed to a self-attention sequential model. T. Nguyen-Quang et al. HCMUS at MediaEval 2020 December 14-15, 2020 4 / 19
  • 8. Methods • § Image-Text Matching via Categorization Image-Text Matching via Categorization Categorizing images and texts with two gradient boosting decision trees. Target categories extracted from URLS: nrw kultur region panorama sport wirtscharft koeln ratgerber politik unknown T. Nguyen-Quang et al. HCMUS at MediaEval 2020 December 14-15, 2020 5 / 19
  • 9. Methods • § Image-Text Matching via Categorization Image-Text Matching via Categorization Augment and extract image features with VGG16, InceptionResNetV2, MobileNetV2, EfficientNetB1-7, Xception, ResNet152V2, NASNetLarge, DenseNet201. Texts are mapped to BERT and ELECTRA contextual embeddings. An iterative ranking method that takes into account the order of matched categories: At the k-th iteration, finds top-k categories for each image and top-k categories for each article. For each article: candidate images are ones having top-k categories intersect that of the article. Sequentially concatenate k candidate lists, then append the remaining images to the tail to make the final ranked list. T. Nguyen-Quang et al. HCMUS at MediaEval 2020 December 14-15, 2020 6 / 19
  • 10. Methods • § Image-Text Fusion with Image Captioning and Contextual Embeddings Image-Text Fusion with Image Captioning and Contextual Embeddings We hypothesize that the description of the image is semantically similar to the title. Captioning model consist of three parts: Image feature extractor: We use EfficientNetTan and Le, EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks for feature extraction. The feature has the shape (8, 8, 2048) Feature encoder: The features pass through fully connected giving a vector 256-dims. Decoder: To generate the caption, we use Bahdanau attentionBahdanau, Cho, and Bengio, Neural Machine Translation by Jointly Learning to Align and Translate and GRU to predict the next word. T. Nguyen-Quang et al. HCMUS at MediaEval 2020 December 14-15, 2020 7 / 19
  • 11. Methods • § Image-Text Fusion with Image Captioning and Contextual Embeddings Image-Text Fusion with Image Captioning and Contextual Embeddings To represent the caption and the title as vectors, we use RoBERTa and doc2vec. Then we compute their similarity via: Stotal = Swiki + Sapnews + SRoBERTa + (1 − Dfuzzy) + (1 − Dpartial) with Swiki, Sapnews, SRoBERTa are cosine similarity of two vectors generated by enwiki dbow, apnews dbow, RoBERTa, respectively Dfuzzy , Dpartial are fuzzywuzzy and partial ratios, respectively. T. Nguyen-Quang et al. HCMUS at MediaEval 2020 December 14-15, 2020 8 / 19
  • 12. Methods • § Image-Text Fusion with Knowledge Graph-based Contextual Embeddings Image-Text Fusion with Knowledge Graph-based Contextual Embeddings To account for high-level semantics, we exploit BabelNet knowledge graph. For articles: Link textual entities from texts to their synsets in the WordNet subset of BabelNet using EWISER word sense disambiguator. Use mean of accompanied SenSemBERT+LMMS embeddings corresponds to these extracted synsets representing the texts For images: Use ResNET-L with Asymmetric Loss (ASL) pre-trained on OpenImagesV6 to extract multi-label from images. Map concatenated labels to SenSemBERT+LMMS synset embeddings similar to the texts. T. Nguyen-Quang et al. HCMUS at MediaEval 2020 December 14-15, 2020 9 / 19
  • 13. Methods • § Image-Text Fusion with Knowledge Graph-based Contextual Embeddings Image-Text Fusion with Knowledge Graph-based Contextual Embeddings Train a canonical correlation analysis (CCA) on the train set to project cross-modal embeddings to bases of significant similarity. Finally, rank all images in the test set using the L2-distance between the transformed embeddings. T. Nguyen-Quang et al. HCMUS at MediaEval 2020 December 14-15, 2020 10 / 19
  • 14. Methods • § Graph-based Face-Name Matching Graph-based Face-Name Matching In a lot of instances, the publisher uses a portrait of somebody mentioned in the text. Person name extraction: We use entity-fishing to automatically extract people’s name from the text. Face encoding: We use face recognition open-source library to detect and represent the face as 128-dims vectors. We connect each person mentioned in the articles with features extracted from accompanying faces on the train set. During testing, we encode the face from the image and aggregate the number of matched faces connected to the people mentioned in the text. T. Nguyen-Quang et al. HCMUS at MediaEval 2020 December 14-15, 2020 11 / 19
  • 15. Methods • § Ensemble Ensemble The Ensemble submission combines all described methods, weighting each models based on their efficiency. As such, the final ranking of a candidate image is: REnsemble = w1RCaption + w2RTriplet + w3RFace + w4RKG−Fusion. With REnsemble, RCaption, RTriplet, RFace, RKG−Fusion are ranks of the image produced by respective methods. Weighting factors are empirically chosen to be w1 = w4 = 1, w2 = 0.02 and w3 = 0.25. T. Nguyen-Quang et al. HCMUS at MediaEval 2020 December 14-15, 2020 12 / 19
  • 16. Results T. Nguyen-Quang et al. HCMUS at MediaEval 2020 December 14-15, 2020 Results
  • 17. Results Results Figure: Submission result Figure: Visualized result T. Nguyen-Quang et al. HCMUS at MediaEval 2020 December 14-15, 2020 13 / 19
  • 18. Conclusion and future works Conclusion Our methods systematically increase the performance on the recall@100 metric. Consistent results, i.e., high-ranking images are of relevance to queried articles. T. Nguyen-Quang et al. HCMUS at MediaEval 2020 December 14-15, 2020 14 / 19
  • 19. Conclusion and future works Conclusion Incorporating high-level semantics increase performance. System builders should use multiple methods to handle different aspects of the complex image-text multimodal relation. T. Nguyen-Quang et al. HCMUS at MediaEval 2020 December 14-15, 2020 15 / 19
  • 20. Conclusion and future works Future Works Investigate better fusion methods. Thorough ablation study for proposed methods. Enhance the dataset for thorough evaluation with information retrieval metrics like NDCG T. Nguyen-Quang et al. HCMUS at MediaEval 2020 December 14-15, 2020 16 / 19
  • 21. Bibliography References I Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio. Neural Machine Translation by Jointly Learning to Align and Translate. 2016. arXiv: 1409.0473 [cs.CL]. Ben-Baruch, Emanuel et al. “Asymmetric Loss For Multi-Label Classification”. In: arXiv preprint arXiv:2009.14119 (2020). Bevilacqua, Michele and Roberto Navigli. “Breaking through the 80% glass ceiling: Raising the state of the art in Word Sense Disambiguation by incorporating knowledge graph information”. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020, pp. 2854–2864. Bollacker, Kurt et al. “Freebase: a collaboratively created graph database for structuring human knowledge”. In: Proceedings of the 2008 ACM SIGMOD international conference on Management of data. 2008, pp. 1247–1250. Branden Chan Timo Möller, Malte Pietsch Tanay Soni. “Model from https://huggingface.co/bert-base-german-cased”. In: (2020). Chan, Branden, Stefan Schweter, and Timo Möller. German’s Next Language Model. 2020. arXiv: 2010.10906 [cs.CL]. Chollet, François. “Xception: Deep learning with depthwise separable convolutions”. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2017, pp. 1251–1258. dbmdz. “Model from https://huggingface.co/dbmdz/bert-base-german-uncased”. In: (2020). Devlin, Jacob et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. 2019. arXiv: 1810.04805 [cs.CL]. Geitgey, Adam. Face Recognition. 2018. url: https://github.com/ageitgey/face_recognition. He, Kaiming et al. “Identity mappings in deep residual networks”. In: European conference on computer vision. Springer. 2016, pp. 630–645. Hoffer, Elad and Nir Ailon. Deep metric learning using Triplet network. 2018. arXiv: 1412.6622 [cs.LG]. Hossain, MD Zakir et al. “A comprehensive survey of deep learning for image captioning”. In: ACM Computing Surveys (CSUR) 51.6 (2019), pp. 1–36. Huang, Gao et al. “Densely connected convolutional networks”. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2017, pp. 4700–4708. T. Nguyen-Quang et al. HCMUS at MediaEval 2020 December 14-15, 2020 17 / 19
  • 22. Bibliography References II Ke, Guolin et al. “LightGBM: A Highly Efficient Gradient Boosting Decision Tree”. In: Advances in Neural Information Processing Systems. Ed. by I. Guyon et al. Vol. 30. Curran Associates, Inc., 2017, pp. 3146–3154. url: https://proceedings.neurips.cc/paper/2017/file/6449f44a102fde848669bdd9eb6b76fa-Paper.pdf. Kille, Benjamin, Andreas Lommatzsch, and Özlem Özgöbek. “News Images in MediaEval 2020”. In: Proc. of the MediaEval 2020 Workshop. Online. 2020. King, Davis E. dlib-models. 2018. url: https://github.com/davisking/dlib-models. Kuznetsova, Alina et al. “The Open Images Dataset V4: Unified image classification, object detection, and visual relationship detection at scale”. In: IJCV (2020). Lau, Jey Han and Timothy Baldwin. An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation. 2016. arXiv: 1607.05368 [cs.CL]. Lopez, Patrice. Entity Fishing. 2020. url: https://github.com/kermitt2/entity-fishing. “Model from https://huggingface.co/german-nlp-group/electra-base-german-uncased”. In: (2020). “Model from https://huggingface.co/T-Systems-onsite/bert-german-dbmdz-uncased-sentence-stsb”. In: (2020). Navigli, Roberto and Simone Paolo Ponzetto. “BabelNet: Building a very large multilingual semantic network”. In: Proceedings of the 48th annual meeting of the association for computational linguistics. 2010, pp. 216–225. Oostdijk, NHJ et al. “The Connection between the Text and Images of News Articles: New Insights for Multimedia Analysis”. In: (2020). Ridnik, Tal et al. “TResNet: High Performance GPU-Dedicated Architecture”. In: arXiv preprint arXiv:2003.13630 (2020). Sandler, Mark et al. “Mobilenetv2: Inverted residuals and linear bottlenecks”. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2018, pp. 4510–4520. Simonyan, Karen and Andrew Zisserman. “Very deep convolutional networks for large-scale image recognition”. In: arXiv preprint arXiv:1409.1556 (2014). T. Nguyen-Quang et al. HCMUS at MediaEval 2020 December 14-15, 2020 18 / 19
  • 23. Bibliography References III Szegedy, Christian et al. “Inception-v4, inception-resnet and the impact of residual connections on learning”. In: arXiv preprint arXiv:1602.07261 (2016). Tan, Mingxing and Quoc V. Le. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. 2020. arXiv: 1905.11946 [cs.LG]. Xu, Kelvin et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. 2016. arXiv: 1502.03044 [cs.LG]. Zoph, Barret et al. “Learning transferable architectures for scalable image recognition”. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2018, pp. 8697–8710. T. Nguyen-Quang et al. HCMUS at MediaEval 2020 December 14-15, 2020 19 / 19