Slides for the paper "Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation", accepted and presented at the 31st ACM International Conference on Multimedia (MM'23).
Paper: https://dl.acm.org/doi/10.1145/3581783.3613458
Code: https://github.com/sisinflab/Ducho
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
[MM2023] Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation
1. Ducho: A Unified Framework for the Extraction of
Multimodal Features in Recommendation
Daniele Malitesta1, Giuseppe Gassi2, Claudio Pomo1, Tommaso Di Noia1
Politecnico di Bari, Bari (Italy)
email: firstname.lastname@poliba.it, g.gassi@studenti.poliba.it
The 31st ACM International Conference on Multimedia
Ottawa, ON, Canada, 11-01-2023
Open Source Track
1 2
2. Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation
The 31st ACM International Conference on Multimedia (Ottawa, October 29 - November 03, 2023)
● Introduction and motivations
● Architecture
● Extraction pipeline
● Ducho as Docker application
● Demonstrations
● Conclusion and future work
Outline
2
4. Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation
The 31st ACM International Conference on Multimedia (Ottawa, October 29 - November 03, 2023)
Multimodal-aware recommender systems [Malitesta et al.] exploit multimodal (i.e., audio, visual, textual) content
data to augment the representation of items, thus tackling known issues such as dataset sparsity and the inexplicable
nature of users’ actions (i.e., views, clicks) on online platforms.
4
Recommendation systems leveraging multimodal data
࢛
MODALITIES
. . .
. . .
MULTIMODAL
FEATURE
EXTRACTOR
࣐ሺڄሻ
MULTIMODAL
REPRESENTATION
JOINT
ࣆሺڄሻ
COORDINATE
ࣆ ڄ
. . .
INFERENCE
࣋ሺڄሻ
EARLY
FUSION
ࢽࢋሺڄሻ
LATE
FUSION
ࢽሺڄሻ
(1) (2)
(a)
(b)
MULTIMODAL
FUSION
(3)
(a)
(b)
(4)
࢘
Which? How? When?
INPUT
[Malitesta et al.] 2023. Formalizing Multimedia Recommendation through Multimodal Deep Learning. Under review at TORS. Available online at: arXiv:2309.05273.
5. Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation
The 31st ACM International Conference on Multimedia (Ottawa, October 29 - November 03, 2023)
࢛
MODALITIES
. . .
. . .
MULTIMODAL
FEATURE
EXTRACTOR
࣐ሺڄሻ
MULTIMODAL
REPRESENTATION
JOINT
ࣆሺڄሻ
COORDINATE
ࣆ ڄ
. . .
INFERENCE
࣋ሺڄሻ
EARLY
FUSION
ࢽࢋሺڄሻ
LATE
FUSION
ࢽሺڄሻ
(1) (2)
(a)
(b)
MULTIMODAL
FUSION
(3)
(a)
(b)
(4)
࢘
Which? How? When?
INPUT
Despite being the initial stage in the multimodal recommendation pipeline, the extraction of meaningful
multimodal features is paramount in delivering high-quality recommendations [Deldjoo et al.].
5
The multimodal recommendation pipeline
[Deldjoo et al.] 2021. A Study on the Relative Importance of Convolutional Neural Networks in Visually-Aware Recommender Systems. In CVPR Workshops. Computer Vision Foundation / IEEE, 3961–3967.
6. Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation
The 31st ACM International Conference on Multimedia (Ottawa, October 29 - November 03, 2023)
However, diverse multimodal extraction
procedures are currently used in the literature.
This poses limitations:
• difficult interdependencies across various
multimodal recommendation frameworks 👎
• no shared interfaces among popular
libraries for the extraction of pre-trained
deep learning features 👎
6
Current issues in multimodal feature extraction
7. Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation
The 31st ACM International Conference on Multimedia (Ottawa, October 29 - November 03, 2023)
We present Ducho!
7
We present Ducho, our unified framework for the extraction of multimodal
features in recommendation.
To this day, Ducho:
ü integrates widely-adopted deep learning libraries (i.e., TensorFlow,
PyTorch, and Transformers) by establishing a shared interface 🙂
ü is useful to extract/process audio, visual, and textual features 😊
ü allows items and user-item interactions [Anelli et al.] as extraction
sources 😃
ü offers an easily configurable extraction pipeline through a YAML-based
file 🤩
Modalities
Sources Backends
Items Interactions TensorFlow PyTorch Transformers
Audio 3 3 3 3
Visual 3 3 3 3
Textual 3 3 3
[Anelli et al.] 2022. Reshaping Graph Recommendation with Edge Graph Collaborative Filtering and Customer Reviews. In DL4SR@CIKM (CEUR Workshop Proceedings, Vol. 3317). CEUR-WS.org.
9. Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation
The 31st ACM International Conference on Multimedia (Ottawa, October 29 - November 03, 2023)
9
The overall framework
10. Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation
The 31st ACM International Conference on Multimedia (Ottawa, October 29 - November 03, 2023)
10
Dataset modules
• Manages the loading and processing of
the input
• A general, shared schema, with three
separate implementations for Audio,
Visual, and Textual datasets
• Image/audio require folder path, text
requires a tsv file
• Two sources for the modalities: items
or user-item interactions
• Handles the pre-processing of data
• Saves the multimodal features into
numpy array format
11. Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation
The 31st ACM International Conference on Multimedia (Ottawa, October 29 - November 03, 2023)
11
Extractor modules
• Builds an extraction model from a pre-trained
network
• Provides three different implementations for
each modality
• Exposes a wide range of pre-trained models
for the three backends
• The user should indicate the (list of)
extraction layers and the pre-trained model,
following the official naming/indexing scheme
• For the textual modality, the user can indicate
the task the model is pre-trained on (e.g.,
sentiment analysis)
12. Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation
The 31st ACM International Conference on Multimedia (Ottawa, October 29 - November 03, 2023)
12
Runner
• Orchestrator of Ducho
• Instantiates, calls, and manages all modules
• Triggers the complete extraction pipeline
• Customized through the Configuration
component
• A YAML-based file is used to override
(some of) the default settings
13. Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation
The 31st ACM International Conference on Multimedia (Ottawa, October 29 - November 03, 2023)
13
Runner (configuration file)
dataset_path: ./local/data/demo1
gpu list: 0
visual:
items:
input_path: images
output_path: visual_embeddings
model: [
{ name: VGG19, output_layers: classifier.3, ...},
{ name: Xception, output_layers: avg_pool, ...},
]
15. Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation
The 31st ACM International Conference on Multimedia (Ottawa, October 29 - November 03, 2023)
15
0. Pipeline configuration
16. Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation
The 31st ACM International Conference on Multimedia (Ottawa, October 29 - November 03, 2023)
16
1. Load and preprocess step
17. Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation
The 31st ACM International Conference on Multimedia (Ottawa, October 29 - November 03, 2023)
17
2. Build of the extraction model
18. Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation
The 31st ACM International Conference on Multimedia (Ottawa, October 29 - November 03, 2023)
18
3. Output save
20. Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation
The 31st ACM International Conference on Multimedia (Ottawa, October 29 - November 03, 2023)
To fully exploit the GPU-speedup capabilities of the
selected backends, we dockerize Ducho into an out-of-
the-box Docker image, which provides:
ü CUDA 11.8
ü cuDNN 8
ü Ubuntu 22.04
ü Python 3.8
ü Pip
ü All needed Python packages
20
Dockerization of Ducho
Scan me!
22. Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation
The 31st ACM International Conference on Multimedia (Ottawa, October 29 - November 03, 2023)
Task: fashion recommendation
Input data: fashion data with images (visual)
and item metadata (textual)
Extraction: VGG19 and Xception (visual),
Sentence-BERT pre-trained for semantic
textual similarity (textual)
Output: numpy arrays for both visual and
textual features
22
Demo 1: visual + textual items features
Scan me!
Run it on
23. Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation
The 31st ACM International Conference on Multimedia (Ottawa, October 29 - November 03, 2023)
Task: song recommendation
Input data: music genres dataset with songs
(audio) and music genre (textual)
Extraction: Hybrid Demucs (audio) and
Sentence-BERT pre-trained for semantic
textual similarity (textual)
Output: audio features may require some
time…
23
Demo 2: audio + textual items features
Scan me!
Run it on
24. Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation
The 31st ACM International Conference on Multimedia (Ottawa, October 29 - November 03, 2023)
Task: product recommendation
Input data: Amazon recommendation dataset
with reviews (textual interactions) and product
descriptions (textual items)
Extraction: Multilingual BERT-based model
pre-trained on customers’ reviews for the task
of sentiment analysis (textual)
Output: numpy arrays (for the interactions,
they are mapped to the user-item pair)
24
Demo 3: textual items/interactions features
Scan me!
Run it on
26. Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation
The 31st ACM International Conference on Multimedia (Ottawa, October 29 - November 03, 2023)
Conclusion
● Ducho, a unified framework for the extraction of multimodal features in recommendation
● Three main modules: Dataset, Extractor, and Runner
● Multimodal pipeline highly configurable through a YAML-based file
● Dockerization of Ducho into an out-of-the-box application
● Three demonstrations to show all Ducho’s functionalities
Future work
● Adopt all available backends for all modalities
● Implement a general extraction interface to use the same naming/indexing scheme
● Integrate the extraction of low-level features
26
27. Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation
The 31st ACM International Conference on Multimedia (Ottawa, October 29 - November 03, 2023)
Useful resources
27
Wandering why we called our
framework Ducho?
Check out the Italian TV series
“Boris” 🤓
28. Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation
The 31st ACM International Conference on Multimedia (Ottawa, October 29 - November 03, 2023)
Don’t forget to check out our theoretical/experimental survey
28