SlideShare a Scribd company logo
1 of 28
Download to read offline
Ducho: A Unified Framework for the Extraction of
Multimodal Features in Recommendation
Daniele Malitesta1, Giuseppe Gassi2, Claudio Pomo1, Tommaso Di Noia1
Politecnico di Bari, Bari (Italy)
email: firstname.lastname@poliba.it, g.gassi@studenti.poliba.it
The 31st ACM International Conference on Multimedia
Ottawa, ON, Canada, 11-01-2023
Open Source Track
1 2
Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation
The 31st ACM International Conference on Multimedia (Ottawa, October 29 - November 03, 2023)
● Introduction and motivations
● Architecture
● Extraction pipeline
● Ducho as Docker application
● Demonstrations
● Conclusion and future work
Outline
2
Introduction and motivations
3
Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation
The 31st ACM International Conference on Multimedia (Ottawa, October 29 - November 03, 2023)
Multimodal-aware recommender systems [Malitesta et al.] exploit multimodal (i.e., audio, visual, textual) content
data to augment the representation of items, thus tackling known issues such as dataset sparsity and the inexplicable
nature of users’ actions (i.e., views, clicks) on online platforms.
4
Recommendation systems leveraging multimodal data
࢛
࢏
MODALITIES
࢓૚
࢓૛
࢓૜
. . .
. . .
MULTIMODAL
FEATURE
EXTRACTOR
࣐࢓ሺ‫ڄ‬ሻ
MULTIMODAL
REPRESENTATION
JOINT
ࣆሺ‫ڄ‬ሻ
COORDINATE
ࣆ࢓ ‫ڄ‬
. . .
INFERENCE
࣋ሺ‫ڄ‬ሻ
EARLY
FUSION
ࢽࢋሺ‫ڄ‬ሻ
LATE
FUSION
ࢽ࢒ሺ‫ڄ‬ሻ
(1) (2)
(a)
(b)
MULTIMODAL
FUSION
(3)
(a)
(b)
(4)
࢘
Which? How? When?
INPUT
[Malitesta et al.] 2023. Formalizing Multimedia Recommendation through Multimodal Deep Learning. Under review at TORS. Available online at: arXiv:2309.05273.
Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation
The 31st ACM International Conference on Multimedia (Ottawa, October 29 - November 03, 2023)
࢛
࢏
MODALITIES
࢓૚
࢓૛
࢓૜
. . .
. . .
MULTIMODAL
FEATURE
EXTRACTOR
࣐࢓ሺ‫ڄ‬ሻ
MULTIMODAL
REPRESENTATION
JOINT
ࣆሺ‫ڄ‬ሻ
COORDINATE
ࣆ࢓ ‫ڄ‬
. . .
INFERENCE
࣋ሺ‫ڄ‬ሻ
EARLY
FUSION
ࢽࢋሺ‫ڄ‬ሻ
LATE
FUSION
ࢽ࢒ሺ‫ڄ‬ሻ
(1) (2)
(a)
(b)
MULTIMODAL
FUSION
(3)
(a)
(b)
(4)
࢘
Which? How? When?
INPUT
Despite being the initial stage in the multimodal recommendation pipeline, the extraction of meaningful
multimodal features is paramount in delivering high-quality recommendations [Deldjoo et al.].
5
The multimodal recommendation pipeline
[Deldjoo et al.] 2021. A Study on the Relative Importance of Convolutional Neural Networks in Visually-Aware Recommender Systems. In CVPR Workshops. Computer Vision Foundation / IEEE, 3961–3967.
Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation
The 31st ACM International Conference on Multimedia (Ottawa, October 29 - November 03, 2023)
However, diverse multimodal extraction
procedures are currently used in the literature.
This poses limitations:
• difficult interdependencies across various
multimodal recommendation frameworks 👎
• no shared interfaces among popular
libraries for the extraction of pre-trained
deep learning features 👎
6
Current issues in multimodal feature extraction
Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation
The 31st ACM International Conference on Multimedia (Ottawa, October 29 - November 03, 2023)
We present Ducho!
7
We present Ducho, our unified framework for the extraction of multimodal
features in recommendation.
To this day, Ducho:
ü integrates widely-adopted deep learning libraries (i.e., TensorFlow,
PyTorch, and Transformers) by establishing a shared interface 🙂
ü is useful to extract/process audio, visual, and textual features 😊
ü allows items and user-item interactions [Anelli et al.] as extraction
sources 😃
ü offers an easily configurable extraction pipeline through a YAML-based
file 🤩
Modalities
Sources Backends
Items Interactions TensorFlow PyTorch Transformers
Audio 3 3 3 3
Visual 3 3 3 3
Textual 3 3 3
[Anelli et al.] 2022. Reshaping Graph Recommendation with Edge Graph Collaborative Filtering and Customer Reviews. In DL4SR@CIKM (CEUR Workshop Proceedings, Vol. 3317). CEUR-WS.org.
Architecture
8
Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation
The 31st ACM International Conference on Multimedia (Ottawa, October 29 - November 03, 2023)
9
The overall framework
Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation
The 31st ACM International Conference on Multimedia (Ottawa, October 29 - November 03, 2023)
10
Dataset modules
• Manages the loading and processing of
the input
• A general, shared schema, with three
separate implementations for Audio,
Visual, and Textual datasets
• Image/audio require folder path, text
requires a tsv file
• Two sources for the modalities: items
or user-item interactions
• Handles the pre-processing of data
• Saves the multimodal features into
numpy array format
Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation
The 31st ACM International Conference on Multimedia (Ottawa, October 29 - November 03, 2023)
11
Extractor modules
• Builds an extraction model from a pre-trained
network
• Provides three different implementations for
each modality
• Exposes a wide range of pre-trained models
for the three backends
• The user should indicate the (list of)
extraction layers and the pre-trained model,
following the official naming/indexing scheme
• For the textual modality, the user can indicate
the task the model is pre-trained on (e.g.,
sentiment analysis)
Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation
The 31st ACM International Conference on Multimedia (Ottawa, October 29 - November 03, 2023)
12
Runner
• Orchestrator of Ducho
• Instantiates, calls, and manages all modules
• Triggers the complete extraction pipeline
• Customized through the Configuration
component
• A YAML-based file is used to override
(some of) the default settings
Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation
The 31st ACM International Conference on Multimedia (Ottawa, October 29 - November 03, 2023)
13
Runner (configuration file)
dataset_path: ./local/data/demo1
gpu list: 0
visual:
items:
input_path: images
output_path: visual_embeddings
model: [
{ name: VGG19, output_layers: classifier.3, ...},
{ name: Xception, output_layers: avg_pool, ...},
]
Extraction pipeline
14
Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation
The 31st ACM International Conference on Multimedia (Ottawa, October 29 - November 03, 2023)
15
0. Pipeline configuration
Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation
The 31st ACM International Conference on Multimedia (Ottawa, October 29 - November 03, 2023)
16
1. Load and preprocess step
Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation
The 31st ACM International Conference on Multimedia (Ottawa, October 29 - November 03, 2023)
17
2. Build of the extraction model
Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation
The 31st ACM International Conference on Multimedia (Ottawa, October 29 - November 03, 2023)
18
3. Output save
Ducho as Docker application
19
Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation
The 31st ACM International Conference on Multimedia (Ottawa, October 29 - November 03, 2023)
To fully exploit the GPU-speedup capabilities of the
selected backends, we dockerize Ducho into an out-of-
the-box Docker image, which provides:
ü CUDA 11.8
ü cuDNN 8
ü Ubuntu 22.04
ü Python 3.8
ü Pip
ü All needed Python packages
20
Dockerization of Ducho
Scan me!
Demonstrations
21
Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation
The 31st ACM International Conference on Multimedia (Ottawa, October 29 - November 03, 2023)
Task: fashion recommendation
Input data: fashion data with images (visual)
and item metadata (textual)
Extraction: VGG19 and Xception (visual),
Sentence-BERT pre-trained for semantic
textual similarity (textual)
Output: numpy arrays for both visual and
textual features
22
Demo 1: visual + textual items features
Scan me!
Run it on
Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation
The 31st ACM International Conference on Multimedia (Ottawa, October 29 - November 03, 2023)
Task: song recommendation
Input data: music genres dataset with songs
(audio) and music genre (textual)
Extraction: Hybrid Demucs (audio) and
Sentence-BERT pre-trained for semantic
textual similarity (textual)
Output: audio features may require some
time…
23
Demo 2: audio + textual items features
Scan me!
Run it on
Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation
The 31st ACM International Conference on Multimedia (Ottawa, October 29 - November 03, 2023)
Task: product recommendation
Input data: Amazon recommendation dataset
with reviews (textual interactions) and product
descriptions (textual items)
Extraction: Multilingual BERT-based model
pre-trained on customers’ reviews for the task
of sentiment analysis (textual)
Output: numpy arrays (for the interactions,
they are mapped to the user-item pair)
24
Demo 3: textual items/interactions features
Scan me!
Run it on
Conclusion and future work
25
Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation
The 31st ACM International Conference on Multimedia (Ottawa, October 29 - November 03, 2023)
Conclusion
● Ducho, a unified framework for the extraction of multimodal features in recommendation
● Three main modules: Dataset, Extractor, and Runner
● Multimodal pipeline highly configurable through a YAML-based file
● Dockerization of Ducho into an out-of-the-box application
● Three demonstrations to show all Ducho’s functionalities
Future work
● Adopt all available backends for all modalities
● Implement a general extraction interface to use the same naming/indexing scheme
● Integrate the extraction of low-level features
26
Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation
The 31st ACM International Conference on Multimedia (Ottawa, October 29 - November 03, 2023)
Useful resources
27
Wandering why we called our
framework Ducho?
Check out the Italian TV series
“Boris” 🤓
Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation
The 31st ACM International Conference on Multimedia (Ottawa, October 29 - November 03, 2023)
Don’t forget to check out our theoretical/experimental survey
28

More Related Content

Similar to [MM2023] Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation

Buerger - W3C Media Annotation Working Group @EUscreen Mykonos
Buerger - W3C Media Annotation Working Group @EUscreen MykonosBuerger - W3C Media Annotation Working Group @EUscreen Mykonos
Buerger - W3C Media Annotation Working Group @EUscreen MykonosEUscreen
 
e-Infrastructure Integration-with gCube
e-Infrastructure Integration-with gCubee-Infrastructure Integration-with gCube
e-Infrastructure Integration-with gCubeFAO
 
Summer school bz_fp7research_20100708
Summer school bz_fp7research_20100708Summer school bz_fp7research_20100708
Summer school bz_fp7research_20100708Sandro D'Elia
 
Semantic Media Project Introduction - Mark Sandler (Barbican Arts Centre, Oct...
Semantic Media Project Introduction - Mark Sandler (Barbican Arts Centre, Oct...Semantic Media Project Introduction - Mark Sandler (Barbican Arts Centre, Oct...
Semantic Media Project Introduction - Mark Sandler (Barbican Arts Centre, Oct...sebastianewert
 
Collaborative Science: Technologies & Examples - Cameron Kiddle, Grid Researc...
Collaborative Science: Technologies & Examples - Cameron Kiddle, Grid Researc...Collaborative Science: Technologies & Examples - Cameron Kiddle, Grid Researc...
Collaborative Science: Technologies & Examples - Cameron Kiddle, Grid Researc...Cybera Inc.
 
Working together with SURF Raymond Oonk Annette Langedijk SURF
Working together with SURF Raymond Oonk Annette Langedijk SURFWorking together with SURF Raymond Oonk Annette Langedijk SURF
Working together with SURF Raymond Oonk Annette Langedijk SURFCommunicatieSURF
 
U-Boot community analysis
U-Boot community analysisU-Boot community analysis
U-Boot community analysisxulioc
 
EuroHPC AI in DAPHNE and Text Summarization
EuroHPC AI in DAPHNE and Text SummarizationEuroHPC AI in DAPHNE and Text Summarization
EuroHPC AI in DAPHNE and Text SummarizationUniversity of Maribor
 
On the Navigability of Social Tagging Systems
On the Navigability of Social Tagging SystemsOn the Navigability of Social Tagging Systems
On the Navigability of Social Tagging SystemsMarkus Strohmaier
 
3D ICONS Guidelines and Case Studies, Anthony Corns, Discovery Programme
3D ICONS Guidelines and Case Studies, Anthony Corns, Discovery Programme3D ICONS Guidelines and Case Studies, Anthony Corns, Discovery Programme
3D ICONS Guidelines and Case Studies, Anthony Corns, Discovery Programme3D ICONS Project
 
Cloud computing and bioinformatics
Cloud computing and bioinformaticsCloud computing and bioinformatics
Cloud computing and bioinformaticsEnis Afgan
 
Deep-linking into Media Assets at the Fragment Level SMAM 2013
Deep-linking into Media Assets at the Fragment Level SMAM 2013Deep-linking into Media Assets at the Fragment Level SMAM 2013
Deep-linking into Media Assets at the Fragment Level SMAM 2013Raphael Troncy
 
Mediamixer – Community set-up and networking for the reMIXing of online MEDIA...
Mediamixer – Community set-up and networking for the reMIXing of online MEDIA...Mediamixer – Community set-up and networking for the reMIXing of online MEDIA...
Mediamixer – Community set-up and networking for the reMIXing of online MEDIA...The Open Education Consortium
 
A Distributed Audio Personalization Framework over Android
A Distributed Audio Personalization Framework over AndroidA Distributed Audio Personalization Framework over Android
A Distributed Audio Personalization Framework over AndroidUniversity of Piraeus
 
Repositorio de Datos LAGO
Repositorio de Datos LAGORepositorio de Datos LAGO
Repositorio de Datos LAGORodrigo Torrens
 
OCCIware & Linked Data prototype OW2Con@POSS
OCCIware & Linked Data prototype OW2Con@POSSOCCIware & Linked Data prototype OW2Con@POSS
OCCIware & Linked Data prototype OW2Con@POSSMarc Dutoo
 
D4Science: An e-Infrastructure for Facilitating Fisheries and Aquaculture Re...
D4Science:An e-Infrastructure for Facilitating Fisheries and Aquaculture Re...D4Science:An e-Infrastructure for Facilitating Fisheries and Aquaculture Re...
D4Science: An e-Infrastructure for Facilitating Fisheries and Aquaculture Re...FAO
 

Similar to [MM2023] Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation (20)

Buerger - W3C Media Annotation Working Group @EUscreen Mykonos
Buerger - W3C Media Annotation Working Group @EUscreen MykonosBuerger - W3C Media Annotation Working Group @EUscreen Mykonos
Buerger - W3C Media Annotation Working Group @EUscreen Mykonos
 
e-Infrastructure Integration-with gCube
e-Infrastructure Integration-with gCubee-Infrastructure Integration-with gCube
e-Infrastructure Integration-with gCube
 
Summer school bz_fp7research_20100708
Summer school bz_fp7research_20100708Summer school bz_fp7research_20100708
Summer school bz_fp7research_20100708
 
Harvesting&Metadata Enrich Project EVA 2009
Harvesting&Metadata Enrich Project   EVA 2009Harvesting&Metadata Enrich Project   EVA 2009
Harvesting&Metadata Enrich Project EVA 2009
 
Semantic Media Project Introduction - Mark Sandler (Barbican Arts Centre, Oct...
Semantic Media Project Introduction - Mark Sandler (Barbican Arts Centre, Oct...Semantic Media Project Introduction - Mark Sandler (Barbican Arts Centre, Oct...
Semantic Media Project Introduction - Mark Sandler (Barbican Arts Centre, Oct...
 
EuroHPC AI in DAPHNE
EuroHPC AI in DAPHNEEuroHPC AI in DAPHNE
EuroHPC AI in DAPHNE
 
Collaborative Science: Technologies & Examples - Cameron Kiddle, Grid Researc...
Collaborative Science: Technologies & Examples - Cameron Kiddle, Grid Researc...Collaborative Science: Technologies & Examples - Cameron Kiddle, Grid Researc...
Collaborative Science: Technologies & Examples - Cameron Kiddle, Grid Researc...
 
Working together with SURF Raymond Oonk Annette Langedijk SURF
Working together with SURF Raymond Oonk Annette Langedijk SURFWorking together with SURF Raymond Oonk Annette Langedijk SURF
Working together with SURF Raymond Oonk Annette Langedijk SURF
 
U-Boot community analysis
U-Boot community analysisU-Boot community analysis
U-Boot community analysis
 
H2020-AHTOOLS Use Case 3 Functional Design
H2020-AHTOOLS Use Case 3 Functional DesignH2020-AHTOOLS Use Case 3 Functional Design
H2020-AHTOOLS Use Case 3 Functional Design
 
EuroHPC AI in DAPHNE and Text Summarization
EuroHPC AI in DAPHNE and Text SummarizationEuroHPC AI in DAPHNE and Text Summarization
EuroHPC AI in DAPHNE and Text Summarization
 
On the Navigability of Social Tagging Systems
On the Navigability of Social Tagging SystemsOn the Navigability of Social Tagging Systems
On the Navigability of Social Tagging Systems
 
3D ICONS Guidelines and Case Studies, Anthony Corns, Discovery Programme
3D ICONS Guidelines and Case Studies, Anthony Corns, Discovery Programme3D ICONS Guidelines and Case Studies, Anthony Corns, Discovery Programme
3D ICONS Guidelines and Case Studies, Anthony Corns, Discovery Programme
 
Cloud computing and bioinformatics
Cloud computing and bioinformaticsCloud computing and bioinformatics
Cloud computing and bioinformatics
 
Deep-linking into Media Assets at the Fragment Level SMAM 2013
Deep-linking into Media Assets at the Fragment Level SMAM 2013Deep-linking into Media Assets at the Fragment Level SMAM 2013
Deep-linking into Media Assets at the Fragment Level SMAM 2013
 
Mediamixer – Community set-up and networking for the reMIXing of online MEDIA...
Mediamixer – Community set-up and networking for the reMIXing of online MEDIA...Mediamixer – Community set-up and networking for the reMIXing of online MEDIA...
Mediamixer – Community set-up and networking for the reMIXing of online MEDIA...
 
A Distributed Audio Personalization Framework over Android
A Distributed Audio Personalization Framework over AndroidA Distributed Audio Personalization Framework over Android
A Distributed Audio Personalization Framework over Android
 
Repositorio de Datos LAGO
Repositorio de Datos LAGORepositorio de Datos LAGO
Repositorio de Datos LAGO
 
OCCIware & Linked Data prototype OW2Con@POSS
OCCIware & Linked Data prototype OW2Con@POSSOCCIware & Linked Data prototype OW2Con@POSS
OCCIware & Linked Data prototype OW2Con@POSS
 
D4Science: An e-Infrastructure for Facilitating Fisheries and Aquaculture Re...
D4Science:An e-Infrastructure for Facilitating Fisheries and Aquaculture Re...D4Science:An e-Infrastructure for Facilitating Fisheries and Aquaculture Re...
D4Science: An e-Infrastructure for Facilitating Fisheries and Aquaculture Re...
 

Recently uploaded

SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Analog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAnalog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAbhinavSharma374939
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 

Recently uploaded (20)

SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Analog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAnalog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog Converter
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 

[MM2023] Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation

  • 1. Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation Daniele Malitesta1, Giuseppe Gassi2, Claudio Pomo1, Tommaso Di Noia1 Politecnico di Bari, Bari (Italy) email: firstname.lastname@poliba.it, g.gassi@studenti.poliba.it The 31st ACM International Conference on Multimedia Ottawa, ON, Canada, 11-01-2023 Open Source Track 1 2
  • 2. Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation The 31st ACM International Conference on Multimedia (Ottawa, October 29 - November 03, 2023) ● Introduction and motivations ● Architecture ● Extraction pipeline ● Ducho as Docker application ● Demonstrations ● Conclusion and future work Outline 2
  • 4. Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation The 31st ACM International Conference on Multimedia (Ottawa, October 29 - November 03, 2023) Multimodal-aware recommender systems [Malitesta et al.] exploit multimodal (i.e., audio, visual, textual) content data to augment the representation of items, thus tackling known issues such as dataset sparsity and the inexplicable nature of users’ actions (i.e., views, clicks) on online platforms. 4 Recommendation systems leveraging multimodal data ࢛ ࢏ MODALITIES ࢓૚ ࢓૛ ࢓૜ . . . . . . MULTIMODAL FEATURE EXTRACTOR ࣐࢓ሺ‫ڄ‬ሻ MULTIMODAL REPRESENTATION JOINT ࣆሺ‫ڄ‬ሻ COORDINATE ࣆ࢓ ‫ڄ‬ . . . INFERENCE ࣋ሺ‫ڄ‬ሻ EARLY FUSION ࢽࢋሺ‫ڄ‬ሻ LATE FUSION ࢽ࢒ሺ‫ڄ‬ሻ (1) (2) (a) (b) MULTIMODAL FUSION (3) (a) (b) (4) ࢘ Which? How? When? INPUT [Malitesta et al.] 2023. Formalizing Multimedia Recommendation through Multimodal Deep Learning. Under review at TORS. Available online at: arXiv:2309.05273.
  • 5. Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation The 31st ACM International Conference on Multimedia (Ottawa, October 29 - November 03, 2023) ࢛ ࢏ MODALITIES ࢓૚ ࢓૛ ࢓૜ . . . . . . MULTIMODAL FEATURE EXTRACTOR ࣐࢓ሺ‫ڄ‬ሻ MULTIMODAL REPRESENTATION JOINT ࣆሺ‫ڄ‬ሻ COORDINATE ࣆ࢓ ‫ڄ‬ . . . INFERENCE ࣋ሺ‫ڄ‬ሻ EARLY FUSION ࢽࢋሺ‫ڄ‬ሻ LATE FUSION ࢽ࢒ሺ‫ڄ‬ሻ (1) (2) (a) (b) MULTIMODAL FUSION (3) (a) (b) (4) ࢘ Which? How? When? INPUT Despite being the initial stage in the multimodal recommendation pipeline, the extraction of meaningful multimodal features is paramount in delivering high-quality recommendations [Deldjoo et al.]. 5 The multimodal recommendation pipeline [Deldjoo et al.] 2021. A Study on the Relative Importance of Convolutional Neural Networks in Visually-Aware Recommender Systems. In CVPR Workshops. Computer Vision Foundation / IEEE, 3961–3967.
  • 6. Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation The 31st ACM International Conference on Multimedia (Ottawa, October 29 - November 03, 2023) However, diverse multimodal extraction procedures are currently used in the literature. This poses limitations: • difficult interdependencies across various multimodal recommendation frameworks 👎 • no shared interfaces among popular libraries for the extraction of pre-trained deep learning features 👎 6 Current issues in multimodal feature extraction
  • 7. Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation The 31st ACM International Conference on Multimedia (Ottawa, October 29 - November 03, 2023) We present Ducho! 7 We present Ducho, our unified framework for the extraction of multimodal features in recommendation. To this day, Ducho: ü integrates widely-adopted deep learning libraries (i.e., TensorFlow, PyTorch, and Transformers) by establishing a shared interface 🙂 ü is useful to extract/process audio, visual, and textual features 😊 ü allows items and user-item interactions [Anelli et al.] as extraction sources 😃 ü offers an easily configurable extraction pipeline through a YAML-based file 🤩 Modalities Sources Backends Items Interactions TensorFlow PyTorch Transformers Audio 3 3 3 3 Visual 3 3 3 3 Textual 3 3 3 [Anelli et al.] 2022. Reshaping Graph Recommendation with Edge Graph Collaborative Filtering and Customer Reviews. In DL4SR@CIKM (CEUR Workshop Proceedings, Vol. 3317). CEUR-WS.org.
  • 9. Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation The 31st ACM International Conference on Multimedia (Ottawa, October 29 - November 03, 2023) 9 The overall framework
  • 10. Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation The 31st ACM International Conference on Multimedia (Ottawa, October 29 - November 03, 2023) 10 Dataset modules • Manages the loading and processing of the input • A general, shared schema, with three separate implementations for Audio, Visual, and Textual datasets • Image/audio require folder path, text requires a tsv file • Two sources for the modalities: items or user-item interactions • Handles the pre-processing of data • Saves the multimodal features into numpy array format
  • 11. Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation The 31st ACM International Conference on Multimedia (Ottawa, October 29 - November 03, 2023) 11 Extractor modules • Builds an extraction model from a pre-trained network • Provides three different implementations for each modality • Exposes a wide range of pre-trained models for the three backends • The user should indicate the (list of) extraction layers and the pre-trained model, following the official naming/indexing scheme • For the textual modality, the user can indicate the task the model is pre-trained on (e.g., sentiment analysis)
  • 12. Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation The 31st ACM International Conference on Multimedia (Ottawa, October 29 - November 03, 2023) 12 Runner • Orchestrator of Ducho • Instantiates, calls, and manages all modules • Triggers the complete extraction pipeline • Customized through the Configuration component • A YAML-based file is used to override (some of) the default settings
  • 13. Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation The 31st ACM International Conference on Multimedia (Ottawa, October 29 - November 03, 2023) 13 Runner (configuration file) dataset_path: ./local/data/demo1 gpu list: 0 visual: items: input_path: images output_path: visual_embeddings model: [ { name: VGG19, output_layers: classifier.3, ...}, { name: Xception, output_layers: avg_pool, ...}, ]
  • 15. Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation The 31st ACM International Conference on Multimedia (Ottawa, October 29 - November 03, 2023) 15 0. Pipeline configuration
  • 16. Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation The 31st ACM International Conference on Multimedia (Ottawa, October 29 - November 03, 2023) 16 1. Load and preprocess step
  • 17. Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation The 31st ACM International Conference on Multimedia (Ottawa, October 29 - November 03, 2023) 17 2. Build of the extraction model
  • 18. Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation The 31st ACM International Conference on Multimedia (Ottawa, October 29 - November 03, 2023) 18 3. Output save
  • 19. Ducho as Docker application 19
  • 20. Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation The 31st ACM International Conference on Multimedia (Ottawa, October 29 - November 03, 2023) To fully exploit the GPU-speedup capabilities of the selected backends, we dockerize Ducho into an out-of- the-box Docker image, which provides: ü CUDA 11.8 ü cuDNN 8 ü Ubuntu 22.04 ü Python 3.8 ü Pip ü All needed Python packages 20 Dockerization of Ducho Scan me!
  • 22. Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation The 31st ACM International Conference on Multimedia (Ottawa, October 29 - November 03, 2023) Task: fashion recommendation Input data: fashion data with images (visual) and item metadata (textual) Extraction: VGG19 and Xception (visual), Sentence-BERT pre-trained for semantic textual similarity (textual) Output: numpy arrays for both visual and textual features 22 Demo 1: visual + textual items features Scan me! Run it on
  • 23. Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation The 31st ACM International Conference on Multimedia (Ottawa, October 29 - November 03, 2023) Task: song recommendation Input data: music genres dataset with songs (audio) and music genre (textual) Extraction: Hybrid Demucs (audio) and Sentence-BERT pre-trained for semantic textual similarity (textual) Output: audio features may require some time… 23 Demo 2: audio + textual items features Scan me! Run it on
  • 24. Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation The 31st ACM International Conference on Multimedia (Ottawa, October 29 - November 03, 2023) Task: product recommendation Input data: Amazon recommendation dataset with reviews (textual interactions) and product descriptions (textual items) Extraction: Multilingual BERT-based model pre-trained on customers’ reviews for the task of sentiment analysis (textual) Output: numpy arrays (for the interactions, they are mapped to the user-item pair) 24 Demo 3: textual items/interactions features Scan me! Run it on
  • 26. Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation The 31st ACM International Conference on Multimedia (Ottawa, October 29 - November 03, 2023) Conclusion ● Ducho, a unified framework for the extraction of multimodal features in recommendation ● Three main modules: Dataset, Extractor, and Runner ● Multimodal pipeline highly configurable through a YAML-based file ● Dockerization of Ducho into an out-of-the-box application ● Three demonstrations to show all Ducho’s functionalities Future work ● Adopt all available backends for all modalities ● Implement a general extraction interface to use the same naming/indexing scheme ● Integrate the extraction of low-level features 26
  • 27. Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation The 31st ACM International Conference on Multimedia (Ottawa, October 29 - November 03, 2023) Useful resources 27 Wandering why we called our framework Ducho? Check out the Italian TV series “Boris” 🤓
  • 28. Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation The 31st ACM International Conference on Multimedia (Ottawa, October 29 - November 03, 2023) Don’t forget to check out our theoretical/experimental survey 28