Day 4 Lecture 4
Multimodal
Deep Learning
Xavier Giró-i-Nieto
[course site]
2
Multimedia
Text
Audio
Vision
and ratings, geolocation,
time stamps...
3
Multimedia
Text
Audio
Vision
and ratings, geolocation,
time stamps...
4
5
Language & Vision: Encoder-Decoder
Kyunghyun Cho, “Introduction to Neural Machine Translation with GPUs” (2015)
Representation or
Embedding
Lectures D3L4 & D4L2 by Marta Ruiz
on Neural Machine Translation
6
Language & Vision: Encoder-Decoder
7
Captioning: DeepImageSent
(Slides by Marc Bolaños): Karpathy, Andrej, and Li Fei-Fei. "Deep visual-semantic alignments for generating
image descriptions." CVPR 2015
8
Captioning: DeepImageSent
(Slides by Marc Bolaños): Karpathy, Andrej, and Li Fei-Fei. "Deep visual-semantic alignments for
generating image descriptions." CVPR 2015
only takes into account image features
in the first hidden state
Multimodal Recurrent
Neural Network
9
Captioning: Show & Tell
Vinyals, Oriol, Alexander Toshev, Samy Bengio, and Dumitru Erhan. "Show and tell: A neural image caption
generator." CVPR 2015.
10
Captioning: Show & Tell
Vinyals, Oriol, Alexander Toshev, Samy Bengio, and Dumitru Erhan. "Show and tell: A neural image caption
generator." CVPR 2015.
11
Captioning: Show, Attend & Tell
Xu, Kelvin, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron C. Courville, Ruslan Salakhutdinov, Richard S.
Zemel, and Yoshua Bengio. "Show, Attend and Tell: Neural Image Caption Generation with Visual
Attention." ICML 2015
12
Captioning: Show, Attend & Tell
Xu, Kelvin, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron C. Courville, Ruslan Salakhutdinov, Richard S.
Zemel, and Yoshua Bengio. "Show, Attend and Tell: Neural Image Caption Generation with Visual
Attention." ICML 2015
13
Captioning: LSTM with image & video
Jeffrey Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko,
Trevor Darrel. Long-term Recurrent Convolutional Networks for Visual Recognition and Description, CVPR 2015. code
14
Johnson, Justin, Andrej Karpathy, and Li Fei-Fei. "Densecap: Fully convolutional localization networks for
dense captioning." CVPR 2016
Captioning (+ Detection): DenseCap
15
Captioning (+ Detection): DenseCap
Johnson, Justin, Andrej Karpathy, and Li Fei-Fei. "Densecap: Fully convolutional localization networks for
dense captioning." CVPR 2016
16
Captioning (+ Detection): DenseCap
Johnson, Justin, Andrej Karpathy, and Li Fei-Fei. "Densecap: Fully convolutional localization networks for
dense captioning." CVPR 2016
XAVI: “man has
short hair”, “man
with short hair”
AMAIA:”a woman
wearing a black
shirt”, “
BOTH: “two men
wearing black
glasses”
17
Captioning (+ Retrieval): DenseCap
Johnson, Justin, Andrej Karpathy, and Li Fei-Fei. "Densecap: Fully convolutional localization networks for
dense captioning." CVPR 2016
18
Captioning: HRNE
( Slides by Marc Bolaños) Pingbo Pan, Zhongwen Xu, Yi Yang,Fei Wu,Yueting Zhuang Hierarchical
Recurrent Neural Encoder for Video Representation with Application to Captioning, CVPR 2016.
LSTM unit
(2nd layer)
Time
Image
t = 1 t = T
hidden state
at t = T
first chunk
of data
19
Visual Question Answering
[z1
, z2
, … zN
] [y1
, y2
, … yM
]
“Is economic growth decreasing ?”
“Yes”
Encode
Encode
Decode
20
Extract visual
features
Embedding
Predict answerMerge
Question
What object is flying?
Answer
Kite
Visual Question Answering
Slide credit: Issey Masuda
21
Visual Question Answering
Noh, H., Seo, P. H., & Han, B. Image question answering using convolutional neural network with dynamic parameter
prediction. CVPR 2016
Dynamic Parameter Prediction Network (DPPnet)
22
Visual Question Answering: Dynamic
(Slides and Slidecast by Santi Pascual): Xiong, Caiming, Stephen Merity, and Richard Socher. "Dynamic
Memory Networks for Visual and Textual Question Answering." arXiv preprint arXiv:1603.01417 (2016).
23
Visual Question Answering: Dynamic
(Slides and Slidecast by Santi Pascual): Xiong, Caiming, Stephen Merity, and Richard Socher. "Dynamic Memory Networks
for Visual and Textual Question Answering." ICML 2016.
Main idea: split image into local regions.
Consider each region equivalent to a
sentence.
Local Region Feature Extraction: CNN
(VGG-19):
(1) Rescale input to 448x448.
(2) Take output from last pooling layer →
D=512x14x14 → 196 512-d local region vectors.
Visual feature embedding: W matrix to project
image features to “q”-textual space.
24
Visual Question Answering: Grounded
(Slides and Screencast by Issey Masuda): Zhu, Yuke, Oliver Groth, Michael Bernstein, and Li
Fei-Fei."Visual7W: Grounded Question Answering in Images." CVPR 2016.
25
Challenges: Visual Question Answering
Visual Question Answering
26
100%0%
Humans
83,30%
UC Berkeley
& Sony
66,47%
Baseline
LSTM&CNN
54,06%
Baseline Nearest
neighbor
42,85%
Baseline Prior per
question type
37,47%
Baseline All yes
29,88%
53,62%
I. Masuda-Mora, “Open-Ended Visual Question-Answering”. BSc ETSETB 2016. [Keras]
Challenges: Visual Question Answering
27
Vision and language: Embeddings
Perronin, F., CVPR Tutorial on LSVR @ CVPR’14, Output embedding for LSVR
28
Vision and language: Embeddings
A Krizhevsky, I Sutskever, GE Hinton “Imagenet classification with deep convolutional neural networks” Part of: Advances in
Neural Information Processing Systems 25 (NIPS 2012)
29
Vision and language: Embeddings
Lecture D2L4 by Toni Bonafonte
on Word Embeddings
Christopher Olah
Visualizing Representations
30
Vision and language: Devise
One-hot encoding Embedding space
31
Vision and language: Devise
A Krizhevsky, I Sutskever, GE Hinton “Imagenet classification with deep convolutional neural networks” Part of: Advances in
Neural Information Processing Systems 25 (NIPS 2012)
32
Vision and language: Devise
Frome, Andrea, Greg S. Corrado, Jon Shlens, Samy Bengio, Jeff Dean, and Tomas Mikolov. "Devise: A
deep visual-semantic embedding model." NIPS 2013
33
Vision and language: Embedding
Víctor Campos, Dèlia Fernàndez, Jordi Torres, Xavier Giró-i-Nieto, Brendan Jou and Shih-Fu Chang
(work under progress)
34
Learn more
Julia Hockenmeirer (UIUC), Vision to Language (@ Microsoft Research)
35
Multimedia
Text
Audio
Vision
and ratings, geolocation,
time stamps...
36
Audio and Video: Soundnet
Aytar, Yusuf, Carl Vondrick, and Antonio Torralba. "Soundnet: Learning sound representations from
unlabeled video." In Advances in Neural Information Processing Systems, pp. 892-900. 2016.
Object & Scenes recognition in videos by analysing the audio track (only).
37
Aytar, Yusuf, Carl Vondrick, and Antonio Torralba. "Soundnet: Learning sound representations from
unlabeled video." NIPS 2016.
Videos for training are unlabeled. Relies on CNNs trained on labeled images.
Audio and Video: Soundnet
38
Aytar, Yusuf, Carl Vondrick, and Antonio Torralba. "Soundnet: Learning sound representations from
unlabeled video." NIPS 2016.
Videos for training are unlabeled. Relies on CNNs trained on labeled images.
Audio and Video: Soundnet
39
Aytar, Yusuf, Carl Vondrick, and Antonio Torralba. "Soundnet: Learning sound representations from
unlabeled video." In Advances in Neural Information Processing Systems, pp. 892-900. 2016.
Audio and Video: Soundnet
40
Aytar, Yusuf, Carl Vondrick, and Antonio Torralba. "Soundnet: Learning sound representations from
unlabeled video." NIPS 2016.
Hidden layers of Soundnet are used to train a standard SVM classifier that
outperforms state of the art.
Audio and Video: Soundnet
41
Aytar, Yusuf, Carl Vondrick, and Antonio Torralba. "Soundnet: Learning sound representations from
unlabeled video." NIPS 2016.
Visualization of the 1D filters over raw audio in conv1.
Audio and Video: Soundnet
42
Aytar, Yusuf, Carl Vondrick, and Antonio Torralba. "Soundnet: Learning sound representations from
unlabeled video." NIPS 2016.
Visualization of the 1D filters over raw audio in conv1.
Audio and Video: Soundnet
43
Aytar, Yusuf, Carl Vondrick, and Antonio Torralba. "Soundnet: Learning sound representations from
unlabeled video." NIPS 2016.
Visualization of the 1D filters over raw audio in conv1.
Audio and Video: Soundnet
44
Aytar, Yusuf, Carl Vondrick, and Antonio Torralba. "Soundnet: Learning sound representations from
unlabeled video." In Advances in Neural Information Processing Systems, pp. 892-900. 2016.
Visualization of the video frames associated to the sounds that activate some of the
last hidden units (conv7):
Audio and Video: Soundnet
45
Audio and Video: Sonorizaton
Owens, Andrew, Phillip Isola, Josh McDermott, Antonio Torralba, Edward H. Adelson, and William T.
Freeman. "Visually indicated sounds." CVPR 2016.
Learn synthesized sounds from videos of people hitting objects with a drumstick.
46
Audio and Video: Visual Sounds
Owens, Andrew, Phillip Isola, Josh McDermott, Antonio Torralba, Edward H. Adelson, and William T.
Freeman. "Visually indicated sounds." CVPR 2016.
No
end-to-end
47
Audio and Video: Visual Sounds
Owens, Andrew, Phillip Isola, Josh McDermott, Antonio Torralba, Edward H. Adelson, and William T.
Freeman. "Visually indicated sounds." CVPR 2016.
48
Ephrat, Ariel, and Shmuel Peleg. "Vid2speech: Speech Reconstruction from Silent Video." ICASSP 2017
Speech and Video: Vid2Speech
CNN
(VGG)
Frame from a
slient video
Audio feature
49Ephrat, Ariel, and Shmuel Peleg. "Vid2speech: Speech Reconstruction from Silent Video." ICASSP 2017
Speech and Video: Vid2Speech
No
end-to-end
50
Speech and Video: Vid2Speech
Ephrat, Ariel, and Shmuel Peleg. "Vid2speech: Speech Reconstruction from Silent Video." ICASSP 2017
51
Assael, Yannis M., Brendan Shillingford, Shimon Whiteson, and Nando de Freitas. "LipNet: Sentence-level
Lipreading." arXiv preprint arXiv:1611.01599 (2016).
Speech and Video: LipNet
52
Assael, Yannis M., Brendan Shillingford, Shimon Whiteson, and Nando de Freitas. "LipNet: Sentence-level
Lipreading." arXiv preprint arXiv:1611.01599 (2016).
Speech and Video: LipNet
53
Chung, Joon Son, Andrew Senior, Oriol Vinyals, and Andrew Zisserman. "Lip reading sentences in the
wild." arXiv preprint arXiv:1611.05358 (2016).
Speech and Video: Watch, Listen, Attend & Spell
54
Chung, Joon Son, Andrew Senior, Oriol Vinyals, and Andrew Zisserman. "Lip reading sentences in the
wild." arXiv preprint arXiv:1611.05358 (2016).
Speech and Video: Watch, Listen, Attend & Spell
55
Conclusions
56
Conclusions
57
Conclusions
[course site]
58
Learn more
Ruus Salakhutdinov, “Multimodal Machine Learning” (NIPS 2015 Workshop)
59
Thanks ! Q&A ?
Follow me at
https://imatge.upc.edu/web/people/xavier-giro
@DocXavi
/ProfessorXavi

Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)

  • 1.
    Day 4 Lecture4 Multimodal Deep Learning Xavier Giró-i-Nieto [course site]
  • 2.
  • 3.
  • 4.
  • 5.
    5 Language & Vision:Encoder-Decoder Kyunghyun Cho, “Introduction to Neural Machine Translation with GPUs” (2015) Representation or Embedding Lectures D3L4 & D4L2 by Marta Ruiz on Neural Machine Translation
  • 6.
    6 Language & Vision:Encoder-Decoder
  • 7.
    7 Captioning: DeepImageSent (Slides byMarc Bolaños): Karpathy, Andrej, and Li Fei-Fei. "Deep visual-semantic alignments for generating image descriptions." CVPR 2015
  • 8.
    8 Captioning: DeepImageSent (Slides byMarc Bolaños): Karpathy, Andrej, and Li Fei-Fei. "Deep visual-semantic alignments for generating image descriptions." CVPR 2015 only takes into account image features in the first hidden state Multimodal Recurrent Neural Network
  • 9.
    9 Captioning: Show &Tell Vinyals, Oriol, Alexander Toshev, Samy Bengio, and Dumitru Erhan. "Show and tell: A neural image caption generator." CVPR 2015.
  • 10.
    10 Captioning: Show &Tell Vinyals, Oriol, Alexander Toshev, Samy Bengio, and Dumitru Erhan. "Show and tell: A neural image caption generator." CVPR 2015.
  • 11.
    11 Captioning: Show, Attend& Tell Xu, Kelvin, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron C. Courville, Ruslan Salakhutdinov, Richard S. Zemel, and Yoshua Bengio. "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention." ICML 2015
  • 12.
    12 Captioning: Show, Attend& Tell Xu, Kelvin, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron C. Courville, Ruslan Salakhutdinov, Richard S. Zemel, and Yoshua Bengio. "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention." ICML 2015
  • 13.
    13 Captioning: LSTM withimage & video Jeffrey Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, Trevor Darrel. Long-term Recurrent Convolutional Networks for Visual Recognition and Description, CVPR 2015. code
  • 14.
    14 Johnson, Justin, AndrejKarpathy, and Li Fei-Fei. "Densecap: Fully convolutional localization networks for dense captioning." CVPR 2016 Captioning (+ Detection): DenseCap
  • 15.
    15 Captioning (+ Detection):DenseCap Johnson, Justin, Andrej Karpathy, and Li Fei-Fei. "Densecap: Fully convolutional localization networks for dense captioning." CVPR 2016
  • 16.
    16 Captioning (+ Detection):DenseCap Johnson, Justin, Andrej Karpathy, and Li Fei-Fei. "Densecap: Fully convolutional localization networks for dense captioning." CVPR 2016 XAVI: “man has short hair”, “man with short hair” AMAIA:”a woman wearing a black shirt”, “ BOTH: “two men wearing black glasses”
  • 17.
    17 Captioning (+ Retrieval):DenseCap Johnson, Justin, Andrej Karpathy, and Li Fei-Fei. "Densecap: Fully convolutional localization networks for dense captioning." CVPR 2016
  • 18.
    18 Captioning: HRNE ( Slidesby Marc Bolaños) Pingbo Pan, Zhongwen Xu, Yi Yang,Fei Wu,Yueting Zhuang Hierarchical Recurrent Neural Encoder for Video Representation with Application to Captioning, CVPR 2016. LSTM unit (2nd layer) Time Image t = 1 t = T hidden state at t = T first chunk of data
  • 19.
    19 Visual Question Answering [z1 ,z2 , … zN ] [y1 , y2 , … yM ] “Is economic growth decreasing ?” “Yes” Encode Encode Decode
  • 20.
    20 Extract visual features Embedding Predict answerMerge Question Whatobject is flying? Answer Kite Visual Question Answering Slide credit: Issey Masuda
  • 21.
    21 Visual Question Answering Noh,H., Seo, P. H., & Han, B. Image question answering using convolutional neural network with dynamic parameter prediction. CVPR 2016 Dynamic Parameter Prediction Network (DPPnet)
  • 22.
    22 Visual Question Answering:Dynamic (Slides and Slidecast by Santi Pascual): Xiong, Caiming, Stephen Merity, and Richard Socher. "Dynamic Memory Networks for Visual and Textual Question Answering." arXiv preprint arXiv:1603.01417 (2016).
  • 23.
    23 Visual Question Answering:Dynamic (Slides and Slidecast by Santi Pascual): Xiong, Caiming, Stephen Merity, and Richard Socher. "Dynamic Memory Networks for Visual and Textual Question Answering." ICML 2016. Main idea: split image into local regions. Consider each region equivalent to a sentence. Local Region Feature Extraction: CNN (VGG-19): (1) Rescale input to 448x448. (2) Take output from last pooling layer → D=512x14x14 → 196 512-d local region vectors. Visual feature embedding: W matrix to project image features to “q”-textual space.
  • 24.
    24 Visual Question Answering:Grounded (Slides and Screencast by Issey Masuda): Zhu, Yuke, Oliver Groth, Michael Bernstein, and Li Fei-Fei."Visual7W: Grounded Question Answering in Images." CVPR 2016.
  • 25.
    25 Challenges: Visual QuestionAnswering Visual Question Answering
  • 26.
    26 100%0% Humans 83,30% UC Berkeley & Sony 66,47% Baseline LSTM&CNN 54,06% BaselineNearest neighbor 42,85% Baseline Prior per question type 37,47% Baseline All yes 29,88% 53,62% I. Masuda-Mora, “Open-Ended Visual Question-Answering”. BSc ETSETB 2016. [Keras] Challenges: Visual Question Answering
  • 27.
    27 Vision and language:Embeddings Perronin, F., CVPR Tutorial on LSVR @ CVPR’14, Output embedding for LSVR
  • 28.
    28 Vision and language:Embeddings A Krizhevsky, I Sutskever, GE Hinton “Imagenet classification with deep convolutional neural networks” Part of: Advances in Neural Information Processing Systems 25 (NIPS 2012)
  • 29.
    29 Vision and language:Embeddings Lecture D2L4 by Toni Bonafonte on Word Embeddings Christopher Olah Visualizing Representations
  • 30.
    30 Vision and language:Devise One-hot encoding Embedding space
  • 31.
    31 Vision and language:Devise A Krizhevsky, I Sutskever, GE Hinton “Imagenet classification with deep convolutional neural networks” Part of: Advances in Neural Information Processing Systems 25 (NIPS 2012)
  • 32.
    32 Vision and language:Devise Frome, Andrea, Greg S. Corrado, Jon Shlens, Samy Bengio, Jeff Dean, and Tomas Mikolov. "Devise: A deep visual-semantic embedding model." NIPS 2013
  • 33.
    33 Vision and language:Embedding Víctor Campos, Dèlia Fernàndez, Jordi Torres, Xavier Giró-i-Nieto, Brendan Jou and Shih-Fu Chang (work under progress)
  • 34.
    34 Learn more Julia Hockenmeirer(UIUC), Vision to Language (@ Microsoft Research)
  • 35.
  • 36.
    36 Audio and Video:Soundnet Aytar, Yusuf, Carl Vondrick, and Antonio Torralba. "Soundnet: Learning sound representations from unlabeled video." In Advances in Neural Information Processing Systems, pp. 892-900. 2016. Object & Scenes recognition in videos by analysing the audio track (only).
  • 37.
    37 Aytar, Yusuf, CarlVondrick, and Antonio Torralba. "Soundnet: Learning sound representations from unlabeled video." NIPS 2016. Videos for training are unlabeled. Relies on CNNs trained on labeled images. Audio and Video: Soundnet
  • 38.
    38 Aytar, Yusuf, CarlVondrick, and Antonio Torralba. "Soundnet: Learning sound representations from unlabeled video." NIPS 2016. Videos for training are unlabeled. Relies on CNNs trained on labeled images. Audio and Video: Soundnet
  • 39.
    39 Aytar, Yusuf, CarlVondrick, and Antonio Torralba. "Soundnet: Learning sound representations from unlabeled video." In Advances in Neural Information Processing Systems, pp. 892-900. 2016. Audio and Video: Soundnet
  • 40.
    40 Aytar, Yusuf, CarlVondrick, and Antonio Torralba. "Soundnet: Learning sound representations from unlabeled video." NIPS 2016. Hidden layers of Soundnet are used to train a standard SVM classifier that outperforms state of the art. Audio and Video: Soundnet
  • 41.
    41 Aytar, Yusuf, CarlVondrick, and Antonio Torralba. "Soundnet: Learning sound representations from unlabeled video." NIPS 2016. Visualization of the 1D filters over raw audio in conv1. Audio and Video: Soundnet
  • 42.
    42 Aytar, Yusuf, CarlVondrick, and Antonio Torralba. "Soundnet: Learning sound representations from unlabeled video." NIPS 2016. Visualization of the 1D filters over raw audio in conv1. Audio and Video: Soundnet
  • 43.
    43 Aytar, Yusuf, CarlVondrick, and Antonio Torralba. "Soundnet: Learning sound representations from unlabeled video." NIPS 2016. Visualization of the 1D filters over raw audio in conv1. Audio and Video: Soundnet
  • 44.
    44 Aytar, Yusuf, CarlVondrick, and Antonio Torralba. "Soundnet: Learning sound representations from unlabeled video." In Advances in Neural Information Processing Systems, pp. 892-900. 2016. Visualization of the video frames associated to the sounds that activate some of the last hidden units (conv7): Audio and Video: Soundnet
  • 45.
    45 Audio and Video:Sonorizaton Owens, Andrew, Phillip Isola, Josh McDermott, Antonio Torralba, Edward H. Adelson, and William T. Freeman. "Visually indicated sounds." CVPR 2016. Learn synthesized sounds from videos of people hitting objects with a drumstick.
  • 46.
    46 Audio and Video:Visual Sounds Owens, Andrew, Phillip Isola, Josh McDermott, Antonio Torralba, Edward H. Adelson, and William T. Freeman. "Visually indicated sounds." CVPR 2016. No end-to-end
  • 47.
    47 Audio and Video:Visual Sounds Owens, Andrew, Phillip Isola, Josh McDermott, Antonio Torralba, Edward H. Adelson, and William T. Freeman. "Visually indicated sounds." CVPR 2016.
  • 48.
    48 Ephrat, Ariel, andShmuel Peleg. "Vid2speech: Speech Reconstruction from Silent Video." ICASSP 2017 Speech and Video: Vid2Speech CNN (VGG) Frame from a slient video Audio feature
  • 49.
    49Ephrat, Ariel, andShmuel Peleg. "Vid2speech: Speech Reconstruction from Silent Video." ICASSP 2017 Speech and Video: Vid2Speech No end-to-end
  • 50.
    50 Speech and Video:Vid2Speech Ephrat, Ariel, and Shmuel Peleg. "Vid2speech: Speech Reconstruction from Silent Video." ICASSP 2017
  • 51.
    51 Assael, Yannis M.,Brendan Shillingford, Shimon Whiteson, and Nando de Freitas. "LipNet: Sentence-level Lipreading." arXiv preprint arXiv:1611.01599 (2016). Speech and Video: LipNet
  • 52.
    52 Assael, Yannis M.,Brendan Shillingford, Shimon Whiteson, and Nando de Freitas. "LipNet: Sentence-level Lipreading." arXiv preprint arXiv:1611.01599 (2016). Speech and Video: LipNet
  • 53.
    53 Chung, Joon Son,Andrew Senior, Oriol Vinyals, and Andrew Zisserman. "Lip reading sentences in the wild." arXiv preprint arXiv:1611.05358 (2016). Speech and Video: Watch, Listen, Attend & Spell
  • 54.
    54 Chung, Joon Son,Andrew Senior, Oriol Vinyals, and Andrew Zisserman. "Lip reading sentences in the wild." arXiv preprint arXiv:1611.05358 (2016). Speech and Video: Watch, Listen, Attend & Spell
  • 55.
  • 56.
  • 57.
  • 58.
    58 Learn more Ruus Salakhutdinov,“Multimodal Machine Learning” (NIPS 2015 Workshop)
  • 59.
    59 Thanks ! Q&A? Follow me at https://imatge.upc.edu/web/people/xavier-giro @DocXavi /ProfessorXavi