SlideShare a Scribd company logo
1 of 38
Download to read offline
DEEP NEURAL ENCODING MODELS
OF THE HUMAN VISUAL CORTEX
TO PREDICT FMRI RESPONSES
TO NATURAL VISUAL SCENES
University of Milano-Bicocca
Department of Informatics, Systems and Communication
Master's Degree in Data Science
Academic Year 2022-2023
Master’s Degree Thesis by:
Giorgio Carbone
ID 811974
Supervisor:
Prof. Simone Bianco
Co-supervisor:
Prof. Paolo Napoletano
[ INTRODUCTION ]
2
/ Visual Encoding in Neuroscience
Visual Neural Encoding
▪ humans understand complex visual stimuli
▪ visual information is represented as neural
activations in the visual cortex
▪ neural activations (or responses) ➨ patterns of
measurable electrical activity
Visual Encoding Models [1]
▪ mimic the human visual system
▪ explain natural visual stimulus ⬌ neural activations
relationship
▪ structured system to test biological hypotheses about
the visual pathways Visual
Encoding
model
Neural
responses
Stimulus Brain
scan
Visual
cortex
3
[1] Naselaris et al. (2011). Encoding and decoding in fMRI. NeuroImage 56.
/ The Algonauts Project 2023: Challenge and Dataset
Algonauts 2023 Challenge goals:
▪ promote artificial intelligence and computational
neuroscience interdisciplinary research
▪ develop cutting-edge image-fMRI encoding models of
the visual brain
Natural Scene Dataset [2] :
▪ fMRI responses to ~73,000 images from MS COCO
▪ each of the eight subjects was shown: ~9000-10000
training images and ~150-400 test images
▪ measured the fMRI activity in the 39,548 voxels of the
visual cortex
▪ betas ➨ single value response estimates
▪ functional Region of Interest (ROI) label for each voxel
4
[2] Allen et al. (2021). A massive 7T fMRI dataset to bridge cognitive neuroscience and artificial intelligence. Nature Neuroscience 25.
2
-2 0
fMRI betas for the
39,548 voxels
Stimulus
images
LH RH
Early
retinotopic
ROIs
Body
selective
ROIs
Face
selective
ROIs
Place
selective
ROIs
Word
selective
ROIs
Functional classes of regions of interest (ROIs)
LH RH
/ Evaluation Metric
5
Median Noise Normalized Squared Corellation (MNNSC) ➨ voxel-wise accuracy metric across 𝑁 voxels
➨ : Noise Ceiling for voxel
predicted responses - true responses
squared Pearson’s correlation for voxel
➨ :
Test Images
Measured
Responses
Voxel-wise true betas
vectors
fMRI
scan
Squared
Person’s
correlation
Median
MNNSC
S1
S8
Predicted
responses
S1 S8
Voxel-wise predictions
vectors
S1 S8
Visual
encoder
/ Research Goals
Main goal:
develop subject-specific image-fMRI encoders of
the visual cortices of the eight subjects
▪ based on deep neural networks and transfer
learning
▪ characterised by high stimulus compatibility
▪ mappability
▪ high predictivity across the entire visual cortex
6
Research Questions:
1. how well can variations in neural activity be
predicted given the stimulus that evoked them?
2. how relevant are the visual features extracted
from pre-trained DNNs for the neural encoding
task?
3. is there a similarity between the visual processing
in the DNNs and the visual cortex?
[ METHODS ]
7
/ A Two-Step Voxel-Based Deep Visual Encoder
1. Non-Linear Feature Mapping
using a pre-trained DNN backbone
Bird
Cow
Face
Ship
Low-level
visual
features
Mid-level
visual
features
High-level
visual
features
Output layer(s) selection
Flattened and concateneted feature maps
Input
image
Visual
features
8
Predicted
response for
voxel 𝒗
/ A Two-Step Voxel-Based Deep Visual Encoder
1. Non-Linear Feature Mapping
using a pre-trained DNN backbone
Bird
Cow
Face
Ship
Low-level
visual
features
Mid-level
visual
features
High-level
visual
features
Output layer(s) selection
Flattened and concateneted feature maps
Dimensionality
reduction
Voxel-based
linear
regression
2. Linear Activity Mapping
Input
image
Visual
features
8
/ Activity Mapping Methods
9
Goal
▪ find the activity mapping method that maximises the
10-fold cross-validation accuracy on Subject 1
▪ feature mapping ➨ pre-trained AlexNet
Dimensionality reduction ➨ 300-components Incremental PCA
Linear regression
▪ Ordinary Least Squares (OLS)
▪ Ridge Regression with optimization of the α parameter
Non-linear regression
▪ Regression Trees (RTs)
▪ Support Vector Regression (SVR)
Regression Model MNNSC
on Subject 1
Linear: OLS Regression 0.35
Linear: Ridge Regression 0.45
Non-linear: RTs 0.15
Non-linear: SVR 0.08
/ Feature Mapping Methods
10
Goals:
1. find the overall and ROI-wise best-performing
feature mapping methods on Subject 1
2. compare pre-trained DNNs with:
▪ different architectures and depths
▪ different training parameters (learning tasks,
learning methods and datasets)
▪ output layer(s) at varying depths
3. test a fused features approach
Architecture Learning task/method Dataset
AlexNet Image classification ImageNet-1K
ZFNet Image classification ImageNet-1K
VGG-16/19 Image classification ImageNet-1K
EfficientNet-B2 Image classification ImageNet-1K
ResNet-50 Image classification ImageNet-1K
ResNet-50
(DINOv1) [3]
Self-supervised ImageNet-1K
RetinaNet Object detection MS COCO
Architecture Learning task/method Dataset
ViT-S/14 (DINOv2) Self-supervised LVD-142M
ViT-B/14 (DINOv2) Self-supervised LVD-142M
ViT-L/14 (DINOv2) Self-supervised LVD-142M
ViT-B/16-GPT2 Image captioning MS COCO
Pre-trained Convolutional Neural Networks (CNNs)
Pre-trained Vision Transformers (ViTs)
[3] M. Caron et al. (2021). Emerging Properties in Self-Supervised Vision Transformers. IEEE/CVF ICCV.
11
ResNet-50
ViT-L/14
(DINOv2)
(a) (b)
Contribution
Rate
(%)
to
the
Highest
Voxel-wise
Accuracy
Contribution
Rate
(%)
to
the
Highest
Voxel-wise
Accuracy
Layer Index
Layer Index
ROI-wise
(MNNSC)
Accuracy
Similarity between DNNs and the human visual cortex: features extraction from output layers at increasing depths.
(a) contribution rate (%) to the highest voxel-wise accuracy for each ROI class
(b) ROI-wise MNNSC for each ROI class
Layer Index
Layer Index
ROI-wise
(MNNSC)
Accuracy
Early
Vis.
ROIs
Body
Sel.
ROIs
Face
Sel.
ROIs
Place
Sel.
ROIs
Word
Sel.
ROIs
Early
Vis.
ROIs
Body
Sel.
ROIs
Face
Sel.
ROIs
Place
Sel.
ROIs
Word
Sel.
ROIs
Early Vis. ROIs
Body Sel. ROIs
Face Sel. ROIs
Place Sel. ROIs
Word Sel. ROIs
Early Vis. ROIs
Body Sel. ROIs
Face Sel. ROIs
Place Sel. ROIs
Word Sel. ROIs
Image pre-
processing
Voxel-based
Ridge (α)
regression
ROI 1 voxels
mask
Output layer(s) selection
Pre-trained feature extractor
PCA
𝒏 comp.
Visual
features
Output layer(s) selection
Pre-trained feature extractor
PCA
𝒏 comp.
Visual
features
Voxel-based
Ridge (α)
regression
Image pre-
processing
ROI 𝐽 voxels
mask
ROI 𝑗
ROI 𝐽
ROI 1 ROI 1
voxels
responses
All voxels
responses
ROI 𝐽
voxels
responses
12
/ A Mixed and ROI-wise Encoding Model
Proposed architecture: a mixed (multi-layer and multi-network) subject-specific encoding model
[ RESULTS ]
13
14
All subjects
MNNSC: 0.62
Subj 8
MNNSC: 0.60
Subj 7
MNNSC: 0.57
Subj 6
MNNSC: 0.54
Subj 5
MNNSC: 0.65
Subj 4
MNNSC: 0.68
Subj 3
MNNSC: 0.65
Subj 1
MNNSC: 0.64
Subj 2
MNNSC: 0.66
Voxel-wise Noise Normalized
Squared Correlation (NNSC)
/ Best ROI-wise Encoder: All Subjects Cross-Validation
(a) (b)
(a) distributions of the voxel-wise accuracies (NNSC) across all subjects conditioned to the hemisphere and the ROI
(b) voxel-wise prediction accuracies (NNSC) across all subjects visualized on a common cortical surface
Early
Retinotopic Visual
ROIs
Body-
selective
ROIs
Face-
selective
ROIs
Place-
selective
ROIs
Word-
selective
ROIs
Voxel-wise
Noise
Normalized
Squared
Correlation
(NNSC)
Left hemisphere
Right hemisphere
ResNet-50 (DINOv1)
RetinaNet
ViT-L/14 (DINOv2)
ViT-B/16-GPT2
15
/ Best ROI-wise Encoder: Test Set Performance
0
20
40
60
80
100
Median
Noise
Normalized
Squared
Correlation
(MNNSC)
Early
visual
ROIs
All
voxels
Body
sel.
ROIs
Face
sel.
ROIs
Place
sel.
ROIs
Word
sel.
ROIs
Median
Noise
Normalized
Squared
Correlation
(MNNSC)
0
20
40
60
80
100
Early
visual
ROIs
All
voxels
Body
sel.
ROIs
Face
sel.
ROIs
Place
sel.
ROIs
Word
sel.
ROIs
Subject
1
2
3
4
5
6
7
8
Proposed ROI-wise encoding model Baseline encoding model (AlexNet-based, not ROI-wise)
All Subjects Subj. 1 Subj. 2 Subj. 3 Subj. 4 Subj. 5 Subj. 6 Subj. 7 Subj. 8
Proposed ROI-wise Encoder 0.52 0.53 0.51 0.56 0.54 0.50 0.59 0.40 0.57
Baseline Encoder 0.41 0.39 0.39 0.47 0.42 0.37 0.44 0.32 0.47
State-of-the -art Pure Neural
Encoder [4]
0.64
[4] Adeli et al. (2023). Predicting brain activity using transformers. digital preprint, bioRxiv.
Overall and subject-specific MNNSC
[ CONCLUSIONS AND FUTURE WORK ]
16
/ Conclusions and Future Work
Conclusions:
▪ effectiveness of transfer learning-based image-
fMRI encoding
▪ generalizability of visual features extracted
from computer vision models, particularly
those pre-trained in a self-supervised
manner
▪ functional alignment between DNNs and the
human visual cortex
▪ a mixed (multi-layer and multi-network) and
independent encoding of each ROI
guarantees mappability and high predictivity
over the entire visual cortex
17
Future Work:
• apply a voxel-wise encoding optimization strategy
to voxels that exhibit poor performance
• implement auxiliary input data: physiological data,
eye tracking data and COCO annotations
• develop a pure neural encoder trained in an end-
to-end way for the image-fMRI task
{ Thank you for your attention }
University of Milano-Bicocca
Department of Informatics, Systems and Communication
Master's Degree in Data Science
Academic Year 2022-2023
Master’s Degree Thesis by:
Giorgio Carbone
ID 811974
{ Thank you for your attention }
Supervisor:
Prof. Simone Bianco
Co-supervisor:
Prof. Paolo Napoletano
/ Bibliography
[1 ] Naselaris T, Kay KN, Nishimoto S, Gallant JL. (2011). Encoding and decoding in fMRI. NeuroImage (56).
[2] Allen, E.J., St-Yves, G., Wu, Y. et al. (2021). A massive 7T fMRI dataset to bridge cognitive neuroscience and artificial intelligence.
Nature Neuroscience.
[3] M. Caron et al. (2021). Emerging Properties in Self-Supervised Vision Transformers. IEEE/CVF ICCV.
[4] H. Adeli, S. Minni, and N. Kriegeskorte. (2023). Predicting brain activity using transformers. Preprint at bioRxiv.
[5] Gifford, A. T., Lahner, B., Saba-Sadiya, S., Vilas, M. G., Lascelles, A., Oliva, A., Kay, K., Roig, G., & Cichy, R. M. (2023).
The Algonauts Project 2023 Challenge: How the Human Brain Makes Sense of Natural Scenes. Preprint at arXiv.
[6] Yamins, D. L. K., & DiCarlo, J. J. (2016). Using goal-driven deep learning models to understand sensory cortex. Nature
Neuroscience, 19(3), Article 3.
[7] Dwivedi, K., Bonner, M. F., Cichy, R. M., & Roig, G. (2021). Unveiling functions of the visual cortex using task-specific deep neural
networks. PLOS Computational Biology, 17(8).
/ Natural Scene Dataset: Details
Distribution of the Algonauts Project 2023 Challenge dataset
images in the full training and test sets across the eight subjects,
and in the training and validation subsets defined in the 10-fold
cross-validation phase:
Number of vertices composing the cortical challenge surface and
the cortical fsaverage surface, considering the right and left
hemispheres of the eight subjects:
Lists of the ROIs belonging to each functional ROI class:
• Early retinotopic visual regions: V1v, V1d, V2v, V2d, V3v, V3d, hV4
(V4).
• Body-selective regions: EBA, FBA-1, FBA-2, mTL-bodies.
• Face-selective regions: OFA, FFA-1, FFA-2, mTL-faces, aTL-faces.
• Place-selective regions: OPA, PPA, RSC.
• Word-selective regions: OWFA, VWFA-1, VWFA-2, mfs-words, mTL-
words.
/ Evaluation Metric: Details
Median Noise-Normalized Squared Correlation (MNNSC) over N voxels:
Voxel-wise Pearson’s correlation between the voxel-wise vector of the predicted (P) responses for the voxel v and the ground truth (G) voxel-
wise vector (t is the index of the stimulus image):
Noise Ceiling for voxel v from the corresponding noise ceiling signal-to-noise ratio (considering the responses to m images, of which A responses
are averaged over three trials, B over two trials, and C over one trial):
(4) Noise Ceiling (NC) and (5) noise ceiling signal-to-noise ratio (ncsnr) formal definitions:
(1)
(2)
(3)
(4) (5)
/ Non-Linear Activity Mapping Methods: Details
Supervised Regression Trees (RTs) learning approach, tested and
chosen parameters:
• Split criterion: Mean Squared Error (MSE)
• maximum depth of the tree [5, 10, 15] ➨ 5
• minimum number of samples required to split an internal
node [2,3] ➨ 2
• minimum number of samples needed to define a node as
a leaf node [1,2] ➨ 1
• number of features considered when searching for the
best split ➨ number of PCA components
Support Vector Regression (SVM) learning approach, chosen
parameters:
• tube width ε (maximum distance between predicted and
true values within which a penalty on the loss function is
not generated) ➨ 0.1
• regularization parameter C (high values lead to more
accurate fits on the training data but increase the
sensitivity of the model to noise) ➨ 1.0
• kernel ➨ Radial Basis Function (RBF)
• Gaussian kernel
• gamma parameter (how far the influence of
individual training examples can reach) ➨ 1 /
number of PCA components
/ Feature Mapping Methods: Details
Summary of the properties of the pre-trained models used as
feature extractors:
Summary of the different sets of image pre-processing steps
applied to image inputs:
/ Comparing Fused Feature and Single Layer Approaches
Single Layer
Fused Features
Single Layer
Fused Features Single Layer
Fused Features
(a) (b) (c)
(d) (e) (f)
Comparison of the voxel-wise prediction accuracy for Subject 1, between an encoding model based on fused feature mapping (ViT-S/14
(DINOv2) 5+6+7), and encoding models using a single feature layer approach (ViT-S/14 (DINOv2) with output layers 5, 6 or 7):
• randomized permutation test to determine a minimum threshold MNNSC value significantly different from zero ➨ 0.19 (p < 0.001)
• (a, b, c): NNSC (abscissae) of the single feature models and NNSC (ordinates) of the 5+6+7 fused feature encoding model
• (d, e, f): distributions of voxel-wise differences between the accuracy of the fused feature model and the single feature layer models
/ Comparing CNNs Pre-Trained with Different Training Tasks and Learning Methods
Comparison of the voxel-wise prediction accuracy for Subject 1 between the two best configurations of the ROI-wise mixed encoding
models based respectively on the:
• (a) pre-trained ResNet-50 (self-supervised DINOv1, ImageNet-1K) and ResNet-50 (image classification, ImageNet-1K) models
• (b) pre-trained RetinaNet (object detection, MS COCO) and ResNet-50 (image classification, ImageNet-1K) models
• (c) pre-trained RetinaNet (object detection, MS COCO) and ResNet-50 (self-supervised DINOv1, ImageNet-1K) models
(a) (b) (c)
/ Best ROI-wise Encoder: Cross-Validation Details
Overall and functional ROI class-specific 10-fold cross-validation
accuracies (MMNSC) considering the voxels of all subjects:
ROI-specific 10-fold cross-validation accuracies (MMNSC)
considering the voxels of all subjects:
/ Best ROI-wise and Baseline Models: Cross-Validation
Proposed ROI-wise and mixed encoding model
Baseline encoding model
/ Best Subject 1 and 2 Encoders: Cross-Validation Details
S1
S2
/ Best Subject 3 and 4 Encoders: Cross-Validation Details
S3
S4
/ Best Subject 5 and 6 Encoders: Cross-Validation Details
S5
S6
/ Best Subject 7 and 8 Encoders: Cross-Validation Details
S7
S8
/ DNNs/Visual Cortex Similarity: AlexNet - ZFNet
AlexNet ZFNet
/ DNNs/Visual Cortex Similarity: VGG-19
VGG-19 VGG-19 (BN)
/ DNNs/Visual Cortex Similarity: ResNet-50
ResNet-50 (image classification) ResNet-50 (DINOv1)
/ DNNs/Visual Cortex Similarity: RetinaNet – EfficientNet-B2
RetinaNet EfficientNet-B2
/ DNNs/Visual Cortex Similarity: ViT-S/14 - ViT-B/14 (DINOv2)
ViT-S/14 (DINOv2) ViT-B/14 (DINOv2)
/ DNNs/Visual Cortex: ViT-L/14 (DINOv2) - ViT-B/16-GPT2
ViT-L/14 (DINOv2)
ViT-B/16-GPT2

More Related Content

Similar to Master's Thesis - Data Science - Presentation

YOLO BASED SHIP IMAGE DETECTION AND CLASSIFICATION
YOLO BASED SHIP IMAGE DETECTION AND CLASSIFICATIONYOLO BASED SHIP IMAGE DETECTION AND CLASSIFICATION
YOLO BASED SHIP IMAGE DETECTION AND CLASSIFICATIONIRJET Journal
 
Review of Pose Recognition Systems
Review of Pose Recognition SystemsReview of Pose Recognition Systems
Review of Pose Recognition Systemsvivatechijri
 
Scene recognition using Convolutional Neural Network
Scene recognition using Convolutional Neural NetworkScene recognition using Convolutional Neural Network
Scene recognition using Convolutional Neural NetworkDhirajGidde
 
Object Detection with Computer Vision
Object Detection with Computer VisionObject Detection with Computer Vision
Object Detection with Computer VisionIRJET Journal
 
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...CSCJournals
 
IRJET - Visual Question Answering – Implementation using Keras
IRJET -  	  Visual Question Answering – Implementation using KerasIRJET -  	  Visual Question Answering – Implementation using Keras
IRJET - Visual Question Answering – Implementation using KerasIRJET Journal
 
VIDEO BASED SIGN LANGUAGE RECOGNITION USING CNN-LSTM
VIDEO BASED SIGN LANGUAGE RECOGNITION USING CNN-LSTMVIDEO BASED SIGN LANGUAGE RECOGNITION USING CNN-LSTM
VIDEO BASED SIGN LANGUAGE RECOGNITION USING CNN-LSTMIRJET Journal
 
TOP 5 Most View Article From Academia in 2019
TOP 5 Most View Article From Academia in 2019TOP 5 Most View Article From Academia in 2019
TOP 5 Most View Article From Academia in 2019sipij
 
REVIEW ON OBJECT DETECTION WITH CNN
REVIEW ON OBJECT DETECTION WITH CNNREVIEW ON OBJECT DETECTION WITH CNN
REVIEW ON OBJECT DETECTION WITH CNNIRJET Journal
 
Artificial intelligence NEURAL NETWORKS
Artificial intelligence NEURAL NETWORKSArtificial intelligence NEURAL NETWORKS
Artificial intelligence NEURAL NETWORKSREHMAT ULLAH
 
IRJET- A Real Time Yolo Human Detection in Flood Affected Areas based on Vide...
IRJET- A Real Time Yolo Human Detection in Flood Affected Areas based on Vide...IRJET- A Real Time Yolo Human Detection in Flood Affected Areas based on Vide...
IRJET- A Real Time Yolo Human Detection in Flood Affected Areas based on Vide...IRJET Journal
 
New Research Articles 2019 October Issue Signal & Image Processing An Interna...
New Research Articles 2019 October Issue Signal & Image Processing An Interna...New Research Articles 2019 October Issue Signal & Image Processing An Interna...
New Research Articles 2019 October Issue Signal & Image Processing An Interna...sipij
 
Robust face recognition using convolutional neural networks combined with Kr...
Robust face recognition using convolutional neural networks  combined with Kr...Robust face recognition using convolutional neural networks  combined with Kr...
Robust face recognition using convolutional neural networks combined with Kr...IJECEIAES
 
Attention correlated appearance and motion feature followed temporal learning...
Attention correlated appearance and motion feature followed temporal learning...Attention correlated appearance and motion feature followed temporal learning...
Attention correlated appearance and motion feature followed temporal learning...IJECEIAES
 
BRAINREGION.pptx
BRAINREGION.pptxBRAINREGION.pptx
BRAINREGION.pptxVISHALAS9
 
Semantic segmentation with Convolutional Neural Network Approaches
Semantic segmentation with Convolutional Neural Network ApproachesSemantic segmentation with Convolutional Neural Network Approaches
Semantic segmentation with Convolutional Neural Network ApproachesFellowship at Vodafone FutureLab
 
Big Data Intelligence: from Correlation Discovery to Causal Reasoning
Big Data Intelligence: from Correlation Discovery to Causal Reasoning Big Data Intelligence: from Correlation Discovery to Causal Reasoning
Big Data Intelligence: from Correlation Discovery to Causal Reasoning Wanjin Yu
 
UNSUPERVISED LEARNING MODELS OF INVARIANT FEATURES IN IMAGES: RECENT DEVELOPM...
UNSUPERVISED LEARNING MODELS OF INVARIANT FEATURES IN IMAGES: RECENT DEVELOPM...UNSUPERVISED LEARNING MODELS OF INVARIANT FEATURES IN IMAGES: RECENT DEVELOPM...
UNSUPERVISED LEARNING MODELS OF INVARIANT FEATURES IN IMAGES: RECENT DEVELOPM...ijscai
 
UNSUPERVISED LEARNING MODELS OF INVARIANT FEATURES IN IMAGES: RECENT DEVELOPM...
UNSUPERVISED LEARNING MODELS OF INVARIANT FEATURES IN IMAGES: RECENT DEVELOPM...UNSUPERVISED LEARNING MODELS OF INVARIANT FEATURES IN IMAGES: RECENT DEVELOPM...
UNSUPERVISED LEARNING MODELS OF INVARIANT FEATURES IN IMAGES: RECENT DEVELOPM...ijscai
 
Unsupervised learning models of invariant features in images: Recent developm...
Unsupervised learning models of invariant features in images: Recent developm...Unsupervised learning models of invariant features in images: Recent developm...
Unsupervised learning models of invariant features in images: Recent developm...IJSCAI Journal
 

Similar to Master's Thesis - Data Science - Presentation (20)

YOLO BASED SHIP IMAGE DETECTION AND CLASSIFICATION
YOLO BASED SHIP IMAGE DETECTION AND CLASSIFICATIONYOLO BASED SHIP IMAGE DETECTION AND CLASSIFICATION
YOLO BASED SHIP IMAGE DETECTION AND CLASSIFICATION
 
Review of Pose Recognition Systems
Review of Pose Recognition SystemsReview of Pose Recognition Systems
Review of Pose Recognition Systems
 
Scene recognition using Convolutional Neural Network
Scene recognition using Convolutional Neural NetworkScene recognition using Convolutional Neural Network
Scene recognition using Convolutional Neural Network
 
Object Detection with Computer Vision
Object Detection with Computer VisionObject Detection with Computer Vision
Object Detection with Computer Vision
 
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...
 
IRJET - Visual Question Answering – Implementation using Keras
IRJET -  	  Visual Question Answering – Implementation using KerasIRJET -  	  Visual Question Answering – Implementation using Keras
IRJET - Visual Question Answering – Implementation using Keras
 
VIDEO BASED SIGN LANGUAGE RECOGNITION USING CNN-LSTM
VIDEO BASED SIGN LANGUAGE RECOGNITION USING CNN-LSTMVIDEO BASED SIGN LANGUAGE RECOGNITION USING CNN-LSTM
VIDEO BASED SIGN LANGUAGE RECOGNITION USING CNN-LSTM
 
TOP 5 Most View Article From Academia in 2019
TOP 5 Most View Article From Academia in 2019TOP 5 Most View Article From Academia in 2019
TOP 5 Most View Article From Academia in 2019
 
REVIEW ON OBJECT DETECTION WITH CNN
REVIEW ON OBJECT DETECTION WITH CNNREVIEW ON OBJECT DETECTION WITH CNN
REVIEW ON OBJECT DETECTION WITH CNN
 
Artificial intelligence NEURAL NETWORKS
Artificial intelligence NEURAL NETWORKSArtificial intelligence NEURAL NETWORKS
Artificial intelligence NEURAL NETWORKS
 
IRJET- A Real Time Yolo Human Detection in Flood Affected Areas based on Vide...
IRJET- A Real Time Yolo Human Detection in Flood Affected Areas based on Vide...IRJET- A Real Time Yolo Human Detection in Flood Affected Areas based on Vide...
IRJET- A Real Time Yolo Human Detection in Flood Affected Areas based on Vide...
 
New Research Articles 2019 October Issue Signal & Image Processing An Interna...
New Research Articles 2019 October Issue Signal & Image Processing An Interna...New Research Articles 2019 October Issue Signal & Image Processing An Interna...
New Research Articles 2019 October Issue Signal & Image Processing An Interna...
 
Robust face recognition using convolutional neural networks combined with Kr...
Robust face recognition using convolutional neural networks  combined with Kr...Robust face recognition using convolutional neural networks  combined with Kr...
Robust face recognition using convolutional neural networks combined with Kr...
 
Attention correlated appearance and motion feature followed temporal learning...
Attention correlated appearance and motion feature followed temporal learning...Attention correlated appearance and motion feature followed temporal learning...
Attention correlated appearance and motion feature followed temporal learning...
 
BRAINREGION.pptx
BRAINREGION.pptxBRAINREGION.pptx
BRAINREGION.pptx
 
Semantic segmentation with Convolutional Neural Network Approaches
Semantic segmentation with Convolutional Neural Network ApproachesSemantic segmentation with Convolutional Neural Network Approaches
Semantic segmentation with Convolutional Neural Network Approaches
 
Big Data Intelligence: from Correlation Discovery to Causal Reasoning
Big Data Intelligence: from Correlation Discovery to Causal Reasoning Big Data Intelligence: from Correlation Discovery to Causal Reasoning
Big Data Intelligence: from Correlation Discovery to Causal Reasoning
 
UNSUPERVISED LEARNING MODELS OF INVARIANT FEATURES IN IMAGES: RECENT DEVELOPM...
UNSUPERVISED LEARNING MODELS OF INVARIANT FEATURES IN IMAGES: RECENT DEVELOPM...UNSUPERVISED LEARNING MODELS OF INVARIANT FEATURES IN IMAGES: RECENT DEVELOPM...
UNSUPERVISED LEARNING MODELS OF INVARIANT FEATURES IN IMAGES: RECENT DEVELOPM...
 
UNSUPERVISED LEARNING MODELS OF INVARIANT FEATURES IN IMAGES: RECENT DEVELOPM...
UNSUPERVISED LEARNING MODELS OF INVARIANT FEATURES IN IMAGES: RECENT DEVELOPM...UNSUPERVISED LEARNING MODELS OF INVARIANT FEATURES IN IMAGES: RECENT DEVELOPM...
UNSUPERVISED LEARNING MODELS OF INVARIANT FEATURES IN IMAGES: RECENT DEVELOPM...
 
Unsupervised learning models of invariant features in images: Recent developm...
Unsupervised learning models of invariant features in images: Recent developm...Unsupervised learning models of invariant features in images: Recent developm...
Unsupervised learning models of invariant features in images: Recent developm...
 

More from Giorgio Carbone

Electricity Consumption Forecasting Using Arima, UCM, Machine Learning and De...
Electricity Consumption Forecasting Using Arima, UCM, Machine Learning and De...Electricity Consumption Forecasting Using Arima, UCM, Machine Learning and De...
Electricity Consumption Forecasting Using Arima, UCM, Machine Learning and De...Giorgio Carbone
 
Identification Of Alzheimer's Disease Using A Deep Learning Method Based O...
Identification Of  Alzheimer's Disease Using A  Deep Learning Method Based  O...Identification Of  Alzheimer's Disease Using A  Deep Learning Method Based  O...
Identification Of Alzheimer's Disease Using A Deep Learning Method Based O...Giorgio Carbone
 
Milano Air Quality: Interactive Data Visualization
Milano Air Quality: Interactive Data VisualizationMilano Air Quality: Interactive Data Visualization
Milano Air Quality: Interactive Data VisualizationGiorgio Carbone
 
Competitive Pokémon Graph Database
Competitive Pokémon Graph DatabaseCompetitive Pokémon Graph Database
Competitive Pokémon Graph DatabaseGiorgio Carbone
 
Video Classification: Human Action Recognition on HMDB-51 dataset
Video Classification: Human Action Recognition on HMDB-51 datasetVideo Classification: Human Action Recognition on HMDB-51 dataset
Video Classification: Human Action Recognition on HMDB-51 datasetGiorgio Carbone
 
Word Embedding (Word2Vec and CADE): the evolution of tópoi in the Italian lit...
Word Embedding (Word2Vec and CADE): the evolution of tópoi in the Italian lit...Word Embedding (Word2Vec and CADE): the evolution of tópoi in the Italian lit...
Word Embedding (Word2Vec and CADE): the evolution of tópoi in the Italian lit...Giorgio Carbone
 
Extreme Extractive Text Summarization and Topic Modeling (using LSA and LDA t...
Extreme Extractive Text Summarization and Topic Modeling (using LSA and LDA t...Extreme Extractive Text Summarization and Topic Modeling (using LSA and LDA t...
Extreme Extractive Text Summarization and Topic Modeling (using LSA and LDA t...Giorgio Carbone
 
CXR-ACGAN: Auxiliary Classifier GAN for Conditional Generation of Chest X-Ray...
CXR-ACGAN: Auxiliary Classifier GAN for Conditional Generation of Chest X-Ray...CXR-ACGAN: Auxiliary Classifier GAN for Conditional Generation of Chest X-Ray...
CXR-ACGAN: Auxiliary Classifier GAN for Conditional Generation of Chest X-Ray...Giorgio Carbone
 

More from Giorgio Carbone (8)

Electricity Consumption Forecasting Using Arima, UCM, Machine Learning and De...
Electricity Consumption Forecasting Using Arima, UCM, Machine Learning and De...Electricity Consumption Forecasting Using Arima, UCM, Machine Learning and De...
Electricity Consumption Forecasting Using Arima, UCM, Machine Learning and De...
 
Identification Of Alzheimer's Disease Using A Deep Learning Method Based O...
Identification Of  Alzheimer's Disease Using A  Deep Learning Method Based  O...Identification Of  Alzheimer's Disease Using A  Deep Learning Method Based  O...
Identification Of Alzheimer's Disease Using A Deep Learning Method Based O...
 
Milano Air Quality: Interactive Data Visualization
Milano Air Quality: Interactive Data VisualizationMilano Air Quality: Interactive Data Visualization
Milano Air Quality: Interactive Data Visualization
 
Competitive Pokémon Graph Database
Competitive Pokémon Graph DatabaseCompetitive Pokémon Graph Database
Competitive Pokémon Graph Database
 
Video Classification: Human Action Recognition on HMDB-51 dataset
Video Classification: Human Action Recognition on HMDB-51 datasetVideo Classification: Human Action Recognition on HMDB-51 dataset
Video Classification: Human Action Recognition on HMDB-51 dataset
 
Word Embedding (Word2Vec and CADE): the evolution of tópoi in the Italian lit...
Word Embedding (Word2Vec and CADE): the evolution of tópoi in the Italian lit...Word Embedding (Word2Vec and CADE): the evolution of tópoi in the Italian lit...
Word Embedding (Word2Vec and CADE): the evolution of tópoi in the Italian lit...
 
Extreme Extractive Text Summarization and Topic Modeling (using LSA and LDA t...
Extreme Extractive Text Summarization and Topic Modeling (using LSA and LDA t...Extreme Extractive Text Summarization and Topic Modeling (using LSA and LDA t...
Extreme Extractive Text Summarization and Topic Modeling (using LSA and LDA t...
 
CXR-ACGAN: Auxiliary Classifier GAN for Conditional Generation of Chest X-Ray...
CXR-ACGAN: Auxiliary Classifier GAN for Conditional Generation of Chest X-Ray...CXR-ACGAN: Auxiliary Classifier GAN for Conditional Generation of Chest X-Ray...
CXR-ACGAN: Auxiliary Classifier GAN for Conditional Generation of Chest X-Ray...
 

Recently uploaded

Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 

Recently uploaded (20)

Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 

Master's Thesis - Data Science - Presentation

  • 1. DEEP NEURAL ENCODING MODELS OF THE HUMAN VISUAL CORTEX TO PREDICT FMRI RESPONSES TO NATURAL VISUAL SCENES University of Milano-Bicocca Department of Informatics, Systems and Communication Master's Degree in Data Science Academic Year 2022-2023 Master’s Degree Thesis by: Giorgio Carbone ID 811974 Supervisor: Prof. Simone Bianco Co-supervisor: Prof. Paolo Napoletano
  • 3. / Visual Encoding in Neuroscience Visual Neural Encoding ▪ humans understand complex visual stimuli ▪ visual information is represented as neural activations in the visual cortex ▪ neural activations (or responses) ➨ patterns of measurable electrical activity Visual Encoding Models [1] ▪ mimic the human visual system ▪ explain natural visual stimulus ⬌ neural activations relationship ▪ structured system to test biological hypotheses about the visual pathways Visual Encoding model Neural responses Stimulus Brain scan Visual cortex 3 [1] Naselaris et al. (2011). Encoding and decoding in fMRI. NeuroImage 56.
  • 4. / The Algonauts Project 2023: Challenge and Dataset Algonauts 2023 Challenge goals: ▪ promote artificial intelligence and computational neuroscience interdisciplinary research ▪ develop cutting-edge image-fMRI encoding models of the visual brain Natural Scene Dataset [2] : ▪ fMRI responses to ~73,000 images from MS COCO ▪ each of the eight subjects was shown: ~9000-10000 training images and ~150-400 test images ▪ measured the fMRI activity in the 39,548 voxels of the visual cortex ▪ betas ➨ single value response estimates ▪ functional Region of Interest (ROI) label for each voxel 4 [2] Allen et al. (2021). A massive 7T fMRI dataset to bridge cognitive neuroscience and artificial intelligence. Nature Neuroscience 25. 2 -2 0 fMRI betas for the 39,548 voxels Stimulus images LH RH Early retinotopic ROIs Body selective ROIs Face selective ROIs Place selective ROIs Word selective ROIs Functional classes of regions of interest (ROIs) LH RH
  • 5. / Evaluation Metric 5 Median Noise Normalized Squared Corellation (MNNSC) ➨ voxel-wise accuracy metric across 𝑁 voxels ➨ : Noise Ceiling for voxel predicted responses - true responses squared Pearson’s correlation for voxel ➨ : Test Images Measured Responses Voxel-wise true betas vectors fMRI scan Squared Person’s correlation Median MNNSC S1 S8 Predicted responses S1 S8 Voxel-wise predictions vectors S1 S8 Visual encoder
  • 6. / Research Goals Main goal: develop subject-specific image-fMRI encoders of the visual cortices of the eight subjects ▪ based on deep neural networks and transfer learning ▪ characterised by high stimulus compatibility ▪ mappability ▪ high predictivity across the entire visual cortex 6 Research Questions: 1. how well can variations in neural activity be predicted given the stimulus that evoked them? 2. how relevant are the visual features extracted from pre-trained DNNs for the neural encoding task? 3. is there a similarity between the visual processing in the DNNs and the visual cortex?
  • 8. / A Two-Step Voxel-Based Deep Visual Encoder 1. Non-Linear Feature Mapping using a pre-trained DNN backbone Bird Cow Face Ship Low-level visual features Mid-level visual features High-level visual features Output layer(s) selection Flattened and concateneted feature maps Input image Visual features 8
  • 9. Predicted response for voxel 𝒗 / A Two-Step Voxel-Based Deep Visual Encoder 1. Non-Linear Feature Mapping using a pre-trained DNN backbone Bird Cow Face Ship Low-level visual features Mid-level visual features High-level visual features Output layer(s) selection Flattened and concateneted feature maps Dimensionality reduction Voxel-based linear regression 2. Linear Activity Mapping Input image Visual features 8
  • 10. / Activity Mapping Methods 9 Goal ▪ find the activity mapping method that maximises the 10-fold cross-validation accuracy on Subject 1 ▪ feature mapping ➨ pre-trained AlexNet Dimensionality reduction ➨ 300-components Incremental PCA Linear regression ▪ Ordinary Least Squares (OLS) ▪ Ridge Regression with optimization of the α parameter Non-linear regression ▪ Regression Trees (RTs) ▪ Support Vector Regression (SVR) Regression Model MNNSC on Subject 1 Linear: OLS Regression 0.35 Linear: Ridge Regression 0.45 Non-linear: RTs 0.15 Non-linear: SVR 0.08
  • 11. / Feature Mapping Methods 10 Goals: 1. find the overall and ROI-wise best-performing feature mapping methods on Subject 1 2. compare pre-trained DNNs with: ▪ different architectures and depths ▪ different training parameters (learning tasks, learning methods and datasets) ▪ output layer(s) at varying depths 3. test a fused features approach Architecture Learning task/method Dataset AlexNet Image classification ImageNet-1K ZFNet Image classification ImageNet-1K VGG-16/19 Image classification ImageNet-1K EfficientNet-B2 Image classification ImageNet-1K ResNet-50 Image classification ImageNet-1K ResNet-50 (DINOv1) [3] Self-supervised ImageNet-1K RetinaNet Object detection MS COCO Architecture Learning task/method Dataset ViT-S/14 (DINOv2) Self-supervised LVD-142M ViT-B/14 (DINOv2) Self-supervised LVD-142M ViT-L/14 (DINOv2) Self-supervised LVD-142M ViT-B/16-GPT2 Image captioning MS COCO Pre-trained Convolutional Neural Networks (CNNs) Pre-trained Vision Transformers (ViTs) [3] M. Caron et al. (2021). Emerging Properties in Self-Supervised Vision Transformers. IEEE/CVF ICCV.
  • 12. 11 ResNet-50 ViT-L/14 (DINOv2) (a) (b) Contribution Rate (%) to the Highest Voxel-wise Accuracy Contribution Rate (%) to the Highest Voxel-wise Accuracy Layer Index Layer Index ROI-wise (MNNSC) Accuracy Similarity between DNNs and the human visual cortex: features extraction from output layers at increasing depths. (a) contribution rate (%) to the highest voxel-wise accuracy for each ROI class (b) ROI-wise MNNSC for each ROI class Layer Index Layer Index ROI-wise (MNNSC) Accuracy Early Vis. ROIs Body Sel. ROIs Face Sel. ROIs Place Sel. ROIs Word Sel. ROIs Early Vis. ROIs Body Sel. ROIs Face Sel. ROIs Place Sel. ROIs Word Sel. ROIs Early Vis. ROIs Body Sel. ROIs Face Sel. ROIs Place Sel. ROIs Word Sel. ROIs Early Vis. ROIs Body Sel. ROIs Face Sel. ROIs Place Sel. ROIs Word Sel. ROIs
  • 13. Image pre- processing Voxel-based Ridge (α) regression ROI 1 voxels mask Output layer(s) selection Pre-trained feature extractor PCA 𝒏 comp. Visual features Output layer(s) selection Pre-trained feature extractor PCA 𝒏 comp. Visual features Voxel-based Ridge (α) regression Image pre- processing ROI 𝐽 voxels mask ROI 𝑗 ROI 𝐽 ROI 1 ROI 1 voxels responses All voxels responses ROI 𝐽 voxels responses 12 / A Mixed and ROI-wise Encoding Model Proposed architecture: a mixed (multi-layer and multi-network) subject-specific encoding model
  • 15. 14 All subjects MNNSC: 0.62 Subj 8 MNNSC: 0.60 Subj 7 MNNSC: 0.57 Subj 6 MNNSC: 0.54 Subj 5 MNNSC: 0.65 Subj 4 MNNSC: 0.68 Subj 3 MNNSC: 0.65 Subj 1 MNNSC: 0.64 Subj 2 MNNSC: 0.66 Voxel-wise Noise Normalized Squared Correlation (NNSC) / Best ROI-wise Encoder: All Subjects Cross-Validation (a) (b) (a) distributions of the voxel-wise accuracies (NNSC) across all subjects conditioned to the hemisphere and the ROI (b) voxel-wise prediction accuracies (NNSC) across all subjects visualized on a common cortical surface Early Retinotopic Visual ROIs Body- selective ROIs Face- selective ROIs Place- selective ROIs Word- selective ROIs Voxel-wise Noise Normalized Squared Correlation (NNSC) Left hemisphere Right hemisphere ResNet-50 (DINOv1) RetinaNet ViT-L/14 (DINOv2) ViT-B/16-GPT2
  • 16. 15 / Best ROI-wise Encoder: Test Set Performance 0 20 40 60 80 100 Median Noise Normalized Squared Correlation (MNNSC) Early visual ROIs All voxels Body sel. ROIs Face sel. ROIs Place sel. ROIs Word sel. ROIs Median Noise Normalized Squared Correlation (MNNSC) 0 20 40 60 80 100 Early visual ROIs All voxels Body sel. ROIs Face sel. ROIs Place sel. ROIs Word sel. ROIs Subject 1 2 3 4 5 6 7 8 Proposed ROI-wise encoding model Baseline encoding model (AlexNet-based, not ROI-wise) All Subjects Subj. 1 Subj. 2 Subj. 3 Subj. 4 Subj. 5 Subj. 6 Subj. 7 Subj. 8 Proposed ROI-wise Encoder 0.52 0.53 0.51 0.56 0.54 0.50 0.59 0.40 0.57 Baseline Encoder 0.41 0.39 0.39 0.47 0.42 0.37 0.44 0.32 0.47 State-of-the -art Pure Neural Encoder [4] 0.64 [4] Adeli et al. (2023). Predicting brain activity using transformers. digital preprint, bioRxiv. Overall and subject-specific MNNSC
  • 17. [ CONCLUSIONS AND FUTURE WORK ] 16
  • 18. / Conclusions and Future Work Conclusions: ▪ effectiveness of transfer learning-based image- fMRI encoding ▪ generalizability of visual features extracted from computer vision models, particularly those pre-trained in a self-supervised manner ▪ functional alignment between DNNs and the human visual cortex ▪ a mixed (multi-layer and multi-network) and independent encoding of each ROI guarantees mappability and high predictivity over the entire visual cortex 17 Future Work: • apply a voxel-wise encoding optimization strategy to voxels that exhibit poor performance • implement auxiliary input data: physiological data, eye tracking data and COCO annotations • develop a pure neural encoder trained in an end- to-end way for the image-fMRI task
  • 19. { Thank you for your attention } University of Milano-Bicocca Department of Informatics, Systems and Communication Master's Degree in Data Science Academic Year 2022-2023 Master’s Degree Thesis by: Giorgio Carbone ID 811974 { Thank you for your attention } Supervisor: Prof. Simone Bianco Co-supervisor: Prof. Paolo Napoletano
  • 20. / Bibliography [1 ] Naselaris T, Kay KN, Nishimoto S, Gallant JL. (2011). Encoding and decoding in fMRI. NeuroImage (56). [2] Allen, E.J., St-Yves, G., Wu, Y. et al. (2021). A massive 7T fMRI dataset to bridge cognitive neuroscience and artificial intelligence. Nature Neuroscience. [3] M. Caron et al. (2021). Emerging Properties in Self-Supervised Vision Transformers. IEEE/CVF ICCV. [4] H. Adeli, S. Minni, and N. Kriegeskorte. (2023). Predicting brain activity using transformers. Preprint at bioRxiv. [5] Gifford, A. T., Lahner, B., Saba-Sadiya, S., Vilas, M. G., Lascelles, A., Oliva, A., Kay, K., Roig, G., & Cichy, R. M. (2023). The Algonauts Project 2023 Challenge: How the Human Brain Makes Sense of Natural Scenes. Preprint at arXiv. [6] Yamins, D. L. K., & DiCarlo, J. J. (2016). Using goal-driven deep learning models to understand sensory cortex. Nature Neuroscience, 19(3), Article 3. [7] Dwivedi, K., Bonner, M. F., Cichy, R. M., & Roig, G. (2021). Unveiling functions of the visual cortex using task-specific deep neural networks. PLOS Computational Biology, 17(8).
  • 21. / Natural Scene Dataset: Details Distribution of the Algonauts Project 2023 Challenge dataset images in the full training and test sets across the eight subjects, and in the training and validation subsets defined in the 10-fold cross-validation phase: Number of vertices composing the cortical challenge surface and the cortical fsaverage surface, considering the right and left hemispheres of the eight subjects: Lists of the ROIs belonging to each functional ROI class: • Early retinotopic visual regions: V1v, V1d, V2v, V2d, V3v, V3d, hV4 (V4). • Body-selective regions: EBA, FBA-1, FBA-2, mTL-bodies. • Face-selective regions: OFA, FFA-1, FFA-2, mTL-faces, aTL-faces. • Place-selective regions: OPA, PPA, RSC. • Word-selective regions: OWFA, VWFA-1, VWFA-2, mfs-words, mTL- words.
  • 22. / Evaluation Metric: Details Median Noise-Normalized Squared Correlation (MNNSC) over N voxels: Voxel-wise Pearson’s correlation between the voxel-wise vector of the predicted (P) responses for the voxel v and the ground truth (G) voxel- wise vector (t is the index of the stimulus image): Noise Ceiling for voxel v from the corresponding noise ceiling signal-to-noise ratio (considering the responses to m images, of which A responses are averaged over three trials, B over two trials, and C over one trial): (4) Noise Ceiling (NC) and (5) noise ceiling signal-to-noise ratio (ncsnr) formal definitions: (1) (2) (3) (4) (5)
  • 23. / Non-Linear Activity Mapping Methods: Details Supervised Regression Trees (RTs) learning approach, tested and chosen parameters: • Split criterion: Mean Squared Error (MSE) • maximum depth of the tree [5, 10, 15] ➨ 5 • minimum number of samples required to split an internal node [2,3] ➨ 2 • minimum number of samples needed to define a node as a leaf node [1,2] ➨ 1 • number of features considered when searching for the best split ➨ number of PCA components Support Vector Regression (SVM) learning approach, chosen parameters: • tube width ε (maximum distance between predicted and true values within which a penalty on the loss function is not generated) ➨ 0.1 • regularization parameter C (high values lead to more accurate fits on the training data but increase the sensitivity of the model to noise) ➨ 1.0 • kernel ➨ Radial Basis Function (RBF) • Gaussian kernel • gamma parameter (how far the influence of individual training examples can reach) ➨ 1 / number of PCA components
  • 24. / Feature Mapping Methods: Details Summary of the properties of the pre-trained models used as feature extractors: Summary of the different sets of image pre-processing steps applied to image inputs:
  • 25. / Comparing Fused Feature and Single Layer Approaches Single Layer Fused Features Single Layer Fused Features Single Layer Fused Features (a) (b) (c) (d) (e) (f) Comparison of the voxel-wise prediction accuracy for Subject 1, between an encoding model based on fused feature mapping (ViT-S/14 (DINOv2) 5+6+7), and encoding models using a single feature layer approach (ViT-S/14 (DINOv2) with output layers 5, 6 or 7): • randomized permutation test to determine a minimum threshold MNNSC value significantly different from zero ➨ 0.19 (p < 0.001) • (a, b, c): NNSC (abscissae) of the single feature models and NNSC (ordinates) of the 5+6+7 fused feature encoding model • (d, e, f): distributions of voxel-wise differences between the accuracy of the fused feature model and the single feature layer models
  • 26. / Comparing CNNs Pre-Trained with Different Training Tasks and Learning Methods Comparison of the voxel-wise prediction accuracy for Subject 1 between the two best configurations of the ROI-wise mixed encoding models based respectively on the: • (a) pre-trained ResNet-50 (self-supervised DINOv1, ImageNet-1K) and ResNet-50 (image classification, ImageNet-1K) models • (b) pre-trained RetinaNet (object detection, MS COCO) and ResNet-50 (image classification, ImageNet-1K) models • (c) pre-trained RetinaNet (object detection, MS COCO) and ResNet-50 (self-supervised DINOv1, ImageNet-1K) models (a) (b) (c)
  • 27. / Best ROI-wise Encoder: Cross-Validation Details Overall and functional ROI class-specific 10-fold cross-validation accuracies (MMNSC) considering the voxels of all subjects: ROI-specific 10-fold cross-validation accuracies (MMNSC) considering the voxels of all subjects:
  • 28. / Best ROI-wise and Baseline Models: Cross-Validation Proposed ROI-wise and mixed encoding model Baseline encoding model
  • 29. / Best Subject 1 and 2 Encoders: Cross-Validation Details S1 S2
  • 30. / Best Subject 3 and 4 Encoders: Cross-Validation Details S3 S4
  • 31. / Best Subject 5 and 6 Encoders: Cross-Validation Details S5 S6
  • 32. / Best Subject 7 and 8 Encoders: Cross-Validation Details S7 S8
  • 33. / DNNs/Visual Cortex Similarity: AlexNet - ZFNet AlexNet ZFNet
  • 34. / DNNs/Visual Cortex Similarity: VGG-19 VGG-19 VGG-19 (BN)
  • 35. / DNNs/Visual Cortex Similarity: ResNet-50 ResNet-50 (image classification) ResNet-50 (DINOv1)
  • 36. / DNNs/Visual Cortex Similarity: RetinaNet – EfficientNet-B2 RetinaNet EfficientNet-B2
  • 37. / DNNs/Visual Cortex Similarity: ViT-S/14 - ViT-B/14 (DINOv2) ViT-S/14 (DINOv2) ViT-B/14 (DINOv2)
  • 38. / DNNs/Visual Cortex: ViT-L/14 (DINOv2) - ViT-B/16-GPT2 ViT-L/14 (DINOv2) ViT-B/16-GPT2