Cultivation of KODO MILLET . made by Ghanshyam pptx
Brain Maps like Mine
1. Brain Maps Like Mine
semantic and computational image comparison
methods for meta-analysis and reproducibility of brain
statistical maps
Vanessa Sochat
Research In Progress
October 20, 2015
3. Outline
Background
Why do we want to compare images?
Computational Image Comparison
Impact of Image Thresholding on Similarity Metrics
4. Outline
Background
Why do we want to compare images?
Computational Image Comparison
Impact of Image Thresholding on Similarity Metrics
Semantic Image Comparison
Ontological and Graph Based Methods
5. Meta-analysis to synthesize understanding of human
cognition and reproducibility of brain statistical maps.
Why Compare Images?
8. Why Compare Images?
1. For Reproducibility
We do not not what a replication looks like.
2. For Meta Analysis
What does all the research say about “anxiety?”
9. Outline
Background
Why do we want to compare images?
Computational Image Comparison
Impact of Image Thresholding on Similarity Metrics
Semantic Image Comparison
Ontological and Graph Based Methods
10. Result
What if there is data missing?
Should I tranform the images first?
What am I trying to optimize?
How similar are these results?
11. Goal: assess influence of different degrees of
image thresholding on the outcome of pairwise
image comparison
12. Goal: assess influence of different degrees of
image thresholding on the outcome of pairwise
image comparison
VARIABLES
thresholds
metrics
optimization
15. Methods Result
3. Define our similarity metrics
Pearson’s R
where
"Correlation coefficient" by Kiatdd - Own work. Licensed under CC BY-SA 3.0 via Wikimedia Commons
Spearman’s Rank Correlation Coefficient
17. Methods Result
Data
465 single subjects
7 tasks
47 contrast images
working memory
animals
gambling
language
relational
emotion
social
motor
“cat” vs. “dog”
18. Methods Result
Subsampling Procedure
For each of 500 subsamples:
Subset data to unrelated groups A and B
For each unthresholded map, A i in A
Apply each threshold in Z = +/- 0:13, and Z = + 0:13 to all of B
Calculate similarity for each of B to A i with CCA or SVI
Assign correct classification if contrast A i most similar to equivalent contrast in B
23. Methods Result
Conclusions
1. More data is not always better
minimum degree of thresholding improves accuracy
random field theory may be too much
1. Question the choice of metric, threshold, etc.
complete case analysis, pearson, worked for us...
3. “What is the quantitative language that we should
use to compare two images?”
24. Outline
Background
Why do we want to compare images?
Computational Image Comparison
Impact of Image Thresholding on Similarity Metrics
Semantic Image Comparison
Ontological and Graph Based Methods
27. The Approach:
- “graph” based similarity
- “probabilistic” based similarity
- compare the two to spatial similarity
Semantic Similarity: Overview
Is semantic comparison of images useful to classify cognitive states?
Goals:
- completely automated
- assess predictive power of semantic similarity
- relate to computational (spatial) similarity
30. ONTOLOGY
Tools
Semantic Similarity: Method
Is semantic comparison of images useful to classify cognitive states?
Goals:
- completely automated
- Cognitive Atlas, NeuroVault, Pybraincompare
cat
dog
graph similarity( , )
IMAGE DATA METHODS
31. Data:
- 93 brain maps tagged in NeuroVault with contrast → concept
- programatically retrieve data, run methods, and output result
and visualization.
General Workflow
- publish interesting results
- tag with a contrast, and associated cognitive concepts
- assess semantic similarity
- graph based
- probabilistic
Semantic Similarity: Method
Is semantic comparison of images useful to classify cognitive states?
34. Graph Similarity: Method
visual feline recognition
visual canine recognition
Wang’s Method
- aggregates semantic contributions of
ancestor terms
1. We start with associated concepts.
visual canine recognition
animal recognitionis a kind of
is part of
is a kind of recognition
canine fear response
35. Wang’s Method
- aggregates semantic contributions of
ancestor terms
2. We then take weights at intersection
visual canine recognition
animal recognitionis a kind of
is part of
is a kind of recognition
canine fear response
visual feline recognition
animal recognitionis a kind of
is part of
is a kind of recognition
feline fear response
S( , ) =
sum(intersected weights)
sum(all weights)
Graph Similarity: Method
39. Reverse Inference
for image classification and concept validation
a new result
P( | )
P( | )
P(cognitive process | a spatial map)
P(node mental process|activation) = P(activation|mental process) * P(mental process)
P(activation|mental process) * P(mental process) + P(A|~mental process) * P(~mental process)
40. P( | )
What does a high score say?
about the cognitive concept?
P( | )
contributes evidence for
41. Data:
93 brain maps tagged in NeuroVault with contrast → concept
programatically retrieve data, run methods, and output result and visualization.
For each of 93 brainmaps, as query image:
For each of 140 concept nodes, node, in Cognitive Atlas:
calculate P(node|query image)
Assign correct classification if P(node|query image) > 0.5
Probabilistic Similarity
Method
45. Summary
Image comparison is essential for
meta-analysis and reproducibility
A small amount of image thresholding
aids to find images of similar contrast
46. Summary
Image comparison is essential for
meta-analysis and reproducibility
A small amount of image thresholding
aids to find images of similar contrast
Semantic Image Comparison
is a promising strategy to assess reproducibility
47. Summary
Image comparison is essential for
meta-analysis and reproducibility
A small amount of image thresholding
aids to find images of similar contrast
Semantic Image Comparison
is a promising strategy to assess reproducibility
48. Acknowledgements
INCF/ Nidash
Satra Ghosh
Nolan Nichols
Jessica Turner
Tom Nichols
JB Poline
David Keator
Collaborators
Tal Yarkoni
Nipy
Funding
Microsoft Research
SGF and NSF
Poldracklab
Russ Poldrack
Chris Gorgolewski
Craig Moodie
Sanmi Koyejo
Patrick Bissett
Joke Durnez
Ian Eisenberg
Mac Shine
Joe Wexler
BMI
Daniel Rubin
Russ Altman
Mark Musen
Rebecca Sawyer
Mary Jeanne
Nancy
Steven Bagley
John DiMario
50. Coordinate-Based Approaches
column 1: raw data: big black dots
showing the local maxima that are
reported, dotted line is “true” simulated
signal, black thick line is that signal with
added noise.
column 2: shows the results of ALE: the
result is more of a curve because the
“ALE statistic” reflects a probability value
that at least one peak is within r mm of
each voxel, so the highest values of
course correspond to actual peaks.
column 3: kernel density analysis (KDA)
gives us a value at each voxel that
represent the number of peaks within r
mm of that voxel. If we divide by voxel
resolution we can turn that into a
“density”
column 4: is MULTI kernel density
analysis, which is the same as KDA, but
the procedure is done for each
study. The resulting “contrast indicator
maps” are either 1 (yes, there is a peak
within r mm) or 0 (nope).
AL
E
KD
A
MKD
A
53. visual canine recognition
CONCEPTS
visual feline recognition
visual feline recognition
RELATIONSHIPS
is a kind of
animal recognition
visual canine recognition
is a kind of
GRAPH SIMILARITY
0.8
Today I am going to be talking about “brain maps like mine” content-based image comparison and retrieval for interactive visualization and meta-analysis of brain statistical maps
Here is the path that we are going to follow today – first I am going to convince you that meta-analysis in the field of neuroimaging is important. Then we are going to identify the current opportunity to do this meta analysis in a way that is more in line with the changing data landscape. Once we understand the problem, the opportunity, we are going to jump right into methods – what am I doing to address this, and then talk about some preliminary results.
Here is the path that we are going to follow today – first I am going to convince you that meta-analysis in the field of neuroimaging is important. Then we are going to identify the current opportunity to do this meta analysis in a way that is more in line with the changing data landscape. Once we understand the problem, the opportunity, we are going to jump right into methods – what am I doing to address this, and then talk about some preliminary results.
Here is the path that we are going to follow today – first I am going to convince you that meta-analysis in the field of neuroimaging is important. Then we are going to identify the current opportunity to do this meta analysis in a way that is more in line with the changing data landscape. Once we understand the problem, the opportunity, we are going to jump right into methods – what am I doing to address this, and then talk about some preliminary results.
We want to synthesize results across many neuroimaging studies
We want to synthesize results across many neuroimaging studies
Reproducibility in neuroimaging requires many things about documenting the workflow, sharing the data, but given that we have all the pieces of inputs, the critical question is how do I know I have a replication when I see it. How do I compare an image A, to an image B. So when we are...
So let’ start with computational image comparison. This was my first question for my thesis work with Poldracklab - what is the impact of different transformations of a brain map on image comparison.
And now we have a simple problem. Let’s say, here is my statistical map result, this is what happens in the brain when people look at cats. And let’s make it really simple, I don’t even want to compare my image against all the maps in NeuroVault to find ones that are similar for a meta analysis. I just have ONE MAP. Here is the brain when people look at pictures of dogs. So how to do this? Well, we could of course compare every single value here, to every single value here. But a lot of these values are tiny, essentially zero, and further, we could have missing data between the maps so comparison isn’t even possible. And so there are two questions here:
This was my first question too, specifically:
This was my first question too, specifically:
Next we define our strategies to test. Given that we have some transformations of brain maps from step 1.
We define “brain mask” strategy as all voxels within a standard template brain mask
We define pairwise inclusion as all non zero, non –nan voxels in either of the maps, like a union
We define pairwise deletion as the intersection of nonzero, non-nan values in the maps. This last one has the most certainty that we are comparing features that exist to features that exist, but the feature set is much smaller. But as we go up at any level, introducing faux zeros might only serve to diffuse signal.
You could imagine an entire project related to just developing some new metric toward this goal, but for this work we wanted to hold this variable relatively constant and use the two that are most reasonable, that are widely use. So we are going to use a reasonable metric that is used widely, pearson’s R correlation. I don’t need to reiterate pearson’s for everyone, but what we are looking at is: the correlation coefficient as the mean of the products of the standard scores for each of our datasets, X and Y. And the standard score is of course subtracting the mean and dividing by the standard deviation. What this measures is the linear relationship between the two datasets.
The Pearson correlation coefficient measures the linear relationship between two datasets. Strictly speaking, Pearson's correlation requires that each dataset be normally distributed. Like other correlation coefficients, this one varies between -1 and +1 with 0 implying no correlation. Correlations of -1 or +1 imply an exact linear relationship. Positive correlations imply that as x increases, so does y. Negative correlations imply that as x increases, y decreases. The p-value roughly indicates the probability of an uncorrelated system producing datasets that have a Pearson correlation at least as extreme as the one computed from these datasets. The p-values are not entirely reliable but are probably reasonable for datasets larger than 500 or so.
This is the hardest question! When I first proposed this work back in March, the optimization I was working on was computational - defining some “gold standard” base image, and then comparing to that. But that doesn’t fly, because who is to say that the gold standard is an unthresholded map, from this scanner type, this many subjects, etc. We also need an optimization strategy that reflected what people in neuroimaging actually want to do when they are comparing images. And that is, they want to find an image for which the study is querying the same experimental paradigm. If my map is about cats, I want to find other catmaps. If it is about dogs, I want other dogmaps.
This is the hardest question! When I first proposed this work back in March, the optimization I was working on was computational - defining some “gold standard” base image, and then comparing to that. But that doesn’t fly, because who is to say that the gold standard is an unthresholded map, from this scanner type, this many subjects, etc. We also need an optimization strategy that reflected what people in neuroimaging actually want to do when they are comparing images. And that is, they want to find an image for which the study is querying the same experimental paradigm. If my map is about cats, I want to find other catmaps. If it is about dogs, I want other dogmaps.
This is the hardest question! When I first proposed this work back in March, the optimization I was working on was computational - defining some “gold standard” base image, and then comparing to that. But that doesn’t fly, because who is to say that the gold standard is an unthresholded map, from this scanner type, this many subjects, etc. We also need an optimization strategy that reflected what people in neuroimaging actually want to do when they are comparing images. And that is, they want to find an image for which the study is querying the same experimental paradigm. If my map is about cats, I want to find other catmaps. If it is about dogs, I want other dogmaps.
We found that, for our dataset, using complete case analysis at a threshold of 1.0 including positive and negative values maximizes accuracy at about 98%.
Given a pair of brain maps for which one of the
maps is thresholded, we show that an analysis using the intersection of nonzero voxels across
images at a threshold of Z = +/- 1.0 maximizes accuracy for retrieval of a list of maps of the same
contrast, and thresholding up to Z = +/- 2.0 can increase accuracy as compared to comparison
using unthresholded maps. Finally, maps can be thresholded up to to Z = +/- 3.0 (corresponding
to 25% of voxels non-empty within a standard brain mask) and still maintain a lower bound of
90% accuracy. Our results suggest that a small degree of thresholding may improve the accuracy
of image similarity computations, and that robust meta-analytic image similarity comparisons can
be obtained using thresholded images
At this threshold, we had almost perfect classification across many of our contrast images, and what we saw is larger error bars with regard to contrasts from specific tasks, which are the different colors we see here. So the biggest errors we see with this working memory task, and the only difference in the contrast images is that we are looking at tools vs. body parts.
And we have an interactive web portal where you can dynamically explore our result, and for example this shows two contrasts from the memory task that were misclassified, and you can see how incredibly similar the images are. This suggests that the brain areas being queried aren’t really that different spatially or in the value of the maps when it comes to memory, and might suggest that there is some redundancy in having the different contrast images. All of our code is available along with a subset of sampled data so you can try out your own things.
And this has immediate application - my first contribution to NeuroVualt was adding this “find similar” button, which will implement the optimal comparison strategy to find maps that are similar, and then you can click another button to see an interactive scatterplot and select/hide by region, etc. But, what is missing here? If you notice for these images, they are from the same study, meaning that the contrasts are similar. This is semantic data, and this kind of strategy doesn’t take any of this into account at all. And this transitions us into one of my current projects.
This is the next question - can we use ontology and graph based methods over image semantics for a semantic-based image comparison?
We are back to this scenario, where we have statistical brain maps that are associated with an experimental paradigm, the person was looking at dogs or cats. But now, instead of the data taking prime position in the spotlight, what if our metric also integrated something about the tasks themselves?
Our broad goal is to assess image similarity with semantic terms. I’m going to be discussing two approaches, a basic “graph,” and a more complicated probabilistic approach that calculates something called “reverse inference,” which I will explain in more detail.
The goals of these analysis are to be completely automated. This means that we get our cognitive concepts from the Cognitive Atlas, and our data from the NeuroVault database, and methods from my python package called “pybraincompare.”
all will be possible by way of the Cognitive Atlas
Our broad goal is to assess image similarity with semantic terms. I’m going to be discussing two approaches, a basic “graph,” and a more complicated probabilistic approach that calculates something called “reverse inference,” which I will explain in more detail.
The goals of these analysis are to be completely automated. This means that we get our cognitive concepts from the Cognitive Atlas, and our data from the NeuroVault database, and methods from my python package called “pybraincompare.”
all will be possible by way of the Cognitive Atlas
We have a very basic question - is semantic comparison of images useful to classify cognitive states?
For example, if I have this brain map here, can I tell you if it’s a person experiencing “dog” or “cat”
and what if we flip it? if you tell me “dog” or “cat” - can I tell you what the brain map looks like?
And further, how does some kind of semantic comparison relate to spatial comparison? None of these questions are really answered.
What do I mean when I say completely automated? I mean that all data, methods, and standards are publicly available so if/when this work gets published, anyone can reproduce the entire analysis with the click of a button, and further, use the same data, method, or standard for a different analysis.
In other terms, the Cognitive Atlas, is our ontology, NeuroVault run by our lab is a database of statistical maps, and both of these things in the last year we have added REST APIs with python wrappers, and finally pybraincompare is the python module I am developing where all the code for my image comparison methods is published.
And all of these things are intimately linked. In the past few months we’ve added infrastructure and the manual work to tag images in neurovault with their appropriate contrasts that link to coginitive concepts in the cognitive atlas. And pybraincompare has a branch of methods that uses both these APIs to do the analyses I’m about to talk about.
Specifically, we are starting with 93 images in NeuroVault tagged with cognitive atlas concepts. And the general workflow of the methods I will discuss is as follows.
A researcher publishes interesting results.
He or she uploads the data to nerovault, and tags with a contrast, the contrast is associated with concepts by way of the ontology.
He /she then runs some method in pybraincompare to assess semantic similarity and we hope its useful.
Now is a good time to talk about the different kinds of semantic similarity.
This is the next question - can we use ontology and graph based methods over image semantics for a semantic-based image comparison?
This is a toy example of our graph based comparison. We are back to this scenario, where we have statistical brain maps that are associated with an experimental paradigm, the person was looking at dogs or cats. If we tag with this contrast, dog vs cat, we can then walk up the tree and figure out how similar these two things are based on distance, and the relationships between the different nodes. But now, instead of the single label that this is a “dog” task or a “cat” task taking prime position in the spotlight, what if our metric also integrated something about the tasks themselves?
So our graph-based method is very simple. If each contrast image, dog or cat, is nothing more than a collection of concepts, we can figure out how similar the images are just based on those concepts, and how they relate in the tree.
You can think of the one or more concepts associated with an image like a gene set. We are going to use a meth
od that is common to genomic analysis to assess similarity of these concept sets. Some of you are likely familiar with this method, it is Wang’s metric.
Wang’s method aggregates the semantic contribution of ancestor terms, including the specific term. We are going to basically generate a list of parent concepts and weights for each contrast, find the overlap in those lists, and then use those weights. As follows:
We start with associated concepts. This means starting at a base node, walking up the tree, and appending a list of all the “is_a” and “part_of” relationships. Each of these links on the tree has a weight, and the weights are 0.8 for “is_a” and 0.6 for part of. Thus, we calculate the weight of each node by multiplifying weights. So the first is_a is 0.8, then we multiple by 0.8 to get 0.64, and so the concept weights or “scores” decrease as we go up the tree to this base master / parent node.
We stop at the root node.
We then have, for each set of contrasts, a complete list of associated concepts, and weights. Given multiple concepts, you can imagine that concepts might overlap. In this case, we take the maximum weight for each one. We then take the weights at the intersection of each list from above.
The similarity score is sum(intersected weights) / sum(all weights)
Here we have our preliminary result for this analysis -
shows sparsity of relationships (we are working on)
contrasts within same task (with similar contrasts tagged) have obvious similarity
this was our first message that the relationships within the ontology need better definition, so this is what we are working on
This is the next question - can we use ontology and graph based methods over image semantics for a semantic-based image comparison?
This idea of reverse inference is what is the probability of a cognitive process, given a pattern of spatial activation, which is a brain map. To put this in simpler terms, let’s say we have a database of images, each of this is a result from a laboratory about what happens in the brain when we look at either cats or dogs.
Here we have a new result, a brain map, and actually we don’t know what it is.
If you think about it, this can serve as both a classifier, and a form of meta analysis and validation for different cognitive concepts.
We can calculate the posterior, or the probability of this cat or dog concept given our new result, a statistical brain map that says what is going on in the brain during the experience of cat or dog. This is a bayesian framework, and it’s going to do to things - tell us something about the concept, so “cats” vs “dogs” and the new result.
So what would for example, a high score say about the cognitive concept? It would say ah,ha! This new result contributes evidence for this cognitive concept. That is in line with validation and reproducibility because it says that what this person found is well matched to what others have found before. If the score is very low, it obviously says that the image does not contribute evidence for the concept, and that could be that the image isn’t really measuring the same concept, or it could be that the likelihood of the concept, period, isn’t very good.
Add that we can of course vary the threshold to produce ROC, and given that we have multiple contrast tags per image, we would need a multi-class procedure.
this is just an example for a single concept node, we need multiple nodes
this is just an example for a single concept node, we need multiple nodes
A small amount of image threhsolding aids in finding images of a similar contrast, and toward this goal we must consider these factors when we are doing any kind of classification task.
A small amount of image threhsolding aids in finding images of a similar contrast, and toward this goal we must consider these factors when we are doing any kind of classification task.
A small amount of image threhsolding aids in finding images of a similar contrast, and toward this goal we must consider these factors when we are doing any kind of classification task.
What we really want to do is assess similarity not based on some single tag that names the task, but based on the cognitive concepts that the task elicits. So what we first do for this work is to take our dog and cat contrasts and associate them each with one or more concepts from the cognitive atlas. For this toy example, we will just assign one concept for each. We then want to define relationships, or make assertions that one concept is related to a parent concept in some way. In the cognitive atlas we have two kinds of relationships:”is a kind of” which implied that one term is parent to another term, and “is part of” which implies a less strong, different kind of relationship. So if we look at this example here, each of these nodes is a concept, and each concept can be related to the other concepts in the tree. And each of these links is a kind of relationship. And we can assign weights to the relationships, and then do different operations over the tree to computationally define how “similar” the concepts are by way of ontology.
Pearson Correlation: measuring linear relationship to determine how proportional values are to one another
It is covariance divided by product of standard deviations
The most widely-used type of correlation coefficient is Pearson r (Pearson, 1896), also called linear or product-moment correlation (the term correlation was first used by Galton, 1888). Using non technical language, we can say that the correlation coefficient determines the extent to which values of two variables are "proportional" to each other. The value of the correlation (i.e., correlation coefficient) does not depend on the specific measurement units used; for example, the correlation between height and weight will be identical regardless of whether inches and pounds, or centimeters and kilograms are used as measurement units. Proportional means linearly related; that is, the correlation is high if it can be approximated by a straight line (sloped upwards or downwards). This line is called the regression line or least squares line, because it is determined such that the sum of the squared distances of all the data points from the line is the lowest possible. Pearson correlation assumes that the two variables are measured on at least interval scales. The Pearson product moment correlation coefficient is calculated as follows: