Reattribution of Semantic Verse
Feature Group Meeting
Vanessa Sochat
July 18, 2013
Hypothesis
Reattribution of terms of well known semantic verse can
be useful in expressing an agreed upon, mutual sentiment
shared by a common cohort.
Background
Background
1
2
3
Background
4
Hypothesis
Reattribution of terms of well known semantic verse can
be useful in expressing an agreed upon, mutual sentiment
shared by a common cohort.
Hypothesis
Reattribution of terms of well known semantic verse can
be useful in expressing an agreed upon, mutual sentiment
shared by a common cohort.
“We, the Feature Group”
Hypothesis
Reattribution of terms of well known semantic verse can
be useful in expressing an agreed upon, mutual sentiment
shared by a common cohort.
“Present with a lunchbox of wisdom”
Hypothesis
Reattribution of terms of well known semantic verse can
be useful in expressing an agreed upon, mutual sentiment
shared by a common cohort.
“Dan, we are going to miss you!”
Conclusions
On the 18th day of July the Feature group
gave to me...
Conclusions
18 horse power car,
17 ounce jail rock
16 hairy eyeballs
15 inch party-stick
14 Matlab hotshots
13 neon straw things
12 ounces caffeine
11 invisible friends
10 ounces beard cream
9 months of planning
8 wheeled Caltrain
7 shiny pens
6 hour hand warmth
5 (plus one!) GOLDEN OREOS!
4 desktop friends
3 barf bags
2 heavy duty sponges
and a lunch box packed with all this wisdom!
Data Driven Neuropsychiatric Profiling
Qualifying Exam Presentation
Vanessa Sochat
August 19, 2013
Outline
• Background
• Hypothesis and Specific Aims
• Methods
• Conclusion
– Biomedical Contribution
– Informatics Contribution
Outline
• Background
• Hypothesis and Specific Aims
• Methods
• Conclusion
– Biomedical Contribution
– Informatics Contribution
Autism Spectrum Disorder:
A childhood development disorder
• Afflicts 1 in 100 children
• Economic burden of $126 billion annually
• Social, communication, and cognitive deficits, repetitive
behaviors and interests
Unsolved Problem:
data-driven subtyping of autism spectrum disorders for
early diagnosis and tailored, effective treatment
Autism Spectrum Disorder:
Our knowledge is limited
• Genetics
• Behavior
• Neuroimaging
Challenges:
Results not reproducible
No clinical applicability
Outline
• Background
• Hypothesis and Specific Aims
• Methods
• Conclusion
– Biomedical Contribution
– Informatics Contribution
Outline
• Background
• Hypothesis and Specific Aims
• Methods
• Conclusion
– Biomedical Contribution
– Informatics Contribution
Hypothesis
Structuring and mining behavioral and imaging data will
define clinically-useful disease subtypes of ASD better
than currently possible using DSM alone.
Specific Aims
Aim 1: to develop a computational representation of ASD
phenotypes based on imaging and behavioral data
Aim 2: to develop informatics methods to identify subtypes of
ASD patients
Aim 3: to evaluate the methods
BEHAVIOR & COGNITION
Big Picture
ASD MRI HC MRI
1. Start with groups
2. Collect data
3. Find differences
4. Inconsistent results
1. Collect data
2. Standardize behavior
3. Local brain phenotype
4. Relate
5. Patterns of relation =
subtypes
MRI
Outline
• Background
• Hypothesis and Specific Aims
• Methods
• Conclusion
– Biomedical Contribution
– Informatics Contribution
Outline
• Background
• Hypothesis and Specific Aims
• Methods
• Conclusion
– Biomedical Contribution
– Informatics Contribution
Aim 1: develop a computational
representation of ASD phenotypes based
on imaging and behavioral data
A Standard Representation of behavior
and cognition
A Standard Representation of behavior
and cognition
A Standard Representation of behavior
and cognition
• Structure data
• Query
• Cognitive phenotype
C1 C2 C3 …. CN
Data driven identification of Local
Differences in Brain Structure
(OBSERVED DATA) (MIXING MATRIX) (ORIGINAL DATA)
X = A SX
S = A-1 XX
n x m n x n n x m
fMRI data
time
time
space
spacecomponents
components
spatial maps
Independent Component Analysis
Of fMRI to define functional networks
n x m n x n n x m
sMRI data spatial maps
brains
brains
components
components
space space
Independent Component Analysis
Of sMRI to discover structural patterns
set of weights belonging to one person, each
one telling us the relative contribution of the
person’s brain to a particular pattern of brain
structure
Aim 2: develop informatics methods to
identify subtypes of ASD patients
C1 C2 C3 …. CN
cognitive phenotype + brain phenotype = neuropsychiatric
profile
Aim 2: develop informatics methods to
identify subtypes of ASD patients: ideas
cognitive phenotype + brain phenotype = neuropsychiatric
profile
Goal
find specific patterns of brain structure that can predict a personality trait, or an
intelligence metric.
Decision Support Means
subtype diagnosis based on brain structure
Aim 2: develop informatics methods to
identify subtypes of ASD patients: ideas
cognitive phenotype + brain phenotype = neuropsychiatric
profile
Do combined autism + healthy control decomposition
apply some threshold to define a group for each component evaluate
these groups.
1
Do combined autism + healthy control decomposition
apply some threshold to define a group for each component
do second decomposition to get “cleaner” result
evaluate these groups.
2
Start by splitting data based on some behavioral metric
Do decomposition for each group
somehow compare output, and evaluate groups
3
Aim 2: develop informatics methods to
identify subtypes of ASD patients: ideas
cognitive phenotype + brain phenotype = neuropsychiatric
profile
• How do we classify a new case? Need to do ICA again?
• Can weights / spatial maps have meaning outside of a decomposition?
• Ideal: make comparisons between different decompositions
• Not ideal: running ICA all over again with entire data + new dataset
Aim 3: evaluate the method: ideas
demonstrate that the subtypes of ASD defined by our methods have
greater homogeneity among individuals within the subtypes than
subtypes defined by the current gold standard DSM
two sample T-test with my groups to assess voxel-wise differences in
structure compared to same test with DSM labels
want to see our groups have clusters of just ASD or just HC
1
2
3
4 validation by producing known results from literature
Outline
• Background
• Hypothesis and Specific Aims
• Methods
• Conclusion
– Biomedical Contribution
– Informatics Contribution
Outline
• Background
• Hypothesis and Specific Aims
• Methods
• Conclusion
– Biomedical Contribution
– Informatics Contribution
Conclusion
• Informatics Contributions
– Extending Big Data paradigms to neuroscience
– Novel KR of behavioral and cognitive metrics
– Novel KR of local brain phenotype
– Methods to make inferences over these KR
• Biological Contributions
– Discovery of biomarkers of disorder
– Definition of disorder subtypes
– Decision support about treatment
Acknowledgements
Advisors and Panel
Daniel Rubin
Russ Altman
Mark Musen
Antonio Hardan
Colleagues
Kaustubh Supekar
Feature Group
The MIND Institute
Support Staff
John DiMario
Mary Jeanne & Nancy
Funding
Microsoft Research
SGF and NSF
Friends and Fellow BMI
Rebecca Sawyer
Linda Szabo
Katie Planey
Tiffany Ting Lu
Francisco Gimenez
Diego Munoz
Luke Yancy Jr.
Jonathan Mortensen
The M&Ms previously known as first years
Thank you!
CAM: Cognitive Atlas Markup

Quals Practice Presentation

  • 1.
    Reattribution of SemanticVerse Feature Group Meeting Vanessa Sochat July 18, 2013
  • 2.
    Hypothesis Reattribution of termsof well known semantic verse can be useful in expressing an agreed upon, mutual sentiment shared by a common cohort.
  • 3.
  • 4.
  • 5.
  • 6.
    Hypothesis Reattribution of termsof well known semantic verse can be useful in expressing an agreed upon, mutual sentiment shared by a common cohort.
  • 7.
    Hypothesis Reattribution of termsof well known semantic verse can be useful in expressing an agreed upon, mutual sentiment shared by a common cohort. “We, the Feature Group”
  • 8.
    Hypothesis Reattribution of termsof well known semantic verse can be useful in expressing an agreed upon, mutual sentiment shared by a common cohort. “Present with a lunchbox of wisdom”
  • 9.
    Hypothesis Reattribution of termsof well known semantic verse can be useful in expressing an agreed upon, mutual sentiment shared by a common cohort. “Dan, we are going to miss you!”
  • 10.
    Conclusions On the 18thday of July the Feature group gave to me...
  • 11.
    Conclusions 18 horse powercar, 17 ounce jail rock 16 hairy eyeballs 15 inch party-stick 14 Matlab hotshots 13 neon straw things 12 ounces caffeine 11 invisible friends 10 ounces beard cream 9 months of planning 8 wheeled Caltrain 7 shiny pens 6 hour hand warmth 5 (plus one!) GOLDEN OREOS! 4 desktop friends 3 barf bags 2 heavy duty sponges and a lunch box packed with all this wisdom!
  • 12.
    Data Driven NeuropsychiatricProfiling Qualifying Exam Presentation Vanessa Sochat August 19, 2013
  • 13.
    Outline • Background • Hypothesisand Specific Aims • Methods • Conclusion – Biomedical Contribution – Informatics Contribution
  • 14.
    Outline • Background • Hypothesisand Specific Aims • Methods • Conclusion – Biomedical Contribution – Informatics Contribution
  • 15.
    Autism Spectrum Disorder: Achildhood development disorder • Afflicts 1 in 100 children • Economic burden of $126 billion annually • Social, communication, and cognitive deficits, repetitive behaviors and interests Unsolved Problem: data-driven subtyping of autism spectrum disorders for early diagnosis and tailored, effective treatment
  • 16.
    Autism Spectrum Disorder: Ourknowledge is limited • Genetics • Behavior • Neuroimaging Challenges: Results not reproducible No clinical applicability
  • 17.
    Outline • Background • Hypothesisand Specific Aims • Methods • Conclusion – Biomedical Contribution – Informatics Contribution
  • 18.
    Outline • Background • Hypothesisand Specific Aims • Methods • Conclusion – Biomedical Contribution – Informatics Contribution
  • 19.
    Hypothesis Structuring and miningbehavioral and imaging data will define clinically-useful disease subtypes of ASD better than currently possible using DSM alone.
  • 20.
    Specific Aims Aim 1:to develop a computational representation of ASD phenotypes based on imaging and behavioral data Aim 2: to develop informatics methods to identify subtypes of ASD patients Aim 3: to evaluate the methods
  • 21.
    BEHAVIOR & COGNITION BigPicture ASD MRI HC MRI 1. Start with groups 2. Collect data 3. Find differences 4. Inconsistent results 1. Collect data 2. Standardize behavior 3. Local brain phenotype 4. Relate 5. Patterns of relation = subtypes MRI
  • 22.
    Outline • Background • Hypothesisand Specific Aims • Methods • Conclusion – Biomedical Contribution – Informatics Contribution
  • 23.
    Outline • Background • Hypothesisand Specific Aims • Methods • Conclusion – Biomedical Contribution – Informatics Contribution
  • 24.
    Aim 1: developa computational representation of ASD phenotypes based on imaging and behavioral data
  • 25.
    A Standard Representationof behavior and cognition
  • 26.
    A Standard Representationof behavior and cognition
  • 27.
    A Standard Representationof behavior and cognition • Structure data • Query • Cognitive phenotype C1 C2 C3 …. CN
  • 28.
    Data driven identificationof Local Differences in Brain Structure
  • 29.
    (OBSERVED DATA) (MIXINGMATRIX) (ORIGINAL DATA) X = A SX S = A-1 XX n x m n x n n x m fMRI data time time space spacecomponents components spatial maps Independent Component Analysis Of fMRI to define functional networks
  • 30.
    n x mn x n n x m sMRI data spatial maps brains brains components components space space Independent Component Analysis Of sMRI to discover structural patterns set of weights belonging to one person, each one telling us the relative contribution of the person’s brain to a particular pattern of brain structure
  • 31.
    Aim 2: developinformatics methods to identify subtypes of ASD patients C1 C2 C3 …. CN cognitive phenotype + brain phenotype = neuropsychiatric profile
  • 32.
    Aim 2: developinformatics methods to identify subtypes of ASD patients: ideas cognitive phenotype + brain phenotype = neuropsychiatric profile Goal find specific patterns of brain structure that can predict a personality trait, or an intelligence metric. Decision Support Means subtype diagnosis based on brain structure
  • 33.
    Aim 2: developinformatics methods to identify subtypes of ASD patients: ideas cognitive phenotype + brain phenotype = neuropsychiatric profile Do combined autism + healthy control decomposition apply some threshold to define a group for each component evaluate these groups. 1 Do combined autism + healthy control decomposition apply some threshold to define a group for each component do second decomposition to get “cleaner” result evaluate these groups. 2 Start by splitting data based on some behavioral metric Do decomposition for each group somehow compare output, and evaluate groups 3
  • 34.
    Aim 2: developinformatics methods to identify subtypes of ASD patients: ideas cognitive phenotype + brain phenotype = neuropsychiatric profile • How do we classify a new case? Need to do ICA again? • Can weights / spatial maps have meaning outside of a decomposition? • Ideal: make comparisons between different decompositions • Not ideal: running ICA all over again with entire data + new dataset
  • 35.
    Aim 3: evaluatethe method: ideas demonstrate that the subtypes of ASD defined by our methods have greater homogeneity among individuals within the subtypes than subtypes defined by the current gold standard DSM two sample T-test with my groups to assess voxel-wise differences in structure compared to same test with DSM labels want to see our groups have clusters of just ASD or just HC 1 2 3 4 validation by producing known results from literature
  • 36.
    Outline • Background • Hypothesisand Specific Aims • Methods • Conclusion – Biomedical Contribution – Informatics Contribution
  • 37.
    Outline • Background • Hypothesisand Specific Aims • Methods • Conclusion – Biomedical Contribution – Informatics Contribution
  • 38.
    Conclusion • Informatics Contributions –Extending Big Data paradigms to neuroscience – Novel KR of behavioral and cognitive metrics – Novel KR of local brain phenotype – Methods to make inferences over these KR • Biological Contributions – Discovery of biomarkers of disorder – Definition of disorder subtypes – Decision support about treatment
  • 39.
    Acknowledgements Advisors and Panel DanielRubin Russ Altman Mark Musen Antonio Hardan Colleagues Kaustubh Supekar Feature Group The MIND Institute Support Staff John DiMario Mary Jeanne & Nancy Funding Microsoft Research SGF and NSF Friends and Fellow BMI Rebecca Sawyer Linda Szabo Katie Planey Tiffany Ting Lu Francisco Gimenez Diego Munoz Luke Yancy Jr. Jonathan Mortensen The M&Ms previously known as first years
  • 40.
  • 41.

Editor's Notes

  • #4 Everyone, this is Feature Group. And this is how feature group feels because of an unpredictable, recent development. Well, everyone except for one person.
  • #5 Everyone, this is Dan. And he is leaving us, and we are sad, and have been exploring methods to communicate our appreciation for the time that he has been with us. So to start, we’ve done some background work to first identify what makes Dan so awesome. Working at NASA. Because it really is rocket science. Having a mini version of yourself, that you can carry around with you. It’s a strategy for future propogation of your genetic awesomeness. Awesome beardage. We have found that having a beard is positively correlated with people being afraid of you And lastly, the most salient predictor of awesomeness is…
  • #6 A well packed, Trader Joes lunch box. And we have observed that Dan is always prepared with his yellow trader Joe’s lunchbox. And so, we have
  • #14 Here is what I will be talking about today. I’ll first give you some background about a particular neuropsychiatric disorder that my work will focus on, and identify gaps in our knowledge. I will then share my proposal for using large data to address these gaps, and in methods will talk about how I plan to carry this out, and finally will close up summarizing the biomedical and informatics contributions of this work.
  • #15 Here is what I will be talking about today. I’ll first give you some background about a particular neuropsychiatric disorder that my work will focus on, and identify gaps in our knowledge. I will then share my proposal for using large data to address these gaps, and in methods will talk about how I plan to carry this out, and finally will close up summarizing the biomedical and informatics contributions of this work.
  • #16 Autism Spectrum Disorder is a family of heterogenous childhood developmental disorders that afflict 1 in 100 children, is characterized by deficits in socialization and communication, sometimes cognitive difficulties, and repetitive behaviors and extremely narrow interests. Autism is highly comorbid with other developmental, psychiatric, and medical disorders, and providing care and special schools and programs for these children leads to an estimated economic burden of $126 billion annually, and the problem is that both prevalence and cost are increasing, and we don’t have a clue about the cause, or within this family, what the real subtypes are. This is reflected in the recent release of the DSM 5, which has gotten rid of all of the previous ASD subclasses. Better, more personalized treatment hinges on our ability to relate specific, well-defined levels of behavioral deficit to treatment outcome, so you can see that there is huge opportunity here to develop data driven methods for subtyping to allow for this early diagnosis and tailored treatment.
  • #17 This is obviously a huge problem, and it’s not the case that no work has been done. What do we know? In terms of genetics, there are a small subset of cases that have a clearly identified set of genes or copy number variants, however what has become clear is that ASD is not a disorder that is going to be tied to a reliable set of genetic biomarkers. In terms of behavioral data, there is no standard set of traits that has been defined to characterize this disorder. A huge number and varied set of behavioral and cognitive metrics have been used for both clinical and research assessment in these disorders, and most of them are biased based on who is answering the questions. This is what makes imaging so appealing. This picture nicely summarizes the most prominent findings from neuroimaging. Structurally, pathology has been localized to the frontal lobes, superior temporal cortex, parietal cortex, the amygdala and cerebellum. We also know that ASD individuals have significantly larger brains and abnormal brain development, along with increased cortical thickness across the entire cerebral cortex. It has been hypothesized that ASD is a result of underconnectivity across the entire brain, and that the ASD brain is characterized by small world networks, or higher localized connectivity. So, what is the problem with this work, or why has imaging, to this point, failed to find a reliable set of biomarkers? It’s because a result is published and it’s not reproducible, and I believe it is because our gold standard isn’t good enough. The DSM labels do not capture the heterogeneity of this disease.
  • #18 Here is what I will be talking about today. I’ll first give you some background about a particular neuropsychiatric disorder that my work will focus on, and identify gaps in our knowledge. I will then share my proposal for using large data to address these gaps, and in methods will talk about how I plan to carry this out, and finally will close up summarizing the biomedical and informatics contributions of this work.
  • #19 Here is what I will be talking about today. I’ll first give you some background about a particular neuropsychiatric disorder that my work will focus on, and identify gaps in our knowledge. I will then share my proposal for using large data to address these gaps, and in methods will talk about how I plan to carry this out, and finally will close up summarizing the biomedical and informatics contributions of this work.
  • #20 So we need to let the data do the talking, and this is where informatics comes in. Our novel methods will produce a computational representation—a neuropsychiatric profile of ASD patients — to derive subtypes of ASD. We hypothesize that structuring and mining behavioral and imaging data will define clinically useful disease subtypes of ASD better than currently possible using DSM alone. This work will harness large databases of largely unused cognitive and behavioral assessments and imaging data that could ultimately enable decision support through data-driven recognition of subtypes of ASD.
  • #21 Specifically, we will first use imaging and behavioral data to develop a computational representation of ASD phenotypes We will then use these knowledge representations to develop methods to identify subtypes of patients And we will evaluate our findings by demonstrating that our subtypes reproduce known findings in the literature, and can better detect differences between types than the current DSM classes.
  • #22 BIG PICTURE The way that we look for group differences now is to start with our groups, collect imaging an behavioral data, do something like a 2 sample T test to find differences, maybe throw in a behavioral metric as a covariate, and largely this produces inconsistent results. Instead, we will equivalently collect our data, standardize and normalize the behavioral metrics so we use all of them to define a neuropsychiatric profile, extract representations of local brain phenotype, and then make specific connections between brain phenotype and behavior. For example, the size of the amygdala might predict an increased level of anxiety, as represened by two scales o anxiety and one behavioral measure of it. And what I think that we will find is that common patterns of these behavior – brain phenotype observations could will emerge, and those patterns we could call a subtype. Then a new person could go to a clinician, be assessed for these relationships, and from that be matched to a similar group, and knowing these groups would allow for better choosing treatment and early diagnosis. And further, we could possibly predict behavioral outcome from the imaging data alone. We would want to look at the local structure of their brain, and say “ok, based on having these local features, we can predict these specific behavioral outcomes, and that treatment X is best.”
  • #23 Here is what I will be talking about today. I’ll first give you some background about a particular neuropsychiatric disorder that my work will focus on, and identify gaps in our knowledge. I will then share my proposal for using large data to address these gaps, and in methods will talk about how I plan to carry this out, and finally will close up summarizing the biomedical and informatics contributions of this work.
  • #24 Here is what I will be talking about today. I’ll first give you some background about a particular neuropsychiatric disorder that my work will focus on, and identify gaps in our knowledge. I will then share my proposal for using large data to address these gaps, and in methods will talk about how I plan to carry this out, and finally will close up summarizing the biomedical and informatics contributions of this work.
  • #25 We will develop a computational representation of behavior and brain phenotype. What this really means is combining many non-overlapping behavioral and cognitive metrics into a standard representation, and modeling structural and functional brain differences as features of a brain phenotype. Our biggest challenge here is that these data that characterize an individual’s behavior and cognitive style are highly structured and complex. Let’s start with behavioral data. Here is an example of what we are encumbered with. Endless numbers of text files of common metrics collected during neuropsychiatric assessment, the majority of which are not used. This is the WASI to measure several metrics of intelligence, and th problem is that 1) we aren’t using this data and 2) between different studies there is not overlap.
  • #26 This is why we need methods that can maximally use these metrics that we have, and intelligently combine metrics across studies that may not completely overlap. If there are two metrics that both reflect verbal intelligence, for example, we should be able to compare them. And we are going to do that by way of ontology. This is the Cognitive Atlas, a growing ontology of behavioral and cognitive assessments. It models concepts, such as intelligence, anxiety, different traits… tasks, these are the tests themselves, and then collections, which are groups or batteries of tasks.
  • #27 From this we can understand a metric based on the underlying traits that it is measuring, allowing us to make comparisons across metrics. For example, the data that we saw previously was the WASI, which is an intelligence metric commonly seen in these research studies. From the ontology we learn that the WASI measures these kind of intelligence, as represented by these scores. Now let’s focus on a particular concept, logical reasoning. The ontology can tell us other tasks that also measure this concept, so we could compare non overlapping metrics. We will first develop methods to give structure to raw data, allow it to be linked to this ontology. We will then develop methods to query this structured data, and based on all of the behavioral assessments in a database of many people, scoring and normalization methods for metrics defined in the NIH Toolbox Battery, a growing standard of behavioral assessments used in neuroscience research, as well as metrics used by the National Database for Autism Research6. We will develop methods that permit comparison of traits extracted from non-overlapping cognitive and behavioral assessments by generating trait scores on a standard numerical dimension. Write up plan here for creating “object” Then discuss ideal solution, and likely solution At the end of the day, if we have normalized metrics for these broad categories, then we can make comparisons between studies (not currently possible), and specific findings about brain structure.
  • #28 So first we will develop methods to give structure to this raw data, allowing it to be linked to this ontology. We will then develop methods to query this structured data, and normalize the metrics onto a common scale. This will allow for comparison of traits extracted from non-overlapping cognitive and behavioral assessments. An individual’s normalized scale across these different concepts represents their cognitive phenotype.
  • #29 So we have our behavioral data, how are we going to incorporate brain imaging? You can think of brain imaging as just another measure of phenotype. How are we going to discover local aberrancy? The problem is pretty simple. We have a big group of brains, including different levels of ASD and then healthy control, and we want to find groups defined by similar local structure within that population. Most research of this type will have apriori information about group membership, namely the DSM label, and will do something like a 2 sample T test to identify significant differences between the groups. Instead, I would like to propose an unsupervised method to both identify these group subsets, and localize brain differences. We will use independent component analysis.
  • #30 Independent component analysis is incredibly simple and beautiful. The idea is that you start with some mixed up data, and that data can be decomposed into independent signals, and if we apply some matrix of weights, called the mixing matrix, to these independent signals, the result is the mixed data. Applied in the domain of neuroimaging, this method is used in fact for functional data. We start with a task oriented or a resting bold scan, if you can imagine taking a 3D image of someone’s brain over time to make a 4D image, and then flatting each 3D image into a row of numbers, so on the x axis we have timepoints, and on th Y axis we have individual points in space. Then when we break this observed data into independent signals, what emerges is functional networks. And the way that we solve for these independent signals is by taking the inverse of the mixing matrix, called the unmixing matrix, multiplied by our observed data. And we solve for this unmixing matrix with maximum likelihood estimation. I DON’T UNDERSTAND THIS So I don’t personally think that ICA with functional data is a reliable metric for developing biomarkers of disorder because, something like having a coffee before going in the scanner can influence the strength of the networks that emerge. However, I have a new proposal for the method applied to my particular problem.
  • #31 So instead of having functional scans from the same person stacked here, each of these will be a 3D image of a different person’s brain. The idea is that there are some underlying components, each of which is a particular pattern of brain structure that is shared across our participants. Each person’s brain is a mixture of these components. So if we look at our mixing matrix, instead of looking at a column as the timecourse for a particular component, if we look at the row, what we see is essentially a set of weights belonging to one person, each one telling us the relative contribution of the person’s brain to a particular pattern of brain structure. And remember that it’s completely feasible that the contribution for a person is zero. So by thresholding this matrix we can define subgroups of people within the larger group that share a pattern of local structure. This method will thus produce a particular hypothesis about a subset of individuals having commonality in a particular brain region. And specifically for ASD I would want to do this for maps of white matter volume and density.
  • #32 Now we have methods for giving structure to an individuals normalized cognitive phenotype, and we have a method for discovering local patterns of brain structure and a normalized value that represents the salient of that pattern to each individual. We have three ways of identifying groups from this data. Cluster cognitive phenotypes – find groups of similarly behaved people Cluster brain phenotypes – find groups of people with overall similar structure 3. But it’s more likely that people might share similarity of one behavioral feature, or one brain region. What we really want to do is find specific patterns of brain structure that can predict a personality trait, or an intelligence metric. Problem: how do we classify a new case? Need to do ICA again? Now our goal is find patterns of brain structure that predict behavioral outcomes. So now I want you to think of the cognitive phenotype Idea: save ICA regions extracted, and weights for a person? These are features, but how do we classify a new person? Idea 2: identify groups based on nearest neighbors, personality wise, then do ICA to find brain regions? What is the evaluation? STOPPED HERE We will also create an imaging processing pipeline to extract robust morphometric features about regional sizes, shape, and spatial relationships for white and gray matter from Diffusion Tensor Imaging (DTI) and anatomical images. Each feature will be given a semantic label to allow for human interpretability, and these features and labels will be combined with the cognitive and behavioral traits into a feature vector as a novel representation of an individual’s neuropsychiatric profile (disease phenotype). We will create these profiles for individuals with autism from the National Database for Autism Research (over 3,000 datasets with DTI and MRI) and for healthy controls with data from the Human Connectome Database. These resources have substantial imaging and phenotypic data for both ASD and healthy control. phenotype, and we want to use these data to identify subtypes of disorder. So we want a method that does not rely on some apriori label. We should be able to hand this method healthy controls and individuals with disorder, and see groups emerge. How are we going to go about this? Define people as similar based on behavioral metrics? Find nearest neighbors, then run ICA across structural maps? 3. Will this ID subgroups? Shouldn’t all people go into ICA? We start with these groups of people defined by local structure And each person has normalized behavioral and cognitive metrics Now we are going to use this abstraction of behavior, and abstraction of local structure, to identify groups of ASD patients. IDEALLY: we find some consistent patterns across patients POSSIBLY: we find no patterns across patients (then what will I do?) We now have people defined by normalized behavioral and cognitive metrics, and for each person, we have also identified (something about local brain aberrancy?) We will use feature subset selection using EM clustering to concurrently identify subsets of features and the groups of patients they distinguish in order to define subtypes of ASD. This unsupervised machine learning algorithm will be employed with a wrapper approach, meaning that the method folds feature selection into the unsupervised clustering algorithm, and subsets of features are evaluated based on how well they cluster the data. We will employ cross validation to choose a value of K. The output will be groups of similar individuals, each with a set of features that represent components of a profile that define a subtype of ASD. These features are important to the clinician to provide a human interpretable assessment of the particular behavioral traits associated with a subtype. Evaluation of the clustering is addressed in Aim 3.
  • #33 Here is the data that we have, and the goal is to find specific patterns of brain structure that can predict a personality trait or an intelligence metric. And this is really set up for the purposes of the evaluation – we want biomarkers from neuroimaging that can stand alone to give decision support, so making the association with behavioral data is really a step to say that there is some meaning in the groups identified by imaging. Because what decision support should mean is processing imaging data, evaluating somehow for patterns of local structure, and using those patterns to place the person in a group of similar patients.
  • #34 So here are some of my ideas. (go over three ideas). And of course the challenge for each of these is, how would I classify a new case?
  • #35 The goal is to be able to process new data and match to a group based on patterns of local structure. It’s not clear to me how I would classify a new case – if the weights and associated maps have meaning outside of a decomposition, then possibly I wouldn’t need to run ICA all over again with one more dataset. if I can’t compare between different decompositions, then maybe I could use the spatial map as an ROI to extract features from the original data? It would be ideal to be able to make comparisons between different decompositions. So the step that I’m at – I have an idea of what my starting data is going to look like, and I need to figure out, from this data, what exactly are the features that would go into some other unsupervised algorithm to define groups of local structure and behavioral variables that would be more correct to call a subtype. In my quals proposal I said that I would do feature subset selection using EM clustering to do this, and thinking about it now I would be just as happy I think to move away from the goal of finding people that share many patterns, and just identifying one pattern of local structure associated with one behavioral metric. That strategy seems to be more humanly interpretable for a clinician, at least to start.
  • #36 Evaluation is the most challenging part of this proposal, because we would normally use some label as a “gold standard,” and in this case, our gold standard is the DSM, which we know isn’t sufficient in the first place. So again, this is hugely based on what I decide to do in Aim 2, but I have three general ideas: First I could demonstrate that the groups defined by my methods have greater homogeneity among individuals than subtypes defined by the current gold standard DSM. That would mean calculating similarity metrics for all groups defined with imaging based on behavioral features, and then comparing the output. I would generally want to see my groups have some clustering of just HC or just ASD, as defined by DSM labels I could also take groups defined by my method and do a two sample T test to assess voxel-wise brain differences in structure, and I would compare to an equivalent analysis but using DSM labels. I would want to show that my groups have more significant differences than DSM labels.
  • #37 Here is what I will be talking about today. I’ll first give you some background about a particular neuropsychiatric disorder that my work will focus on, and identify gaps in our knowledge. I will then share my proposal for using large data to address these gaps, and in methods will talk about how I plan to carry this out, and finally will close up summarizing the biomedical and informatics contributions of this work.
  • #39 In a nutshell, I am proposing an extension of big data methods to neuroscience. We will start with novel knowledge representations for behavioral and imaging data, and use unsupervised machine learning algorithms to make inferences over these data. With these neuropsychiatric profiles, we can first find specific brain phenotypes that are indicative of specific behavioral outcomes, what I would call a biomarker, and then patterns of these biomarkers can define a disorder subtype. And this method would allow for the emergence of a novel combination of biomarkers that could define a new disorder subtype. And better being able to match an individual to a group will drive decision support to advise treatment. This work will focus on redefining the current nosology of ASD, our work will be generalizable to many other neuropsychiatric disorders.
  • #41 Things to know really well: principal component analysis, factor analysis, and projection pursuit ICA, choosing number components, infomax Structuring data with xml Querying xml data (sparql?) When I have cog and brain phenotype – how are each structured, related to original metrics? What am I going to do with ICA to represent brain phenotypes? How am I going to relate brain and behavioral data? Things that could go wrong IDEALLY: we find some consistent patterns across patients POSSIBLY: we find no patterns across patients (then what will I do?)