SlideShare a Scribd company logo
1 of 19
Predicting Outcomes When Your
Outcomes are Graphs (or functions)
Bill Shannon, PhD, MBA
Co-Founder and CEO, BioRankings
Professor Emeritus of Biostatistics in Medicine, WUSM
bill@biorankings.com, 314-704-8725
With big data come new complex data
formats – data as graphs
Functional MRI Data
• Brains are inserted into MRI
scanner
• 30 gigabytes raw data
• Parcellation
• Networks
– Nodes are regions of the
brain
– Edges are the correlations
between pairs of nodes
Connectome Graph
With big data come new complex data
formats – data as graphs
Data Microbiome
• Sample from human,
animal, field (soil),
environment
• Next Generation
Sequencing (write once,
read never data)
• Genomic analysis
processing
– Annotation to taxonomic
label (i.e., genus, species)
Microbiome Tree
Statistics is interested in inferring
things about everything from a sample
Sample to Population Inference
• Collect a bunch of graphs – 1
per subject
• Plot graphs
• Estimate mean and variance
(or g* and tau)
• Does this plot teach us about
the graphs in terms of how
they are distributed and what
the central tendency is?
Does this plot teach us anything?
Graphs are too complex – let’s simplify
Network metrics
Average connectivity
Small world network
Species diversity
Taxa counts
Enterotype
Many-to-one mapping is not necessarily a good
way to simplify data for analysis
Simplifying in fMRI and Microbiome
fMRI
• Average Node Connectivity
• Consider two brain scans
– Patient 1
• Right half ANC = 10
• Left half ANC = 0
– Patient 2
• Right half ANC = 5
• Left half ANC = 5
• Both whole brain ANC = 5
Microbiome
• Species Diversity
• Consider two samples
– Patient 1
• Proportion Taxa A, B, C = 1/3
• Proportion Taxa D, E, F = 0
– Patient 2
• Proportion Taxa A, B, C = 0
• Proportion Taxa D, E, F = 1/3
• Both have Simpson diversity
= 0.33
We analyze graphical data the same
way as we analyze columns of data
Gibbs distribution
• Let G be a finite set of graphs and denote the
elements of G by g. Let 𝑑 be an arbitrary
distance metric on G. The Gibbs distribution
on the graphs G is denoted by
ℙ 𝒈; 𝒈∗
, 𝝉 = 𝒄 𝒈∗
, 𝝉 𝒆𝒙𝒑 −𝝉𝒅 𝒈∗
, 𝒈 , ∀𝒈 𝝐 𝐆,
with parameters g∗
the central or average
graph, and 𝜏 a non-negative number that is a
measure of the dispersion of the observed
connectome data around g∗
. 𝑐 g∗
, 𝜏 is the
normalizing constant.
ℙ 𝑔𝑖; g∗
, 𝜏 is the probability of observing a
specific graph 𝑔𝑖 given the parameters
g∗
, 𝜏 .
Statistics on Graphs
We analyze graphical data the same
way as we analyze columns of data
Recursive partitioning
• Regress the graphs on
covariates
• In this example of Parkinson's
disease
– Y = connectome
– X = group, sex, age
• RP splits the connectomes into
homogeneous groups based
on likelihood of Gibbs
Statistics on Graphs
What else can be analyzed with
graphical OODA?
IoT
Blockchain
Cybersecurity
What about data which are functional
objects?
Untargeted Metabolomics
• Liquid chromatography and
mass spec – LC/MS
• RT x m/z plots
• Which peaks correspond to
metabolites (known or
unknown), and which peaks
are different in patients
who live and die?
RT x m/z plots are too complex – let’s
simply
Looking for things that look
different and then testing them
statistically is wrong – P values
don’t mean anything in these
cases.
Why not analyze functions using
functional OODA?
Why not analyze functions using
functional OODA?
Field Enabling
Technology
Bioinformatics Exploratory Analysis Translational
Statistics
Microbiome Next generation
Sequencing
Assembly,
annotation, chimera
checking
Cluster analysis,
multidimensional
scaling, heatmaps
Dirichlet-
multinomial for taxa
counts
Gibbs distribution
for taxonomic
trees
Brain Imaging Functional MRI
(fMRI)
Image registration,
parcellation
Generalized linear
models with
multiple testing
adjustment, graph
metrics
Gibbs distribution
for connectome
Metabolomics LC/MS Peak detection,
centering
Mass univariate
testing with multiple
testing adjustment
Functional data
analysis, Gibbs
distribution, Co-
Inertia, and the
Exploratory-
Validation Model for
experimental design
Projects in object oriented data analysis
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017

More Related Content

What's hot

PyData Miami 2019, Quantum Generalized Linear Models
PyData Miami 2019, Quantum Generalized Linear ModelsPyData Miami 2019, Quantum Generalized Linear Models
PyData Miami 2019, Quantum Generalized Linear ModelsColleen Farrelly
 
PMC Poster - phylogenetic algorithm for morphological data
PMC Poster - phylogenetic algorithm for morphological dataPMC Poster - phylogenetic algorithm for morphological data
PMC Poster - phylogenetic algorithm for morphological dataYiteng Dang
 
Basic Statistics (MEAN)
Basic Statistics (MEAN)Basic Statistics (MEAN)
Basic Statistics (MEAN)Shahirah Aziz
 
Hierarchical clustering and topology for psychometric validation
Hierarchical clustering and topology for psychometric validationHierarchical clustering and topology for psychometric validation
Hierarchical clustering and topology for psychometric validationColleen Farrelly
 
Machine Learning by Analogy II
Machine Learning by Analogy IIMachine Learning by Analogy II
Machine Learning by Analogy IIColleen Farrelly
 
Estimating Gaussian Mixture Densities via an implemetation of the Expectaatio...
Estimating Gaussian Mixture Densities via an implemetation of the Expectaatio...Estimating Gaussian Mixture Densities via an implemetation of the Expectaatio...
Estimating Gaussian Mixture Densities via an implemetation of the Expectaatio...Asoka Korale
 
Data Science Meetup: DGLARS and Homotopy LASSO for Regression Models
Data Science Meetup: DGLARS and Homotopy LASSO for Regression ModelsData Science Meetup: DGLARS and Homotopy LASSO for Regression Models
Data Science Meetup: DGLARS and Homotopy LASSO for Regression ModelsColleen Farrelly
 
Logistic regression: topological and geometric considerations
Logistic regression: topological and geometric considerationsLogistic regression: topological and geometric considerations
Logistic regression: topological and geometric considerationsColleen Farrelly
 
Portfolio Theory of Information Retrieval
Portfolio Theory of Information RetrievalPortfolio Theory of Information Retrieval
Portfolio Theory of Information RetrievalJun Wang
 
Rodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Poster_Presentation_SACNAS
Rodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Poster_Presentation_SACNASRodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Poster_Presentation_SACNAS
Rodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Poster_Presentation_SACNAS​Iván Rodríguez
 
How to correctly estimate the effect of online advertisement(About Double Mac...
How to correctly estimate the effect of online advertisement(About Double Mac...How to correctly estimate the effect of online advertisement(About Double Mac...
How to correctly estimate the effect of online advertisement(About Double Mac...Yusuke Kaneko
 
Survival Analysis Superlearner
Survival Analysis SuperlearnerSurvival Analysis Superlearner
Survival Analysis SuperlearnerColleen Farrelly
 
On Statistical Analysis and Optimization of Information Retrieval Effectivene...
On Statistical Analysis and Optimization of Information Retrieval Effectivene...On Statistical Analysis and Optimization of Information Retrieval Effectivene...
On Statistical Analysis and Optimization of Information Retrieval Effectivene...Jun Wang
 

What's hot (17)

PyData Miami 2019, Quantum Generalized Linear Models
PyData Miami 2019, Quantum Generalized Linear ModelsPyData Miami 2019, Quantum Generalized Linear Models
PyData Miami 2019, Quantum Generalized Linear Models
 
PMC Poster - phylogenetic algorithm for morphological data
PMC Poster - phylogenetic algorithm for morphological dataPMC Poster - phylogenetic algorithm for morphological data
PMC Poster - phylogenetic algorithm for morphological data
 
Basic Statistics (MEAN)
Basic Statistics (MEAN)Basic Statistics (MEAN)
Basic Statistics (MEAN)
 
Hierarchical clustering and topology for psychometric validation
Hierarchical clustering and topology for psychometric validationHierarchical clustering and topology for psychometric validation
Hierarchical clustering and topology for psychometric validation
 
Machine Learning by Analogy II
Machine Learning by Analogy IIMachine Learning by Analogy II
Machine Learning by Analogy II
 
Block iterative methods
Block iterative methodsBlock iterative methods
Block iterative methods
 
Estimating Gaussian Mixture Densities via an implemetation of the Expectaatio...
Estimating Gaussian Mixture Densities via an implemetation of the Expectaatio...Estimating Gaussian Mixture Densities via an implemetation of the Expectaatio...
Estimating Gaussian Mixture Densities via an implemetation of the Expectaatio...
 
Data Science Meetup: DGLARS and Homotopy LASSO for Regression Models
Data Science Meetup: DGLARS and Homotopy LASSO for Regression ModelsData Science Meetup: DGLARS and Homotopy LASSO for Regression Models
Data Science Meetup: DGLARS and Homotopy LASSO for Regression Models
 
Big data presentation
Big data presentationBig data presentation
Big data presentation
 
Logistic regression: topological and geometric considerations
Logistic regression: topological and geometric considerationsLogistic regression: topological and geometric considerations
Logistic regression: topological and geometric considerations
 
Portfolio Theory of Information Retrieval
Portfolio Theory of Information RetrievalPortfolio Theory of Information Retrieval
Portfolio Theory of Information Retrieval
 
Deep learning
Deep learningDeep learning
Deep learning
 
Rodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Poster_Presentation_SACNAS
Rodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Poster_Presentation_SACNASRodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Poster_Presentation_SACNAS
Rodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Poster_Presentation_SACNAS
 
How to correctly estimate the effect of online advertisement(About Double Mac...
How to correctly estimate the effect of online advertisement(About Double Mac...How to correctly estimate the effect of online advertisement(About Double Mac...
How to correctly estimate the effect of online advertisement(About Double Mac...
 
Survival Analysis Superlearner
Survival Analysis SuperlearnerSurvival Analysis Superlearner
Survival Analysis Superlearner
 
On Statistical Analysis and Optimization of Information Retrieval Effectivene...
On Statistical Analysis and Optimization of Information Retrieval Effectivene...On Statistical Analysis and Optimization of Information Retrieval Effectivene...
On Statistical Analysis and Optimization of Information Retrieval Effectivene...
 
Dmblog
DmblogDmblog
Dmblog
 

Similar to Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017

Prote-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and VisualizationProte-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and VisualizationDmitry Grapov
 
How to analyse bulk transcriptomic data using Deseq2
How to analyse bulk transcriptomic data using Deseq2How to analyse bulk transcriptomic data using Deseq2
How to analyse bulk transcriptomic data using Deseq2AdamCribbs1
 
Seminar Slides
Seminar SlidesSeminar Slides
Seminar Slidespannicle
 
BIOSTATISTICS FUNDAMENTALS FOR BIOTECHNOLOGY
BIOSTATISTICS FUNDAMENTALS FOR BIOTECHNOLOGYBIOSTATISTICS FUNDAMENTALS FOR BIOTECHNOLOGY
BIOSTATISTICS FUNDAMENTALS FOR BIOTECHNOLOGYGauravBoruah
 
Lect 1_Biostat.pdf
Lect 1_Biostat.pdfLect 1_Biostat.pdf
Lect 1_Biostat.pdfBirhanTesema
 
Data & data reprentation
Data & data reprentationData & data reprentation
Data & data reprentationSomeshwarMoholkar
 
Data Visualization (1).pptx
Data Visualization (1).pptxData Visualization (1).pptx
Data Visualization (1).pptxcfiskillzz159
 
Singular Value Decomposition (SVD).pptx
Singular Value Decomposition (SVD).pptxSingular Value Decomposition (SVD).pptx
Singular Value Decomposition (SVD).pptxrajalakshmi5921
 
EDAB Module 5 Singular Value Decomposition (SVD).pptx
EDAB Module 5 Singular Value Decomposition (SVD).pptxEDAB Module 5 Singular Value Decomposition (SVD).pptx
EDAB Module 5 Singular Value Decomposition (SVD).pptxrajalakshmi5921
 
Microarray Data Analysis
Microarray Data AnalysisMicroarray Data Analysis
Microarray Data Analysisyuvraj404
 
Building maps with analysis
Building maps with analysisBuilding maps with analysis
Building maps with analysisLindaBeale
 
INFLUENCE OF DATA GEOMETRY IN RANDOM SUBSET FEATURE SELECTION
INFLUENCE OF DATA GEOMETRY IN RANDOM SUBSET FEATURE SELECTIONINFLUENCE OF DATA GEOMETRY IN RANDOM SUBSET FEATURE SELECTION
INFLUENCE OF DATA GEOMETRY IN RANDOM SUBSET FEATURE SELECTIONIJDKP
 
Network Biology: A paradigm for modeling biological complex systems
Network Biology: A paradigm for modeling biological complex systemsNetwork Biology: A paradigm for modeling biological complex systems
Network Biology: A paradigm for modeling biological complex systemsGanesh Bagler
 
Ability Study of Proximity Measure for Big Data Mining Context on Clustering
Ability Study of Proximity Measure for Big Data Mining Context on ClusteringAbility Study of Proximity Measure for Big Data Mining Context on Clustering
Ability Study of Proximity Measure for Big Data Mining Context on ClusteringKamleshKumar394
 
Exploratory Data Analysis for Biotechnology and Pharmaceutical Sciences
Exploratory Data Analysis for Biotechnology and Pharmaceutical SciencesExploratory Data Analysis for Biotechnology and Pharmaceutical Sciences
Exploratory Data Analysis for Biotechnology and Pharmaceutical SciencesParag Shah
 
Data Mining StepsProblem Definition Market AnalysisC
Data Mining StepsProblem Definition Market AnalysisCData Mining StepsProblem Definition Market AnalysisC
Data Mining StepsProblem Definition Market AnalysisCsharondabriggs
 

Similar to Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017 (20)

Prote-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and VisualizationProte-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and Visualization
 
How to analyse bulk transcriptomic data using Deseq2
How to analyse bulk transcriptomic data using Deseq2How to analyse bulk transcriptomic data using Deseq2
How to analyse bulk transcriptomic data using Deseq2
 
Genome wide association mapping
Genome wide association mappingGenome wide association mapping
Genome wide association mapping
 
Seminar Slides
Seminar SlidesSeminar Slides
Seminar Slides
 
BIOSTATISTICS FUNDAMENTALS FOR BIOTECHNOLOGY
BIOSTATISTICS FUNDAMENTALS FOR BIOTECHNOLOGYBIOSTATISTICS FUNDAMENTALS FOR BIOTECHNOLOGY
BIOSTATISTICS FUNDAMENTALS FOR BIOTECHNOLOGY
 
Lect 1_Biostat.pdf
Lect 1_Biostat.pdfLect 1_Biostat.pdf
Lect 1_Biostat.pdf
 
Data & data reprentation
Data & data reprentationData & data reprentation
Data & data reprentation
 
Data Visualization (1).pptx
Data Visualization (1).pptxData Visualization (1).pptx
Data Visualization (1).pptx
 
Singular Value Decomposition (SVD).pptx
Singular Value Decomposition (SVD).pptxSingular Value Decomposition (SVD).pptx
Singular Value Decomposition (SVD).pptx
 
EDAB Module 5 Singular Value Decomposition (SVD).pptx
EDAB Module 5 Singular Value Decomposition (SVD).pptxEDAB Module 5 Singular Value Decomposition (SVD).pptx
EDAB Module 5 Singular Value Decomposition (SVD).pptx
 
Microarray Data Analysis
Microarray Data AnalysisMicroarray Data Analysis
Microarray Data Analysis
 
Lec 3.pptx
Lec 3.pptxLec 3.pptx
Lec 3.pptx
 
Building maps with analysis
Building maps with analysisBuilding maps with analysis
Building maps with analysis
 
INFLUENCE OF DATA GEOMETRY IN RANDOM SUBSET FEATURE SELECTION
INFLUENCE OF DATA GEOMETRY IN RANDOM SUBSET FEATURE SELECTIONINFLUENCE OF DATA GEOMETRY IN RANDOM SUBSET FEATURE SELECTION
INFLUENCE OF DATA GEOMETRY IN RANDOM SUBSET FEATURE SELECTION
 
Network Biology: A paradigm for modeling biological complex systems
Network Biology: A paradigm for modeling biological complex systemsNetwork Biology: A paradigm for modeling biological complex systems
Network Biology: A paradigm for modeling biological complex systems
 
Ability Study of Proximity Measure for Big Data Mining Context on Clustering
Ability Study of Proximity Measure for Big Data Mining Context on ClusteringAbility Study of Proximity Measure for Big Data Mining Context on Clustering
Ability Study of Proximity Measure for Big Data Mining Context on Clustering
 
Data in science
Data in science Data in science
Data in science
 
Exploratory Data Analysis for Biotechnology and Pharmaceutical Sciences
Exploratory Data Analysis for Biotechnology and Pharmaceutical SciencesExploratory Data Analysis for Biotechnology and Pharmaceutical Sciences
Exploratory Data Analysis for Biotechnology and Pharmaceutical Sciences
 
Unit 1 - Statistics (Part 1).pptx
Unit 1 - Statistics (Part 1).pptxUnit 1 - Statistics (Part 1).pptx
Unit 1 - Statistics (Part 1).pptx
 
Data Mining StepsProblem Definition Market AnalysisC
Data Mining StepsProblem Definition Market AnalysisCData Mining StepsProblem Definition Market AnalysisC
Data Mining StepsProblem Definition Market AnalysisC
 

More from StampedeCon

Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...StampedeCon
 
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017StampedeCon
 
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...StampedeCon
 
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017StampedeCon
 
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017StampedeCon
 
Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017StampedeCon
 
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...StampedeCon
 
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...StampedeCon
 
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017StampedeCon
 
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017StampedeCon
 
A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017StampedeCon
 
Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017StampedeCon
 
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017StampedeCon
 
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017StampedeCon
 
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...StampedeCon
 
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...StampedeCon
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016StampedeCon
 
Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016StampedeCon
 
Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016StampedeCon
 
Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016StampedeCon
 

More from StampedeCon (20)

Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
 
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
 
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
 
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
 
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
 
Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017
 
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
 
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
 
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
 
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
 
A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017
 
Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017
 
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
 
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
 
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
 
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
 
Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016
 
Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016
 
Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016
 

Recently uploaded

Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataTecnoIncentive
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxHimangsuNath
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Milind Agarwal
 
convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfSubhamKumar3239
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingsocarem879
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxHaritikaChhatwal1
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxSimranPal17
 

Recently uploaded (20)

Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptx
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
 
convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdf
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processing
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptx
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptx
 

Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017

  • 1. Predicting Outcomes When Your Outcomes are Graphs (or functions) Bill Shannon, PhD, MBA Co-Founder and CEO, BioRankings Professor Emeritus of Biostatistics in Medicine, WUSM bill@biorankings.com, 314-704-8725
  • 2. With big data come new complex data formats – data as graphs Functional MRI Data • Brains are inserted into MRI scanner • 30 gigabytes raw data • Parcellation • Networks – Nodes are regions of the brain – Edges are the correlations between pairs of nodes
  • 4. With big data come new complex data formats – data as graphs Data Microbiome • Sample from human, animal, field (soil), environment • Next Generation Sequencing (write once, read never data) • Genomic analysis processing – Annotation to taxonomic label (i.e., genus, species)
  • 6. Statistics is interested in inferring things about everything from a sample Sample to Population Inference • Collect a bunch of graphs – 1 per subject • Plot graphs • Estimate mean and variance (or g* and tau) • Does this plot teach us about the graphs in terms of how they are distributed and what the central tendency is?
  • 7. Does this plot teach us anything?
  • 8. Graphs are too complex – let’s simplify Network metrics Average connectivity Small world network Species diversity Taxa counts Enterotype
  • 9. Many-to-one mapping is not necessarily a good way to simplify data for analysis
  • 10. Simplifying in fMRI and Microbiome fMRI • Average Node Connectivity • Consider two brain scans – Patient 1 • Right half ANC = 10 • Left half ANC = 0 – Patient 2 • Right half ANC = 5 • Left half ANC = 5 • Both whole brain ANC = 5 Microbiome • Species Diversity • Consider two samples – Patient 1 • Proportion Taxa A, B, C = 1/3 • Proportion Taxa D, E, F = 0 – Patient 2 • Proportion Taxa A, B, C = 0 • Proportion Taxa D, E, F = 1/3 • Both have Simpson diversity = 0.33
  • 11. We analyze graphical data the same way as we analyze columns of data Gibbs distribution • Let G be a finite set of graphs and denote the elements of G by g. Let 𝑑 be an arbitrary distance metric on G. The Gibbs distribution on the graphs G is denoted by ℙ 𝒈; 𝒈∗ , 𝝉 = 𝒄 𝒈∗ , 𝝉 𝒆𝒙𝒑 −𝝉𝒅 𝒈∗ , 𝒈 , ∀𝒈 𝝐 𝐆, with parameters g∗ the central or average graph, and 𝜏 a non-negative number that is a measure of the dispersion of the observed connectome data around g∗ . 𝑐 g∗ , 𝜏 is the normalizing constant. ℙ 𝑔𝑖; g∗ , 𝜏 is the probability of observing a specific graph 𝑔𝑖 given the parameters g∗ , 𝜏 . Statistics on Graphs
  • 12. We analyze graphical data the same way as we analyze columns of data Recursive partitioning • Regress the graphs on covariates • In this example of Parkinson's disease – Y = connectome – X = group, sex, age • RP splits the connectomes into homogeneous groups based on likelihood of Gibbs Statistics on Graphs
  • 13. What else can be analyzed with graphical OODA? IoT Blockchain Cybersecurity
  • 14. What about data which are functional objects? Untargeted Metabolomics • Liquid chromatography and mass spec – LC/MS • RT x m/z plots • Which peaks correspond to metabolites (known or unknown), and which peaks are different in patients who live and die?
  • 15. RT x m/z plots are too complex – let’s simply Looking for things that look different and then testing them statistically is wrong – P values don’t mean anything in these cases.
  • 16. Why not analyze functions using functional OODA?
  • 17. Why not analyze functions using functional OODA?
  • 18. Field Enabling Technology Bioinformatics Exploratory Analysis Translational Statistics Microbiome Next generation Sequencing Assembly, annotation, chimera checking Cluster analysis, multidimensional scaling, heatmaps Dirichlet- multinomial for taxa counts Gibbs distribution for taxonomic trees Brain Imaging Functional MRI (fMRI) Image registration, parcellation Generalized linear models with multiple testing adjustment, graph metrics Gibbs distribution for connectome Metabolomics LC/MS Peak detection, centering Mass univariate testing with multiple testing adjustment Functional data analysis, Gibbs distribution, Co- Inertia, and the Exploratory- Validation Model for experimental design Projects in object oriented data analysis