SlideShare a Scribd company logo
Monkeying Around:
Automatically
Analyzing Malaria
Infections in Rhesus
Macaques
Lindsay Hexter
Motivation
★ Why Malaria?
○ 3.2 billion people at risk globally
★ Why begin at E04?
○ To automate analysis in
Joyner 2016, foundation for
other experiments
○ Compare Human vs. Machine
Analysis
Motivation: Why Automation?
★ High-dimensional data
★ Precision of data analysis
★ Time: can re-apply same
analysis to new data
★ Discover something new
★ Non-experts can still
make conclusions about
the data
Purpose
★ Automatic analysis of E04
○ Come up with a framework to study small
datasets (differently shaped)
★ Apply this framework to other
experiments
○ Generalizable to other parasite strains? How
and why do the results change?
Overview of E04 Experiment
★ P. cynomolgi || P. vivax
★ Rhesus macaques ⇒ humans
★ Provide data for better
understanding of the infection =
better treatments!
Dataset
★ 5 monkeys: 2 non-severe, 1
severe, 1 very severe, 1 lethal
★ Clinical parameters taken daily
over the course of the experiment -
looked at specifically blood-related
parameters
Techniques: Data preprocessing / Normalization
★ Scaling, missing data?
★ Normalize data to equalize computations for
distance metrics
Unit vector Min-max
normalization
Normalization reduced important variation
★ Used to reduce noise, but in this case, removed important malaria phase
information
Techniques: Iterations on Curve-fitting
★ Two ways of thinking:
○ Guess x # of Gaussians based on x # peaks, then divide that window into x
chunks for fitting (e.g. 3 peaks, so divide window evenly and fit to those 3
ranges) - original logic to code an “n_gaussians” function
○ Guess x # Gaussians based on x # peaks, then specifically divide windows
based on peak ranges instead - code a fitting function over the whole
interval
★ Parameter search space?
Techniques: Iterations on Curve-fitting
★ Two ways of thinking:
○ Guess x # of Gaussians based on x # peaks, then divide that window into x
chunks for fitting (e.g. 3 peaks, so divide window evenly and fit to those 3
ranges) - original logic to code an “n_gaussians” function
○ Guess x # Gaussians based on x # peaks, then specifically divide windows
based on peak ranges instead - code a fitting function over the whole
interval
★ Two approaches: minimizing residual between fit and data
vs. relying on built-in function
○ Curve-fitting based on reduction of user-defined loss function: scipy
minimize, scipy leastsq, scipy fmin_slsqp
○ Curve-fitting based on built-in loss function: scipy curve_fit
Techniques: Iterations on Curve-fitting
★ Fitting window? Peakutils, scipy peak-finding...
Result: Curve fitting / peak finding
My peak
function -
includes
plateaux!
Peakutils
results in
poor fit
Data is cleaned, scaled and normalized;
mathematical representation via
concatenated Gaussian functions; now
onto analysis...
Analysis Roadmap: Goal + Technique
Relationship among clinical
parameters
Regression modeling
Representation of monkeys in vector
space
Residual matrices
Automatic grouping of clinical
parameters in this vector space
Clustering
Minimizing biological noise to
increase similarity among
monkeys of similar phenotype
Bayesian optimization
Analyses we can find automatically
Joyner et al. 2016 Automated analyses
# Reticulocytes - possible indicator of lethal
phenotype
✅
Anemic phenotype worsens with severity ✅
Relationship between hemoglobin and:
parasitemia kinetics, mean corpuscular
volume (red blood cell size)
✅
Role of thrombocytopenia (platelet
deficiency) not well understood
✅ + insight?
Lower parasitemia in non-severe
phenotype
✅
Techniques: Regression
★ Ridge method★ Combined model: Stochastic Gradient Descent Regressor- weights
for each monkey as measure of predictor significance
Results: Regression
★ Coefficients for both
non-severe monkeys
are much more similar,
in comparison to the
other monkeys
★ Suggests some ‘normal’
phenotype vs. sick =
anomaly
Results: Regression
★ # Reticulocytes largest
positive coefficient for
lethal phenotype -
possible indicator
★ MCV - red blood cell
size - as another
distinguishing factor?
# Reticulocytes - possible
indicator of lethal phenotype
✅
Anemic phenotype worsens
with severity
✅
Results: Regression
★ Hgb shown to have the
same relationship
among two non-severe
monkeys and among
the other group
★ Hgb is negative as
compared to positive in
non-severe monkeys
★ Hgb unrelated to mcv,
as found in paper
Relationship between hgb
and: parasitemia, mcv
✅
Results: Combined Regression
★ Possible representation of
non-severe phenotype with
low regression weights
Results: Phased Regression
★ Did not improve mean squared error over whole interval - peak-finding worked
well for Gaussian fitting, but not finding phases automatically
★ Also because of evaluation pre-shifting, some of the phases may not have
aligned (resulting in larger errors for combined models, shown in table)
Target
monkey
Sum of
MSEs of
all phases
RIc14
(non-severe)
34.288
RSb14
(non-severe)
29.817
Techniques: Clustering
★ Clustering, e.g. kmeans
(tried Gaussian means,
agglomerative
hierarchical, spectral, and
birch methods)
★ Evaluation via Silhouette
Score
b = avg dissimilarity with nearest
neighboring cluster
a = avg similarity within own cluster
Techniques: Residual Matrices- representing
Monkeys in Vector Space
★ Construct residual matrices where
each clinical parameter is
characterized by the residual
between two monkeys
m1 vs.
m2
... m4 vs.
m5
gran
lymph
...
wbc
Monkey pairwise residuals / sign
match ⇒
Clinicalparameter⇒
Techniques: Bayesian Optimization
★ Bayesian Optimization
○ Motivation: derive insights from
very complex functions
○ E.g. 7^20 * 7^20 = !!!!!!
■ extremely computationally
heavy
○ Optimality of guessed result
based on loss function (in our
case, residual between two
monkeys)
RSb14 (non-severe) and RIc14 (non-severe)
Results: Residual Matrices + Shifting
★ Shifting helped
elucidate the trend
between
non-severe
monkeys
Reduced residual from ~ 57 to ~ 12
Lower parasitemia in non-severe
phenotype
✅
Results: Residual Matrices + Shifting
★ Shifting helped elucidate the trend between
non-severe monkeys
Relationship between hemoglobin and:
parasitemia kinetics
✅
Pre-shifting Post-shifting
Results: Residual Matrices + Shifting
★ Monocytes - role in adaptive immune system, so important in first phase?
(prognostic of long-term survival?)
Results: Clustering + Shifting
Parasites / uL clustered with monocytes and reticulocytes, as previously
mentioned, post-shifting (up to day 23)
# Reticulocytes - possible
indicator of lethal phenotype
✅
Results: Clustering + Shifting
★ Parameters clustered together over all days, k = 4: how are they related?
Granulocytes, lymphocytes,
monocytes, platelets, #
reticulocytes, reticulocytes
concentration, white blood cell
total count
Parasites / uL
Red blood cells / volume,
hemoglobin, mean corpuscular
hemoglobin conc, mean
corpuscular hemoglobin, red
blood cell volume, mean platelet
volume, total red blood cell
count, red blood cell distribution
width
% reticulocytes (proportional to
total red blood cell count)
Immune
response?
Red blood
cell-related
parameters?
Results: Normalization in clustering - tradeoff
Role of thrombocytopenia (platelet
deficiency) not well understood
✅ + insight?
Can we apply this methodology to NEW
experiments?
E03: P. coatneyi Hackeri (i.e. different parasite)
★ Different
representation in
vector space
★ Reticulocytes even
further from the
other clinical data
(more significant in
this parasite?)
★ More clusters =
white blood cells /
red blood cells
again separated
E23: Iterative P. cynomolgi, new monkeys
★ Similar
representation in
vector space as
E04
★ Thus - a way to
characterize the
malaria parasite?
Conclusions & Contributions & Future Work!
★ Comprehensive framework to analyze malaria experiments - automatically
characterizing relationships among monkeys (severity phenotype?), among
clinical parameters (which are similarly important?), and malaria parasite
○ Such that non-experts can analyze data
○ Applicable to other experiments
○ Reduce TIME spent studying these results and provide more precise analysis
★ Future
○ Expand existing framework with new methods
○ Application of FULL framework to other experiments - build a more generalized model
○ More in-depth consideration of biological profiles
○ More comprehensive data storage to help automate entire process from start to finish
(especially Bayesian optimization)
★ I’ve learned a lot!
Thank you ...
★ Dr. Galinski and co. for running the experiments!
★ Thesis committee members - Dr. Eisen, Dr. Fossati, Dr. Prinz
★ My friends for being here!!
★ Dr. Choi for guiding me through the process and pushing me throughout my
CS career!
Questions?
E23 combined
model
weights?
Monkey Coefficient
RBg14 0.456473
ROh14 0.431664
RAd14 0.085915
RJn13 0.080195
ROc14 0.010533
RIb13 0.044158

More Related Content

Similar to Monkeying Around: Automatically Analyzing Malaria Infections in Rhesus Macaques

sample size new 1111 ppt community-1.ppt
sample size new 1111 ppt community-1.pptsample size new 1111 ppt community-1.ppt
sample size new 1111 ppt community-1.ppt
ParulSingal3
 
sample size phd-finalpresentation111.ppt
sample size phd-finalpresentation111.pptsample size phd-finalpresentation111.ppt
sample size phd-finalpresentation111.ppt
tyagikanishka10
 
Histogram-Based Method for Effective Initialization of the K-Means Clustering...
Histogram-Based Method for Effective Initialization of the K-Means Clustering...Histogram-Based Method for Effective Initialization of the K-Means Clustering...
Histogram-Based Method for Effective Initialization of the K-Means Clustering...
Gingles Caroline
 
Summer 2015 Internship
Summer 2015 InternshipSummer 2015 Internship
Summer 2015 Internship
Taylor Martell
 

Similar to Monkeying Around: Automatically Analyzing Malaria Infections in Rhesus Macaques (20)

How to analyse bulk transcriptomic data using Deseq2
How to analyse bulk transcriptomic data using Deseq2How to analyse bulk transcriptomic data using Deseq2
How to analyse bulk transcriptomic data using Deseq2
 
Pitfalls of multivariate pattern analysis(MVPA), fMRI
Pitfalls of multivariate pattern analysis(MVPA), fMRI Pitfalls of multivariate pattern analysis(MVPA), fMRI
Pitfalls of multivariate pattern analysis(MVPA), fMRI
 
sample size new 1111 ppt community-1.ppt
sample size new 1111 ppt community-1.pptsample size new 1111 ppt community-1.ppt
sample size new 1111 ppt community-1.ppt
 
5 5 10
5 5 105 5 10
5 5 10
 
Data Science Using Python
Data Science Using PythonData Science Using Python
Data Science Using Python
 
The W-curve and its application.
The W-curve and its application.The W-curve and its application.
The W-curve and its application.
 
sample size phd-finalpresentation111.ppt
sample size phd-finalpresentation111.pptsample size phd-finalpresentation111.ppt
sample size phd-finalpresentation111.ppt
 
Descriptive versus mechanistic modelling
Descriptive versus mechanistic modellingDescriptive versus mechanistic modelling
Descriptive versus mechanistic modelling
 
MM-KBAC – Using Mixed Models to Adjust for Population Structure in a Rare-var...
MM-KBAC – Using Mixed Models to Adjust for Population Structure in a Rare-var...MM-KBAC – Using Mixed Models to Adjust for Population Structure in a Rare-var...
MM-KBAC – Using Mixed Models to Adjust for Population Structure in a Rare-var...
 
Histogram-Based Method for Effective Initialization of the K-Means Clustering...
Histogram-Based Method for Effective Initialization of the K-Means Clustering...Histogram-Based Method for Effective Initialization of the K-Means Clustering...
Histogram-Based Method for Effective Initialization of the K-Means Clustering...
 
Data association for semantic world modeling from partial views
Data association for semantic world modeling from partial viewsData association for semantic world modeling from partial views
Data association for semantic world modeling from partial views
 
Non Parametric Test by Vikramjit Singh
Non Parametric Test by  Vikramjit SinghNon Parametric Test by  Vikramjit Singh
Non Parametric Test by Vikramjit Singh
 
Summer 2015 Internship
Summer 2015 InternshipSummer 2015 Internship
Summer 2015 Internship
 
A-Walk-on-the-W-Side
A-Walk-on-the-W-SideA-Walk-on-the-W-Side
A-Walk-on-the-W-Side
 
L2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms IL2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms I
 
SVM - Functional Verification
SVM - Functional VerificationSVM - Functional Verification
SVM - Functional Verification
 
15 anomaly detection
15 anomaly detection15 anomaly detection
15 anomaly detection
 
An Updated Survey on Niching Methods and Their Applications
An Updated Survey on Niching Methods and Their ApplicationsAn Updated Survey on Niching Methods and Their Applications
An Updated Survey on Niching Methods and Their Applications
 
Sample and effect size
Sample and effect sizeSample and effect size
Sample and effect size
 
2.7 other classifiers
2.7 other classifiers2.7 other classifiers
2.7 other classifiers
 

More from Jinho Choi

More from Jinho Choi (20)

Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
 
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
 
Competence-Level Prediction and Resume & Job Description Matching Using Conte...
Competence-Level Prediction and Resume & Job Description Matching Using Conte...Competence-Level Prediction and Resume & Job Description Matching Using Conte...
Competence-Level Prediction and Resume & Job Description Matching Using Conte...
 
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
 
The Myth of Higher-Order Inference in Coreference Resolution
The Myth of Higher-Order Inference in Coreference ResolutionThe Myth of Higher-Order Inference in Coreference Resolution
The Myth of Higher-Order Inference in Coreference Resolution
 
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
 
Abstract Meaning Representation
Abstract Meaning RepresentationAbstract Meaning Representation
Abstract Meaning Representation
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role Labeling
 
CKY Parsing
CKY ParsingCKY Parsing
CKY Parsing
 
CS329 - WordNet Similarities
CS329 - WordNet SimilaritiesCS329 - WordNet Similarities
CS329 - WordNet Similarities
 
CS329 - Lexical Relations
CS329 - Lexical RelationsCS329 - Lexical Relations
CS329 - Lexical Relations
 
Automatic Knowledge Base Expansion for Dialogue Management
Automatic Knowledge Base Expansion for Dialogue ManagementAutomatic Knowledge Base Expansion for Dialogue Management
Automatic Knowledge Base Expansion for Dialogue Management
 
Attention is All You Need for AMR Parsing
Attention is All You Need for AMR ParsingAttention is All You Need for AMR Parsing
Attention is All You Need for AMR Parsing
 
Graph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to DialogueGraph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to Dialogue
 
Real-time Coreference Resolution for Dialogue Understanding
Real-time Coreference Resolution for Dialogue UnderstandingReal-time Coreference Resolution for Dialogue Understanding
Real-time Coreference Resolution for Dialogue Understanding
 
Topological Sort
Topological SortTopological Sort
Topological Sort
 
Tries - Put
Tries - PutTries - Put
Tries - Put
 
Multi-modal Embedding Learning for Early Detection of Alzheimer's Disease
Multi-modal Embedding Learning for Early Detection of Alzheimer's DiseaseMulti-modal Embedding Learning for Early Detection of Alzheimer's Disease
Multi-modal Embedding Learning for Early Detection of Alzheimer's Disease
 
Building Widely-Interpretable Semantic Networks for Dialogue Contexts
Building Widely-Interpretable Semantic Networks for Dialogue ContextsBuilding Widely-Interpretable Semantic Networks for Dialogue Contexts
Building Widely-Interpretable Semantic Networks for Dialogue Contexts
 
How to make Emora talk about Sports Intelligently
How to make Emora talk about Sports IntelligentlyHow to make Emora talk about Sports Intelligently
How to make Emora talk about Sports Intelligently
 

Recently uploaded

Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Peter Udo Diehl
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 

Recently uploaded (20)

Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 

Monkeying Around: Automatically Analyzing Malaria Infections in Rhesus Macaques

  • 2. Motivation ★ Why Malaria? ○ 3.2 billion people at risk globally ★ Why begin at E04? ○ To automate analysis in Joyner 2016, foundation for other experiments ○ Compare Human vs. Machine Analysis
  • 3. Motivation: Why Automation? ★ High-dimensional data ★ Precision of data analysis ★ Time: can re-apply same analysis to new data ★ Discover something new ★ Non-experts can still make conclusions about the data
  • 4. Purpose ★ Automatic analysis of E04 ○ Come up with a framework to study small datasets (differently shaped) ★ Apply this framework to other experiments ○ Generalizable to other parasite strains? How and why do the results change?
  • 5. Overview of E04 Experiment ★ P. cynomolgi || P. vivax ★ Rhesus macaques ⇒ humans ★ Provide data for better understanding of the infection = better treatments!
  • 6. Dataset ★ 5 monkeys: 2 non-severe, 1 severe, 1 very severe, 1 lethal ★ Clinical parameters taken daily over the course of the experiment - looked at specifically blood-related parameters
  • 7. Techniques: Data preprocessing / Normalization ★ Scaling, missing data? ★ Normalize data to equalize computations for distance metrics Unit vector Min-max normalization
  • 8. Normalization reduced important variation ★ Used to reduce noise, but in this case, removed important malaria phase information
  • 9. Techniques: Iterations on Curve-fitting ★ Two ways of thinking: ○ Guess x # of Gaussians based on x # peaks, then divide that window into x chunks for fitting (e.g. 3 peaks, so divide window evenly and fit to those 3 ranges) - original logic to code an “n_gaussians” function ○ Guess x # Gaussians based on x # peaks, then specifically divide windows based on peak ranges instead - code a fitting function over the whole interval ★ Parameter search space?
  • 10. Techniques: Iterations on Curve-fitting ★ Two ways of thinking: ○ Guess x # of Gaussians based on x # peaks, then divide that window into x chunks for fitting (e.g. 3 peaks, so divide window evenly and fit to those 3 ranges) - original logic to code an “n_gaussians” function ○ Guess x # Gaussians based on x # peaks, then specifically divide windows based on peak ranges instead - code a fitting function over the whole interval ★ Two approaches: minimizing residual between fit and data vs. relying on built-in function ○ Curve-fitting based on reduction of user-defined loss function: scipy minimize, scipy leastsq, scipy fmin_slsqp ○ Curve-fitting based on built-in loss function: scipy curve_fit
  • 11. Techniques: Iterations on Curve-fitting
  • 12. ★ Fitting window? Peakutils, scipy peak-finding... Result: Curve fitting / peak finding My peak function - includes plateaux! Peakutils results in poor fit
  • 13. Data is cleaned, scaled and normalized; mathematical representation via concatenated Gaussian functions; now onto analysis...
  • 14. Analysis Roadmap: Goal + Technique Relationship among clinical parameters Regression modeling Representation of monkeys in vector space Residual matrices Automatic grouping of clinical parameters in this vector space Clustering Minimizing biological noise to increase similarity among monkeys of similar phenotype Bayesian optimization
  • 15. Analyses we can find automatically Joyner et al. 2016 Automated analyses # Reticulocytes - possible indicator of lethal phenotype ✅ Anemic phenotype worsens with severity ✅ Relationship between hemoglobin and: parasitemia kinetics, mean corpuscular volume (red blood cell size) ✅ Role of thrombocytopenia (platelet deficiency) not well understood ✅ + insight? Lower parasitemia in non-severe phenotype ✅
  • 16. Techniques: Regression ★ Ridge method★ Combined model: Stochastic Gradient Descent Regressor- weights for each monkey as measure of predictor significance
  • 17. Results: Regression ★ Coefficients for both non-severe monkeys are much more similar, in comparison to the other monkeys ★ Suggests some ‘normal’ phenotype vs. sick = anomaly
  • 18. Results: Regression ★ # Reticulocytes largest positive coefficient for lethal phenotype - possible indicator ★ MCV - red blood cell size - as another distinguishing factor? # Reticulocytes - possible indicator of lethal phenotype ✅ Anemic phenotype worsens with severity ✅
  • 19. Results: Regression ★ Hgb shown to have the same relationship among two non-severe monkeys and among the other group ★ Hgb is negative as compared to positive in non-severe monkeys ★ Hgb unrelated to mcv, as found in paper Relationship between hgb and: parasitemia, mcv ✅
  • 20. Results: Combined Regression ★ Possible representation of non-severe phenotype with low regression weights
  • 21. Results: Phased Regression ★ Did not improve mean squared error over whole interval - peak-finding worked well for Gaussian fitting, but not finding phases automatically ★ Also because of evaluation pre-shifting, some of the phases may not have aligned (resulting in larger errors for combined models, shown in table) Target monkey Sum of MSEs of all phases RIc14 (non-severe) 34.288 RSb14 (non-severe) 29.817
  • 22. Techniques: Clustering ★ Clustering, e.g. kmeans (tried Gaussian means, agglomerative hierarchical, spectral, and birch methods) ★ Evaluation via Silhouette Score b = avg dissimilarity with nearest neighboring cluster a = avg similarity within own cluster
  • 23. Techniques: Residual Matrices- representing Monkeys in Vector Space ★ Construct residual matrices where each clinical parameter is characterized by the residual between two monkeys m1 vs. m2 ... m4 vs. m5 gran lymph ... wbc Monkey pairwise residuals / sign match ⇒ Clinicalparameter⇒
  • 24. Techniques: Bayesian Optimization ★ Bayesian Optimization ○ Motivation: derive insights from very complex functions ○ E.g. 7^20 * 7^20 = !!!!!! ■ extremely computationally heavy ○ Optimality of guessed result based on loss function (in our case, residual between two monkeys)
  • 25. RSb14 (non-severe) and RIc14 (non-severe)
  • 26. Results: Residual Matrices + Shifting ★ Shifting helped elucidate the trend between non-severe monkeys Reduced residual from ~ 57 to ~ 12 Lower parasitemia in non-severe phenotype ✅
  • 27. Results: Residual Matrices + Shifting ★ Shifting helped elucidate the trend between non-severe monkeys Relationship between hemoglobin and: parasitemia kinetics ✅ Pre-shifting Post-shifting
  • 28. Results: Residual Matrices + Shifting ★ Monocytes - role in adaptive immune system, so important in first phase? (prognostic of long-term survival?)
  • 29. Results: Clustering + Shifting Parasites / uL clustered with monocytes and reticulocytes, as previously mentioned, post-shifting (up to day 23) # Reticulocytes - possible indicator of lethal phenotype ✅
  • 30. Results: Clustering + Shifting ★ Parameters clustered together over all days, k = 4: how are they related? Granulocytes, lymphocytes, monocytes, platelets, # reticulocytes, reticulocytes concentration, white blood cell total count Parasites / uL Red blood cells / volume, hemoglobin, mean corpuscular hemoglobin conc, mean corpuscular hemoglobin, red blood cell volume, mean platelet volume, total red blood cell count, red blood cell distribution width % reticulocytes (proportional to total red blood cell count) Immune response? Red blood cell-related parameters?
  • 31. Results: Normalization in clustering - tradeoff Role of thrombocytopenia (platelet deficiency) not well understood ✅ + insight?
  • 32. Can we apply this methodology to NEW experiments?
  • 33. E03: P. coatneyi Hackeri (i.e. different parasite) ★ Different representation in vector space ★ Reticulocytes even further from the other clinical data (more significant in this parasite?) ★ More clusters = white blood cells / red blood cells again separated
  • 34. E23: Iterative P. cynomolgi, new monkeys ★ Similar representation in vector space as E04 ★ Thus - a way to characterize the malaria parasite?
  • 35. Conclusions & Contributions & Future Work! ★ Comprehensive framework to analyze malaria experiments - automatically characterizing relationships among monkeys (severity phenotype?), among clinical parameters (which are similarly important?), and malaria parasite ○ Such that non-experts can analyze data ○ Applicable to other experiments ○ Reduce TIME spent studying these results and provide more precise analysis ★ Future ○ Expand existing framework with new methods ○ Application of FULL framework to other experiments - build a more generalized model ○ More in-depth consideration of biological profiles ○ More comprehensive data storage to help automate entire process from start to finish (especially Bayesian optimization) ★ I’ve learned a lot!
  • 36. Thank you ... ★ Dr. Galinski and co. for running the experiments! ★ Thesis committee members - Dr. Eisen, Dr. Fossati, Dr. Prinz ★ My friends for being here!! ★ Dr. Choi for guiding me through the process and pushing me throughout my CS career!
  • 38. E23 combined model weights? Monkey Coefficient RBg14 0.456473 ROh14 0.431664 RAd14 0.085915 RJn13 0.080195 ROc14 0.010533 RIb13 0.044158