SlideShare a Scribd company logo
1 of 39
Download to read offline
Jessica Minnier, OHSU,
Lewis & Clark College Mathematics Colloquium, 3.19.14
Math, Stats and CS
in Public Health and Medical Research
“Biostatistics (a portmanteau of biology and
statistics; sometimes referred to as biometry or
biometrics) is the application of statistics to a
wide range of topics in biology.” – Wikipedia
or,“What is Biostatistics?”
“Bioinformatics is an interdisciplinary scientific field
that develops methods for storing, retrieving,
organizing and analyzing biological data” – Wikipedia
“Computational biology involves the development
and application of data-analytical and theoretical
methods, mathematical modeling and computational
simulation techniques to the study of biological,
behavioral, and social systems.” – Wikipedia
Sample (n = 1)
¨  L&C mathematics major (2007), CS minor
¨  PhD in Biostatistics (2007-2012)
¤  “Inference and Prediction for High Dimensional Data via
Penalized Regression and Kernel Machine Methods”
¨  Postdoc (2012-2013)
¤  Cancer risk prediction with gene-environment
interactions
¨  Assistant Professor (2013-now)
v  Division of Biostatistics
v  Department of Public Health & Preventive Medicine
v  School of Medicine (soon to be School of Public Health)
v  Oregon Health & Science University
Outline
¨  Biostatistics and Bioinformatics/
Computational Biology
¤  More interesting definitions, research examples,
case studies
¤  Types of careers
¨  My trajectory
¤  LC math to grad school to jobs
¨  Resources and advice
Biostatistics, in the news.
Comics from Jim Borgman; XKCD; also fun:
http://stats.stackexchange.com/questions/423/what-is-your-favorite-data-analysis-cartoon
In summary:
A poor understanding of statistics makes everyone look bad.
Biostatistics, in the news
Forbes
Biostatistics, in the news
Applied math?
¨  Applied mathematics often studies deterministic
models (engineering and mechanics, population
models, cryptography)
¨  Some questions can’t be solved by deterministic
models, but a partial answer can be given with
statistics
¤  Does smoking cause lung cancer? (inference from
observational studies)
¤  Is it going to rain tomorrow? (stochastic model)
¤  Do statins lower cholesterol? (randomized trial)
Rafa Irizarry’s math major talk: https://www.youtube.com/watch?v=gXeWdvHKTQQ
Example data
¨  Collection of measurements from a sampled
population
¨  Measurements of a lab experiment
¨  Medical images of subjects’ brains over time
¨  Results of a clinical trial
¨  Gene expression from different types of cultured
tissue
¨  Simulated data modeling HIV progression
¨  Values from electronic medical records sampled
retrospectively
¨  3 million genetic mutations from 20,000 subjects
Brian Caffo’s MOOC: Biostatistics Bootcamp I, lecture 1
Inform medical decisions
¨  A large clinical trial in 2002 by the Women’s
Health Initiative was stopped early due to
preliminary data showing that hormone
replacement therapy had a negative health
impact.
¨  This data contradicted prior evidence on the
efficacy of HRT for post menopausal women.
¨  Statistical decision to end the trial, prevent
further harm
Brian Caffo’s MOOC: Biostatistics Bootcamp I, lecture 1; JAMA 2002;288(3):321-333
Inform medical decisions
¨  Guidelines for mammogram screening
based on probabilities of false positives and
negatives, cost-benefit analyses, survival
analysis
¨  Analysis of adverse effects in a clinical trial
determines drug safety, dosage,
subpopulations
¨  Even general public must make decisions
about risk when making their own medical
decisions
¨  Experts cannot make decisions without data
Bioinformatics & Computational
biology
¨  Sequencing the human genome (aligning,
matching, searching)
¨  Algorithms for turning massive information from
electronic medical records into useful predictors
of disease progression
¨  Machine learning algorithms for risk prediction
models with large and complex data (imaging,
genetic)
¨  Analysis of networks (protein interactions,
genetic pathways, social behavior influencing
health outcomes)
¨  Simulation of complex data (methylation
patterns in the genome)
Biomathematics
¨  Mathematical models to study infectious
disease progression (in a population or in a
body’s cells)
¨  Steady-state simulations of cancer cell
growth
¨  Usually in joint biostatistics/biomathematics
or applied mathematics departments, some
epidemiology
Where do we work?
(non-random sample = my classmates)
¨  Assistant professors: OHSU School of Medicine, UNC School of Medicine, UIUC
Statistics Dept, University of New Mexico School of Medicine
¨  Consultant/Manager, Analysis Group
¨  Assistant Member, RAND Corporation
¤  Nonprofit global policy think tank
¨  Computational Biologist, Genentech
¨  Instructors: UPenn School of Medicine, Harvard School of Public Health
¨  Research Associate, Dana Farber Cancer Institute
¨  Statistician, Partners Health Care
¨  Other possibilities:
¤  Government: National Institutes of Health, Food & Drug, Centers for Disease and
Control,WHO, Health departments in foreign countries
¤  Google, Intel, etc.
¤  Liberal arts colleges or smaller universities focused on teaching
¤  Pharma, Consulting, Labs, Hospitals, Hospital Research Centers, Research Institutes,
Universities
Real data, please?
¨  Two examples…
Case study 1: RNA-Seq Data
¨  RNA sequencing uses
Next Generation
Sequencing (NGS) to
quantify RNA presence
and quantity in a genetic
sample at a moment in
time
¨  Studies the dynamic
transcriptome of a cell
¨  The problem: Compare
expressions of genes in
heart vs. brain tissues?
Which genes are turned
off in heart and on in
brain?
Case study 1: RNA-Seq Data
¨  Step 1: Biologists collect samples, send to lab
for sequencing
¨  Step 2: Genetic material is transformed into
millions of ‘reads’
¤  AACTAGACCTGG
¨  Step 3:The reads are mapped to the genome,
transformed into counts for each gene
¨  Step 4:The distribution of gene counts for
different tissues is compared
RNA-seq: Step 3
¨  Step 3:The reads are mapped to the genome,
transformed into counts for each gene
¨  Computational biologists developed fast
searching algorithms to map a short read
(likely containing errors) to a genome with
millions of base pairs, much repetition, some
variability (SNPs)
RNA-seq: Step 3
¨  Bowtie (Langmead 2009
Genome Biology)
incorporated the Burrows
Wheeler indexing
algorithm to shorten the
mapping to less than a day
(used to be days if not
months)
http://www.cs.jhu.edu/~langmea/resources/
lecture_notes/bwt_and_fm_index.pdf
¨  TopHat (Trapnell 2009
Bioinformatics) can detect
splicing junctions where
certain genes code for
multiple proteins via
alternatively spliced mRNA
RNA-seq: Step 4
¨  Step 4:The distribution of gene counts for
different tissues is compared
¨  Bioinformaticians and biostatisticians clean the
data, normalize the data, and conduct statistical
tests to determine if certain genes are
expressed in one tissue differently than another
¨  Tests based on models: negative binomial
distribution of counts, likelihood ratio tests
¨  Clustering algorithms
¨  Study genetic pathway enrichment, up- or down-
regulated genes
¨  Biologists then study these genes more closely
Heatmap and
dendogram from
cluster algorithm
comparing genes
in cultured mouse
heart and brain
tissues
Case study 2:
Electronic Medical Records
¨  Medical and health records are
becoming increasingly digitized
¨  EMR can contain records of health
measurements (blood pressure),
diagnoses (depression), treatments
prescribed (statins), family history
information, and even detailed
descriptions of doctor visits (clinician
notes)
¨  Thousands of patients can have
dozens of records, some can have just
2
¨  Question: How to select subjects with
bipolar disorder from a large pool of
patients?
Case study 2:
Electronic Medical Records
¨  Step 1: All the records must be collected, stored, put
in a database, managed, tracked
¨  Step 2: A small subset must be read by a team of
clinicians and scored as “case” versus “control”
¨  Step 3:Transform codes and paragraphs of words
into predictors of disease
¨  Step 4: Determine important predictors of disease
and build a prediction model with these variables
¨  Step 5:Validate the model, assess its performance
¨  Step 6: Implement the model in larger pool of
subjects to select the bipolar cases for a future
genetic study
EMR: Step 1
¨  Step 1: All the records must be collected,
stored, put in a database, managed, tracked
¨  Computer scientists and bioinformaticians
must perform these steps (SQL, anyone?
MUMPS? Python, perl…)
¨  Efficiency in this setting is no small task
EMR: Step 3
¨  Step 3:Transform codes and paragraphs of
words into predictors of disease
¨  Natural language processing (NLP) is used
by bioinformaticians to mine the paragraphs
of data for terms that occur often in cases and
less often in controls
¨  Certain words in a doctor’s note become
possible predictors of disease
EMR: Step 4-6
¨  Step 4-6: Determine important predictors of
disease, build a prediction model with these
variables, assess/validate performance,
implement model
¨  Biostatisticians develop
¤  high dimensional regression methods or
machine learning methods
¤  to select important predictors and build models
¤  to predict outcomes based on a large number of
variables (i.e., LASSO, support vector machine
learning)
Regularized logistic regression with NLP predictors
Solution path for coefficients of predictors
based on adaptive LASSO
Back to me.
¨  Began with Yung-Pin’s research project on
CpG islands (related to new field of
epigenetics)
¨  Enjoyed journal clubs/biostatistics meetings
at OHSU
¨  Pure math vs. applied math vs. something
else
¨  Did you want to be a doctor? Do you want to
help people?
¨  Ended up in grad school, what did I learn?
Biostatistics grad school
¨  Statistics ≠ pure math!
¨  A masters would have helped with intuition,
but not usually funded
¨  Research universities ≠ Lewis & Clark!
¨  Depend on self-teaching, your classmates,
and especially the T.A.’s to get by (when
interviewing, meet the students!)
¨  Light teaching load, (hopefully) heavy
collaborative/consulting load
¨  Lots of women in public health (like LC)!
¨  Grad school is always hard.
Bioinformatics grad school
¨  So far mostly the same
¨  More focused on biology
¨  Incorporating more biology training, wet
labs
¨  Software/Bioconductor/R package
development
¨  Diverging from traditional biostat?
Helpful classes
¨  Statistics and probability (obviously)
¨  All the computer science classes, ever (python,
more C!)
¨  Linear algebra
¨  Genetics (molecular biology would have been
nice, though no biology required for biostat)
¨  Advanced calculus/real analysis (for theoretical
classes such as Prob II and Inference II and
writing my thesis, not always required)
¨  Discrete
¨  Abstract Algebra (don’t worry, not required
either)
¨  Liberal arts education in general
Helpful skills
¨  Latex
¨  R
¨  Python or Perl
¨  Unix, cluster/cloud computing
¨  Teaching/tutoring
¨  Research experience!
¨  Programming, software development
¨  C, Fortran
¨  Github
¨  You must enjoy talking to people, collaborating,
explaining math/stat/cs to non mathematical
people!
Pros & Cons
Pros
¨  Interesting & meaningful research problems
¨  Always in demand, more so every day
¨  Collaborate with clinicians, biologists,
researchers of all kinds
¨  Salary isn’t too shabby
Cons
¨  Soft money L
¨  Grants, grants, always grants (but not
necessarily our own)
Last thoughts
¨  Consider Epidemiology
¨  Applied vs.Theoretical research
¨  My day: mostly programming and writing
code (cleaning data + analysis, simulations),
lots of meetings, a bit of pen & pencil
research and thinking of new grants, reading
articles, reading clinical trial protocols,
sample size and power calculations
¨  This will vary on where you work
¨  Masters vs. PhD
More talks like this
¨  Excellent overview of bioinformatics & computational biology fields and
careers in medicine by Dr. Shannon McWeeney (
http://www.biodevlab.org/) at OHSU
https://ohsu.adobeconnect.com/_a46054336/p61byw86754/?
launcher=false&fcsContent=true&pbMode=normal
¨  Rafa Irizarry’s (at HSPH http://rafalab.dfci.harvard.edu/) math major talk:
https://www.youtube.com/watch?v=gXeWdvHKTQQ
¨  Plenty of interesting talks at JSM, the big statistical meeting/conference,
it will be nearby in Seattle in August of 2015
http://www.amstat.org/meetings/jsm/2014/index.cfm (in Boston this
year); http://www.amstat.org/meetings/jsm.cfm
Learning resources
¨  Summer Institute for Training in Biostatistics (for undergrads)
http://www.nhlbi.nih.gov/funding/training/redbook/sibsweb.htm
¤  U Wisc at Madison, Columbia, Emory, Boston U, NC State, U of Iowa, U of Minnesota, U
of Pittsburgh (All of the websites have “What is Biostatistics?” pages)
¨  MOOC’s (Massive Online Open Courses)
¤  Learn R
http://www.flaviobarros.net/2014/03/14/online-multimedia-resources-learn-r
¤  Learn biostats https://www.coursera.org/course/biostats
¤  Learn statistical learning
https://class.stanford.edu/courses/HumanitiesScience/StatLearning/Winter2014/
about
¤  Learn bioinformatics http://www.langmead-lab.org/teaching-materials/ and
http://rosalind.info/problems/list-view/
¨  UW’s Summer Institutes (scholarships for students)
¤  Statistical Genetics; Statistics and Modeling in Infectious Diseases; Statistics for
Clinical Research
¨  Comprehensive list of job postings for statistics/biostatistics/bioinformatics:
http://www.stat.ufl.edu/jobs/
The internet
¨  Youtube
¤  Rafa Irizarry’s youtube channel (especially
http://youtu.be/gXeWdvHKTQQ)
¨  Simply Statistics blog (http://simplystatistics.org/)
¨  R-bloggers
¨  Getting Genetics Done blog
(http://gettinggeneticsdone.blogspot.com/ )
¨  FiveThirtyEight (http://fivethirtyeight.com/)
¨  Neat summary measure of types of research
done in various departments (biased toward
east coast) https://muschellij2.shinyapps.io/ENAR_Over_Time/
Questions?
¨  minnier@ohsu.edu

More Related Content

What's hot

NLP tutorial at AIME 2020
NLP tutorial at AIME 2020NLP tutorial at AIME 2020
NLP tutorial at AIME 2020Rui Zhang
 
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...ijaia
 
MISSING DATA CLASSIFICATION OF CHRONIC KIDNEY DISEASE
MISSING DATA CLASSIFICATION OF CHRONIC KIDNEY DISEASEMISSING DATA CLASSIFICATION OF CHRONIC KIDNEY DISEASE
MISSING DATA CLASSIFICATION OF CHRONIC KIDNEY DISEASEIJDKP
 
Very brief overview of AI in drug discovery
Very brief overview of AI in drug discoveryVery brief overview of AI in drug discovery
Very brief overview of AI in drug discoveryDr. Gerry Higgins
 
September Journal Club -Aishwarya
September Journal Club -AishwaryaSeptember Journal Club -Aishwarya
September Journal Club -AishwaryaRSG Luxembourg
 
Analysis of Imbalanced Classification Algorithms A Perspective View
Analysis of Imbalanced Classification Algorithms A Perspective ViewAnalysis of Imbalanced Classification Algorithms A Perspective View
Analysis of Imbalanced Classification Algorithms A Perspective Viewijtsrd
 
Srge most important publications 2020
Srge most important  publications 2020Srge most important  publications 2020
Srge most important publications 2020Aboul Ella Hassanien
 
The Role of Statistician in Personalized Medicine: An Overview of Statistical...
The Role of Statistician in Personalized Medicine: An Overview of Statistical...The Role of Statistician in Personalized Medicine: An Overview of Statistical...
The Role of Statistician in Personalized Medicine: An Overview of Statistical...Setia Pramana
 
American Statistical Association October 23 2009 Presentation Part 1
American Statistical Association October 23 2009 Presentation Part 1American Statistical Association October 23 2009 Presentation Part 1
American Statistical Association October 23 2009 Presentation Part 1Double Check ĆŐNSULTING
 
Evaluation of Logistic Regression and Neural Network Model With Sensitivity A...
Evaluation of Logistic Regression and Neural Network Model With Sensitivity A...Evaluation of Logistic Regression and Neural Network Model With Sensitivity A...
Evaluation of Logistic Regression and Neural Network Model With Sensitivity A...CSCJournals
 
SCDT: FC-NNC-structured Complex Decision Technique for Gene Analysis Using Fu...
SCDT: FC-NNC-structured Complex Decision Technique for Gene Analysis Using Fu...SCDT: FC-NNC-structured Complex Decision Technique for Gene Analysis Using Fu...
SCDT: FC-NNC-structured Complex Decision Technique for Gene Analysis Using Fu...IJECEIAES
 
Simplified Knowledge Prediction: Application of Machine Learning in Real Life
Simplified Knowledge Prediction: Application of Machine Learning in Real LifeSimplified Knowledge Prediction: Application of Machine Learning in Real Life
Simplified Knowledge Prediction: Application of Machine Learning in Real LifePeea Bal Chakraborty
 
Nomograms why when what Congres CURy 2009
Nomograms why when what Congres CURy 2009Nomograms why when what Congres CURy 2009
Nomograms why when what Congres CURy 2009Vincent H. Hupertan
 
Machine Learning Based Approaches for Prediction of Parkinson's Disease
Machine Learning Based Approaches for Prediction of Parkinson's Disease  Machine Learning Based Approaches for Prediction of Parkinson's Disease
Machine Learning Based Approaches for Prediction of Parkinson's Disease mlaij
 
DENGUE DETECTION AND PREDICTION SYSTEM USING DATA MINING WITH FREQUENCY ANALYSIS
DENGUE DETECTION AND PREDICTION SYSTEM USING DATA MINING WITH FREQUENCY ANALYSISDENGUE DETECTION AND PREDICTION SYSTEM USING DATA MINING WITH FREQUENCY ANALYSIS
DENGUE DETECTION AND PREDICTION SYSTEM USING DATA MINING WITH FREQUENCY ANALYSIScsandit
 
Systems biology in polypharmacology: explaining and predicting drug secondary...
Systems biology in polypharmacology: explaining and predicting drug secondary...Systems biology in polypharmacology: explaining and predicting drug secondary...
Systems biology in polypharmacology: explaining and predicting drug secondary...Andrei KUCHARAVY
 
FunGen JC Presentation - Mostafavi et al. (2019)
FunGen JC Presentation - Mostafavi et al. (2019)FunGen JC Presentation - Mostafavi et al. (2019)
FunGen JC Presentation - Mostafavi et al. (2019)BrianSchilder
 

What's hot (19)

NLP tutorial at AIME 2020
NLP tutorial at AIME 2020NLP tutorial at AIME 2020
NLP tutorial at AIME 2020
 
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...
 
AI for drug discovery
AI for drug discoveryAI for drug discovery
AI for drug discovery
 
MISSING DATA CLASSIFICATION OF CHRONIC KIDNEY DISEASE
MISSING DATA CLASSIFICATION OF CHRONIC KIDNEY DISEASEMISSING DATA CLASSIFICATION OF CHRONIC KIDNEY DISEASE
MISSING DATA CLASSIFICATION OF CHRONIC KIDNEY DISEASE
 
Very brief overview of AI in drug discovery
Very brief overview of AI in drug discoveryVery brief overview of AI in drug discovery
Very brief overview of AI in drug discovery
 
September Journal Club -Aishwarya
September Journal Club -AishwaryaSeptember Journal Club -Aishwarya
September Journal Club -Aishwarya
 
Analysis of Imbalanced Classification Algorithms A Perspective View
Analysis of Imbalanced Classification Algorithms A Perspective ViewAnalysis of Imbalanced Classification Algorithms A Perspective View
Analysis of Imbalanced Classification Algorithms A Perspective View
 
Srge most important publications 2020
Srge most important  publications 2020Srge most important  publications 2020
Srge most important publications 2020
 
The Role of Statistician in Personalized Medicine: An Overview of Statistical...
The Role of Statistician in Personalized Medicine: An Overview of Statistical...The Role of Statistician in Personalized Medicine: An Overview of Statistical...
The Role of Statistician in Personalized Medicine: An Overview of Statistical...
 
American Statistical Association October 23 2009 Presentation Part 1
American Statistical Association October 23 2009 Presentation Part 1American Statistical Association October 23 2009 Presentation Part 1
American Statistical Association October 23 2009 Presentation Part 1
 
Evaluation of Logistic Regression and Neural Network Model With Sensitivity A...
Evaluation of Logistic Regression and Neural Network Model With Sensitivity A...Evaluation of Logistic Regression and Neural Network Model With Sensitivity A...
Evaluation of Logistic Regression and Neural Network Model With Sensitivity A...
 
SCDT: FC-NNC-structured Complex Decision Technique for Gene Analysis Using Fu...
SCDT: FC-NNC-structured Complex Decision Technique for Gene Analysis Using Fu...SCDT: FC-NNC-structured Complex Decision Technique for Gene Analysis Using Fu...
SCDT: FC-NNC-structured Complex Decision Technique for Gene Analysis Using Fu...
 
Simplified Knowledge Prediction: Application of Machine Learning in Real Life
Simplified Knowledge Prediction: Application of Machine Learning in Real LifeSimplified Knowledge Prediction: Application of Machine Learning in Real Life
Simplified Knowledge Prediction: Application of Machine Learning in Real Life
 
Nomograms why when what Congres CURy 2009
Nomograms why when what Congres CURy 2009Nomograms why when what Congres CURy 2009
Nomograms why when what Congres CURy 2009
 
Machine Learning Based Approaches for Prediction of Parkinson's Disease
Machine Learning Based Approaches for Prediction of Parkinson's Disease  Machine Learning Based Approaches for Prediction of Parkinson's Disease
Machine Learning Based Approaches for Prediction of Parkinson's Disease
 
DENGUE DETECTION AND PREDICTION SYSTEM USING DATA MINING WITH FREQUENCY ANALYSIS
DENGUE DETECTION AND PREDICTION SYSTEM USING DATA MINING WITH FREQUENCY ANALYSISDENGUE DETECTION AND PREDICTION SYSTEM USING DATA MINING WITH FREQUENCY ANALYSIS
DENGUE DETECTION AND PREDICTION SYSTEM USING DATA MINING WITH FREQUENCY ANALYSIS
 
Systems biology in polypharmacology: explaining and predicting drug secondary...
Systems biology in polypharmacology: explaining and predicting drug secondary...Systems biology in polypharmacology: explaining and predicting drug secondary...
Systems biology in polypharmacology: explaining and predicting drug secondary...
 
FunGen JC Presentation - Mostafavi et al. (2019)
FunGen JC Presentation - Mostafavi et al. (2019)FunGen JC Presentation - Mostafavi et al. (2019)
FunGen JC Presentation - Mostafavi et al. (2019)
 
PMED Undergraduate Workshop - Modeling and Estimating Biological Heterogeneit...
PMED Undergraduate Workshop - Modeling and Estimating Biological Heterogeneit...PMED Undergraduate Workshop - Modeling and Estimating Biological Heterogeneit...
PMED Undergraduate Workshop - Modeling and Estimating Biological Heterogeneit...
 

Viewers also liked

REALIDAD AUMENTADA EVARISTO GLEZ PORTAS
REALIDAD AUMENTADA EVARISTO GLEZ PORTASREALIDAD AUMENTADA EVARISTO GLEZ PORTAS
REALIDAD AUMENTADA EVARISTO GLEZ PORTASVani González Portas
 
explain what values and attitudes are and describe their impact on managerial...
explain what values and attitudes are and describe their impact on managerial...explain what values and attitudes are and describe their impact on managerial...
explain what values and attitudes are and describe their impact on managerial...evangeline jumalon
 
Ensayo cientifico accesibilidad y circulacion peatonal. electiva iv.
Ensayo cientifico accesibilidad y circulacion peatonal. electiva iv.Ensayo cientifico accesibilidad y circulacion peatonal. electiva iv.
Ensayo cientifico accesibilidad y circulacion peatonal. electiva iv.erika acuña noriega
 
WEE-Nepal Updates_Final_8 Oct 2015
WEE-Nepal Updates_Final_8 Oct 2015WEE-Nepal Updates_Final_8 Oct 2015
WEE-Nepal Updates_Final_8 Oct 2015Keshab Bahadur Thapa
 
Minor disorder of pregnancy ppt
Minor disorder of pregnancy pptMinor disorder of pregnancy ppt
Minor disorder of pregnancy pptpinal darji
 

Viewers also liked (10)

Metodos de demanda vehicular
Metodos de demanda vehicularMetodos de demanda vehicular
Metodos de demanda vehicular
 
REALIDAD AUMENTADA EVARISTO GLEZ PORTAS
REALIDAD AUMENTADA EVARISTO GLEZ PORTASREALIDAD AUMENTADA EVARISTO GLEZ PORTAS
REALIDAD AUMENTADA EVARISTO GLEZ PORTAS
 
About-Callbox
About-CallboxAbout-Callbox
About-Callbox
 
Ing transito (1)
Ing transito (1)Ing transito (1)
Ing transito (1)
 
explain what values and attitudes are and describe their impact on managerial...
explain what values and attitudes are and describe their impact on managerial...explain what values and attitudes are and describe their impact on managerial...
explain what values and attitudes are and describe their impact on managerial...
 
Subdrenajes
SubdrenajesSubdrenajes
Subdrenajes
 
2016 Middle Market M&A Activity
2016 Middle Market M&A Activity2016 Middle Market M&A Activity
2016 Middle Market M&A Activity
 
Ensayo cientifico accesibilidad y circulacion peatonal. electiva iv.
Ensayo cientifico accesibilidad y circulacion peatonal. electiva iv.Ensayo cientifico accesibilidad y circulacion peatonal. electiva iv.
Ensayo cientifico accesibilidad y circulacion peatonal. electiva iv.
 
WEE-Nepal Updates_Final_8 Oct 2015
WEE-Nepal Updates_Final_8 Oct 2015WEE-Nepal Updates_Final_8 Oct 2015
WEE-Nepal Updates_Final_8 Oct 2015
 
Minor disorder of pregnancy ppt
Minor disorder of pregnancy pptMinor disorder of pregnancy ppt
Minor disorder of pregnancy ppt
 

Similar to Math, Stats and CS in Public Health and Medical Research

Amia tb-review-15
Amia tb-review-15Amia tb-review-15
Amia tb-review-15Russ Altman
 
biostatistics-220223232107.pdf
biostatistics-220223232107.pdfbiostatistics-220223232107.pdf
biostatistics-220223232107.pdfBagalanaSteven
 
Amia tbi-14-final
Amia tbi-14-finalAmia tbi-14-final
Amia tbi-14-finalRuss Altman
 
INBIOMEDvision Workshop at MIE 2011. Victoria López
INBIOMEDvision Workshop at MIE 2011. Victoria LópezINBIOMEDvision Workshop at MIE 2011. Victoria López
INBIOMEDvision Workshop at MIE 2011. Victoria LópezINBIOMEDvision
 
Amia tb-review-13
Amia tb-review-13Amia tb-review-13
Amia tb-review-13Russ Altman
 
Research Statement Chien-Wei Lin
Research Statement Chien-Wei LinResearch Statement Chien-Wei Lin
Research Statement Chien-Wei LinChien-Wei Lin
 
Curriculum_Vitae_Mark_Ebbert-modern
Curriculum_Vitae_Mark_Ebbert-modernCurriculum_Vitae_Mark_Ebbert-modern
Curriculum_Vitae_Mark_Ebbert-modernMark Ebbert
 
The Clinical Genome Conference 2014
The Clinical Genome Conference 2014The Clinical Genome Conference 2014
The Clinical Genome Conference 2014Nicole Proulx
 
Role of bioinformatics of drug designing
Role of bioinformatics of drug designingRole of bioinformatics of drug designing
Role of bioinformatics of drug designingDr NEETHU ASOKAN
 
Pistoia Alliance-Elsevier Datathon
Pistoia Alliance-Elsevier DatathonPistoia Alliance-Elsevier Datathon
Pistoia Alliance-Elsevier DatathonPistoia Alliance
 
MseqDR consortium: a grass-roots effort to establish a global resource aimed ...
MseqDR consortium: a grass-roots effort to establish a global resource aimed ...MseqDR consortium: a grass-roots effort to establish a global resource aimed ...
MseqDR consortium: a grass-roots effort to establish a global resource aimed ...Human Variome Project
 
Amia tb-review-12
Amia tb-review-12Amia tb-review-12
Amia tb-review-12Russ Altman
 
BIOINFORMATICS Applications And Challenges
BIOINFORMATICS Applications And ChallengesBIOINFORMATICS Applications And Challenges
BIOINFORMATICS Applications And ChallengesAmos Watentena
 
Methods to enhance the validity of precision guidelines emerging from big data
Methods to enhance the validity of precision guidelines emerging from big dataMethods to enhance the validity of precision guidelines emerging from big data
Methods to enhance the validity of precision guidelines emerging from big dataChirag Patel
 
CV of Rong Chen
CV of Rong ChenCV of Rong Chen
CV of Rong ChenRong Chen
 

Similar to Math, Stats and CS in Public Health and Medical Research (20)

Amia tb-review-15
Amia tb-review-15Amia tb-review-15
Amia tb-review-15
 
biostatistics-220223232107.pdf
biostatistics-220223232107.pdfbiostatistics-220223232107.pdf
biostatistics-220223232107.pdf
 
Biostatistics
BiostatisticsBiostatistics
Biostatistics
 
Amia tbi-14-final
Amia tbi-14-finalAmia tbi-14-final
Amia tbi-14-final
 
Qiu_CV_Feb12_2017
Qiu_CV_Feb12_2017Qiu_CV_Feb12_2017
Qiu_CV_Feb12_2017
 
JALANov2000
JALANov2000JALANov2000
JALANov2000
 
INBIOMEDvision Workshop at MIE 2011. Victoria López
INBIOMEDvision Workshop at MIE 2011. Victoria LópezINBIOMEDvision Workshop at MIE 2011. Victoria López
INBIOMEDvision Workshop at MIE 2011. Victoria López
 
Amia tb-review-13
Amia tb-review-13Amia tb-review-13
Amia tb-review-13
 
Research Statement Chien-Wei Lin
Research Statement Chien-Wei LinResearch Statement Chien-Wei Lin
Research Statement Chien-Wei Lin
 
Curriculum_Vitae_Mark_Ebbert-modern
Curriculum_Vitae_Mark_Ebbert-modernCurriculum_Vitae_Mark_Ebbert-modern
Curriculum_Vitae_Mark_Ebbert-modern
 
The Clinical Genome Conference 2014
The Clinical Genome Conference 2014The Clinical Genome Conference 2014
The Clinical Genome Conference 2014
 
Bioinformatics .pptx
Bioinformatics .pptxBioinformatics .pptx
Bioinformatics .pptx
 
Role of bioinformatics of drug designing
Role of bioinformatics of drug designingRole of bioinformatics of drug designing
Role of bioinformatics of drug designing
 
Pistoia Alliance-Elsevier Datathon
Pistoia Alliance-Elsevier DatathonPistoia Alliance-Elsevier Datathon
Pistoia Alliance-Elsevier Datathon
 
Computational biology
Computational biologyComputational biology
Computational biology
 
MseqDR consortium: a grass-roots effort to establish a global resource aimed ...
MseqDR consortium: a grass-roots effort to establish a global resource aimed ...MseqDR consortium: a grass-roots effort to establish a global resource aimed ...
MseqDR consortium: a grass-roots effort to establish a global resource aimed ...
 
Amia tb-review-12
Amia tb-review-12Amia tb-review-12
Amia tb-review-12
 
BIOINFORMATICS Applications And Challenges
BIOINFORMATICS Applications And ChallengesBIOINFORMATICS Applications And Challenges
BIOINFORMATICS Applications And Challenges
 
Methods to enhance the validity of precision guidelines emerging from big data
Methods to enhance the validity of precision guidelines emerging from big dataMethods to enhance the validity of precision guidelines emerging from big data
Methods to enhance the validity of precision guidelines emerging from big data
 
CV of Rong Chen
CV of Rong ChenCV of Rong Chen
CV of Rong Chen
 

Recently uploaded

Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelBoston Institute of Analytics
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxHaritikaChhatwal1
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfWorld Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfsimulationsindia
 
Rithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfRithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfrahulyadav957181
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxSimranPal17
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataTecnoIncentive
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 

Recently uploaded (20)

Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptx
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfWorld Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
 
Rithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfRithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdf
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptx
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 

Math, Stats and CS in Public Health and Medical Research

  • 1. Jessica Minnier, OHSU, Lewis & Clark College Mathematics Colloquium, 3.19.14 Math, Stats and CS in Public Health and Medical Research
  • 2. “Biostatistics (a portmanteau of biology and statistics; sometimes referred to as biometry or biometrics) is the application of statistics to a wide range of topics in biology.” – Wikipedia or,“What is Biostatistics?” “Bioinformatics is an interdisciplinary scientific field that develops methods for storing, retrieving, organizing and analyzing biological data” – Wikipedia “Computational biology involves the development and application of data-analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, behavioral, and social systems.” – Wikipedia
  • 3. Sample (n = 1) ¨  L&C mathematics major (2007), CS minor ¨  PhD in Biostatistics (2007-2012) ¤  “Inference and Prediction for High Dimensional Data via Penalized Regression and Kernel Machine Methods” ¨  Postdoc (2012-2013) ¤  Cancer risk prediction with gene-environment interactions ¨  Assistant Professor (2013-now) v  Division of Biostatistics v  Department of Public Health & Preventive Medicine v  School of Medicine (soon to be School of Public Health) v  Oregon Health & Science University
  • 4. Outline ¨  Biostatistics and Bioinformatics/ Computational Biology ¤  More interesting definitions, research examples, case studies ¤  Types of careers ¨  My trajectory ¤  LC math to grad school to jobs ¨  Resources and advice
  • 5. Biostatistics, in the news. Comics from Jim Borgman; XKCD; also fun: http://stats.stackexchange.com/questions/423/what-is-your-favorite-data-analysis-cartoon In summary: A poor understanding of statistics makes everyone look bad.
  • 6. Biostatistics, in the news Forbes
  • 8. Applied math? ¨  Applied mathematics often studies deterministic models (engineering and mechanics, population models, cryptography) ¨  Some questions can’t be solved by deterministic models, but a partial answer can be given with statistics ¤  Does smoking cause lung cancer? (inference from observational studies) ¤  Is it going to rain tomorrow? (stochastic model) ¤  Do statins lower cholesterol? (randomized trial) Rafa Irizarry’s math major talk: https://www.youtube.com/watch?v=gXeWdvHKTQQ
  • 9. Example data ¨  Collection of measurements from a sampled population ¨  Measurements of a lab experiment ¨  Medical images of subjects’ brains over time ¨  Results of a clinical trial ¨  Gene expression from different types of cultured tissue ¨  Simulated data modeling HIV progression ¨  Values from electronic medical records sampled retrospectively ¨  3 million genetic mutations from 20,000 subjects Brian Caffo’s MOOC: Biostatistics Bootcamp I, lecture 1
  • 10. Inform medical decisions ¨  A large clinical trial in 2002 by the Women’s Health Initiative was stopped early due to preliminary data showing that hormone replacement therapy had a negative health impact. ¨  This data contradicted prior evidence on the efficacy of HRT for post menopausal women. ¨  Statistical decision to end the trial, prevent further harm Brian Caffo’s MOOC: Biostatistics Bootcamp I, lecture 1; JAMA 2002;288(3):321-333
  • 11. Inform medical decisions ¨  Guidelines for mammogram screening based on probabilities of false positives and negatives, cost-benefit analyses, survival analysis ¨  Analysis of adverse effects in a clinical trial determines drug safety, dosage, subpopulations ¨  Even general public must make decisions about risk when making their own medical decisions ¨  Experts cannot make decisions without data
  • 12. Bioinformatics & Computational biology ¨  Sequencing the human genome (aligning, matching, searching) ¨  Algorithms for turning massive information from electronic medical records into useful predictors of disease progression ¨  Machine learning algorithms for risk prediction models with large and complex data (imaging, genetic) ¨  Analysis of networks (protein interactions, genetic pathways, social behavior influencing health outcomes) ¨  Simulation of complex data (methylation patterns in the genome)
  • 13. Biomathematics ¨  Mathematical models to study infectious disease progression (in a population or in a body’s cells) ¨  Steady-state simulations of cancer cell growth ¨  Usually in joint biostatistics/biomathematics or applied mathematics departments, some epidemiology
  • 14. Where do we work? (non-random sample = my classmates) ¨  Assistant professors: OHSU School of Medicine, UNC School of Medicine, UIUC Statistics Dept, University of New Mexico School of Medicine ¨  Consultant/Manager, Analysis Group ¨  Assistant Member, RAND Corporation ¤  Nonprofit global policy think tank ¨  Computational Biologist, Genentech ¨  Instructors: UPenn School of Medicine, Harvard School of Public Health ¨  Research Associate, Dana Farber Cancer Institute ¨  Statistician, Partners Health Care ¨  Other possibilities: ¤  Government: National Institutes of Health, Food & Drug, Centers for Disease and Control,WHO, Health departments in foreign countries ¤  Google, Intel, etc. ¤  Liberal arts colleges or smaller universities focused on teaching ¤  Pharma, Consulting, Labs, Hospitals, Hospital Research Centers, Research Institutes, Universities
  • 15. Real data, please? ¨  Two examples…
  • 16. Case study 1: RNA-Seq Data ¨  RNA sequencing uses Next Generation Sequencing (NGS) to quantify RNA presence and quantity in a genetic sample at a moment in time ¨  Studies the dynamic transcriptome of a cell ¨  The problem: Compare expressions of genes in heart vs. brain tissues? Which genes are turned off in heart and on in brain?
  • 17. Case study 1: RNA-Seq Data ¨  Step 1: Biologists collect samples, send to lab for sequencing ¨  Step 2: Genetic material is transformed into millions of ‘reads’ ¤  AACTAGACCTGG ¨  Step 3:The reads are mapped to the genome, transformed into counts for each gene ¨  Step 4:The distribution of gene counts for different tissues is compared
  • 18. RNA-seq: Step 3 ¨  Step 3:The reads are mapped to the genome, transformed into counts for each gene ¨  Computational biologists developed fast searching algorithms to map a short read (likely containing errors) to a genome with millions of base pairs, much repetition, some variability (SNPs)
  • 19. RNA-seq: Step 3 ¨  Bowtie (Langmead 2009 Genome Biology) incorporated the Burrows Wheeler indexing algorithm to shorten the mapping to less than a day (used to be days if not months) http://www.cs.jhu.edu/~langmea/resources/ lecture_notes/bwt_and_fm_index.pdf ¨  TopHat (Trapnell 2009 Bioinformatics) can detect splicing junctions where certain genes code for multiple proteins via alternatively spliced mRNA
  • 20. RNA-seq: Step 4 ¨  Step 4:The distribution of gene counts for different tissues is compared ¨  Bioinformaticians and biostatisticians clean the data, normalize the data, and conduct statistical tests to determine if certain genes are expressed in one tissue differently than another ¨  Tests based on models: negative binomial distribution of counts, likelihood ratio tests ¨  Clustering algorithms ¨  Study genetic pathway enrichment, up- or down- regulated genes ¨  Biologists then study these genes more closely
  • 21. Heatmap and dendogram from cluster algorithm comparing genes in cultured mouse heart and brain tissues
  • 22.
  • 23. Case study 2: Electronic Medical Records ¨  Medical and health records are becoming increasingly digitized ¨  EMR can contain records of health measurements (blood pressure), diagnoses (depression), treatments prescribed (statins), family history information, and even detailed descriptions of doctor visits (clinician notes) ¨  Thousands of patients can have dozens of records, some can have just 2 ¨  Question: How to select subjects with bipolar disorder from a large pool of patients?
  • 24. Case study 2: Electronic Medical Records ¨  Step 1: All the records must be collected, stored, put in a database, managed, tracked ¨  Step 2: A small subset must be read by a team of clinicians and scored as “case” versus “control” ¨  Step 3:Transform codes and paragraphs of words into predictors of disease ¨  Step 4: Determine important predictors of disease and build a prediction model with these variables ¨  Step 5:Validate the model, assess its performance ¨  Step 6: Implement the model in larger pool of subjects to select the bipolar cases for a future genetic study
  • 25. EMR: Step 1 ¨  Step 1: All the records must be collected, stored, put in a database, managed, tracked ¨  Computer scientists and bioinformaticians must perform these steps (SQL, anyone? MUMPS? Python, perl…) ¨  Efficiency in this setting is no small task
  • 26. EMR: Step 3 ¨  Step 3:Transform codes and paragraphs of words into predictors of disease ¨  Natural language processing (NLP) is used by bioinformaticians to mine the paragraphs of data for terms that occur often in cases and less often in controls ¨  Certain words in a doctor’s note become possible predictors of disease
  • 27. EMR: Step 4-6 ¨  Step 4-6: Determine important predictors of disease, build a prediction model with these variables, assess/validate performance, implement model ¨  Biostatisticians develop ¤  high dimensional regression methods or machine learning methods ¤  to select important predictors and build models ¤  to predict outcomes based on a large number of variables (i.e., LASSO, support vector machine learning)
  • 28. Regularized logistic regression with NLP predictors Solution path for coefficients of predictors based on adaptive LASSO
  • 29. Back to me. ¨  Began with Yung-Pin’s research project on CpG islands (related to new field of epigenetics) ¨  Enjoyed journal clubs/biostatistics meetings at OHSU ¨  Pure math vs. applied math vs. something else ¨  Did you want to be a doctor? Do you want to help people? ¨  Ended up in grad school, what did I learn?
  • 30. Biostatistics grad school ¨  Statistics ≠ pure math! ¨  A masters would have helped with intuition, but not usually funded ¨  Research universities ≠ Lewis & Clark! ¨  Depend on self-teaching, your classmates, and especially the T.A.’s to get by (when interviewing, meet the students!) ¨  Light teaching load, (hopefully) heavy collaborative/consulting load ¨  Lots of women in public health (like LC)! ¨  Grad school is always hard.
  • 31. Bioinformatics grad school ¨  So far mostly the same ¨  More focused on biology ¨  Incorporating more biology training, wet labs ¨  Software/Bioconductor/R package development ¨  Diverging from traditional biostat?
  • 32. Helpful classes ¨  Statistics and probability (obviously) ¨  All the computer science classes, ever (python, more C!) ¨  Linear algebra ¨  Genetics (molecular biology would have been nice, though no biology required for biostat) ¨  Advanced calculus/real analysis (for theoretical classes such as Prob II and Inference II and writing my thesis, not always required) ¨  Discrete ¨  Abstract Algebra (don’t worry, not required either) ¨  Liberal arts education in general
  • 33. Helpful skills ¨  Latex ¨  R ¨  Python or Perl ¨  Unix, cluster/cloud computing ¨  Teaching/tutoring ¨  Research experience! ¨  Programming, software development ¨  C, Fortran ¨  Github ¨  You must enjoy talking to people, collaborating, explaining math/stat/cs to non mathematical people!
  • 34. Pros & Cons Pros ¨  Interesting & meaningful research problems ¨  Always in demand, more so every day ¨  Collaborate with clinicians, biologists, researchers of all kinds ¨  Salary isn’t too shabby Cons ¨  Soft money L ¨  Grants, grants, always grants (but not necessarily our own)
  • 35. Last thoughts ¨  Consider Epidemiology ¨  Applied vs.Theoretical research ¨  My day: mostly programming and writing code (cleaning data + analysis, simulations), lots of meetings, a bit of pen & pencil research and thinking of new grants, reading articles, reading clinical trial protocols, sample size and power calculations ¨  This will vary on where you work ¨  Masters vs. PhD
  • 36. More talks like this ¨  Excellent overview of bioinformatics & computational biology fields and careers in medicine by Dr. Shannon McWeeney ( http://www.biodevlab.org/) at OHSU https://ohsu.adobeconnect.com/_a46054336/p61byw86754/? launcher=false&fcsContent=true&pbMode=normal ¨  Rafa Irizarry’s (at HSPH http://rafalab.dfci.harvard.edu/) math major talk: https://www.youtube.com/watch?v=gXeWdvHKTQQ ¨  Plenty of interesting talks at JSM, the big statistical meeting/conference, it will be nearby in Seattle in August of 2015 http://www.amstat.org/meetings/jsm/2014/index.cfm (in Boston this year); http://www.amstat.org/meetings/jsm.cfm
  • 37. Learning resources ¨  Summer Institute for Training in Biostatistics (for undergrads) http://www.nhlbi.nih.gov/funding/training/redbook/sibsweb.htm ¤  U Wisc at Madison, Columbia, Emory, Boston U, NC State, U of Iowa, U of Minnesota, U of Pittsburgh (All of the websites have “What is Biostatistics?” pages) ¨  MOOC’s (Massive Online Open Courses) ¤  Learn R http://www.flaviobarros.net/2014/03/14/online-multimedia-resources-learn-r ¤  Learn biostats https://www.coursera.org/course/biostats ¤  Learn statistical learning https://class.stanford.edu/courses/HumanitiesScience/StatLearning/Winter2014/ about ¤  Learn bioinformatics http://www.langmead-lab.org/teaching-materials/ and http://rosalind.info/problems/list-view/ ¨  UW’s Summer Institutes (scholarships for students) ¤  Statistical Genetics; Statistics and Modeling in Infectious Diseases; Statistics for Clinical Research ¨  Comprehensive list of job postings for statistics/biostatistics/bioinformatics: http://www.stat.ufl.edu/jobs/
  • 38. The internet ¨  Youtube ¤  Rafa Irizarry’s youtube channel (especially http://youtu.be/gXeWdvHKTQQ) ¨  Simply Statistics blog (http://simplystatistics.org/) ¨  R-bloggers ¨  Getting Genetics Done blog (http://gettinggeneticsdone.blogspot.com/ ) ¨  FiveThirtyEight (http://fivethirtyeight.com/) ¨  Neat summary measure of types of research done in various departments (biased toward east coast) https://muschellij2.shinyapps.io/ENAR_Over_Time/