SlideShare a Scribd company logo
1 of 32
Download to read offline
Genome Wide Association
Studies (GWAS)
Miguel Gutierrez
Md Rayhanul Masud
March 3, 2022
1
Agenda for Presentation
INTRO TO GWAS
01
Pros/ Cons of GWAS
02
GRAIL
03
SIGNAL PRIORITIZATION
WITH GENOWAP
04
CONCLUSION
05
FUTURE RESEARCH
06
2
What is GWAS (Genome Wide Association Studies)
❖ A Statistical Study
❖ Makes Association between genetic variation (genotype) and
observable traits (phenotype: high blood pressure)
❖ Tags a region of variant genes: including the causal ones
❖ 99.9 percent of genetic letters (3B) are identical in every human
❖ 0.1 percent caters to the mystery of the diversity of human kind
3
Ref: Five Years of GWAS Discovery, Peter M. Visscher et al, 2012
HTRA1 Promoter Polymorphism in Wet Age-Related Macular Degeneration
2005
Genome-wide association study of 14,000 cases of seven common diseases and
3,000 shared controls
2007
Common polygenic variation contributes to risk of schizophrenia that overlaps
with bipolar disorder
2009
Statistical Framework to Predict Functional Non-Coding Regions in the Human
Genome Through Integrated Analysis of Annotation Data
2015
LD Hub: a centralized database and web interface to perform LD score
regression that maximizes the potential of summary level GWAS data for SNP
heritability and genetic correlation analysis
2017
Post GWAS
ML and whole-genome sequencing (WGS) studies
Beyond
4
Real Life Example
A T C G
DNA Alphabet
ATC
CTG
Codons Exon Intron Stop
Codon Codon
_______________________________
Gene
ATG CTC GTT AAG TAA
ATG CTC GTT TAG TAA
_____________________
Variations
Identifying the genetic association with observable behaviors (e.g. traits/diseases)
5
What is SNP (Single Nucleotide Polymorphism) in GWAS
❖ Order of genetic letters in human genomes vary at specific location
❖ This variation is called SNP (pronounced ‘snip’)
❖ But not all SNPs have effects on traits
❖ GWAS finds SNPs to find associations between genes and traits
Image Source: https://www.nutrigeneticsspecialists.com/single-post/2017/03/27/what-is-a-snpc 6
High Level: GWAS Mechanism (Case/Control Study)
Population with
Disease
Population
without Disease
for each SNP
compute frequency for with and w/o disease
compute odd ratio
Genotyping
Genome
Finding
SNPs
SNP with Disease SNP w/o Disease
Case
Control
7
Ref: Designing candidate gene and genome-wide case-control association studies, Krina T. Zondervan et al, 2007
High Level: GWAS Mechanism (Case/Control Study)
SNP A T Total
Case 50 150 200
Control 100 100 200
Total 150 250 400
At some position, variation found with A/T
T is found to be more associated with the disease than A
8
Manhattan Plot based on GWAS Study
https://en.wikipedia.org/wiki/Genome-wide_association_study 9
Where to find GWAS data
National Center
for
Biotechnology
Information
(NCBI)
10
Curve of learning speed (GWAS Study and Publication)
Image Source: https://mobile.twitter.com/GWASCatalog/status/1360288750132150272/photo/1 11
GWAS
Pros
GWAS can lead to the discovery of novel biological
mechanism
implicate genes of unknown function, and experimental
follow-up on loci can lead to the discovery of novel
biological mechanisms that underlying disease
01
GWAS are relevant to the study of
low-frequency and rare variants
informed by data from large reference panels
enabling many low-frequency and rare variants
to be directly genotyped
02
GWAS based on SNP arrays use reliable
genotyping technology.
contemporary genome-wide SNP arrays achieve
call rates, HapMap concordance, Mendelian
consistency and reproducibility of >99.7%
03
GWAS can provide insight into ethnic
variation of complex traits
GWAS in diverse ethnic groups can therefore
reveal heterogeneity in genetic susceptibility to
disease
04
GWAS data are easily shared and publicly
available data facilitates novel discoveries
availability of GWAS summary statistics has
increased dramatically in recent years ( K
Biobank; Kaiser Permanente’s Research Pro.)
05
GWAS based on SNP arrays are
cost-effective for identifying risk loci
genome-wide SNP arrays, like Illumina Infinium
Global Screening Array or Thermo Fisher Axiom
Precision Medicine Research Array, cost
approximately US$40 per sample.
06
Tam et al, 2019
12
GWAS
Cons
Lack of Diversity
Large-scale GWAS efforts have
disproportionally focused on European ancestry
populations with only ~10% of all GWAS
participants being of non-European descent
(Loos, R. 2020)
01
GWAS are penalized by an important multiple
testing burden
done using a Bonferroni correction to maintain
genome-wide false-positive rate at 5% (assumption of
1m independent tests for common genetic variation)
02
GWAS have limited clinical predictive value
modest proportion of heritability explained
03
GWAS based on SNP arrays rely on pre-existing
genetic variant reference panels
SNP array based GWAS depends on completeness of
the sequencing studies and resulting reference panels
that inform genotyping array design
04
GWAS signals may be due to cryptic population
stratification
can result in spurious associations if not properly
accounted for
05
GWAS explain only a modest fraction of the
missing heritability
the variants that GWAS identifies as associated
with a trait/ disease account for only a modest
proportion of the estimated heritability of most
complex traits
06
Tam et al, 2019
13
The more we learn the more we realize how we little know - R. Buckminster Fuller
● Literatures provide us a number of disease regions
● Each region may have a number of SNPs
● All SNPs may not be responsible
● How to find the causal SNPs from the GWAS studies ?
● Is it possible to leverage the existing studies to find the causal genes?
14
GRAIL (Gene Relationships Across Implicated Loci)
Given a collection of disease regions, identifying a subset of genes that are
more highly related than by chance
A list of disease regions identified by GWAS and list of their publications
Input
Output
Degree of relatedness of the genes with the disease
Motivation
● Association does not infer causal
● Identifying causal inference helps better understanding of the disease
15
Ref: Identifying relationships among genomic disease regions: Predicting genes at Pathogenic SNP associations and rare
deletions, Chaudhury et al, 2009
GRAIL works in 4 steps !
16
for each of the overlapping gene:
identifying overlapping genes from the list of disease regions
rank all other genes based on the relatedness to it
count of regions having at least one highly related gene
assign p-value to the count
select the most connected gene in the region
Step 1: Define the overlapping region
Image Source: GRAIL, Chaudhury et al, 2009
17
Lets we look into
gene 1 next
Background of next step
GWAS study is important because of its
ability to identify associations between
disease and related genes. GWAS result
can be used for inventing treatment and
medication of the disease
Published Article / Document
word1 GWAS 2
word2 study 1
……..
wordN disease 2
Word Frequency in Doc 1
Doc1
GWAS of BMI helps researchers know
about more information regarding obesity. Doc2
word1 GWAS 2
word2 BMI 1
……..
wordN obesity 1
Document Frequency of words
18
Background of next step
# of Document
Fewer Documents, More Weight
Document freq. of Word i
Freq. of Word i in Document j
More frequent, More weight
Weight of Word i in
Document j
Word frequency : weight
Inverse document frequency: weight
19
Background of next step
Weighted count of
word i for gene k
All Documents/Publications
referring gene k
# of genes referred in
Document j
for a gene k, calculating g with all words: i in the vocabulary, provides a gene vector for gene k
20
Step 2: Ranking the Gene with other genes
Image Source: GRAIL, Chaudhury et al, 2009
21
Step 3: Counting regions with related genes
Image Source: GRAIL, Chaudhury et al, 2009
22
Step 4: Assign p-value of key gene to the region
Image Source: GRAIL, Chaudhury et al, 2009
23
Higher p-value bagged for some of the SNPs than other
Image Source: GRAIL, Chaudhury et al, 2009
24
Linkage disequilibrium
(LD) sometimes leads
to misinterpretation of
association results.
Specific Problems with GWAS
Bonferroni-corrected
significance threshold is too
conservative and leads to
missing heritability
coding-region-based tools
are not sufficient for
GWAS signal prioritization
…….This brings us to
GENOWAP!
25
GenoWAP: GWAS signal prioritization through integrated
analysis of genomic functional annotation
● goal of GWAS signal prioritization is to assign each SNP a
new score that measures its importance.
● GWAS signal prioritization method that integrates genomic
functional annotation and GWAS test statistics
○ GenoCanyon functional prediction
○ GWAS P-values.
26
GenoWAP: GenoCanyon (Lu, et al 2015)
● what: unsupervised statistical framework
● why: to predict functional non-coding regions in the human
genome
● how: through integrated analysis of multiple biochemical signals
and genomic conservation measures
● For each SNP in a GWAS dataset, the mean GenoCanyon
functional score of its surrounding 10,000 base pairs is used as
the prior probability P(Z =1)
● partition all the SNPs into functional (Z |1) and nonfunctional (Z |
0) subgroups based on the calculated mean
27
GenoWAP: Statistical Model
● for every SNP we define a Ζ to be the indicator of general functionality, and ΖD to
be the indicator of disease-specific functionality
● if a SNP or its surrounding region is active in any genomic functional pathway,
then Z equals to 1. If this SNP or the surrounding region is involved in the disease
pathway, then ZD equals to 1.
● each SNP has an associated ρ to denote its P-value obtained from the standard
GWAS analysis.
● conditional probability of being disease-specific functional given the P-value, i.e.
28
GenoWAP: Statistical Model
● to calculate a marker’s conditional probability P-value, we must know a few things:
○ prior probability of being functional
○ P-value density for disease-specific functional markers
○ P-value density for markers that are not related to the disease
○ the conditional probability of being disease-specific functional given the
marker is functional in the general sense
● partition all the SNPs into functional (Z | 1) and nonfunctional (Z | 0) subgroups
based on the calculated mean
29
GenoWAP: Contribution
● Compared to the top loci ranked on P-values only, top ranked
loci after prioritization tend to show substantially stronger
signals in large GWAS studies.
● Within each locus, we are able to distinguish true signals
among highly correlated SNPs.
30
Next Frontier: Machine Learning
Future Areas of Research
GENERALIZED LINEAR
MODELS
Generates a line of best “fit”
through input data in the
form of a classification
line/boundary
DECISION TREES
trees built around yes/no
rules developed by specific
features
NEURAL NETWORKS
interconnected neurons
evaluate and weigh input
data based on features
produced from the previous
connected neuron
31
Thank you…
Questions?
32

More Related Content

What's hot

Correlation and Path analysis in breeding experiments
Correlation and Path analysis in breeding experimentsCorrelation and Path analysis in breeding experiments
Correlation and Path analysis in breeding experimentsankit dhillon
 
Genome wide Association studies.pptx
Genome wide Association studies.pptxGenome wide Association studies.pptx
Genome wide Association studies.pptxAkshitaAwasthi3
 
Measures of Linkage Disequilibrium
Measures of Linkage DisequilibriumMeasures of Linkage Disequilibrium
Measures of Linkage DisequilibriumAwais Khan
 
Report- Genome wide association studies.
Report- Genome wide association studies.Report- Genome wide association studies.
Report- Genome wide association studies.Varsha Gayatonde
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomicsAmol Kunde
 
genome mapping
genome mappinggenome mapping
genome mappingSuresh San
 
Genomics Assisted Breeding for Resilient Rice: Progress and Prospects
Genomics Assisted Breeding for Resilient Rice: Progress and ProspectsGenomics Assisted Breeding for Resilient Rice: Progress and Prospects
Genomics Assisted Breeding for Resilient Rice: Progress and ProspectsSenthil Natesan
 
Marker Assisted Gene Pyramiding for Disease Resistance in Rice
Marker Assisted Gene Pyramiding for Disease Resistance in RiceMarker Assisted Gene Pyramiding for Disease Resistance in Rice
Marker Assisted Gene Pyramiding for Disease Resistance in RiceIndrapratap1
 
Back to Basics: Using GWAS to Drive Discovery for Complex Diseases
Back to Basics: Using GWAS to Drive Discovery for Complex DiseasesBack to Basics: Using GWAS to Drive Discovery for Complex Diseases
Back to Basics: Using GWAS to Drive Discovery for Complex DiseasesGolden Helix Inc
 
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisSo you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisUniversity of California, Davis
 
mapping population
mapping populationmapping population
mapping populationHarsh Mishra
 
Association mapping approaches for tagging quality traits in maize
Association mapping approaches for tagging quality traits in maizeAssociation mapping approaches for tagging quality traits in maize
Association mapping approaches for tagging quality traits in maizeSenthil Natesan
 

What's hot (20)

Correlation and Path analysis in breeding experiments
Correlation and Path analysis in breeding experimentsCorrelation and Path analysis in breeding experiments
Correlation and Path analysis in breeding experiments
 
Genome wide Association studies.pptx
Genome wide Association studies.pptxGenome wide Association studies.pptx
Genome wide Association studies.pptx
 
Genotyping in Breeding programs
Genotyping in Breeding programsGenotyping in Breeding programs
Genotyping in Breeding programs
 
Measures of Linkage Disequilibrium
Measures of Linkage DisequilibriumMeasures of Linkage Disequilibrium
Measures of Linkage Disequilibrium
 
Report- Genome wide association studies.
Report- Genome wide association studies.Report- Genome wide association studies.
Report- Genome wide association studies.
 
Snp
SnpSnp
Snp
 
Qtl mapping
 Qtl mapping  Qtl mapping
Qtl mapping
 
Genomic Data Analysis
Genomic Data AnalysisGenomic Data Analysis
Genomic Data Analysis
 
NGS: Mapping and de novo assembly
NGS: Mapping and de novo assemblyNGS: Mapping and de novo assembly
NGS: Mapping and de novo assembly
 
Tree building 2
Tree building 2Tree building 2
Tree building 2
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
genome mapping
genome mappinggenome mapping
genome mapping
 
Genomics Assisted Breeding for Resilient Rice: Progress and Prospects
Genomics Assisted Breeding for Resilient Rice: Progress and ProspectsGenomics Assisted Breeding for Resilient Rice: Progress and Prospects
Genomics Assisted Breeding for Resilient Rice: Progress and Prospects
 
Marker Assisted Gene Pyramiding for Disease Resistance in Rice
Marker Assisted Gene Pyramiding for Disease Resistance in RiceMarker Assisted Gene Pyramiding for Disease Resistance in Rice
Marker Assisted Gene Pyramiding for Disease Resistance in Rice
 
Back to Basics: Using GWAS to Drive Discovery for Complex Diseases
Back to Basics: Using GWAS to Drive Discovery for Complex DiseasesBack to Basics: Using GWAS to Drive Discovery for Complex Diseases
Back to Basics: Using GWAS to Drive Discovery for Complex Diseases
 
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisSo you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
 
mapping population
mapping populationmapping population
mapping population
 
Association mapping approaches for tagging quality traits in maize
Association mapping approaches for tagging quality traits in maizeAssociation mapping approaches for tagging quality traits in maize
Association mapping approaches for tagging quality traits in maize
 
Microsatellites Markers
Microsatellites  MarkersMicrosatellites  Markers
Microsatellites Markers
 
Basics of association_mapping
Basics of association_mappingBasics of association_mapping
Basics of association_mapping
 

Similar to GWAS Study.pdf

A New Generation Of Mechanism-Based Biomarkers For The Clinic
A New Generation Of Mechanism-Based Biomarkers For The ClinicA New Generation Of Mechanism-Based Biomarkers For The Clinic
A New Generation Of Mechanism-Based Biomarkers For The ClinicJoaquin Dopazo
 
From reads to pathways for efficient disease gene finding
From reads to pathways for efficient disease gene findingFrom reads to pathways for efficient disease gene finding
From reads to pathways for efficient disease gene findingJoaquin Dopazo
 
Personalized Medicine in Diagnosis and Treatment of Cancer
Personalized Medicine in Diagnosis and Treatment of Cancer Personalized Medicine in Diagnosis and Treatment of Cancer
Personalized Medicine in Diagnosis and Treatment of Cancer Maryam Rafati
 
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701 Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701 Chirag Patel
 
Genome Wide Association Studies in Psychiatry
Genome Wide Association Studies in PsychiatryGenome Wide Association Studies in Psychiatry
Genome Wide Association Studies in PsychiatryDr.Guru S Gowda
 
Contribution of genome-wide association studies to scientific research: a pra...
Contribution of genome-wide association studies to scientific research: a pra...Contribution of genome-wide association studies to scientific research: a pra...
Contribution of genome-wide association studies to scientific research: a pra...Mutiple Sclerosis
 
헬스케어 빅데이터로 무엇을 할 수 있는가?
헬스케어 빅데이터로 무엇을 할 수 있는가?헬스케어 빅데이터로 무엇을 할 수 있는가?
헬스케어 빅데이터로 무엇을 할 수 있는가? Hyung Jin Choi
 
The emerging picture of host genetic control of susceptibility and outcome in...
The emerging picture of host genetic control of susceptibility and outcome in...The emerging picture of host genetic control of susceptibility and outcome in...
The emerging picture of host genetic control of susceptibility and outcome in...Meningitis Research Foundation
 
Japanese Environmental Children's Study and Data-driven E
Japanese Environmental Children's Study and Data-driven EJapanese Environmental Children's Study and Data-driven E
Japanese Environmental Children's Study and Data-driven EChirag Patel
 
Ong et al._Translational utility of next-generation sequencing_2013_Genomics
Ong et al._Translational utility of next-generation sequencing_2013_GenomicsOng et al._Translational utility of next-generation sequencing_2013_Genomics
Ong et al._Translational utility of next-generation sequencing_2013_GenomicsFrank Ong, MD, CPI
 
QMB_Poster_Tom_Kelly
QMB_Poster_Tom_KellyQMB_Poster_Tom_Kelly
QMB_Poster_Tom_KellyTom Kelly
 
Gene hunting strategies
Gene hunting strategiesGene hunting strategies
Gene hunting strategiesAshfaq Ahmad
 
ACMG guidelines 2015: How to interpret DNA variants? [Today's paper]
ACMG guidelines 2015: How to interpret DNA variants? [Today's paper]ACMG guidelines 2015: How to interpret DNA variants? [Today's paper]
ACMG guidelines 2015: How to interpret DNA variants? [Today's paper]HeonjongHan
 
Transcriptional signaling pathways inversely regulated in alzheimer's disease...
Transcriptional signaling pathways inversely regulated in alzheimer's disease...Transcriptional signaling pathways inversely regulated in alzheimer's disease...
Transcriptional signaling pathways inversely regulated in alzheimer's disease...Elsa von Licy
 
Science 2016 Poster 2016.10.20 FINAL
Science 2016 Poster 2016.10.20 FINALScience 2016 Poster 2016.10.20 FINAL
Science 2016 Poster 2016.10.20 FINALAndrew Warburton
 
Next Generation Sequencing
Next Generation SequencingNext Generation Sequencing
Next Generation SequencingShelomi Karoon
 
HEART DISEASES PREDICTION USING MACHINE LEARNING ALGORITHM
HEART DISEASES PREDICTION USING MACHINE LEARNING ALGORITHMHEART DISEASES PREDICTION USING MACHINE LEARNING ALGORITHM
HEART DISEASES PREDICTION USING MACHINE LEARNING ALGORITHMPoojaSri45
 
Genetic polymorphisms
Genetic polymorphisms Genetic polymorphisms
Genetic polymorphisms cindyzeta
 
Genetic polymorphisms pptx
Genetic polymorphisms pptxGenetic polymorphisms pptx
Genetic polymorphisms pptxcindyzeta
 

Similar to GWAS Study.pdf (20)

A New Generation Of Mechanism-Based Biomarkers For The Clinic
A New Generation Of Mechanism-Based Biomarkers For The ClinicA New Generation Of Mechanism-Based Biomarkers For The Clinic
A New Generation Of Mechanism-Based Biomarkers For The Clinic
 
From reads to pathways for efficient disease gene finding
From reads to pathways for efficient disease gene findingFrom reads to pathways for efficient disease gene finding
From reads to pathways for efficient disease gene finding
 
Personalized Medicine in Diagnosis and Treatment of Cancer
Personalized Medicine in Diagnosis and Treatment of Cancer Personalized Medicine in Diagnosis and Treatment of Cancer
Personalized Medicine in Diagnosis and Treatment of Cancer
 
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701 Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
 
Genome Wide Association Studies in Psychiatry
Genome Wide Association Studies in PsychiatryGenome Wide Association Studies in Psychiatry
Genome Wide Association Studies in Psychiatry
 
Contribution of genome-wide association studies to scientific research: a pra...
Contribution of genome-wide association studies to scientific research: a pra...Contribution of genome-wide association studies to scientific research: a pra...
Contribution of genome-wide association studies to scientific research: a pra...
 
헬스케어 빅데이터로 무엇을 할 수 있는가?
헬스케어 빅데이터로 무엇을 할 수 있는가?헬스케어 빅데이터로 무엇을 할 수 있는가?
헬스케어 빅데이터로 무엇을 할 수 있는가?
 
The emerging picture of host genetic control of susceptibility and outcome in...
The emerging picture of host genetic control of susceptibility and outcome in...The emerging picture of host genetic control of susceptibility and outcome in...
The emerging picture of host genetic control of susceptibility and outcome in...
 
NGS-report-amir.pdf
NGS-report-amir.pdfNGS-report-amir.pdf
NGS-report-amir.pdf
 
Japanese Environmental Children's Study and Data-driven E
Japanese Environmental Children's Study and Data-driven EJapanese Environmental Children's Study and Data-driven E
Japanese Environmental Children's Study and Data-driven E
 
Ong et al._Translational utility of next-generation sequencing_2013_Genomics
Ong et al._Translational utility of next-generation sequencing_2013_GenomicsOng et al._Translational utility of next-generation sequencing_2013_Genomics
Ong et al._Translational utility of next-generation sequencing_2013_Genomics
 
QMB_Poster_Tom_Kelly
QMB_Poster_Tom_KellyQMB_Poster_Tom_Kelly
QMB_Poster_Tom_Kelly
 
Gene hunting strategies
Gene hunting strategiesGene hunting strategies
Gene hunting strategies
 
ACMG guidelines 2015: How to interpret DNA variants? [Today's paper]
ACMG guidelines 2015: How to interpret DNA variants? [Today's paper]ACMG guidelines 2015: How to interpret DNA variants? [Today's paper]
ACMG guidelines 2015: How to interpret DNA variants? [Today's paper]
 
Transcriptional signaling pathways inversely regulated in alzheimer's disease...
Transcriptional signaling pathways inversely regulated in alzheimer's disease...Transcriptional signaling pathways inversely regulated in alzheimer's disease...
Transcriptional signaling pathways inversely regulated in alzheimer's disease...
 
Science 2016 Poster 2016.10.20 FINAL
Science 2016 Poster 2016.10.20 FINALScience 2016 Poster 2016.10.20 FINAL
Science 2016 Poster 2016.10.20 FINAL
 
Next Generation Sequencing
Next Generation SequencingNext Generation Sequencing
Next Generation Sequencing
 
HEART DISEASES PREDICTION USING MACHINE LEARNING ALGORITHM
HEART DISEASES PREDICTION USING MACHINE LEARNING ALGORITHMHEART DISEASES PREDICTION USING MACHINE LEARNING ALGORITHM
HEART DISEASES PREDICTION USING MACHINE LEARNING ALGORITHM
 
Genetic polymorphisms
Genetic polymorphisms Genetic polymorphisms
Genetic polymorphisms
 
Genetic polymorphisms pptx
Genetic polymorphisms pptxGenetic polymorphisms pptx
Genetic polymorphisms pptx
 

Recently uploaded

_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Science lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonScience lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonJericReyAuditor
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 

Recently uploaded (20)

Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Science lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonScience lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lesson
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 

GWAS Study.pdf

  • 1. Genome Wide Association Studies (GWAS) Miguel Gutierrez Md Rayhanul Masud March 3, 2022 1
  • 2. Agenda for Presentation INTRO TO GWAS 01 Pros/ Cons of GWAS 02 GRAIL 03 SIGNAL PRIORITIZATION WITH GENOWAP 04 CONCLUSION 05 FUTURE RESEARCH 06 2
  • 3. What is GWAS (Genome Wide Association Studies) ❖ A Statistical Study ❖ Makes Association between genetic variation (genotype) and observable traits (phenotype: high blood pressure) ❖ Tags a region of variant genes: including the causal ones ❖ 99.9 percent of genetic letters (3B) are identical in every human ❖ 0.1 percent caters to the mystery of the diversity of human kind 3 Ref: Five Years of GWAS Discovery, Peter M. Visscher et al, 2012
  • 4. HTRA1 Promoter Polymorphism in Wet Age-Related Macular Degeneration 2005 Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls 2007 Common polygenic variation contributes to risk of schizophrenia that overlaps with bipolar disorder 2009 Statistical Framework to Predict Functional Non-Coding Regions in the Human Genome Through Integrated Analysis of Annotation Data 2015 LD Hub: a centralized database and web interface to perform LD score regression that maximizes the potential of summary level GWAS data for SNP heritability and genetic correlation analysis 2017 Post GWAS ML and whole-genome sequencing (WGS) studies Beyond 4
  • 5. Real Life Example A T C G DNA Alphabet ATC CTG Codons Exon Intron Stop Codon Codon _______________________________ Gene ATG CTC GTT AAG TAA ATG CTC GTT TAG TAA _____________________ Variations Identifying the genetic association with observable behaviors (e.g. traits/diseases) 5
  • 6. What is SNP (Single Nucleotide Polymorphism) in GWAS ❖ Order of genetic letters in human genomes vary at specific location ❖ This variation is called SNP (pronounced ‘snip’) ❖ But not all SNPs have effects on traits ❖ GWAS finds SNPs to find associations between genes and traits Image Source: https://www.nutrigeneticsspecialists.com/single-post/2017/03/27/what-is-a-snpc 6
  • 7. High Level: GWAS Mechanism (Case/Control Study) Population with Disease Population without Disease for each SNP compute frequency for with and w/o disease compute odd ratio Genotyping Genome Finding SNPs SNP with Disease SNP w/o Disease Case Control 7 Ref: Designing candidate gene and genome-wide case-control association studies, Krina T. Zondervan et al, 2007
  • 8. High Level: GWAS Mechanism (Case/Control Study) SNP A T Total Case 50 150 200 Control 100 100 200 Total 150 250 400 At some position, variation found with A/T T is found to be more associated with the disease than A 8
  • 9. Manhattan Plot based on GWAS Study https://en.wikipedia.org/wiki/Genome-wide_association_study 9
  • 10. Where to find GWAS data National Center for Biotechnology Information (NCBI) 10
  • 11. Curve of learning speed (GWAS Study and Publication) Image Source: https://mobile.twitter.com/GWASCatalog/status/1360288750132150272/photo/1 11
  • 12. GWAS Pros GWAS can lead to the discovery of novel biological mechanism implicate genes of unknown function, and experimental follow-up on loci can lead to the discovery of novel biological mechanisms that underlying disease 01 GWAS are relevant to the study of low-frequency and rare variants informed by data from large reference panels enabling many low-frequency and rare variants to be directly genotyped 02 GWAS based on SNP arrays use reliable genotyping technology. contemporary genome-wide SNP arrays achieve call rates, HapMap concordance, Mendelian consistency and reproducibility of >99.7% 03 GWAS can provide insight into ethnic variation of complex traits GWAS in diverse ethnic groups can therefore reveal heterogeneity in genetic susceptibility to disease 04 GWAS data are easily shared and publicly available data facilitates novel discoveries availability of GWAS summary statistics has increased dramatically in recent years ( K Biobank; Kaiser Permanente’s Research Pro.) 05 GWAS based on SNP arrays are cost-effective for identifying risk loci genome-wide SNP arrays, like Illumina Infinium Global Screening Array or Thermo Fisher Axiom Precision Medicine Research Array, cost approximately US$40 per sample. 06 Tam et al, 2019 12
  • 13. GWAS Cons Lack of Diversity Large-scale GWAS efforts have disproportionally focused on European ancestry populations with only ~10% of all GWAS participants being of non-European descent (Loos, R. 2020) 01 GWAS are penalized by an important multiple testing burden done using a Bonferroni correction to maintain genome-wide false-positive rate at 5% (assumption of 1m independent tests for common genetic variation) 02 GWAS have limited clinical predictive value modest proportion of heritability explained 03 GWAS based on SNP arrays rely on pre-existing genetic variant reference panels SNP array based GWAS depends on completeness of the sequencing studies and resulting reference panels that inform genotyping array design 04 GWAS signals may be due to cryptic population stratification can result in spurious associations if not properly accounted for 05 GWAS explain only a modest fraction of the missing heritability the variants that GWAS identifies as associated with a trait/ disease account for only a modest proportion of the estimated heritability of most complex traits 06 Tam et al, 2019 13
  • 14. The more we learn the more we realize how we little know - R. Buckminster Fuller ● Literatures provide us a number of disease regions ● Each region may have a number of SNPs ● All SNPs may not be responsible ● How to find the causal SNPs from the GWAS studies ? ● Is it possible to leverage the existing studies to find the causal genes? 14
  • 15. GRAIL (Gene Relationships Across Implicated Loci) Given a collection of disease regions, identifying a subset of genes that are more highly related than by chance A list of disease regions identified by GWAS and list of their publications Input Output Degree of relatedness of the genes with the disease Motivation ● Association does not infer causal ● Identifying causal inference helps better understanding of the disease 15 Ref: Identifying relationships among genomic disease regions: Predicting genes at Pathogenic SNP associations and rare deletions, Chaudhury et al, 2009
  • 16. GRAIL works in 4 steps ! 16 for each of the overlapping gene: identifying overlapping genes from the list of disease regions rank all other genes based on the relatedness to it count of regions having at least one highly related gene assign p-value to the count select the most connected gene in the region
  • 17. Step 1: Define the overlapping region Image Source: GRAIL, Chaudhury et al, 2009 17 Lets we look into gene 1 next
  • 18. Background of next step GWAS study is important because of its ability to identify associations between disease and related genes. GWAS result can be used for inventing treatment and medication of the disease Published Article / Document word1 GWAS 2 word2 study 1 …….. wordN disease 2 Word Frequency in Doc 1 Doc1 GWAS of BMI helps researchers know about more information regarding obesity. Doc2 word1 GWAS 2 word2 BMI 1 …….. wordN obesity 1 Document Frequency of words 18
  • 19. Background of next step # of Document Fewer Documents, More Weight Document freq. of Word i Freq. of Word i in Document j More frequent, More weight Weight of Word i in Document j Word frequency : weight Inverse document frequency: weight 19
  • 20. Background of next step Weighted count of word i for gene k All Documents/Publications referring gene k # of genes referred in Document j for a gene k, calculating g with all words: i in the vocabulary, provides a gene vector for gene k 20
  • 21. Step 2: Ranking the Gene with other genes Image Source: GRAIL, Chaudhury et al, 2009 21
  • 22. Step 3: Counting regions with related genes Image Source: GRAIL, Chaudhury et al, 2009 22
  • 23. Step 4: Assign p-value of key gene to the region Image Source: GRAIL, Chaudhury et al, 2009 23
  • 24. Higher p-value bagged for some of the SNPs than other Image Source: GRAIL, Chaudhury et al, 2009 24
  • 25. Linkage disequilibrium (LD) sometimes leads to misinterpretation of association results. Specific Problems with GWAS Bonferroni-corrected significance threshold is too conservative and leads to missing heritability coding-region-based tools are not sufficient for GWAS signal prioritization …….This brings us to GENOWAP! 25
  • 26. GenoWAP: GWAS signal prioritization through integrated analysis of genomic functional annotation ● goal of GWAS signal prioritization is to assign each SNP a new score that measures its importance. ● GWAS signal prioritization method that integrates genomic functional annotation and GWAS test statistics ○ GenoCanyon functional prediction ○ GWAS P-values. 26
  • 27. GenoWAP: GenoCanyon (Lu, et al 2015) ● what: unsupervised statistical framework ● why: to predict functional non-coding regions in the human genome ● how: through integrated analysis of multiple biochemical signals and genomic conservation measures ● For each SNP in a GWAS dataset, the mean GenoCanyon functional score of its surrounding 10,000 base pairs is used as the prior probability P(Z =1) ● partition all the SNPs into functional (Z |1) and nonfunctional (Z | 0) subgroups based on the calculated mean 27
  • 28. GenoWAP: Statistical Model ● for every SNP we define a Ζ to be the indicator of general functionality, and ΖD to be the indicator of disease-specific functionality ● if a SNP or its surrounding region is active in any genomic functional pathway, then Z equals to 1. If this SNP or the surrounding region is involved in the disease pathway, then ZD equals to 1. ● each SNP has an associated ρ to denote its P-value obtained from the standard GWAS analysis. ● conditional probability of being disease-specific functional given the P-value, i.e. 28
  • 29. GenoWAP: Statistical Model ● to calculate a marker’s conditional probability P-value, we must know a few things: ○ prior probability of being functional ○ P-value density for disease-specific functional markers ○ P-value density for markers that are not related to the disease ○ the conditional probability of being disease-specific functional given the marker is functional in the general sense ● partition all the SNPs into functional (Z | 1) and nonfunctional (Z | 0) subgroups based on the calculated mean 29
  • 30. GenoWAP: Contribution ● Compared to the top loci ranked on P-values only, top ranked loci after prioritization tend to show substantially stronger signals in large GWAS studies. ● Within each locus, we are able to distinguish true signals among highly correlated SNPs. 30
  • 31. Next Frontier: Machine Learning Future Areas of Research GENERALIZED LINEAR MODELS Generates a line of best “fit” through input data in the form of a classification line/boundary DECISION TREES trees built around yes/no rules developed by specific features NEURAL NETWORKS interconnected neurons evaluate and weigh input data based on features produced from the previous connected neuron 31