SlideShare a Scribd company logo
A brief introduction to 
epistasis detection in GWAS 
2014. 01. 27. 
Hyun-hwan Jeong
Agenda 
• Introduction 
• Problem definition 
• Computational detection methods 
• Challenges 
2
Introduction
Single Nucleotide Polymorphism 
• A single letter change in DNA sequence 
• DNA sequence : 99.9% identical 
• Common type of genetic variation 
• ≥ 1% changes in general population 
…ATTCGCCGGCTGCAACGTTAGA… 
…ATTCGCCGGCTGCAGCGTTAGA… 
…ATTCGCCGGCTGCATCGTTAGA… 
4
Genotype, phenotype and Allele 
http://en.wikipedia.org/wiki/Phenotype 
phenotype 
genotype 
allele 
5
Genome Wide Association Study 
for relation between Single SNP and disease 
Manhattan plot of the GWAS of the discovery cohort comprising 
2,346 SSc cases and 5,193 healthy controls. - Nature Genetics 42, 426–429 (2010) 
6
Why is detecting epistasis needed 
in GWAS? 
An illustration of interaction pattern between two SNPs with no marginal effect. - 
Bioinformatics 26, 30-37 (2010) 
7
Problem Definition 
8
Problem definition 
Epistasis detection problem 
• Object 
• Detection of causative SNPs for disease 
• Maximum value for defined measure 
• Dataset 
• 0.5M ~ 1M SNPs 
• 4,000 ~ 5,000 subjects 
• Binary disease status(case/control) 
• 100MB ~ 1GB genotype data file 
9
Problem definition – Data format 
푆푁푃ퟎ 푆푁푃ퟏ 푆푁푃ퟐ 푆푁푃ퟑ 푆푁푃ퟒ 푆푁푃ퟓ 푆푁푃ퟔ 푆푁푃ퟕ 푆푁푃ퟖ 푆푁푃ퟗ 퐶퐿퐴푆푆 
1 1 0 0 0 0 1 0 1 1 1 
0 0 1 0 0 0 1 1 0 2 1 
0 0 0 0 0 0 1 0 0 0 1 
1 1 0 0 0 0 0 1 0 2 1 
0 0 0 0 0 0 0 1 0 0 1 
0 0 0 0 0 0 0 0 0 1 0 
1 1 0 1 0 0 0 1 1 1 0 
0 0 0 1 1 1 0 1 1 1 0 
0 1 0 2 0 0 0 1 0 1 0 
0 0 0 1 0 0 1 2 1 0 0 
• 3 values for SNP columns - 0(AA), 1(Aa/aA), 2(AA/AA) 
• Binary values for CLASS – 1(case/affected subjects), 0(control/normal) 
10
Problem definition – measure(1/3) 
• On contingency table 
• Popular measure in epistasis detection 
• 휒2 − 푡푒푠푡 
• Mutual information 
Genotype 퐴퐴퐵퐵 퐴퐴퐵푏 퐴퐴푏푏 퐴푎퐵퐵 퐴푎퐵푏 퐴푎푏푏 푎푎퐵퐵 푎푎퐵푏 푎푎푏푏 sum 
Case 39 91 95 92 14 31 63 4 71 500 
Control 100 15 55 5 22 150 50 93 10 500 
sum 139 106 150 97 36 181 113 97 81 1000 
11
Problem definition – measure(2/3) 
• 휒2 − 푡푒푠푡 based on 휒2 − distribution 
• 퐻0 : no association between SNPs and disease status 
휒2 = 
(푂푖−퐸푖 )2 
퐸푖 
Genotype 퐴퐴퐵퐵 퐴퐴퐵푏 퐴퐴푏푏 퐴푎퐵퐵 퐴푎퐵푏 퐴푎푏푏 푎푎퐵퐵 푎푎퐵푏 푎푎푏푏 sum 
Case 39 91 95 92 14 31 63 4 71 500 
Control 100 15 55 5 22 150 50 93 10 500 
sum 139 106 150 97 36 181 113 97 81 1000 
Expected contingency table 
Genotype 퐴퐴퐵퐵 퐴퐴퐵푏 퐴퐴푏푏 퐴푎퐵퐵 퐴푎퐵푏 퐴푎푏푏 푎푎퐵퐵 푎푎퐵푏 푎푎푏푏 sum 
Case 69.5 53 75 48.5 18 90.5 56.5 48.5 40.5 500 
Control 69.5 53 75 48.5 18 90.5 56.5 48.5 40.5 500 
sum 139 106 150 97 36 181 113 97 81 1000 
휒2 − 푣푎푙푢푒 ∶ 379.07, 푝 − 푣푎푙푢푒 = 2.76 × 10−77 
12
Problem definition – measure(3/3) 
• Mutual information(1/3) 
• Non-parametric measure 
13
Problem definition – measure(3/3) 
• Mutual information(2/3) 
Genotype 퐴퐴퐵퐵 퐴퐴퐵푏 퐴퐴푏푏 퐴푎퐵퐵 퐴푎퐵푏 퐴푎푏푏 푎푎퐵퐵 푎푎퐵푏 푎푎푏푏 sum 
Case 39 91 95 92 14 31 63 4 71 500 
Control 100 15 55 5 22 150 50 93 10 500 
sum 139 106 150 97 36 181 113 97 81 1000 
Frequency Table 
Genotype 퐴퐴퐵퐵 퐴퐴퐵푏 퐴퐴푏푏 퐴푎퐵퐵 퐴푎퐵푏 퐴푎푏푏 푎푎퐵퐵 푎푎퐵푏 푎푎푏푏 sum 
Case 0.039 0.091 0.095 0.092 0.014 0.031 0.063 0.004 0.071 0.500 
Control 0.100 0.015 0.055 0.005 0.022 0.150 0.050 0.093 0.010 0.500 
sum 0.139 0.106 0.150 0.097 0.036 0.181 0.113 0.097 0.081 1.000 
14
Problem definition – measure(3/3) 
• Mutual information(3/3) 
Entropy Table 
Genotype 퐴퐴퐵퐵 퐴퐴퐵푏 퐴퐴푏푏 퐴푎퐵퐵 퐴푎퐵푏 퐴푎푏푏 푎푎퐵퐵 푎푎퐵푏 푎푎푏푏 sum 
Case 0.183 0.315 0.323 0.317 0.086 0.155 0.251 0.032 0.271 0.500 
Control 0.332 0.091 0.230 0.038 0.121 0.411 0.216 0.319 0.066 0.500 
sum 0.396 0.343 0.411 0.326 0.173 0.446 0.355 0.326 0.294 
퐼 푔푒푛표푡푦푝푒; 푑푖푠푒푎푠푒 = 퐻 푔푒푛표푡푦푝푒 + 퐻 푑푖푠푒푎푠푒 − 퐻 푔푒푛표푡푦푝푒, 푑푖푠푒푎푠푒 
= 3.07 + 1.00 − 3.76 
= 0.31 
15
Methods to detect 
epistasis 
16
Methods – Computational 
Approaches 
• Multifactor Dimensionality Reduction 
(Ritchie et al. 2002) 
• SNPHarvester (Yang et al. 2009) 
• SNPRuler (Wan et al. 2010) 
• Mutual Information With Clustering 
(Leem et al. 2014) 
17
Methods 
Multifactor dimensionality reduction(1/2) 
18
Methods 
Multifactor dimensionality reduction(2/2) 
• Model free, non-parametric methods 
• Pattern-based method 
• Association rule for each combinations of SNPs and 
phenotypes 
• i.e. 푖푓 푆푁푃10 = 0 푎푛푑 푆푁푃13 = 4 푡ℎ푒푛 푐푙푎푠푠 = 1 
• Exhaustive Search 
• Computational Burden 
• Cross Validation Consistency 
• To select best model 
19
Methods 
SNPHarvester(1/2) 
20
Methods 
SNPHarvester(2/2) 
• Local search 
• Local optima problem 
• PathSeeker algorithm 
• Successive Runs 
• Score function : 휒2 − 푣푎푙푢푒 
21
Methods 
SNPRuler 
• Pattern-based method 
• Predictive rule 
• Branch-and-bound algorithm 
• Upper bound of 휒2 − 푣푎푙푢푒 in d.f. is 1 
22
Methods 
Mutual Information With Clustering(1/2) 
: SNPs 
: causative SNPs 
d1 
d2 
distance 
Score=d1+d2 
Centroid 1 
Centroid 2 
Centroid 3 
3 SNPs with the 
highest mutual 
information value 
m candidates 
m candidates 
m candidates 
23
Methods 
Mutual Information With Clustering(2/2) 
• Mutual information 
• As distance measure for clustering 
• K-means clustering algorithm 
• Candidate selection 
• Reduce search space dramtically 
• Can detect high-order epistatic interaction 
• Also, shows better performance (power, execution time) 
than previous methods 
24
Challenges in epistasis 
detection 
25
Challenges 
• Reducing computational burden 
• Filtering 
• Parallel processing 
• Higher-order epistatic interaction detection 
• Larger than 2 
• Novel measure of association between SNPs and 
disease 
26

More Related Content

Viewers also liked

Johnathan & Scholastica's Wedding DVD
Johnathan & Scholastica's Wedding DVDJohnathan & Scholastica's Wedding DVD
Johnathan & Scholastica's Wedding DVD
Scholastica Marie
 
サービスブランド構築のための「商戦略
サービスブランド構築のための「商戦略サービスブランド構築のための「商戦略
サービスブランド構築のための「商戦略Samurai Incubate Inc.
 
People and-subsoil 1
People and-subsoil 1People and-subsoil 1
People and-subsoil 1omsktfi
 
UA_Foundation_AED295B
UA_Foundation_AED295BUA_Foundation_AED295B
UA_Foundation_AED295B
ABEnathanneubauer
 
Introduction to python
Introduction to pythonIntroduction to python
Introduction to python
Hyun-hwan Jeong
 
Isamar portfolio
Isamar portfolioIsamar portfolio
Isamar portfolio
Isamaravery
 
http:// accountants.nearlexingtonarea.com
http:// accountants.nearlexingtonarea.comhttp:// accountants.nearlexingtonarea.com
http:// accountants.nearlexingtonarea.com
5trong
 
About Pepe Fanjul
About Pepe Fanjul About Pepe Fanjul
About Pepe Fanjul
Pepe Fanjul
 
Passives Voice
Passives  VoicePassives  Voice
Passives Voice
ghiwar
 
Glorias card
Glorias cardGlorias card
Glorias card
Naurelle O'mara
 
Abefinalvideo
AbefinalvideoAbefinalvideo
Abefinalvideo
ABEnathanneubauer
 
Adby meソーシャルアドバタイズメントカンファレンス用
Adby meソーシャルアドバタイズメントカンファレンス用Adby meソーシャルアドバタイズメントカンファレンス用
Adby meソーシャルアドバタイズメントカンファレンス用
Samurai Incubate Inc.
 
Actualia Publieke Sector December 2011
Actualia Publieke Sector December 2011Actualia Publieke Sector December 2011
Actualia Publieke Sector December 2011
jarry10
 

Viewers also liked (14)

Johnathan & Scholastica's Wedding DVD
Johnathan & Scholastica's Wedding DVDJohnathan & Scholastica's Wedding DVD
Johnathan & Scholastica's Wedding DVD
 
サービスブランド構築のための「商戦略
サービスブランド構築のための「商戦略サービスブランド構築のための「商戦略
サービスブランド構築のための「商戦略
 
People and-subsoil 1
People and-subsoil 1People and-subsoil 1
People and-subsoil 1
 
deneme
denemedeneme
deneme
 
UA_Foundation_AED295B
UA_Foundation_AED295BUA_Foundation_AED295B
UA_Foundation_AED295B
 
Introduction to python
Introduction to pythonIntroduction to python
Introduction to python
 
Isamar portfolio
Isamar portfolioIsamar portfolio
Isamar portfolio
 
http:// accountants.nearlexingtonarea.com
http:// accountants.nearlexingtonarea.comhttp:// accountants.nearlexingtonarea.com
http:// accountants.nearlexingtonarea.com
 
About Pepe Fanjul
About Pepe Fanjul About Pepe Fanjul
About Pepe Fanjul
 
Passives Voice
Passives  VoicePassives  Voice
Passives Voice
 
Glorias card
Glorias cardGlorias card
Glorias card
 
Abefinalvideo
AbefinalvideoAbefinalvideo
Abefinalvideo
 
Adby meソーシャルアドバタイズメントカンファレンス用
Adby meソーシャルアドバタイズメントカンファレンス用Adby meソーシャルアドバタイズメントカンファレンス用
Adby meソーシャルアドバタイズメントカンファレンス用
 
Actualia Publieke Sector December 2011
Actualia Publieke Sector December 2011Actualia Publieke Sector December 2011
Actualia Publieke Sector December 2011
 

Similar to a brief introduction to epistasis detection

Single Nucleotide Polymorphism Analysis (SNPs)
Single Nucleotide Polymorphism Analysis (SNPs)Single Nucleotide Polymorphism Analysis (SNPs)
Single Nucleotide Polymorphism Analysis (SNPs)
Data Science Thailand
 
Lecture 7 gwas full
Lecture 7 gwas fullLecture 7 gwas full
Lecture 7 gwas full
Lekki Frazier-Wood
 
Categorical-data-afghvvghgfhg.analysis.ppt
Categorical-data-afghvvghgfhg.analysis.pptCategorical-data-afghvvghgfhg.analysis.ppt
Categorical-data-afghvvghgfhg.analysis.ppt
qkmaiu
 
GGWS_M3_L5_Estimation_of_heritability_from_GWAS_summary_statistics.pptx
GGWS_M3_L5_Estimation_of_heritability_from_GWAS_summary_statistics.pptxGGWS_M3_L5_Estimation_of_heritability_from_GWAS_summary_statistics.pptx
GGWS_M3_L5_Estimation_of_heritability_from_GWAS_summary_statistics.pptx
BHAGWAT NAWADE
 
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
Rajarshi Guha
 
Sampling Strategies to Control Misclassification Bias in Longitudinal Udder H...
Sampling Strategies to Control Misclassification Bias in Longitudinal Udder H...Sampling Strategies to Control Misclassification Bias in Longitudinal Udder H...
Sampling Strategies to Control Misclassification Bias in Longitudinal Udder H...
dhaine
 
Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...
Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...
Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...
Christos Argyropoulos
 
Determination of sample size in scientific research.pptx
Determination of sample size in scientific research.pptxDetermination of sample size in scientific research.pptx
Determination of sample size in scientific research.pptx
Sam Edeson
 
Genetic predisposition to papillary thyroid cancer by Albert de la Chapelle, ...
Genetic predisposition to papillary thyroid cancer by Albert de la Chapelle, ...Genetic predisposition to papillary thyroid cancer by Albert de la Chapelle, ...
Genetic predisposition to papillary thyroid cancer by Albert de la Chapelle, ...
OSUCCC - James
 
Analysis of Variance (ANOVA), MANOVA: Expected variance components, Random an...
Analysis of Variance (ANOVA), MANOVA: Expected variance components, Random an...Analysis of Variance (ANOVA), MANOVA: Expected variance components, Random an...
Analysis of Variance (ANOVA), MANOVA: Expected variance components, Random an...
Satish Khadia
 
Vanderbilt b
Vanderbilt bVanderbilt b
Vanderbilt b
Claudine Garcia
 
Do Fractional Norms and Quasinorms Help to Overcome the Curse of Dimensiona...
Do Fractional Norms and Quasinorms Help to Overcome the Curse of Dimensiona...Do Fractional Norms and Quasinorms Help to Overcome the Curse of Dimensiona...
Do Fractional Norms and Quasinorms Help to Overcome the Curse of Dimensiona...
Alexander Gorban
 
Aug2013 Heidi Rehm integrating large scale sequencing into clinical practice
Aug2013 Heidi Rehm integrating large scale sequencing into clinical practiceAug2013 Heidi Rehm integrating large scale sequencing into clinical practice
Aug2013 Heidi Rehm integrating large scale sequencing into clinical practice
GenomeInABottle
 
Importance of X-STR Linkage Groups in the Establishment of Maternal Relatedne...
Importance of X-STR Linkage Groups in the Establishment of Maternal Relatedne...Importance of X-STR Linkage Groups in the Establishment of Maternal Relatedne...
Importance of X-STR Linkage Groups in the Establishment of Maternal Relatedne...
Shamir Montazid
 
Design of an Intelligent System for Improving Classification of Cancer Diseases
Design of an Intelligent System for Improving Classification of Cancer DiseasesDesign of an Intelligent System for Improving Classification of Cancer Diseases
Design of an Intelligent System for Improving Classification of Cancer Diseases
Mohamed Loey
 
Genomica - Microarreglos de DNA
Genomica - Microarreglos de DNAGenomica - Microarreglos de DNA
Genomica - Microarreglos de DNA
Ulises Urzua
 
FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS
FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSISFUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS
FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS
Irene Pochinok
 
Methods for High Dimensional Interactions
Methods for High Dimensional InteractionsMethods for High Dimensional Interactions
Methods for High Dimensional Interactions
sahirbhatnagar
 
Copy of NCSU Workshop - DTI.1
Copy of NCSU Workshop - DTI.1Copy of NCSU Workshop - DTI.1
Copy of NCSU Workshop - DTI.1
Tinashe Michael Tapera
 
Predicting phenotype from genotype with machine learning
Predicting phenotype from genotype with machine learningPredicting phenotype from genotype with machine learning
Predicting phenotype from genotype with machine learning
Patricia Francis-Lyon
 

Similar to a brief introduction to epistasis detection (20)

Single Nucleotide Polymorphism Analysis (SNPs)
Single Nucleotide Polymorphism Analysis (SNPs)Single Nucleotide Polymorphism Analysis (SNPs)
Single Nucleotide Polymorphism Analysis (SNPs)
 
Lecture 7 gwas full
Lecture 7 gwas fullLecture 7 gwas full
Lecture 7 gwas full
 
Categorical-data-afghvvghgfhg.analysis.ppt
Categorical-data-afghvvghgfhg.analysis.pptCategorical-data-afghvvghgfhg.analysis.ppt
Categorical-data-afghvvghgfhg.analysis.ppt
 
GGWS_M3_L5_Estimation_of_heritability_from_GWAS_summary_statistics.pptx
GGWS_M3_L5_Estimation_of_heritability_from_GWAS_summary_statistics.pptxGGWS_M3_L5_Estimation_of_heritability_from_GWAS_summary_statistics.pptx
GGWS_M3_L5_Estimation_of_heritability_from_GWAS_summary_statistics.pptx
 
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
 
Sampling Strategies to Control Misclassification Bias in Longitudinal Udder H...
Sampling Strategies to Control Misclassification Bias in Longitudinal Udder H...Sampling Strategies to Control Misclassification Bias in Longitudinal Udder H...
Sampling Strategies to Control Misclassification Bias in Longitudinal Udder H...
 
Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...
Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...
Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...
 
Determination of sample size in scientific research.pptx
Determination of sample size in scientific research.pptxDetermination of sample size in scientific research.pptx
Determination of sample size in scientific research.pptx
 
Genetic predisposition to papillary thyroid cancer by Albert de la Chapelle, ...
Genetic predisposition to papillary thyroid cancer by Albert de la Chapelle, ...Genetic predisposition to papillary thyroid cancer by Albert de la Chapelle, ...
Genetic predisposition to papillary thyroid cancer by Albert de la Chapelle, ...
 
Analysis of Variance (ANOVA), MANOVA: Expected variance components, Random an...
Analysis of Variance (ANOVA), MANOVA: Expected variance components, Random an...Analysis of Variance (ANOVA), MANOVA: Expected variance components, Random an...
Analysis of Variance (ANOVA), MANOVA: Expected variance components, Random an...
 
Vanderbilt b
Vanderbilt bVanderbilt b
Vanderbilt b
 
Do Fractional Norms and Quasinorms Help to Overcome the Curse of Dimensiona...
Do Fractional Norms and Quasinorms Help to Overcome the Curse of Dimensiona...Do Fractional Norms and Quasinorms Help to Overcome the Curse of Dimensiona...
Do Fractional Norms and Quasinorms Help to Overcome the Curse of Dimensiona...
 
Aug2013 Heidi Rehm integrating large scale sequencing into clinical practice
Aug2013 Heidi Rehm integrating large scale sequencing into clinical practiceAug2013 Heidi Rehm integrating large scale sequencing into clinical practice
Aug2013 Heidi Rehm integrating large scale sequencing into clinical practice
 
Importance of X-STR Linkage Groups in the Establishment of Maternal Relatedne...
Importance of X-STR Linkage Groups in the Establishment of Maternal Relatedne...Importance of X-STR Linkage Groups in the Establishment of Maternal Relatedne...
Importance of X-STR Linkage Groups in the Establishment of Maternal Relatedne...
 
Design of an Intelligent System for Improving Classification of Cancer Diseases
Design of an Intelligent System for Improving Classification of Cancer DiseasesDesign of an Intelligent System for Improving Classification of Cancer Diseases
Design of an Intelligent System for Improving Classification of Cancer Diseases
 
Genomica - Microarreglos de DNA
Genomica - Microarreglos de DNAGenomica - Microarreglos de DNA
Genomica - Microarreglos de DNA
 
FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS
FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSISFUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS
FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS
 
Methods for High Dimensional Interactions
Methods for High Dimensional InteractionsMethods for High Dimensional Interactions
Methods for High Dimensional Interactions
 
Copy of NCSU Workshop - DTI.1
Copy of NCSU Workshop - DTI.1Copy of NCSU Workshop - DTI.1
Copy of NCSU Workshop - DTI.1
 
Predicting phenotype from genotype with machine learning
Predicting phenotype from genotype with machine learningPredicting phenotype from genotype with machine learning
Predicting phenotype from genotype with machine learning
 

Recently uploaded

如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
yqqaatn0
 
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
Abdul Wali Khan University Mardan,kP,Pakistan
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
pablovgd
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Texas Alliance of Groundwater Districts
 
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
AbdullaAlAsif1
 
Nucleophilic Addition of carbonyl compounds.pptx
Nucleophilic Addition of carbonyl  compounds.pptxNucleophilic Addition of carbonyl  compounds.pptx
Nucleophilic Addition of carbonyl compounds.pptx
SSR02
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
PRIYANKA PATEL
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
University of Maribor
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
Anagha Prasad
 
Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.
Aditi Bajpai
 
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdfTopic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
TinyAnderson
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
University of Hertfordshire
 
Equivariant neural networks and representation theory
Equivariant neural networks and representation theoryEquivariant neural networks and representation theory
Equivariant neural networks and representation theory
Daniel Tubbenhauer
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
yqqaatn0
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
University of Maribor
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
muralinath2
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
tonzsalvador2222
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
Sérgio Sacani
 
20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
Sharon Liu
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
by6843629
 

Recently uploaded (20)

如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
 
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
 
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
 
Nucleophilic Addition of carbonyl compounds.pptx
Nucleophilic Addition of carbonyl  compounds.pptxNucleophilic Addition of carbonyl  compounds.pptx
Nucleophilic Addition of carbonyl compounds.pptx
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
 
Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.
 
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdfTopic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
 
Equivariant neural networks and representation theory
Equivariant neural networks and representation theoryEquivariant neural networks and representation theory
Equivariant neural networks and representation theory
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
 
20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
 

a brief introduction to epistasis detection

  • 1. A brief introduction to epistasis detection in GWAS 2014. 01. 27. Hyun-hwan Jeong
  • 2. Agenda • Introduction • Problem definition • Computational detection methods • Challenges 2
  • 4. Single Nucleotide Polymorphism • A single letter change in DNA sequence • DNA sequence : 99.9% identical • Common type of genetic variation • ≥ 1% changes in general population …ATTCGCCGGCTGCAACGTTAGA… …ATTCGCCGGCTGCAGCGTTAGA… …ATTCGCCGGCTGCATCGTTAGA… 4
  • 5. Genotype, phenotype and Allele http://en.wikipedia.org/wiki/Phenotype phenotype genotype allele 5
  • 6. Genome Wide Association Study for relation between Single SNP and disease Manhattan plot of the GWAS of the discovery cohort comprising 2,346 SSc cases and 5,193 healthy controls. - Nature Genetics 42, 426–429 (2010) 6
  • 7. Why is detecting epistasis needed in GWAS? An illustration of interaction pattern between two SNPs with no marginal effect. - Bioinformatics 26, 30-37 (2010) 7
  • 9. Problem definition Epistasis detection problem • Object • Detection of causative SNPs for disease • Maximum value for defined measure • Dataset • 0.5M ~ 1M SNPs • 4,000 ~ 5,000 subjects • Binary disease status(case/control) • 100MB ~ 1GB genotype data file 9
  • 10. Problem definition – Data format 푆푁푃ퟎ 푆푁푃ퟏ 푆푁푃ퟐ 푆푁푃ퟑ 푆푁푃ퟒ 푆푁푃ퟓ 푆푁푃ퟔ 푆푁푃ퟕ 푆푁푃ퟖ 푆푁푃ퟗ 퐶퐿퐴푆푆 1 1 0 0 0 0 1 0 1 1 1 0 0 1 0 0 0 1 1 0 2 1 0 0 0 0 0 0 1 0 0 0 1 1 1 0 0 0 0 0 1 0 2 1 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1 1 0 1 0 0 0 1 1 1 0 0 0 0 1 1 1 0 1 1 1 0 0 1 0 2 0 0 0 1 0 1 0 0 0 0 1 0 0 1 2 1 0 0 • 3 values for SNP columns - 0(AA), 1(Aa/aA), 2(AA/AA) • Binary values for CLASS – 1(case/affected subjects), 0(control/normal) 10
  • 11. Problem definition – measure(1/3) • On contingency table • Popular measure in epistasis detection • 휒2 − 푡푒푠푡 • Mutual information Genotype 퐴퐴퐵퐵 퐴퐴퐵푏 퐴퐴푏푏 퐴푎퐵퐵 퐴푎퐵푏 퐴푎푏푏 푎푎퐵퐵 푎푎퐵푏 푎푎푏푏 sum Case 39 91 95 92 14 31 63 4 71 500 Control 100 15 55 5 22 150 50 93 10 500 sum 139 106 150 97 36 181 113 97 81 1000 11
  • 12. Problem definition – measure(2/3) • 휒2 − 푡푒푠푡 based on 휒2 − distribution • 퐻0 : no association between SNPs and disease status 휒2 = (푂푖−퐸푖 )2 퐸푖 Genotype 퐴퐴퐵퐵 퐴퐴퐵푏 퐴퐴푏푏 퐴푎퐵퐵 퐴푎퐵푏 퐴푎푏푏 푎푎퐵퐵 푎푎퐵푏 푎푎푏푏 sum Case 39 91 95 92 14 31 63 4 71 500 Control 100 15 55 5 22 150 50 93 10 500 sum 139 106 150 97 36 181 113 97 81 1000 Expected contingency table Genotype 퐴퐴퐵퐵 퐴퐴퐵푏 퐴퐴푏푏 퐴푎퐵퐵 퐴푎퐵푏 퐴푎푏푏 푎푎퐵퐵 푎푎퐵푏 푎푎푏푏 sum Case 69.5 53 75 48.5 18 90.5 56.5 48.5 40.5 500 Control 69.5 53 75 48.5 18 90.5 56.5 48.5 40.5 500 sum 139 106 150 97 36 181 113 97 81 1000 휒2 − 푣푎푙푢푒 ∶ 379.07, 푝 − 푣푎푙푢푒 = 2.76 × 10−77 12
  • 13. Problem definition – measure(3/3) • Mutual information(1/3) • Non-parametric measure 13
  • 14. Problem definition – measure(3/3) • Mutual information(2/3) Genotype 퐴퐴퐵퐵 퐴퐴퐵푏 퐴퐴푏푏 퐴푎퐵퐵 퐴푎퐵푏 퐴푎푏푏 푎푎퐵퐵 푎푎퐵푏 푎푎푏푏 sum Case 39 91 95 92 14 31 63 4 71 500 Control 100 15 55 5 22 150 50 93 10 500 sum 139 106 150 97 36 181 113 97 81 1000 Frequency Table Genotype 퐴퐴퐵퐵 퐴퐴퐵푏 퐴퐴푏푏 퐴푎퐵퐵 퐴푎퐵푏 퐴푎푏푏 푎푎퐵퐵 푎푎퐵푏 푎푎푏푏 sum Case 0.039 0.091 0.095 0.092 0.014 0.031 0.063 0.004 0.071 0.500 Control 0.100 0.015 0.055 0.005 0.022 0.150 0.050 0.093 0.010 0.500 sum 0.139 0.106 0.150 0.097 0.036 0.181 0.113 0.097 0.081 1.000 14
  • 15. Problem definition – measure(3/3) • Mutual information(3/3) Entropy Table Genotype 퐴퐴퐵퐵 퐴퐴퐵푏 퐴퐴푏푏 퐴푎퐵퐵 퐴푎퐵푏 퐴푎푏푏 푎푎퐵퐵 푎푎퐵푏 푎푎푏푏 sum Case 0.183 0.315 0.323 0.317 0.086 0.155 0.251 0.032 0.271 0.500 Control 0.332 0.091 0.230 0.038 0.121 0.411 0.216 0.319 0.066 0.500 sum 0.396 0.343 0.411 0.326 0.173 0.446 0.355 0.326 0.294 퐼 푔푒푛표푡푦푝푒; 푑푖푠푒푎푠푒 = 퐻 푔푒푛표푡푦푝푒 + 퐻 푑푖푠푒푎푠푒 − 퐻 푔푒푛표푡푦푝푒, 푑푖푠푒푎푠푒 = 3.07 + 1.00 − 3.76 = 0.31 15
  • 16. Methods to detect epistasis 16
  • 17. Methods – Computational Approaches • Multifactor Dimensionality Reduction (Ritchie et al. 2002) • SNPHarvester (Yang et al. 2009) • SNPRuler (Wan et al. 2010) • Mutual Information With Clustering (Leem et al. 2014) 17
  • 19. Methods Multifactor dimensionality reduction(2/2) • Model free, non-parametric methods • Pattern-based method • Association rule for each combinations of SNPs and phenotypes • i.e. 푖푓 푆푁푃10 = 0 푎푛푑 푆푁푃13 = 4 푡ℎ푒푛 푐푙푎푠푠 = 1 • Exhaustive Search • Computational Burden • Cross Validation Consistency • To select best model 19
  • 21. Methods SNPHarvester(2/2) • Local search • Local optima problem • PathSeeker algorithm • Successive Runs • Score function : 휒2 − 푣푎푙푢푒 21
  • 22. Methods SNPRuler • Pattern-based method • Predictive rule • Branch-and-bound algorithm • Upper bound of 휒2 − 푣푎푙푢푒 in d.f. is 1 22
  • 23. Methods Mutual Information With Clustering(1/2) : SNPs : causative SNPs d1 d2 distance Score=d1+d2 Centroid 1 Centroid 2 Centroid 3 3 SNPs with the highest mutual information value m candidates m candidates m candidates 23
  • 24. Methods Mutual Information With Clustering(2/2) • Mutual information • As distance measure for clustering • K-means clustering algorithm • Candidate selection • Reduce search space dramtically • Can detect high-order epistatic interaction • Also, shows better performance (power, execution time) than previous methods 24
  • 25. Challenges in epistasis detection 25
  • 26. Challenges • Reducing computational burden • Filtering • Parallel processing • Higher-order epistatic interaction detection • Larger than 2 • Novel measure of association between SNPs and disease 26