SlideShare a Scribd company logo
1
Master of science dissertation defense
﴿
ْ
‫م‬َ‫ل‬َ‫أ‬
َْ‫ر‬َ‫ت‬
ْ
‫ن‬َ‫أ‬
ْ
َ‫ّللا‬
َْ‫ل‬َ‫ز‬‫ن‬َ‫أ‬
َْ‫ن‬ِ‫م‬
ِْ‫اء‬َ‫م‬‫الس‬
ْ
‫اء‬َ‫م‬
‫َا‬‫ن‬‫ج‬َ‫ر‬‫خ‬َ‫أ‬َ‫ف‬
ْ
ِ‫ه‬ِ‫ب‬
ْ
‫ات‬َ‫ر‬َ‫م‬َ‫ث‬
ْ
‫خ‬ُ‫م‬
‫ا‬‫ف‬ِ‫ل‬َ‫ت‬
‫ا‬َ‫ه‬ُ‫ن‬‫ا‬ َ‫و‬‫ل‬َ‫أ‬
َْ‫ن‬ِ‫م‬ َ‫و‬
ِْ‫ل‬‫ا‬َ‫ب‬ ِ‫ج‬‫ال‬
ْ
‫د‬َ‫د‬ُ‫ج‬
ْ
‫يض‬ِ‫ب‬
ْ
‫ر‬‫م‬ُ‫ح‬ َ‫و‬
ْ
‫ف‬ِ‫ل‬َ‫ت‬‫خ‬ُ‫م‬
‫ا‬َ‫ه‬ُ‫ن‬‫ا‬ َ‫و‬‫ل‬َ‫أ‬
ُْ‫يب‬ِ‫ب‬‫ا‬َ‫َر‬‫غ‬ َ‫و‬
ْ
‫ود‬ُ‫س‬
❁
َْ‫ن‬ِ‫م‬ َ‫و‬
ِْ
‫اس‬‫الن‬
ْ
ِ‫اب‬ َ‫و‬‫الد‬ َ‫و‬
ِْ‫ام‬َ‫ع‬‫األن‬ َ‫و‬
ْ
‫ف‬ِ‫ل‬َ‫ت‬‫خ‬ُ‫م‬
ْ
ُ‫ه‬ُ‫ن‬‫ا‬ َ‫و‬‫ل‬َ‫أ‬
َْ‫ك‬ِ‫ل‬َ‫ذ‬َ‫ك‬
‫ا‬َ‫م‬‫ن‬ِ‫إ‬
ْ
َ‫ي‬
ََ‫خ‬
ْ
َ‫ّللا‬
ْ
‫ن‬ِ‫م‬
ْ
ِ‫ه‬ِ‫د‬‫ا‬َ‫ب‬ِ‫ع‬
ْ
ُ‫ء‬‫ا‬َ‫م‬َ‫ل‬ُ‫ع‬‫ال‬
ْ
‫ن‬ِ‫إ‬
ْ
َ‫ّللا‬
ْ
‫يز‬ ِ
‫ز‬َ‫ع‬
ْ
‫ور‬ُ‫ف‬َ‫غ‬
﴾
[
‫فاطر‬
:35-36]
2
3
Minia University
Faculty of Engineering
Biomedical Engineering Department
Fatma Sayed Ibrahim
Master of science thesis defense
Wednesday, January 27,
2021
Algorithm Implementation of Genetic Association Analysis
for Rheumatoid Arthritis Data Based on Haplotype Blocks
4
Acknowledgement
5
Supervisors
Prof. Dr. Hesham Fathy A. Hamed
Former Dean of Faculty of Engineering, Minia University
Professor at Egyptian-Russian University
Dr. Ashraf Mahroos Said
Associate Professor
Biomedical Engineering Department, Minia University,
Dr. Mohamed Nagy Saad
Assistant Professor
Biomedical Engineering Department, Minia University
6
Dr Muhammad Ali M. Rushdi
Biomedical Engineering and Systems
Department
Faculty of Engineering, Cairo university
Dr Essam Halim Houssein
Vice-dean for Postgraduate studies and
research affairs
Faculty of Computers and Information,
Minia University
Thesis committee members
Prof. Dr. Hesham Fathy A.
Hamed
Former Dean of Faculty of
Engineering, Minia University
Professor at Egyptian-
Russian University
Dr. Ashraf Mahroos Said
Associate Professor
Biomedical Engineering Department
Minia University, Minia
7
1. Introduction 2. Literature
review
3. Data
description
4. Pre-processing
5. Methods 6. Results 7. Conclusion
Outlines
8
Introduction
Literature
review
Data
description
pre-processing Methods Results Conclusion
Background
Motivation
Research Objectives
Introduction
9
Introduction Literature review Data description pre-processing Methods Results Conclusion
Background
12
13
14
All human DNA is 99.9% identical and, 0.1 % is difference
A A
A
T
T T
A G
A
T
T C
SNP
15
A SNP is a mutation at a single nucleotide position, where a possible
nucleotide type is called an allele.
SNP
Chromosome Chromosome
Allele 1 Allele 2
A T T T
A A T A
Major homozygous
genotype
Minor homozygous
genotype
Heterozygous genotype
16
17
Genotypes
GG
AA
Minor
genotype
Major
genotype
Heterozygous
genotype
GA
18
SNPs in population
GG
GG
GG
GG
AA
GG
GG
GG
AA
AA
GA
GA
GA
The minor allele
frequency (MAF)
19
20
The minor allele frequency (MAF)
...ATGTCACACGTACTT...
...ATGTCACACGTACTT...
...ATGACACAGGTACTT...
...ATGTCACAGGTACTT...
...ATGTCACAGGTACTT...
...ATGACACAGGTACTT...
...ATGTCACAGGTACTT...
...ATGTCACAGGTACTT...
...ATGACACACGTACTT...
...ATGACACAGGTACTT...
SNP1 SNP2
SNP1 SNP2
Allele 1
Allele 2
Allele 1 frequency
Allele 2 frequency
Allele 1
Allele 2
Major
Minor
T C
A G
6 3
4 7
60% 30%
40% 70%
T G
A C
Linkage Disequilibrium (LD)
21
LD is the nonrandom association of alleles at different sites.
Ab
Ab Ab
aB
Ab
aB aB
aB
AB
AB
ab
AB
AB
ab ab
ab
ab
Equilibrium
AB
AB
ab
AB
AB
ab ab
ab
ab
Disequilibrium
AB
AB
AB
AB
ab
ab
ab
ab
22
23
24
Low LD
High LD
A C
A T
A C
A T
G C
G T
G C
G T
Individuals
SNP 1 SNP 2
A C
A C
A C
A C
G T
G T
G T
G T
Individuals
SNP 1 SNP 2
24
| D’ |
1.0
0.8
0.6
0.2
0.0
SNP
SNP1
SNP2
SNP3
SNP4
SNP5
SNP6
1 2 3 5 6
4
25
Measuring the LD
Pairwise LD
1 2 3 5 6
4
r 2 Color Key
0.2
0 0.6 0.8 1
0.4
26
Measuring the LD
The interpretation of LD
27
28
29
30
Haplotype blocks
SNP3
ATA
GTA
GAT
C
C
C
A
A
A
T
T
T
C
C
C
G
G
G
T
T
T
C
C
C
G
G
G
T
T
T
C
C
C
A
G
G
T
T
T
A
A
A
G
G
G
C
C
C
A
A
A
T
T
A
A
A
T
SNP2
SNP1
Chromosome
31
SNP3
ATA
GTA
GAT
Haplotype block
Haplotypes
C
C
C
A
A
A
T
T
T
C
C
C
G
G
G
T
T
T
C
C
C
G
G
G
T
T
T
C
C
C
A
G
G
T
T
T
A
A
A
G
G
G
C
C
C
A
A
A
T
T
A
A
A
T
SNP2
SNP1
Chromosome
Haplotype blocks
Recombination hotspots
32
33
Haplotype blocks partitioning
and visualization
SNP3
ATA
GTA
GAT
Haplotype block
Haplotypes
C
C
C
A
A
A
T
T
T
C
C
C
G
G
G
T
T
T
C
C
C
G
G
G
T
T
T
C
C
C
A
G
G
T
T
T
A
A
A
G
G
G
C
C
C
A
A
A
T
T
A
A
A
T
SNP2
SNP1
Chromosome
Haplotype blocks
Recombination hotspots
34
SNP1 SNP2 SNP3 SNP4 SNP5 SNP6 SNP7 SNP8
1 2 3 4 5 6 7 8
Block 1 Block 2
35
36
The role of SNPs and haplotype blocks in diseases
Environmental component
Genetic component
Infectious diseases Complex diseases Hereditary disease
37
38
SNP-based
association studied
Haplotype-based
association studied
39
Samples from
Population
Cases and controls
Data
acquisition
Genotyping
Data
SNP array dataset
40
Single SNP-based
association studies
approach
Haplotype block-
based association
studies approach
41
Single SNP-based association studies approach
Single point
mutation
(Single SNP)
Single point
mutation result in
a disease
42
Haplotype block-based association studies approach
Multiple
SNPs
Multiple
SNPs contributes
to disease
susciptability
Why haplotype blocks?
• Dimensional reduction
• Act better in complex diseases
• Multiple SNPs could have moderate effect
• Considers the interrelationship between linked
• Evolutionary studies
43
Introduction Literature review Data description pre-processing Methods Results Conclusion
44
Introduction
Literature
review
Data
description
pre-processing Methods Results Conclusion
Background
Motivation
Research Objectives
Introduction
Motivations
Why this point ?
• The genetic variations influence our predisposition to diseases and any disease
has a genetic component, even infectious diseases.
• Complex diseases are very common in societies. In particular, chronic condition
hugely affects the productivity of a person and its quality of life.
• Since the haplotype blocks are much more effective and powerful in such case
46
Motivations
Introduction Literature review Data description pre-processing Methods Results Conclusion
Why this point ?
• The gap in knowledge in this field (especially MAF). There are many
remains questions that are unanswered
• No so many Arabs in this field especially in Minia.
47
Introduction Literature review Data description pre-processing Methods Results Conclusion
48
Research Objectives
Introduction Literature review Data description pre-processing Methods Results Conclusion
• Practically implement computational algorithms to partition
genotyped data based on the haplotype blocks.
• Find the best haplotype partitioning method applied for the
whole-genome case-control dataset to reduce the number
of SNPs in the association study.
49
. . .
G_G A_C . . . .
A_G A_A . . . .
. . . . . . .
. . . . . . .
. . . . . . .
G_G A_C . . . A_A
SNPs
50
Samples
Cases and controls
Data
acquisition
Genotyping
Data
SNP array dataset
51
The input The output
52
The Input
52
. . .
G_G A_C . . . .
A_G A_A . . . .
. . . . . . .
. . . . . . .
. . . . . . .
G_G A_C . . . A_A
SNPs
53
. . . .
G_G A_C G_G . . . . .
A_G A_A A_G . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
G_G A_C G_G . . . . A_A
SNPs
54
.
G_G A_C G_G A_C G_G G_G . .
A_G A_A A_G A_A A_G A_G . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
G_G A_C G_G A_C G_G G_G . A_A
SNPs
Block 1 Block 2
Individuals
IDs
55
Start index End index Start rsID End rsID
rs22572xxx rs225722xxx
rs74307xxx rs198574xxx
The output
56
Major finding
Introduction Literature review Data description pre-processing Methods Results Conclusion
• Practically data exploration and uncovering interesting detections
• Investigating partitioning method from literature review and empirical
comparative study (Biomarker reduction with high SNP correlation)
• Sequence of data preprocessing 🡪 R (Hope to make it in R package)
• The role of MAF on haplotype block partitioning
57
545,080
SNPs
Biomarker reduction
73,743
101,915 73,743
Max
Min
Haplotype blocks
58
Min
Max
Haplotype blocks(Biomarkers)
Total No.
59
Biomarker reduction percentage
13.5
18.7
• The value is varying from one method to
another
• MAF has an impact on the results
60
Outlines
1. Introduction 2. Literature
review
3. Data
description
4. Pre-processing
5. Methods 6. Results 7. Conclusion
Literature review
61
Introduction Literature review Data description pre-processing Methods Results Conclusion
Literature review
62
Introduction Literature review Data description pre-processing Methods Results Conclusion
The main projects in genomics Haplotype bock partitioning methods
2001
2002
2005
2007
2009
Human Genome Project
International HapMap Project
The 100,000 Genomes Project
The 1000 Genome Project
The announcement of the HGP
The completion of the HGP
The initial HGP sequencing
The announcement of the International HapMap
Project
HapMap Phase I completion
HapMap Phase II completion
HapMap Phase III completion
1990
2003
2008
2010
2012
2015
2012
2015
2019
The announcement of 1KGP
The completion of pilot phase
The 1KGP fulfill its goal
The 1KGP’s completion Northern Ireland and Scotland joint the
project
Beginning the initiative to involve the public in
genomic research
The 100, 000 Genome Project’s announcement
63
Literature review
haplotype block
partitioning
Survey in from 2001 to 2021
Introduction Literature review Data description pre-processing Methods Results Conclusion
2001
2001-2003
2005-2013
2014-2020
Hidden Markov
model (HMM)
Mark Daly 2001
66
67
68
Hidden Markov model (HMM)
• 2001
Greedy algorithm (GA)
• 2002
Dynamic programming (DP)
• 2002
Confidence interval (CI)
• 2002
Four-gamete test (FGT)
• 2002
The minimum description length (MDL)
• 2003
Ning Wang
Kui Zhang
Stacey Gabriel
Nila Patil
Mark Daly
Mikko Koivisto
69
Solid spine of LD (SSLD)
• 2005
Markov Chain Monte Carlo (MCMC) algorithm
• 2008
Xor-genotypes
• 2009
Wavelet transforms
• 2011
GA-SVM algorithm
• 2013
From
2005
to
2013
Jeffrey Barrett
Pattaro
70
MIG++
• 2014
S-MIG ++
• 2015
Big-LD
• 2018
on neutrosophic c-means (NCM) algorithm
• 2020
LDBlockShow
• 2020
From
2014
to
2020
Daniel
Talian
Sunah Kim
71
Introduction Literature review Data description pre-processing Methods Results Conclusion
Data description and exploration
72
73
Introduction Literature review Data description pre-processing Methods Results Conclusion
Data description and exploration
NARAC dataset
Description
Map file
SNP array data file
Missing data
Participants
% Cases
Very rare SNPs
Alleles distribution
Genotype
distribution
% Male
% Female
Rare SNPs
SNP
annotation
% Controls
Visualization based on MAF
viewpoint
MAF for each
SNP
Low-frequency
SNPs
Common SNPs
% Male
% Female
74
Participants
ID
Af
fe
ct
io
n
Se
x
DRB1
_1
DRB1
_2
SE
Num
SE
Status
Anti-
CCP
RFUW
rs104
39884
D0024949 0 F 0101 0401 SS yes ? ? G_G
D0024302 0 F 0101 7 SN yes ? ? G_G
D0023151 0 F 0101 11 SN yes ? ? G_G
D0022042 0 F 0101 2 SN yes ? ? G_G
D0021275 0 F 0101 7 SN yes ? ? G_G
D0021163 0 F 0101 0403 SN yes ? ? G_G
D0020795 0 F 0101 3 SN yes ? ? G_G
6045201 1 F 0101 7 SN yes 31.3 142 G_G
D0023027 0 M 0101 3 SN yes ? ? G_G
1015200 1 M 0101 0403 SN yes 112.9 405 ?_?
D0015941 0 F 2 7 NN no ? ? A_G
D0016405 0 F 0101 7 SN yes ? ? ?_?
KNH763243 1 M 0404 0301 SN yes 99 ? G_G
Sample of
the SNPs’
array data
75
Chromosome rsID Position
1 rs3094315 792429
1 rs12562034 808311
11 rs3802985 188510
11 rs3741411 189256
21 rs2821850 13693682
21 rs2257226 13695103
Sample of the map file from
North American Rheumatoid Arthritis Consortium (NARAC) dataset
76
Cases (RA) Controls Total
Male 227 342 569
Female 641 852 1493
Total 868 1194 2,062
SNP array data file
77
78
The percentage of the missing data
80
81
82
Common SNPs
Low frequency
Rare SNPs
Very rare SNPs
MAF>0.05
0.05<MAF<0.01
0.001<MAF<0.01
MAF<0.001
83
Introduction Literature review Data description pre-processing Methods Results Conclusion
84
84
Introduction Literature review Data description pre-processing Methods Results Conclusion
■ Very rare
■ Rare
■ Low
frequency
■ Common
MAF distribution for the data from chromosome 1 to chromosome 22
■ Very rare
■ Rare
■ Low frequency
■ Common
The common SNPs are 90.5%.
The histogram of the MAF
distribution
The SNPs
effect
(Annotation)
87
87
Introduction Literature review Data description pre-processing Methods Results Conclusion
88
Ensembl Variant
Effect Predictor
(VEP)
DNA
RNA
Protein
89
Non-coding transcript exon variant
TF-binding inside variant
3-prime UTR variant
Missense variant
Synonymous variant
Others
Intron variant
Non-coding transcript variant
Downstream Gene variant
Upstream Gene variant
Intergenic variant
NMD transcript variant
Regulatory region variant
All consequence’s regions
Very rare SNPs Rare SNPs
90
Low frequency SNPs Common SNPs
90
Introduction Literature review Data description pre-processing Methods Results Conclusion
Coding consequence’s regions
Missense variants
Synonymous variant
Others (Stop gained, Start lost, Stop lost, Coding sequence variant, Incomplete terminal codon variant)
Very rare SNPs Rare SNPs Low frequency SNPs Common SNPs
91
Introduction Literature review Data description pre-processing Methods Results Conclusion 91
92
93
94
Outlines
1. Introduction 2. Literature
review
3. Data
description
4. Pre-processing
5. Methods 6. Results 7. Conclusion
9
95
Introduction Literature review Data description pre-processing Methods Results Conclusion
Introduction Literature review Data description pre-processing Methods Results Conclusion
Our Data consists of about 545,080 SNPs for
about 2062 individuals (Cases and controls)
Matrix size =
2062*545080 = 1,123,954,960
about 5,619,774,800 letters
(it taking in acount homozygous and homozygous in a string format)
Introduction Literature review Data description pre-processing Methods Results Conclusion
SNPs from chromosome 1 to chromosome 22
(531,689 SNPs ×2,062 participants)
1,096,342,718
Introduction Literature review Data description pre-processing Methods Results Conclusion
Reading and cropping
Starting with reading genotyped
data and removing the first 9
columns
Introduction Literature review Data description pre-processing Methods Results Conclusion
100
rs10439884 rs2260810 rs1296971 rs2257224
G_G A_A A_A G_G
G_G A_A A_A G_G
G_G A_A A_A G_G
G_G A_A A_A A_A
G_G A_A A_A G_G
G_G A_A A_A G_G
G_G A_A A_A G_G
G_G G_G C_C A_G
A_G A_G A_C A_G
A_G A_G A_C G_G
?_? A_G A_C A_G
G_G A_G A_C G_G
Introduction Literature review Data description pre-processing Methods Results Conclusion
101
rs10439884 rs2260810 rs1296971 rs2257224
GG AA AA GG
GG AA AA GG
GG AA AA GG
GG AA AA AA
GG AA AA GG
GG AA AA GG
GG AA AA GG
GG GG CC AG
AG AG AC AG
AG AG AC GG
NA AG AC AG
GG AA AA GG
Introduction Literature review Data description pre-processing Methods Results Conclusion
Convert the genotyped matrices and its map file into gp.data form as preparation to
codeGeno function
Introduction Literature review Data description pre-processing Methods Results Conclusion
Imputation
using marginal allele distribution
Introduction Literature review Data description pre-processing Methods Results Conclusion
Introduction Literature review Data description pre-processing Methods Results Conclusion
Imputed data in bi-allelic format
Introduction Literature review Data description pre-processing Methods Results Conclusion
After Imputation and recoding using Synbreed R package
Introduction Literature review Data description pre-processing Methods Results Conclusion
0==Reference allele= Major allele
1==heterozygous
2==major allele
The output of imputation and
recoding
Pre-processed dataset
Introduction Literature review Data description pre-processing Methods Results Conclusion
From bi-allelic format to 1,2,3,4 format
A🡪1
C🡪2
G🡪3
T🡪4
Introduction Literature review Data description pre-processing Methods Results Conclusion
Family ID ID P ID M ID sex aff
SNPs
SNP 1
Preparing data for Haploview
Introduction Literature review Data description Methods Results
pre-processing Conclusion
110
Introduction Literature review Data description Methods Results
pre-processing Conclusion
Minor allele frequency (MAF)
quality control
Why studying the effect of MAF
(8 SNPs) in the CIT
Same size LD block
(12 SNPs) in SSLD
In 2019, Saad et al.
different MAF threshold
discard significant SNPs
while the size of block
was the same
113
Outlines
1. Introduction 2. Literature
review
3. Data
description
4. Pre-processing
5. Methods 6. Results 7. Conclusion
1
3
114
Introduction Literature review Data description pre-processing Methods Results Conclusion
Methods and workflow
The proposed methods for haplotype block partitioning
Applied haplotype partitioning methods
FGT
CIT SSLD
2002
2005
BigLD
2018
2002
115
Introduction Literature review Data description pre-processing Methods Results Conclusion
start
NARAC genotype
dataset ch21
NARAC map file
Position ch21
Reformatting Dataset
Imputation using ImputR
MAF=0.01
Biomarker check
MAF=0.02 MAF=0.05 MAF=0.1
MAF=0.001
Haploview
R
(BigLD)
Comparison and calculations
Haploview
R
(BigLD)
Haploview
R
(BigLD)
Haploview
R
(BigLD)
Haploview
R
(BigLD)
Flowchart and system
description
NARAC 22 chromosomes input files
NARAC genomic data
(2,062 individuals)
Perl
Ch1 to ch22 data and map file
Haploview
NARAC map file
(545,080 SNPs)
R
Haplotype blocks for 22 chromosome using 4 methods using 5 MAF
thresholds
R
Pre-processing
Imputation and recoding
Reformatting
Reformatting
for Haploview
MAF
0.001
MAF
0.01
MAF
0.05
MAF
0.02
MAF
0.1
FGT
CIT SSLD BigLD
Reformatting
for BigLD
Chromosomes
separated map file
Chromosomes
separated data
Haplotype block
partitioning
117
Flowchart and system
description
Ch1 to ch22 data and map file
Haploview
Haplotype blocks for 22 chromosome using 4 methods using 5 MAF
thresholds
R
Pre-
processing
Imputation and recoding
Reformatting
Reformatting for Haploview
MAF
0.001
MAF
0.01
MAF
0.05
MAF
0.02
MAF
0.1
FGT
CIT SSLD BigLD
Reformatting for BigLD
Haplotype block
partitioning
118
The measured parameters
Introduction Literature review Data description pre-processing Methods Results Conclusion
The measured evaluated parameters
NARAC 22 chromosomes input files
NARAC genomic data
(2,062 individuals)
Perl
Ch1 to ch22 data and map file
Haploview
NARAC map file
(545,080 SNPs)
R
Haplotype blocks
R
Pre-processing
Imputation and recoding
Reformatting
Reformatting
for Haploview
MAF
0.001
MAF
0.01
MAF
0.05
MAF
0.02
MAF
0.1
FGT
CIT SSLD BigLD
Reformatting
for BigLD
Chromosomes
separated map file
Chromosomes
separated data
Haplotype block
partitioning
121
Flowchart and system
description
Introduction Literature review Data description pre-processing Methods Results Conclusion
Proposed Method Based on Interval Graph Modeling
(BigLD)
123
Heatmap for the haplotype
blocks detected by interval
graph modeling of clusters for a
portion of chromosome 21 from
9,993,822 bp to 14,137,685 bp.
12
4
• Confidence interval test (CIT)
• Four-gamete test (FGT)
• Solid spine of linkage disequilibrium (SSLD)
Haploview
Solid spine of linkage disequilibrium
(SSLD)
126
Introduction Literature review Data description pre-processing Methods Results Conclusion
Implementation on Haploview
128
BLOCK 1. MARKERS: 4 5 6
132 (0.473) |0.277 0.124 0.074|
311 (0.467) |0.262 0.114 0.092|
112 (0.036) |0.011 0.017 0.008|
Multiallelic D prime: 0.054
BLOCK 2. MARKERS: 22 23
33 (0.564) |0.365 0.122 0.076|
13 (0.259) |0.103 0.052 0.104|
31 (0.175) |0.083 0.075 0.017|
Multiallelic D prime: 0.238
BLOCK 3. MARKERS: 26 27
33 (0.551) |0.242 0.174 0.136|
13 (0.252) |0.124 0.078 0.049|
11 (0.197) |0.081 0.077 0.038|
Multiallelic D prime: 0.068
BLOCK 4. MARKERS: 34 35
33 (0.446) |0.313 0.040 0.093|
31 (0.328) |0.209 0.064 0.055|
11 (0.223) |0.120 0.096 0.006|
Multiallelic D prime: 0.189
BLOCK 5. MARKERS: 39 40
31 (0.644) |0.322 0.323 0.000|
33 (0.201) |0.015 0.003 0.183|
13 (0.155) |0.155 0.000 0.000|
Multiallelic D prime: 0.669
Haploview input
Haploview output
Post-processing for a standardization output
Chromosome 1
Chromosome 2
Post-processing of Haploview output
Results and
discussion 13
2
133
Introduction Literature review Data description pre-processing Methods Results Conclusion
1) The total number of haplotype blocks
FGT
CIT
SSLD
BigLD
134
Introduction Literature review Data description pre-processing Methods Results Conclusion
1) The total number of haplotype blocks
The smaller is the number of haplotypes blocks the greater is
the reduction rate.
135
Introduction Literature review Data description pre-processing Methods Results Conclusion
The MAF and total number of haplotype blocks
136
Introduction Literature review Data description pre-processing Methods Results Conclusion
1) The total number of haplotype blocks
0.1
0.05
0.02
0.01
0.001
137
Introduction Literature review Data description pre-processing Methods Results Conclusion
2) Total number of blocks with considering the singletons
CIT
FGT
BigLD
SSLD
138
Introduction Literature review Data description pre-processing Methods Results Conclusion
2) Total number of blocks with considering the singletons
139
Introduction Literature review Data description pre-processing Methods Results Conclusion
3) Total number of SNPs in all blocks
SSLD
BigLD
FGT
CIT
140
Introduction Literature review Data description pre-processing Methods Results Conclusion
3) Total number of SNPs in all blocks
141
Introduction Literature review Data description pre-processing Methods Results Conclusion
4) The total length of all blocks (bp)
SSLD
142
Introduction Literature review Data description pre-processing Methods Results Conclusion
4) The total length of all blocks (bp)
0.1
0.05
0.02
0.01
0.001
143
Introduction Literature review Data description pre-processing Methods Results Conclusion
5) Mean number of SNPs in blocks
The BigLD and SSLD has higher mean number of SNPs in blocks than FIG and CIT
144
Introduction Literature review Data description pre-processing Methods Results Conclusion
5) Mean number of SNPs in blocks
• MAF does not affect the mean number of SNPs in blocks so much.
• The highest mean number of SNPs in blocks is in chromosome 6
using SSLD with MAF=0.1 which equals to 6.637.
145
Introduction Literature review Data description pre-processing Methods Results Conclusion
5) Mean number of SNPs in blocks
The BigLD has almost the same mean number of SNPs in blocks in the range from
0.001 to 0.05 and a higher mean number of SNPs in blocks at MAF=0.1.
The SSLD’s mean number of SNPs in blocks increases with the MAF threshold increases
146
Introduction Literature review Data description pre-processing Methods Results Conclusion
6) The mean block length in base pair
SSLD
BigLD
CIT
FG
147
Introduction Literature review Data description pre-processing Methods Results Conclusion
6) The mean block length in base pair
0.1
0.05
0.02
0.01
0.001
148
Introduction Literature review Data description pre-processing Methods Results Conclusion
7) The mean r2 within blocks
The correlation mean r2 within the blocks is higher in
BigLD in general
BigLD
CIT
FGT
SSLD
149
Introduction Literature review Data description pre-processing Methods Results Conclusion
7) The mean r2 within blocks
0.1
0.05
0.02
0.01
0.001
Contrastingly, in the BigLD method, the mean r2 within a block
decrease with the MAF threshold increases
150
Introduction Literature review Data description pre-processing Methods Results Conclusion
8) The mean r2 between consecutive blocks
(without considering the singleton blocks)
FGT
CIT
BigLD
SSLD
0.1
0.05
0.02
0.01
0.001
151
Introduction Literature review Data description pre-processing Methods Results Conclusion
8) The mean r2 between consecutive blocks
(without considering the singleton blocks)
152
Introduction Literature review Data description pre-processing Methods Results Conclusion
8) The mean r2 between consecutive blocks
(without considering the singleton blocks)
FGT
CIT
BigLD
SSLD
153
Introduction Literature review Data description pre-processing Methods Results Conclusion
9) The mean r2 between consecutive all blocks
(with singleton)
FGT
CIT
SSLD
BigLD
154
Introduction Literature review Data description pre-processing Methods Results Conclusion
9) The mean r2 between consecutive all blocks
(with singleton)
155
Introduction Literature review Data description pre-processing Methods Results Conclusion
10) The intersection percentage
Methods Matching percentage
FGT, CIT and, SSLD 67%
FGT, CIT, SSLD and, Big-LD 57.45%
FGT and Big-LD 78.6%
CIT and Big-LD 76.7%
SSLD and Big-LD 71.92%
The results of agreement in percentage of haplotype blocks
produced by compared methods
15
6
Index bp FGT CIT SSLD Big_LD
1 9993822
2 13562271 ✓ ✓
3 13609442 ✓ 1
4 13690214 ✓ ✓ ✓ ✓
5 13693682 ✓ ✓ ✓ ✓
6 13695103 ✓ ✓ ✓ ✓
7 13707489 ✓
8 13729007
9 13769165 ✓
10 13823791 ✓ ✓
11 13823972 ✓ ✓
12 13865210 ✓
13 13866986 ✓ ✓ ✓
14 13879844 ✓ ✓
15 13895773 ✓
16 13950406
17 14027356 ✓
18 14059449 ✓
19 14070504
20 14087640 ✓ ✓
21 14088675 ✓ ✓
22 14092050 ✓ ✓ ✓ ✓
23 14095717 ✓ ✓ ✓
24 14121682
25 14128555 ✓
26 14136579 ✓ ✓ ✓ ✓
27 14137685 ✓ ✓ ✓ ✓
28 14191954
29 14197852
30 14258290 ✓ ✓
31 14261969 ✓ ✓
32 14292734
33 14297378
34 14322489 ✓ ✓ ✓
35 14334270 ✓ ✓ ✓
36 14334702 ✓ ✓ ✓
37 14367339
Intersection among
the methods
Plot of a sample of chromosome 21 haplotype blocks produced by
FGT, CIT, SSLD, and Big-LD.
15
8
Compared parameters Big-LD FGT CIT SSLD
Max. No. of SNPs in each block 26 17 20 27
Total No. of Blocks 1,182 1,562 1,464 1,378
Max. block size (in bp) 190,708 140,491 178,064 218,644
Min. block size (in bp) 34 4 2 12
Percentage of uncovered SNPs 14.5% 12.8% 22.1% 4.9%
Median No. of SNPs within each block 4 4 3 4
Median block size (in bp) 9,830 7,551 6,783 10,870
Total block size (in bp) 23932662 23452817 23696256 23696256
The comparison between Big-LD, FGT, CIT, and SSLD haplotype block
partitioning methods in chromosome 21
16
0
Max. No. SNPs in each block
161
Init. SNPs No. SNPs No. blocks
Max Block
size (bp)
Min Block
size (bp)
Max Block size
(SNPs)
Min Block size
(SNPs)
Median block
size (bp)
Median No. SNPs
within each block
% uncovered SNPs
Ch 1 40929 39000 6586 192010 10 41 2 13040 4 12.54103
Ch 2 44090 42033 6586 192010 10 41 2 13040 4 12.57583
Ch 3 36690 35078 5502 214260 13 62 2 12425 4 13.22766
Ch 4 32628 31123 5018 222084 14 37 2 13234 4 13.33098
Ch 5 33612 32220 4992 314333 13 49 2 12636.5 4 13.48852
Ch 6 35574 34140 5020 216439 11 55 2 12732 4 11.90393
Ch 7 29244 28120 4315 260965 7 44 2 12220 4 13.58464
Ch 8 30990 28120 4315 260965 7 44 2 12220 4 13.58464
Ch 9 26128 25095 3741 190762 17 44 2 9934 4 12.38892
Ch 10 28331 27070 3968 206622 8 57 2 11668.5 4 12.60805
Ch 11 26477 25333 3901 214818 11 61 2 11601 4 12.91596
Ch 12 26365 25229 3879 190743 15 37 2 11500 4 13.68267
Ch 13 20242 19380 2842 191785 12 39 2 12827.5 4 12.52322
Ch 14 17951 17243 2648 201362 22 32 2 11740 4 13.50693
Ch 15 16166 15470 2648 201362 22 32 2 11740 4 13.23206
Ch 16 16460 15780 2442 237308 10 37 2 9532.5 4 14.02408
Ch 17 14027 13538 2273 247297 10 30 2 10013 4 16.49431
Ch 18 16450 15708 2406 250370 12 61 2 10374 4 13.38172
Ch 19 9236 8973 1596 134312 15 44 2 9875 3 19.20205
Ch 20 13843 13310 2056 144803 11 36 2 10054 4 13.78663
Ch 21 8051 7786 1185 190709 34 32 2 9587 4 13.52427
Ch 22 8205 7887 1240 176594 7 44 2 7769 4 14.42881
Total 531689 507636 79159 4651913 291 959 44 249763 87 299.9369
Conclusion
16
3
164
Introduction Literature review Data description pre-processing Methods Results Conclusion
164
• The alleles distribution and description.
• The percentage of SNP appearance in physical location in chromosomes
affect by SNP’s MAF.
• The genotype imputation and preprocessing are crucial steps in HBP and
we produce a sequence of preprocessing that facilitates several any
genetic analysis.
Conclusion
165
Introduction Literature review Data description pre-processing Methods Results Conclusion
165
• HBP reduce the biomarker to about 13%
• Big-LD method provided robust blocks partitioning in terms of
the block size and genomic coverage.
Conclusion
166
• There is a 70% intersection agreement among most HBP methods, Big-LD
matched more with FGT.
• FGT produce modest results in term of correlation and in term of
biomarker reduction.
• BigLD produced large haplotype blocks and show high r2 between blocks
and the lowest r2 between blocks considering the singleton blocks.
• In term of computation, BigLD takes less than half computational time of
Haploview methods.
166
Introduction Literature review Data description pre-processing Methods Results Conclusion
166
Conclusion
167
• MAF quality control has a high effect of haplotype block partitioning
• We recommend taking the MAF in consideration when applying a
haplotype block partition. However, it is a tradeoff, higher MAF
produces a higher correlation within the blocks but its trunk a
portion of data that may could be significant.
• In term of correlation, we recommend using high MAF while using
Haploview methods, low or moderate MAF in BigLD method.
167
Introduction Literature review Data description pre-processing Methods Results Conclusion
167
Conclusion
168
• We could answer the question related to MAF 🡪 the number of
blocks not necessarily affect the number of SNPs within blocks.
• The SNPs within blocks is the highest in SSLD at the same MAF
due to its size and is lowest in CIT
• At the same block size, the SNPs within blocks is decrease with
MAF increase.
168
Introduction Literature review Data description pre-processing Methods Results Conclusion
168
Conclusion
Publications and Future work
169
170

More Related Content

Similar to Algorithm Implementation of Genetic Association ‎Analysis for Rheumatoid Arthritis Data Based on ‎Haplotype Blocks

Microarray
MicroarrayMicroarray
Microarray
ruchibioinfo
 
Genome in a bottle april 30 2015 hvp Leiden
Genome in a bottle april 30 2015 hvp LeidenGenome in a bottle april 30 2015 hvp Leiden
Genome in a bottle april 30 2015 hvp Leiden
GenomeInABottle
 
Genome editing comes of age
Genome editing comes of ageGenome editing comes of age
Genome editing comes of age
Jan Hryca
 
Cnv and a analysis strategies
Cnv and a analysis strategiesCnv and a analysis strategies
Cnv and a analysis strategies
Elsa von Licy
 
Advances and Applications Enabled by Single Cell Technology
Advances and Applications Enabled by Single Cell TechnologyAdvances and Applications Enabled by Single Cell Technology
Advances and Applications Enabled by Single Cell Technology
QIAGEN
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
Genome Reference Consortium
 
Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...
GenomeInABottle
 
Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...
Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...
Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...
VHIR Vall d’Hebron Institut de Recerca
 
Predicting phenotype from genotype with machine learning
Predicting phenotype from genotype with machine learningPredicting phenotype from genotype with machine learning
Predicting phenotype from genotype with machine learning
Patricia Francis-Lyon
 
Genomica - Microarreglos de DNA
Genomica - Microarreglos de DNAGenomica - Microarreglos de DNA
Genomica - Microarreglos de DNA
Ulises Urzua
 
20100509 bioinformatics kapushesky_lecture03-04_0
20100509 bioinformatics kapushesky_lecture03-04_020100509 bioinformatics kapushesky_lecture03-04_0
20100509 bioinformatics kapushesky_lecture03-04_0
Computer Science Club
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
GenomeInABottle
 
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
GenomeInABottle
 
Aug2015 analysis team 10 mason epigentics
Aug2015 analysis team 10 mason epigenticsAug2015 analysis team 10 mason epigentics
Aug2015 analysis team 10 mason epigentics
GenomeInABottle
 
Genome Editing Comes of Age; CRISPR, rAAV and the new landscape of molecular ...
Genome Editing Comes of Age; CRISPR, rAAV and the new landscape of molecular ...Genome Editing Comes of Age; CRISPR, rAAV and the new landscape of molecular ...
Genome Editing Comes of Age; CRISPR, rAAV and the new landscape of molecular ...
Candy Smellie
 
AI Systems @ Manchester
AI Systems @ ManchesterAI Systems @ Manchester
AI Systems @ Manchester
Andre Freitas
 
Micro array based comparative genomic hybridisation -Dr Yogesh D
Micro array based comparative genomic hybridisation -Dr Yogesh DMicro array based comparative genomic hybridisation -Dr Yogesh D
Micro array based comparative genomic hybridisation -Dr Yogesh D
Dr.Yogesh D
 
ppgardner-lecture06-homologysearch.pdf
ppgardner-lecture06-homologysearch.pdfppgardner-lecture06-homologysearch.pdf
ppgardner-lecture06-homologysearch.pdf
Paul Gardner
 
Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030
GenomeInABottle
 
a brief introduction to epistasis detection
a brief introduction to epistasis detectiona brief introduction to epistasis detection
a brief introduction to epistasis detection
Hyun-hwan Jeong
 

Similar to Algorithm Implementation of Genetic Association ‎Analysis for Rheumatoid Arthritis Data Based on ‎Haplotype Blocks (20)

Microarray
MicroarrayMicroarray
Microarray
 
Genome in a bottle april 30 2015 hvp Leiden
Genome in a bottle april 30 2015 hvp LeidenGenome in a bottle april 30 2015 hvp Leiden
Genome in a bottle april 30 2015 hvp Leiden
 
Genome editing comes of age
Genome editing comes of ageGenome editing comes of age
Genome editing comes of age
 
Cnv and a analysis strategies
Cnv and a analysis strategiesCnv and a analysis strategies
Cnv and a analysis strategies
 
Advances and Applications Enabled by Single Cell Technology
Advances and Applications Enabled by Single Cell TechnologyAdvances and Applications Enabled by Single Cell Technology
Advances and Applications Enabled by Single Cell Technology
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...
 
Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...
Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...
Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...
 
Predicting phenotype from genotype with machine learning
Predicting phenotype from genotype with machine learningPredicting phenotype from genotype with machine learning
Predicting phenotype from genotype with machine learning
 
Genomica - Microarreglos de DNA
Genomica - Microarreglos de DNAGenomica - Microarreglos de DNA
Genomica - Microarreglos de DNA
 
20100509 bioinformatics kapushesky_lecture03-04_0
20100509 bioinformatics kapushesky_lecture03-04_020100509 bioinformatics kapushesky_lecture03-04_0
20100509 bioinformatics kapushesky_lecture03-04_0
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
 
Aug2015 analysis team 10 mason epigentics
Aug2015 analysis team 10 mason epigenticsAug2015 analysis team 10 mason epigentics
Aug2015 analysis team 10 mason epigentics
 
Genome Editing Comes of Age; CRISPR, rAAV and the new landscape of molecular ...
Genome Editing Comes of Age; CRISPR, rAAV and the new landscape of molecular ...Genome Editing Comes of Age; CRISPR, rAAV and the new landscape of molecular ...
Genome Editing Comes of Age; CRISPR, rAAV and the new landscape of molecular ...
 
AI Systems @ Manchester
AI Systems @ ManchesterAI Systems @ Manchester
AI Systems @ Manchester
 
Micro array based comparative genomic hybridisation -Dr Yogesh D
Micro array based comparative genomic hybridisation -Dr Yogesh DMicro array based comparative genomic hybridisation -Dr Yogesh D
Micro array based comparative genomic hybridisation -Dr Yogesh D
 
ppgardner-lecture06-homologysearch.pdf
ppgardner-lecture06-homologysearch.pdfppgardner-lecture06-homologysearch.pdf
ppgardner-lecture06-homologysearch.pdf
 
Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030
 
a brief introduction to epistasis detection
a brief introduction to epistasis detectiona brief introduction to epistasis detection
a brief introduction to epistasis detection
 

More from Fatma Sayed Ibrahim

Introduction to computer architecture .pptx
Introduction to computer architecture .pptxIntroduction to computer architecture .pptx
Introduction to computer architecture .pptx
Fatma Sayed Ibrahim
 
Introduction to haplotype blocks .pptx
Introduction to haplotype blocks .pptxIntroduction to haplotype blocks .pptx
Introduction to haplotype blocks .pptx
Fatma Sayed Ibrahim
 
CIBEC Presentation Fatma Sayed.pptx
CIBEC Presentation Fatma Sayed.pptxCIBEC Presentation Fatma Sayed.pptx
CIBEC Presentation Fatma Sayed.pptx
Fatma Sayed Ibrahim
 
The steps of R code Master.pptx
The steps of R code Master.pptxThe steps of R code Master.pptx
The steps of R code Master.pptx
Fatma Sayed Ibrahim
 
installationoftensorflowandkeras-190310121258.pptx
installationoftensorflowandkeras-190310121258.pptxinstallationoftensorflowandkeras-190310121258.pptx
installationoftensorflowandkeras-190310121258.pptx
Fatma Sayed Ibrahim
 
Automatic System for Detection and Classification of Brain Tumors
Automatic System for Detection and Classification of Brain TumorsAutomatic System for Detection and Classification of Brain Tumors
Automatic System for Detection and Classification of Brain Tumors
Fatma Sayed Ibrahim
 
Hospital architecture design planning
Hospital architecture design  planningHospital architecture design  planning
Hospital architecture design planning
Fatma Sayed Ibrahim
 

More from Fatma Sayed Ibrahim (7)

Introduction to computer architecture .pptx
Introduction to computer architecture .pptxIntroduction to computer architecture .pptx
Introduction to computer architecture .pptx
 
Introduction to haplotype blocks .pptx
Introduction to haplotype blocks .pptxIntroduction to haplotype blocks .pptx
Introduction to haplotype blocks .pptx
 
CIBEC Presentation Fatma Sayed.pptx
CIBEC Presentation Fatma Sayed.pptxCIBEC Presentation Fatma Sayed.pptx
CIBEC Presentation Fatma Sayed.pptx
 
The steps of R code Master.pptx
The steps of R code Master.pptxThe steps of R code Master.pptx
The steps of R code Master.pptx
 
installationoftensorflowandkeras-190310121258.pptx
installationoftensorflowandkeras-190310121258.pptxinstallationoftensorflowandkeras-190310121258.pptx
installationoftensorflowandkeras-190310121258.pptx
 
Automatic System for Detection and Classification of Brain Tumors
Automatic System for Detection and Classification of Brain TumorsAutomatic System for Detection and Classification of Brain Tumors
Automatic System for Detection and Classification of Brain Tumors
 
Hospital architecture design planning
Hospital architecture design  planningHospital architecture design  planning
Hospital architecture design planning
 

Recently uploaded

一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
xclpvhuk
 
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
slg6lamcq
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
a9qfiubqu
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
UofT毕业证如何办理
UofT毕业证如何办理UofT毕业证如何办理
UofT毕业证如何办理
exukyp
 
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
y3i0qsdzb
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
z6osjkqvd
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
wyddcwye1
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 

Recently uploaded (20)

一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
 
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
UofT毕业证如何办理
UofT毕业证如何办理UofT毕业证如何办理
UofT毕业证如何办理
 
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 

Algorithm Implementation of Genetic Association ‎Analysis for Rheumatoid Arthritis Data Based on ‎Haplotype Blocks

  • 1. 1 Master of science dissertation defense
  • 2. ﴿ ْ ‫م‬َ‫ل‬َ‫أ‬ َْ‫ر‬َ‫ت‬ ْ ‫ن‬َ‫أ‬ ْ َ‫ّللا‬ َْ‫ل‬َ‫ز‬‫ن‬َ‫أ‬ َْ‫ن‬ِ‫م‬ ِْ‫اء‬َ‫م‬‫الس‬ ْ ‫اء‬َ‫م‬ ‫َا‬‫ن‬‫ج‬َ‫ر‬‫خ‬َ‫أ‬َ‫ف‬ ْ ِ‫ه‬ِ‫ب‬ ْ ‫ات‬َ‫ر‬َ‫م‬َ‫ث‬ ْ ‫خ‬ُ‫م‬ ‫ا‬‫ف‬ِ‫ل‬َ‫ت‬ ‫ا‬َ‫ه‬ُ‫ن‬‫ا‬ َ‫و‬‫ل‬َ‫أ‬ َْ‫ن‬ِ‫م‬ َ‫و‬ ِْ‫ل‬‫ا‬َ‫ب‬ ِ‫ج‬‫ال‬ ْ ‫د‬َ‫د‬ُ‫ج‬ ْ ‫يض‬ِ‫ب‬ ْ ‫ر‬‫م‬ُ‫ح‬ َ‫و‬ ْ ‫ف‬ِ‫ل‬َ‫ت‬‫خ‬ُ‫م‬ ‫ا‬َ‫ه‬ُ‫ن‬‫ا‬ َ‫و‬‫ل‬َ‫أ‬ ُْ‫يب‬ِ‫ب‬‫ا‬َ‫َر‬‫غ‬ َ‫و‬ ْ ‫ود‬ُ‫س‬ ❁ َْ‫ن‬ِ‫م‬ َ‫و‬ ِْ ‫اس‬‫الن‬ ْ ِ‫اب‬ َ‫و‬‫الد‬ َ‫و‬ ِْ‫ام‬َ‫ع‬‫األن‬ َ‫و‬ ْ ‫ف‬ِ‫ل‬َ‫ت‬‫خ‬ُ‫م‬ ْ ُ‫ه‬ُ‫ن‬‫ا‬ َ‫و‬‫ل‬َ‫أ‬ َْ‫ك‬ِ‫ل‬َ‫ذ‬َ‫ك‬ ‫ا‬َ‫م‬‫ن‬ِ‫إ‬ ْ َ‫ي‬ ََ‫خ‬ ْ َ‫ّللا‬ ْ ‫ن‬ِ‫م‬ ْ ِ‫ه‬ِ‫د‬‫ا‬َ‫ب‬ِ‫ع‬ ْ ُ‫ء‬‫ا‬َ‫م‬َ‫ل‬ُ‫ع‬‫ال‬ ْ ‫ن‬ِ‫إ‬ ْ َ‫ّللا‬ ْ ‫يز‬ ِ ‫ز‬َ‫ع‬ ْ ‫ور‬ُ‫ف‬َ‫غ‬ ﴾ [ ‫فاطر‬ :35-36] 2
  • 3. 3 Minia University Faculty of Engineering Biomedical Engineering Department Fatma Sayed Ibrahim Master of science thesis defense Wednesday, January 27, 2021 Algorithm Implementation of Genetic Association Analysis for Rheumatoid Arthritis Data Based on Haplotype Blocks
  • 5. 5 Supervisors Prof. Dr. Hesham Fathy A. Hamed Former Dean of Faculty of Engineering, Minia University Professor at Egyptian-Russian University Dr. Ashraf Mahroos Said Associate Professor Biomedical Engineering Department, Minia University, Dr. Mohamed Nagy Saad Assistant Professor Biomedical Engineering Department, Minia University
  • 6. 6 Dr Muhammad Ali M. Rushdi Biomedical Engineering and Systems Department Faculty of Engineering, Cairo university Dr Essam Halim Houssein Vice-dean for Postgraduate studies and research affairs Faculty of Computers and Information, Minia University Thesis committee members Prof. Dr. Hesham Fathy A. Hamed Former Dean of Faculty of Engineering, Minia University Professor at Egyptian- Russian University Dr. Ashraf Mahroos Said Associate Professor Biomedical Engineering Department Minia University, Minia
  • 7. 7 1. Introduction 2. Literature review 3. Data description 4. Pre-processing 5. Methods 6. Results 7. Conclusion Outlines
  • 8. 8 Introduction Literature review Data description pre-processing Methods Results Conclusion Background Motivation Research Objectives Introduction
  • 9. 9 Introduction Literature review Data description pre-processing Methods Results Conclusion Background
  • 10.
  • 11.
  • 12. 12
  • 13. 13
  • 14. 14 All human DNA is 99.9% identical and, 0.1 % is difference
  • 15. A A A T T T A G A T T C SNP 15 A SNP is a mutation at a single nucleotide position, where a possible nucleotide type is called an allele. SNP
  • 16. Chromosome Chromosome Allele 1 Allele 2 A T T T A A T A Major homozygous genotype Minor homozygous genotype Heterozygous genotype 16
  • 20. 20 The minor allele frequency (MAF) ...ATGTCACACGTACTT... ...ATGTCACACGTACTT... ...ATGACACAGGTACTT... ...ATGTCACAGGTACTT... ...ATGTCACAGGTACTT... ...ATGACACAGGTACTT... ...ATGTCACAGGTACTT... ...ATGTCACAGGTACTT... ...ATGACACACGTACTT... ...ATGACACAGGTACTT... SNP1 SNP2 SNP1 SNP2 Allele 1 Allele 2 Allele 1 frequency Allele 2 frequency Allele 1 Allele 2 Major Minor T C A G 6 3 4 7 60% 30% 40% 70% T G A C
  • 21. Linkage Disequilibrium (LD) 21 LD is the nonrandom association of alleles at different sites.
  • 22. Ab Ab Ab aB Ab aB aB aB AB AB ab AB AB ab ab ab ab Equilibrium AB AB ab AB AB ab ab ab ab Disequilibrium AB AB AB AB ab ab ab ab 22
  • 23. 23
  • 24. 24 Low LD High LD A C A T A C A T G C G T G C G T Individuals SNP 1 SNP 2 A C A C A C A C G T G T G T G T Individuals SNP 1 SNP 2 24
  • 26. Pairwise LD 1 2 3 5 6 4 r 2 Color Key 0.2 0 0.6 0.8 1 0.4 26 Measuring the LD
  • 28. 28
  • 29. 29
  • 35. SNP1 SNP2 SNP3 SNP4 SNP5 SNP6 SNP7 SNP8 1 2 3 4 5 6 7 8 Block 1 Block 2 35
  • 36. 36 The role of SNPs and haplotype blocks in diseases
  • 37. Environmental component Genetic component Infectious diseases Complex diseases Hereditary disease 37
  • 39. 39 Samples from Population Cases and controls Data acquisition Genotyping Data SNP array dataset
  • 40. 40 Single SNP-based association studies approach Haplotype block- based association studies approach
  • 41. 41 Single SNP-based association studies approach Single point mutation (Single SNP) Single point mutation result in a disease
  • 42. 42 Haplotype block-based association studies approach Multiple SNPs Multiple SNPs contributes to disease susciptability
  • 43. Why haplotype blocks? • Dimensional reduction • Act better in complex diseases • Multiple SNPs could have moderate effect • Considers the interrelationship between linked • Evolutionary studies 43 Introduction Literature review Data description pre-processing Methods Results Conclusion
  • 44. 44 Introduction Literature review Data description pre-processing Methods Results Conclusion Background Motivation Research Objectives Introduction
  • 46. Why this point ? • The genetic variations influence our predisposition to diseases and any disease has a genetic component, even infectious diseases. • Complex diseases are very common in societies. In particular, chronic condition hugely affects the productivity of a person and its quality of life. • Since the haplotype blocks are much more effective and powerful in such case 46 Motivations Introduction Literature review Data description pre-processing Methods Results Conclusion
  • 47. Why this point ? • The gap in knowledge in this field (especially MAF). There are many remains questions that are unanswered • No so many Arabs in this field especially in Minia. 47 Introduction Literature review Data description pre-processing Methods Results Conclusion
  • 48. 48 Research Objectives Introduction Literature review Data description pre-processing Methods Results Conclusion • Practically implement computational algorithms to partition genotyped data based on the haplotype blocks. • Find the best haplotype partitioning method applied for the whole-genome case-control dataset to reduce the number of SNPs in the association study.
  • 49. 49 . . . G_G A_C . . . . A_G A_A . . . . . . . . . . . . . . . . . . . . . . . . . G_G A_C . . . A_A SNPs
  • 52. 52 The Input 52 . . . G_G A_C . . . . A_G A_A . . . . . . . . . . . . . . . . . . . . . . . . . G_G A_C . . . A_A SNPs
  • 53. 53 . . . . G_G A_C G_G . . . . . A_G A_A A_G . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G_G A_C G_G . . . . A_A SNPs
  • 54. 54 . G_G A_C G_G A_C G_G G_G . . A_G A_A A_G A_A A_G A_G . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G_G A_C G_G A_C G_G G_G . A_A SNPs Block 1 Block 2 Individuals IDs
  • 55. 55 Start index End index Start rsID End rsID rs22572xxx rs225722xxx rs74307xxx rs198574xxx The output
  • 56. 56 Major finding Introduction Literature review Data description pre-processing Methods Results Conclusion • Practically data exploration and uncovering interesting detections • Investigating partitioning method from literature review and empirical comparative study (Biomarker reduction with high SNP correlation) • Sequence of data preprocessing 🡪 R (Hope to make it in R package) • The role of MAF on haplotype block partitioning
  • 59. 59 Biomarker reduction percentage 13.5 18.7 • The value is varying from one method to another • MAF has an impact on the results
  • 60. 60 Outlines 1. Introduction 2. Literature review 3. Data description 4. Pre-processing 5. Methods 6. Results 7. Conclusion
  • 61. Literature review 61 Introduction Literature review Data description pre-processing Methods Results Conclusion
  • 62. Literature review 62 Introduction Literature review Data description pre-processing Methods Results Conclusion The main projects in genomics Haplotype bock partitioning methods
  • 63. 2001 2002 2005 2007 2009 Human Genome Project International HapMap Project The 100,000 Genomes Project The 1000 Genome Project The announcement of the HGP The completion of the HGP The initial HGP sequencing The announcement of the International HapMap Project HapMap Phase I completion HapMap Phase II completion HapMap Phase III completion 1990 2003 2008 2010 2012 2015 2012 2015 2019 The announcement of 1KGP The completion of pilot phase The 1KGP fulfill its goal The 1KGP’s completion Northern Ireland and Scotland joint the project Beginning the initiative to involve the public in genomic research The 100, 000 Genome Project’s announcement 63
  • 64. Literature review haplotype block partitioning Survey in from 2001 to 2021 Introduction Literature review Data description pre-processing Methods Results Conclusion
  • 67. 67
  • 68. 68 Hidden Markov model (HMM) • 2001 Greedy algorithm (GA) • 2002 Dynamic programming (DP) • 2002 Confidence interval (CI) • 2002 Four-gamete test (FGT) • 2002 The minimum description length (MDL) • 2003 Ning Wang Kui Zhang Stacey Gabriel Nila Patil Mark Daly Mikko Koivisto
  • 69. 69 Solid spine of LD (SSLD) • 2005 Markov Chain Monte Carlo (MCMC) algorithm • 2008 Xor-genotypes • 2009 Wavelet transforms • 2011 GA-SVM algorithm • 2013 From 2005 to 2013 Jeffrey Barrett Pattaro
  • 70. 70 MIG++ • 2014 S-MIG ++ • 2015 Big-LD • 2018 on neutrosophic c-means (NCM) algorithm • 2020 LDBlockShow • 2020 From 2014 to 2020 Daniel Talian Sunah Kim
  • 71. 71 Introduction Literature review Data description pre-processing Methods Results Conclusion Data description and exploration
  • 72. 72
  • 73. 73 Introduction Literature review Data description pre-processing Methods Results Conclusion Data description and exploration NARAC dataset Description Map file SNP array data file Missing data Participants % Cases Very rare SNPs Alleles distribution Genotype distribution % Male % Female Rare SNPs SNP annotation % Controls Visualization based on MAF viewpoint MAF for each SNP Low-frequency SNPs Common SNPs % Male % Female
  • 74. 74 Participants ID Af fe ct io n Se x DRB1 _1 DRB1 _2 SE Num SE Status Anti- CCP RFUW rs104 39884 D0024949 0 F 0101 0401 SS yes ? ? G_G D0024302 0 F 0101 7 SN yes ? ? G_G D0023151 0 F 0101 11 SN yes ? ? G_G D0022042 0 F 0101 2 SN yes ? ? G_G D0021275 0 F 0101 7 SN yes ? ? G_G D0021163 0 F 0101 0403 SN yes ? ? G_G D0020795 0 F 0101 3 SN yes ? ? G_G 6045201 1 F 0101 7 SN yes 31.3 142 G_G D0023027 0 M 0101 3 SN yes ? ? G_G 1015200 1 M 0101 0403 SN yes 112.9 405 ?_? D0015941 0 F 2 7 NN no ? ? A_G D0016405 0 F 0101 7 SN yes ? ? ?_? KNH763243 1 M 0404 0301 SN yes 99 ? G_G Sample of the SNPs’ array data
  • 75. 75 Chromosome rsID Position 1 rs3094315 792429 1 rs12562034 808311 11 rs3802985 188510 11 rs3741411 189256 21 rs2821850 13693682 21 rs2257226 13695103 Sample of the map file from
  • 76. North American Rheumatoid Arthritis Consortium (NARAC) dataset 76 Cases (RA) Controls Total Male 227 342 569 Female 641 852 1493 Total 868 1194 2,062
  • 77. SNP array data file 77
  • 78. 78 The percentage of the missing data
  • 79.
  • 80. 80
  • 81. 81
  • 82. 82 Common SNPs Low frequency Rare SNPs Very rare SNPs MAF>0.05 0.05<MAF<0.01 0.001<MAF<0.01 MAF<0.001
  • 83. 83 Introduction Literature review Data description pre-processing Methods Results Conclusion
  • 84. 84 84 Introduction Literature review Data description pre-processing Methods Results Conclusion
  • 85. ■ Very rare ■ Rare ■ Low frequency ■ Common MAF distribution for the data from chromosome 1 to chromosome 22
  • 86. ■ Very rare ■ Rare ■ Low frequency ■ Common The common SNPs are 90.5%. The histogram of the MAF distribution
  • 87. The SNPs effect (Annotation) 87 87 Introduction Literature review Data description pre-processing Methods Results Conclusion
  • 90. Non-coding transcript exon variant TF-binding inside variant 3-prime UTR variant Missense variant Synonymous variant Others Intron variant Non-coding transcript variant Downstream Gene variant Upstream Gene variant Intergenic variant NMD transcript variant Regulatory region variant All consequence’s regions Very rare SNPs Rare SNPs 90 Low frequency SNPs Common SNPs 90 Introduction Literature review Data description pre-processing Methods Results Conclusion
  • 91. Coding consequence’s regions Missense variants Synonymous variant Others (Stop gained, Start lost, Stop lost, Coding sequence variant, Incomplete terminal codon variant) Very rare SNPs Rare SNPs Low frequency SNPs Common SNPs 91 Introduction Literature review Data description pre-processing Methods Results Conclusion 91
  • 92. 92
  • 93. 93
  • 94. 94 Outlines 1. Introduction 2. Literature review 3. Data description 4. Pre-processing 5. Methods 6. Results 7. Conclusion 9
  • 95. 95 Introduction Literature review Data description pre-processing Methods Results Conclusion
  • 96. Introduction Literature review Data description pre-processing Methods Results Conclusion
  • 97. Our Data consists of about 545,080 SNPs for about 2062 individuals (Cases and controls) Matrix size = 2062*545080 = 1,123,954,960 about 5,619,774,800 letters (it taking in acount homozygous and homozygous in a string format) Introduction Literature review Data description pre-processing Methods Results Conclusion
  • 98. SNPs from chromosome 1 to chromosome 22 (531,689 SNPs ×2,062 participants) 1,096,342,718 Introduction Literature review Data description pre-processing Methods Results Conclusion
  • 99. Reading and cropping Starting with reading genotyped data and removing the first 9 columns Introduction Literature review Data description pre-processing Methods Results Conclusion
  • 100. 100 rs10439884 rs2260810 rs1296971 rs2257224 G_G A_A A_A G_G G_G A_A A_A G_G G_G A_A A_A G_G G_G A_A A_A A_A G_G A_A A_A G_G G_G A_A A_A G_G G_G A_A A_A G_G G_G G_G C_C A_G A_G A_G A_C A_G A_G A_G A_C G_G ?_? A_G A_C A_G G_G A_G A_C G_G Introduction Literature review Data description pre-processing Methods Results Conclusion
  • 101. 101 rs10439884 rs2260810 rs1296971 rs2257224 GG AA AA GG GG AA AA GG GG AA AA GG GG AA AA AA GG AA AA GG GG AA AA GG GG AA AA GG GG GG CC AG AG AG AC AG AG AG AC GG NA AG AC AG GG AA AA GG Introduction Literature review Data description pre-processing Methods Results Conclusion
  • 102. Convert the genotyped matrices and its map file into gp.data form as preparation to codeGeno function Introduction Literature review Data description pre-processing Methods Results Conclusion
  • 103. Imputation using marginal allele distribution Introduction Literature review Data description pre-processing Methods Results Conclusion
  • 104. Introduction Literature review Data description pre-processing Methods Results Conclusion
  • 105. Imputed data in bi-allelic format Introduction Literature review Data description pre-processing Methods Results Conclusion
  • 106. After Imputation and recoding using Synbreed R package Introduction Literature review Data description pre-processing Methods Results Conclusion 0==Reference allele= Major allele 1==heterozygous 2==major allele
  • 107. The output of imputation and recoding Pre-processed dataset Introduction Literature review Data description pre-processing Methods Results Conclusion
  • 108. From bi-allelic format to 1,2,3,4 format A🡪1 C🡪2 G🡪3 T🡪4 Introduction Literature review Data description pre-processing Methods Results Conclusion
  • 109. Family ID ID P ID M ID sex aff SNPs SNP 1 Preparing data for Haploview Introduction Literature review Data description Methods Results pre-processing Conclusion
  • 110. 110 Introduction Literature review Data description Methods Results pre-processing Conclusion Minor allele frequency (MAF) quality control
  • 111. Why studying the effect of MAF (8 SNPs) in the CIT Same size LD block (12 SNPs) in SSLD In 2019, Saad et al. different MAF threshold discard significant SNPs while the size of block was the same
  • 112.
  • 113. 113 Outlines 1. Introduction 2. Literature review 3. Data description 4. Pre-processing 5. Methods 6. Results 7. Conclusion 1 3
  • 114. 114 Introduction Literature review Data description pre-processing Methods Results Conclusion Methods and workflow The proposed methods for haplotype block partitioning
  • 115. Applied haplotype partitioning methods FGT CIT SSLD 2002 2005 BigLD 2018 2002 115 Introduction Literature review Data description pre-processing Methods Results Conclusion
  • 116. start NARAC genotype dataset ch21 NARAC map file Position ch21 Reformatting Dataset Imputation using ImputR MAF=0.01 Biomarker check MAF=0.02 MAF=0.05 MAF=0.1 MAF=0.001 Haploview R (BigLD) Comparison and calculations Haploview R (BigLD) Haploview R (BigLD) Haploview R (BigLD) Haploview R (BigLD) Flowchart and system description
  • 117. NARAC 22 chromosomes input files NARAC genomic data (2,062 individuals) Perl Ch1 to ch22 data and map file Haploview NARAC map file (545,080 SNPs) R Haplotype blocks for 22 chromosome using 4 methods using 5 MAF thresholds R Pre-processing Imputation and recoding Reformatting Reformatting for Haploview MAF 0.001 MAF 0.01 MAF 0.05 MAF 0.02 MAF 0.1 FGT CIT SSLD BigLD Reformatting for BigLD Chromosomes separated map file Chromosomes separated data Haplotype block partitioning 117 Flowchart and system description
  • 118. Ch1 to ch22 data and map file Haploview Haplotype blocks for 22 chromosome using 4 methods using 5 MAF thresholds R Pre- processing Imputation and recoding Reformatting Reformatting for Haploview MAF 0.001 MAF 0.01 MAF 0.05 MAF 0.02 MAF 0.1 FGT CIT SSLD BigLD Reformatting for BigLD Haplotype block partitioning 118
  • 120. Introduction Literature review Data description pre-processing Methods Results Conclusion The measured evaluated parameters
  • 121. NARAC 22 chromosomes input files NARAC genomic data (2,062 individuals) Perl Ch1 to ch22 data and map file Haploview NARAC map file (545,080 SNPs) R Haplotype blocks R Pre-processing Imputation and recoding Reformatting Reformatting for Haploview MAF 0.001 MAF 0.01 MAF 0.05 MAF 0.02 MAF 0.1 FGT CIT SSLD BigLD Reformatting for BigLD Chromosomes separated map file Chromosomes separated data Haplotype block partitioning 121 Flowchart and system description Introduction Literature review Data description pre-processing Methods Results Conclusion
  • 122. Proposed Method Based on Interval Graph Modeling (BigLD)
  • 123. 123
  • 124. Heatmap for the haplotype blocks detected by interval graph modeling of clusters for a portion of chromosome 21 from 9,993,822 bp to 14,137,685 bp. 12 4
  • 125. • Confidence interval test (CIT) • Four-gamete test (FGT) • Solid spine of linkage disequilibrium (SSLD) Haploview
  • 126. Solid spine of linkage disequilibrium (SSLD) 126
  • 127. Introduction Literature review Data description pre-processing Methods Results Conclusion Implementation on Haploview
  • 128. 128 BLOCK 1. MARKERS: 4 5 6 132 (0.473) |0.277 0.124 0.074| 311 (0.467) |0.262 0.114 0.092| 112 (0.036) |0.011 0.017 0.008| Multiallelic D prime: 0.054 BLOCK 2. MARKERS: 22 23 33 (0.564) |0.365 0.122 0.076| 13 (0.259) |0.103 0.052 0.104| 31 (0.175) |0.083 0.075 0.017| Multiallelic D prime: 0.238 BLOCK 3. MARKERS: 26 27 33 (0.551) |0.242 0.174 0.136| 13 (0.252) |0.124 0.078 0.049| 11 (0.197) |0.081 0.077 0.038| Multiallelic D prime: 0.068 BLOCK 4. MARKERS: 34 35 33 (0.446) |0.313 0.040 0.093| 31 (0.328) |0.209 0.064 0.055| 11 (0.223) |0.120 0.096 0.006| Multiallelic D prime: 0.189 BLOCK 5. MARKERS: 39 40 31 (0.644) |0.322 0.323 0.000| 33 (0.201) |0.015 0.003 0.183| 13 (0.155) |0.155 0.000 0.000| Multiallelic D prime: 0.669
  • 130. Post-processing for a standardization output
  • 133. 133 Introduction Literature review Data description pre-processing Methods Results Conclusion 1) The total number of haplotype blocks FGT CIT SSLD BigLD
  • 134. 134 Introduction Literature review Data description pre-processing Methods Results Conclusion 1) The total number of haplotype blocks The smaller is the number of haplotypes blocks the greater is the reduction rate.
  • 135. 135 Introduction Literature review Data description pre-processing Methods Results Conclusion The MAF and total number of haplotype blocks
  • 136. 136 Introduction Literature review Data description pre-processing Methods Results Conclusion 1) The total number of haplotype blocks 0.1 0.05 0.02 0.01 0.001
  • 137. 137 Introduction Literature review Data description pre-processing Methods Results Conclusion 2) Total number of blocks with considering the singletons CIT FGT BigLD SSLD
  • 138. 138 Introduction Literature review Data description pre-processing Methods Results Conclusion 2) Total number of blocks with considering the singletons
  • 139. 139 Introduction Literature review Data description pre-processing Methods Results Conclusion 3) Total number of SNPs in all blocks SSLD BigLD FGT CIT
  • 140. 140 Introduction Literature review Data description pre-processing Methods Results Conclusion 3) Total number of SNPs in all blocks
  • 141. 141 Introduction Literature review Data description pre-processing Methods Results Conclusion 4) The total length of all blocks (bp) SSLD
  • 142. 142 Introduction Literature review Data description pre-processing Methods Results Conclusion 4) The total length of all blocks (bp) 0.1 0.05 0.02 0.01 0.001
  • 143. 143 Introduction Literature review Data description pre-processing Methods Results Conclusion 5) Mean number of SNPs in blocks The BigLD and SSLD has higher mean number of SNPs in blocks than FIG and CIT
  • 144. 144 Introduction Literature review Data description pre-processing Methods Results Conclusion 5) Mean number of SNPs in blocks • MAF does not affect the mean number of SNPs in blocks so much. • The highest mean number of SNPs in blocks is in chromosome 6 using SSLD with MAF=0.1 which equals to 6.637.
  • 145. 145 Introduction Literature review Data description pre-processing Methods Results Conclusion 5) Mean number of SNPs in blocks The BigLD has almost the same mean number of SNPs in blocks in the range from 0.001 to 0.05 and a higher mean number of SNPs in blocks at MAF=0.1. The SSLD’s mean number of SNPs in blocks increases with the MAF threshold increases
  • 146. 146 Introduction Literature review Data description pre-processing Methods Results Conclusion 6) The mean block length in base pair SSLD BigLD CIT FG
  • 147. 147 Introduction Literature review Data description pre-processing Methods Results Conclusion 6) The mean block length in base pair 0.1 0.05 0.02 0.01 0.001
  • 148. 148 Introduction Literature review Data description pre-processing Methods Results Conclusion 7) The mean r2 within blocks The correlation mean r2 within the blocks is higher in BigLD in general BigLD CIT FGT SSLD
  • 149. 149 Introduction Literature review Data description pre-processing Methods Results Conclusion 7) The mean r2 within blocks 0.1 0.05 0.02 0.01 0.001 Contrastingly, in the BigLD method, the mean r2 within a block decrease with the MAF threshold increases
  • 150. 150 Introduction Literature review Data description pre-processing Methods Results Conclusion 8) The mean r2 between consecutive blocks (without considering the singleton blocks) FGT CIT BigLD SSLD 0.1 0.05 0.02 0.01 0.001
  • 151. 151 Introduction Literature review Data description pre-processing Methods Results Conclusion 8) The mean r2 between consecutive blocks (without considering the singleton blocks)
  • 152. 152 Introduction Literature review Data description pre-processing Methods Results Conclusion 8) The mean r2 between consecutive blocks (without considering the singleton blocks) FGT CIT BigLD SSLD
  • 153. 153 Introduction Literature review Data description pre-processing Methods Results Conclusion 9) The mean r2 between consecutive all blocks (with singleton) FGT CIT SSLD BigLD
  • 154. 154 Introduction Literature review Data description pre-processing Methods Results Conclusion 9) The mean r2 between consecutive all blocks (with singleton)
  • 155. 155 Introduction Literature review Data description pre-processing Methods Results Conclusion 10) The intersection percentage
  • 156. Methods Matching percentage FGT, CIT and, SSLD 67% FGT, CIT, SSLD and, Big-LD 57.45% FGT and Big-LD 78.6% CIT and Big-LD 76.7% SSLD and Big-LD 71.92% The results of agreement in percentage of haplotype blocks produced by compared methods 15 6
  • 157. Index bp FGT CIT SSLD Big_LD 1 9993822 2 13562271 ✓ ✓ 3 13609442 ✓ 1 4 13690214 ✓ ✓ ✓ ✓ 5 13693682 ✓ ✓ ✓ ✓ 6 13695103 ✓ ✓ ✓ ✓ 7 13707489 ✓ 8 13729007 9 13769165 ✓ 10 13823791 ✓ ✓ 11 13823972 ✓ ✓ 12 13865210 ✓ 13 13866986 ✓ ✓ ✓ 14 13879844 ✓ ✓ 15 13895773 ✓ 16 13950406 17 14027356 ✓ 18 14059449 ✓ 19 14070504 20 14087640 ✓ ✓ 21 14088675 ✓ ✓ 22 14092050 ✓ ✓ ✓ ✓ 23 14095717 ✓ ✓ ✓ 24 14121682 25 14128555 ✓ 26 14136579 ✓ ✓ ✓ ✓ 27 14137685 ✓ ✓ ✓ ✓ 28 14191954 29 14197852 30 14258290 ✓ ✓ 31 14261969 ✓ ✓ 32 14292734 33 14297378 34 14322489 ✓ ✓ ✓ 35 14334270 ✓ ✓ ✓ 36 14334702 ✓ ✓ ✓ 37 14367339 Intersection among the methods
  • 158. Plot of a sample of chromosome 21 haplotype blocks produced by FGT, CIT, SSLD, and Big-LD. 15 8
  • 159.
  • 160. Compared parameters Big-LD FGT CIT SSLD Max. No. of SNPs in each block 26 17 20 27 Total No. of Blocks 1,182 1,562 1,464 1,378 Max. block size (in bp) 190,708 140,491 178,064 218,644 Min. block size (in bp) 34 4 2 12 Percentage of uncovered SNPs 14.5% 12.8% 22.1% 4.9% Median No. of SNPs within each block 4 4 3 4 Median block size (in bp) 9,830 7,551 6,783 10,870 Total block size (in bp) 23932662 23452817 23696256 23696256 The comparison between Big-LD, FGT, CIT, and SSLD haplotype block partitioning methods in chromosome 21 16 0
  • 161. Max. No. SNPs in each block 161
  • 162. Init. SNPs No. SNPs No. blocks Max Block size (bp) Min Block size (bp) Max Block size (SNPs) Min Block size (SNPs) Median block size (bp) Median No. SNPs within each block % uncovered SNPs Ch 1 40929 39000 6586 192010 10 41 2 13040 4 12.54103 Ch 2 44090 42033 6586 192010 10 41 2 13040 4 12.57583 Ch 3 36690 35078 5502 214260 13 62 2 12425 4 13.22766 Ch 4 32628 31123 5018 222084 14 37 2 13234 4 13.33098 Ch 5 33612 32220 4992 314333 13 49 2 12636.5 4 13.48852 Ch 6 35574 34140 5020 216439 11 55 2 12732 4 11.90393 Ch 7 29244 28120 4315 260965 7 44 2 12220 4 13.58464 Ch 8 30990 28120 4315 260965 7 44 2 12220 4 13.58464 Ch 9 26128 25095 3741 190762 17 44 2 9934 4 12.38892 Ch 10 28331 27070 3968 206622 8 57 2 11668.5 4 12.60805 Ch 11 26477 25333 3901 214818 11 61 2 11601 4 12.91596 Ch 12 26365 25229 3879 190743 15 37 2 11500 4 13.68267 Ch 13 20242 19380 2842 191785 12 39 2 12827.5 4 12.52322 Ch 14 17951 17243 2648 201362 22 32 2 11740 4 13.50693 Ch 15 16166 15470 2648 201362 22 32 2 11740 4 13.23206 Ch 16 16460 15780 2442 237308 10 37 2 9532.5 4 14.02408 Ch 17 14027 13538 2273 247297 10 30 2 10013 4 16.49431 Ch 18 16450 15708 2406 250370 12 61 2 10374 4 13.38172 Ch 19 9236 8973 1596 134312 15 44 2 9875 3 19.20205 Ch 20 13843 13310 2056 144803 11 36 2 10054 4 13.78663 Ch 21 8051 7786 1185 190709 34 32 2 9587 4 13.52427 Ch 22 8205 7887 1240 176594 7 44 2 7769 4 14.42881 Total 531689 507636 79159 4651913 291 959 44 249763 87 299.9369
  • 164. 164 Introduction Literature review Data description pre-processing Methods Results Conclusion 164 • The alleles distribution and description. • The percentage of SNP appearance in physical location in chromosomes affect by SNP’s MAF. • The genotype imputation and preprocessing are crucial steps in HBP and we produce a sequence of preprocessing that facilitates several any genetic analysis. Conclusion
  • 165. 165 Introduction Literature review Data description pre-processing Methods Results Conclusion 165 • HBP reduce the biomarker to about 13% • Big-LD method provided robust blocks partitioning in terms of the block size and genomic coverage. Conclusion
  • 166. 166 • There is a 70% intersection agreement among most HBP methods, Big-LD matched more with FGT. • FGT produce modest results in term of correlation and in term of biomarker reduction. • BigLD produced large haplotype blocks and show high r2 between blocks and the lowest r2 between blocks considering the singleton blocks. • In term of computation, BigLD takes less than half computational time of Haploview methods. 166 Introduction Literature review Data description pre-processing Methods Results Conclusion 166 Conclusion
  • 167. 167 • MAF quality control has a high effect of haplotype block partitioning • We recommend taking the MAF in consideration when applying a haplotype block partition. However, it is a tradeoff, higher MAF produces a higher correlation within the blocks but its trunk a portion of data that may could be significant. • In term of correlation, we recommend using high MAF while using Haploview methods, low or moderate MAF in BigLD method. 167 Introduction Literature review Data description pre-processing Methods Results Conclusion 167 Conclusion
  • 168. 168 • We could answer the question related to MAF 🡪 the number of blocks not necessarily affect the number of SNPs within blocks. • The SNPs within blocks is the highest in SSLD at the same MAF due to its size and is lowest in CIT • At the same block size, the SNPs within blocks is decrease with MAF increase. 168 Introduction Literature review Data description pre-processing Methods Results Conclusion 168 Conclusion
  • 170. 170