Genome-wide Association Study (GWAS) Analysis Guide in TASSEL Software (GUI).pdf

Genome-wide Association Analysis Guide
Study (GWAS) in TASSEL Software
(GUIs)
REZA DYSTA SATRIA
Translated from Indonesian to English - www.onlinedoctranslator.com

SNP Quality Control
SNP Quality Control (QC) is a process of evaluating and cleaning data in
genetic analysis related to Single Nucleotide Polymorphisms (SNP). SNP is a genetic
variation that occurs when one nucleotide in DNA is replaced by another nucleotide
at a certain point in the genome. In this case the use of HapMap in Tassel is very
helpful in the SNP quality inspection process.
HapMap is used as a reference to compare and analyze genetic variation in the
sample population data being studied. Tassel users can utilize HapMap information to
assess population relationships, genomic structure, and patterns of associations
between genetic polymorphisms and phenotypic characteristics in their samples.
To understand more, we want to understand data analysis using the HapMap
method with the hmp data we have. For example, I will practice it with the data
"mdp_genotype.hmp.txt".

Select the "HapMap" Format and also check the "Sort Positions" column. Then click "OK"
The output of the data that was opened earlier produces the output as below:

To obtain the results of genetic data analysis which provides information
about genotypes at certain loci in a genetic sample. We can find this information in
the "Data" bar then click "Geno Summary". The details we can see include:
1. Search for mdp_genotype_OverallSummary

From this information we get information:
A. Number of Taxa : The total number of taxa
B. Number of Sites: The number of locations that are used as research objects
C. Sites X Tava: Relationship between genetic location and phenotypic characteristics
D. Number Not Missing: Completeness of data
E. Proportion Not Missng: Percentage of data completeness
F. Number Missing : Lost data
G. Proportion Missing: Percentage of data loss
H. Number of Gametes : Number of Gametes
İ. Gametes Not Missing : Data gametes that are not lost
J. Proportion gametes not Missing : The percentage of gametes that are not missing
K. Gametes Missing: The number of missing gametes
L. Proportion of Gamet Missing: Percentage of gametes missing
M. Number Heterozygous: Number heterozygous
N. Heterozygous Proportion: The percentage of the number of heterozygotes
O. Average Minor Allele Frequency: the average frequency of minor alleles at a particular
locus (site) in the genetic sample.
2. Study mdp_genotype_SiteSummary
This chart is one of the charts that must be skipped to analyze GWAS.
Select “chart” in the “Result” chart. After that we can get a histogram of the data
distribution of certain characteristics in the genetic dataset.
The histograms that can be displayed include:

In order to get more valid analysis results and also improve the quality and
consistency of data analysis, we need to replace missing or empty values in the
dataset with the specified values. For that, we use this method. Click
"mdp_genotype" then select the "Impute" bar. After that, select the "LD KNNI
Imputation" section. After doing the things that have been mentioned a pop-up
window will appear as shown below:
With the default settings that are already available we can replace the missing
values. Click "OK" to run this method. After the process is complete we will get the
updated dataset
With this dataset, we can get higher quality data analysis results. To
increase the validity of the updated dataset that we have processed, we need to
filter this dataset with "Filter Genotype Table Sites". Set the filtering with the
settings formatted below:

After this filtering, we get a new dataset with the name
"mdp_genotype_KNNimp_Filtered_QD". Just like the previous method, we implement
this dataset into "Geno Summary". To clearly see the dataset that we have filtered
and replaced the empty values we can see it on the histogram chart with the
method we did before by selecting
"mdp_genotype_KNNimp_Filtered_QD_SiteSummary" in the "Chart" option in the
"Result" bar. Some of the results we can see include:

With these results, we can use this dataset for further methods. Save it to the
folder with the "Hapmap" format you want to be able to use this dataset further.
GLM analysis
GLM (Generalized Linear Models) analysis is a statistical method used to model
the relationship between response variables and predictor variables in various
situations. In this process we need a file that was created in the "SNP Quality Control"
method with the "mdp_traits" file.

Analysis of "Filtered_QCount" data using the "PCA" method in the Analysis bar
"Relatedness". After that, click OK according to the pop-up settings that exist.
After doing the initial work, we select the files we want to use. For example in
this work, we will use the initial “Filtered_QCount” file, the new “Filtered_QCount”,
with the “mdp_traits” file.
Click the "Data" bar then select "Intersect Join" as shown below:
After this work, we will get a new file. In this work, we will re-analyze the data
"PC_Filtered_QCount + Filtered_QCount + mdp_traits" as shown in the image
below.

GLM analysis using the file we got earlier by applying it to the "Analyse" bar and
selecting the "Association" option by clicking "GLM".
Set the settings as shown below

Create a new folder that is used to store this data. For example, create a "GLM
stats" folder that is used to store the results of this data. Then click "OK".
After that, we will get 2 (two) GLM dataset output results. We can see the
results of the dataset by using the "Manhattan Plot" in the "Results" bar. This
process is listed below.

MLM ( PCA + Kinship )
MLM (Mixed Linear Model) with PCA (Principal Component Analysis) and
Kinship is a statistical approach used in genetic analysis to examine the relationship
between genotype (genetic data) and phenotype (observed characteristics of an
organism). This method is used to address the problems of population structure and
chromosomal effects in genomic association analysis. Open the file that has been
filtered and replaced with the "HapMap" format and also the "mdp_traits" file.
Analyze the "Filtered_QCount" file with PCA analyzes in the "Analyses" bar.
After that, analyze the data using the "Kinship" method as shown in the image
below.

After that process, new data will be generated called "Centered_IBS_Filtered_QCount".
The next step we select 3 files as shown in this image.
After selecting these three files, "Intersect join" the three files. Then select
both files as shown below to be analyzed in "MLM" in the "Analyse" chart.

After the "MLM" process is complete, select the file
"MLM_statistics_for_Filtered_QCount + mdp_traits + PC_Filtered_Qcount" then look at the
graph using the "Manhattan Plot" in the "Result" chart.
PLOT OF GWAS RESULTS IN R STUDIO
Open Rstudio then create a new script with "Set Working Directory" in the
"Session" bar. Select the folder you want to save in that folder. For example in this
project, I select “GLM stats”.
- Make sure you have installed the qqman and dlypar packages on Rstdio. If you
haven't installed the package in the "Tool" bar.
-Code inside the script
library(qqman)
library(dplyr)
# import TASSEL results
#notes

TASSEL_MLM_Out <- read.table("Tasel out2.txt", header = T, sep = "t")
# Number of traits
head(unique(TASSEL_MLM_Out$Trait))
# note: for each plot trait name must be specified
# first trait as example (ie, EarHT)
Trait1 <- TASSEL_MLM_Out %>% filter(.$Trait == "EarHT")
# Bonferroni correction threshold
name <- nrow(Trait1)
(GWAS_Bonn_corr_threshold <- -log10(0.05 / nmrk))
# Manhattan plot
(Mann_plot <- manhattan(
TASSEL_MLM_Out,
chr = "chr",
bp = "Post",
snp = "Markers",
p = "p",
col = c("red", "blue"),
annotateTop = T,
genomewideline = GWAS_Bonn_corr_threshold,
suggestiveline = F
)
)
#QQ plots
QQ_plot <- qq(TASSEL_MLM_Out$p)
# Manhattan and QQ plots arranged in 1 rows and 2 columns
old_par <- par()
par(mfrow=c(1,2))
(Mann_plot <- manhattan(
TASSEL_MLM_Out,
chr = "chr",
bp = "Post",
snp = "Markers",
p = "p",
col = c("red", "blue"),
annotateTop = T,
genomewideline = GWAS_Bonn_corr_threshold,

suggestiveline = F,
main = "EarHT" # trait name
)
)
(QQ_plot <- qq(TASSEL_MLM_Out$p, main = "EarHT" ))
-Results of analysis

Genome-wide Association Study (GWAS) Analysis Guide in TASSEL Software (GUI).pdf

Recommended

Recommended

More Related Content

Similar to Genome-wide Association Study (GWAS) Analysis Guide in TASSEL Software (GUI).pdf

Similar to Genome-wide Association Study (GWAS) Analysis Guide in TASSEL Software (GUI).pdf (20)

More from RezaDystaSatria

More from RezaDystaSatria (6)

Recently uploaded

Recently uploaded (20)

Genome-wide Association Study (GWAS) Analysis Guide in TASSEL Software (GUI).pdf