ABSTRACT
Advances in next-generation sequencing (NGS) methods have allowed for the cost-effective
sequencing of huge portions of the genome, allowing NGS to be used in clinical practise.
Diseases related to the kidney contributes significantly to the global disease burden and is
associated with an elevated risk of morbidity and death. A wide range of primary renal diseases
can result in chronic kidney disease. One in every five nephology disease patients does not
have a primary disease diagnosis. Moreover, new research reveal that the clinical diagnosis
may be inaccurate in a considerable proportion of cases. The absence of a diagnosis, as well as
an inaccurate diagnosis, might have therapeutic consequences. Genetic testing may improve
diagnosis accuracy in individuals with nephrotic disease, particularly in people with unknown
aetiology.
The study mainly focuses on the genetic cause of disease in individuals with nephrotic
diseases using targeted NGS panels.
Based on ACMG-AMP standards, the analysis discovered uncommon pathogenic variations
that were compatible with the clinical diagnosis of 65% of the samples included in this study.
Similarly, 12% of patients had rare pathogenic/likely pathogenic mutations in nephrotic
disease-related genes.
1. INTRODUCTION
Genetics plays a key role, to a lesser or greater extent, in all diseases. Variations in the DNA
and their impact in functions of their protein products by itself or together with epigenetic
modifications leads to disease processes. WHO recommends a prevalence of less than 10
cases/10,000 persons for designating rare illnesses. The typical prevalence criteria used to
classify rare diseases vary among various jurisdictions and range from 1 to 6 cases/10,000
people. The study concluded that efforts to harmonise the disparate definitions should
concentrate on standardising quantitative criteria like prevalence thresholds and minimise
qualitative descriptors like illness severity. The constitution of genetics plays a key role, in all
disease processes, including both rare and common disorders, because of the variations. These
differences, either together or individually, might proffer an individual more liable to cause a
single disorder (for example, a type of cancer), but at same time could manifest the same
individual less liable to come about an unrelated disorder (for example, diabetes). The
instructions for the generation of a human being are encoded in the substance called DNA
present in the cells: the genome of the human, which contains approximately 3 billion bp.
Scientists all over the world joined in the ‘Human Genome Project’ to initiate the first DNA
sequences of whole human genome which contained additions and corrections being made in
the subsequent years (reference). The major mass of the DNA is present inside the nucleus in
the form of chromosomes. A small amount of DNA is also present in mitochondria known as
mitochondrial DNA (mt DNA) Most of the individuals have 23 pairs of chromosomes and the
is present in form of 2 copies that is one copy of chromosome is from our mother and the other
copy is from our father. The nuclear genome of the Human contains roughly 20,000 genes
which are protein coding, and mainly consists of both protein-coding sequences known as
Exons and non-coding sequences are known as Introns. 22000 genes of the Human genome
encode RNA molecules only; out of which a small part of RNAs (rRNA, tRNA) form the parts
of the translation machinery also there are many RNAs which perform a variety of roles within
the cell or in expression of other genes via regulation. In fact, it is now believed that as much
as 80% of our genome has biological activity that may influence structure and function. (1)
The human genome also contains over 14000 ‘pseudogenes’; these are imperfect copies of
protein-coding genes that have lost the ability to code for protein. (1) Some of the pseudogenes
have their application in gene therapy by gene editing approaches furthermore can be used to
generate functional genes. One of the important findings is the gene distribution between the
chromosomes is unequal: chromosome 19 is particularly gene-dense, while the autosomes for
which trisomy is viable (13, 18, 21) are relatively gene-poor.
It was recognised from the outset of the Human Genome Project that there was a significant
degree of DNA sequence variation amongst healthy individuals, and so there is no such thing
as a 'normal' human DNA sequence. However, if to explain changes to the DNA sequence, it
must be done in relation to some baseline, which is the human reference genome sequence.
2. REVIEW OF LITERATURE
With over 1.3 billion people, India is the world's second most populous country, accounting
for 17% of the global population. With over 4500 anthropologically distinct populations, the
country is extremely diverse. (18) These populations have been divided into castes, tribes, and
religious groups, which differ in terms of cultural practises, geographic areas, climatic
conditions, physical characteristics, marriage practises, linguistics, and genetic architecture.
(18-19)
Despite its rich genetic variation, India was under in global genome research. (26)
Furthermore, the Indian population is divided into several huge endogamous clusters and is
characterised by consanguineous marriages. Consequently, recessive alleles are common with
in Indian cohort. In the absence of significant whole-genome studies from India, those certain
subpopulation-specific genetic variants are not sufficiently captured and categorised in the
worldwide scientific literature. (20)
The emergence of next generation sequencing and its expanding availability over the last
decade has revolutionised our understanding of the genetic architectural history of various
communities around the globe. (21) Various worldwide population datasets, including the 1000
genome sequencing, (22) ExAC, (23) ESP6500 (https://evs.gs.washington.edu/EVS/), and
gnomAD (24) have produced guide and patient genome sets of data from communities across
the regions as part of this effort. Even though these data - sets include Indian genomes, the
number is negligible compared to the genetic variation and heterogeneous nature of the Indian
population. (25)
2.1 Genetic Alterations
Genes code for proteins that act as pigments, enzymes, hormones, antibodies, and regulate
other proteins, as well as all metabolic pathways. Replication (DNA makes DNA), transcription
(DNA makes RNA), RNA processing (capping, splicing, tailing, and RNA translocation to
cytoplasm), translation (RNA makes protein), and protein processing, folding, transport, and
incorporation are all steps in the transmission of genetic information Figure 1. If the DNA
sequence is mutated and not repaired by the cell, subsequent replications replicate the mutation.
Mutations can be caused by a variety of mechanisms, ranging from a single nucleotide change
to the loss, duplication, or rearrangement of chromosomes. Single-gene, chromosomal, and
multifactorial disorders are the three major types of genetic diseases. Multifactorial disorders,
such as congenital heart disease, most types of cleft lip/palate, club foot, and neural tube
defects, are caused by a combination of genes and environmental factors. (2)
Figure 1: Steps in the transmission of genetic information
2.2 Variation
A variation occurs when the sequence of DNA of an organism change. Variation can occur
because of errors in Replication of DNA throughout cell division, mutagen exposure, or viral
infection.
Variants can be classified depending on cell type: Table 1
GERMLINE VARIATION SOMATIC VARIATION
Any variation occurring in germline lineage
and is heritable is known as germline
variation.
Any variation occurring in the somatic cells
and would result in mosaic individual is
known as somatic variation.
Also known as heritable variation Also known as acquired variation
Occurs in different stages of gametogenesis. Occurs in body cells like skin liver etc.
These variation can be passed on from one
generation to another.
These variation do not pass to future
generations.
Through natural selections, these variation
have an effect on evolution.
These variation do not have any affect on
evolution.
Variants can be classified depending on type of alteration:
Single nucleotide variant (SNV)
A single nucleotide is substituted for another. A SNV may be uncommon in one population but
can be more common in another. SNVs are often referred to as single nucleotide
polymorphisms (SNPs), but the terms are not exchangeable. The variant must be present in
approximately 1% of the population or above to meet criteria as an SNP.
The coding gene area exclusively contains silent (non-synonymous) SNPs that result in
phenotypic alterations, as seen in the Figure 2 below.
Figure 2: Types of variations
If an SNV is found in a protein-coding region, it may cause either a: Silent Variation
/missense/nonsense/indel.
2.2.1 Silent Variation
When a nucleotide is swapped out for another nucleotide in a way that still results in the same
amino acid being generated, this is known as a silent mutation. Silent mutation produces a new
codon for the same native amino acid.
For instance, the codes GAG and GAA both represent glutamic acid. The same amino acid
form results from changing the G at this specific codon to an A. Silent mutation is the term
used to describe this kind of mutation.
When a nucleotide is swapped out for another nucleotide in a way that still results in the same
amino acid being generated, this is known as a silent mutation. Silent mutation produces a new
codon for the same native amino acid.
For instance, the codes GAG and GAA both represent glutamic acid. The same amino acid
form results from changing the G at this specific codon to an A. Silent mutation is the term
used to describe this kind of mutation.
2.2.2 Missense Variation
Nucleotide substitution causes a different codon to be generated in the missense mutation, but
the new codon is not a stop codon. A missense mutation causes a protein to lose its function or
change completely.
To define clearly, when the idea of protein creation leads to functionally different amino acids,
then a different protein is created overall.
2.2.3 Nonsense Variation
When a stop codon is replaced for an amino acid coding codon, nonsense mutation occurs.
A stop codon, a particular kind of triplet codon that marks the completion of protein synthesis,
is made up of the amino acids UAA, UAG, and UGA. The start codon initiates the synthesis
of protein in an equivalent manner.
The premature end of amino acid synthesis or loss of a functional protein results from the stop
codon being inserted at an unexpected location in an amino acid sequence. The term "nonsense
mutation" refers to this kind of mutation.
An example of silent, missense and nonsense mutations are described in the Figure 3.
Figure 3: Illustration of silent, nonsense and missense mutations
2.2.4 Indel
A combination of insertion and deletion mutation. It describes a length discrepancy between
two “alleles” in which it is unknown whether a sequence insertion or deletion originally
generated the disparity. It is also a Frame - shift Mutation if the insertion or deletion has a
nucleotide count that is not divisible by three and is present within a protein-coding area.
As several indels were found in various disease types by current genome sequencing research,
indels can be harmful and might increase disease susceptibility.
Structural Variation
The functional spectra of structural variation have recently been expanded to cover events >50
bp in length. Formerly, structural variation was described as insertions, deletions, duplication,
translocations, and inversions (as shown in Figure 4) larger than 1 kb in length. The accurate
description of the copy, content, as well as structure of genomic variations should be the
primary goal of structural variant (SV) investigations. (27)
Figure 4: Structural variation was described as insertions, deletions, inversions, translocation,
ring chromosome and fragile site.
Human genome comparisons reveal that structural variation, particularly copy number
variation, results in far more base pair changes than do point mutations. (27)
Copy Number Variation (CNV)
2.4 COPY NUMBER VARIATION (CNV)
The term "copy number variation" denotes to a mid-scale genetic change, best characterized as
segments longer than 1,000 base pairs but often only about 5 MB, the cytogenetic level of
specificity. CNVs contain both genetic material losses (deletion) and extra copies of the
sequence (duplications). CNVs, along with inversions and translocations, are typically
categorised as types of genome structural variation because they alter the structure of the
genome. Recently, researchers have realised how much of human diversity can be attributed to
CNVs.
In general, depending on length of the damaged sequence, scientists classify CNVs into one of
two basic groups. Copy number polymorphisms (CNPs), which have a prevalence of more than
1% overall in the general population, are included in the first group. Most CNPs are under 10
kb in size in length, and they are frequently enriched for genes which produce proteins vital to
immunity and drug detoxification. A portion of these CNPs exhibit significant copy number
variation. (30)
The size of the second class of CNVs, which ranges from a few hundred thousand base pairs
to over a million base pairs, comprises relatively uncommon variants that are substantially
longer than CNPs. These variants, which are often referred to as microdeletions and
microduplications, typically have a more recent ancestry within a family. (29) These CNVs
might have developed during the development of the egg or sperm that gave rise to a specific
person, or they might have been passed down within a family for only a brief time. A
disproportionate number of patients with mental retardation, developmental delay,
schizophrenia, and autism have been shown to have these significant and unusual structural
variations. (31)
Numerical Variations
Chromosome defects include many types of numerical anomalies. These kinds of birth
abnormalities happen when the body's cells contain a different number of chromosomes than
is typically observed. There may therefore be 45 or 47 chromosomes in each cell of the body
rather than the customary 46. Chromosome imbalances might result in health issues or birth
abnormalities.
Numerical aberrations include Edwards syndrome and Down syndrome (trisomy 21). Patau
syndrome (trisomy 13), Klinefelter syndrome (XXY syndrome), Turner syndrome (monosomy
X), and trisomy X as shown in Figure 5 with maternal age, chromosomal errors in oocytes
increase dramatically.
Figure 5: Structural variation was described as insertions, deletions, inversions, translocation,
ring chromosome and fragile site.
Mendelian Inheritance Pattern
In the mid-nineteenth century, Mendel discovered a collection of heredity principles; these
principles are used to determine characteristic patterns of inheritance. Single gene disorders are
grouped as autosomal dominant (AD), autosomal recessive (AR), X-linked recessive (XR),
X-linked dominant, and Y-linked (holandric) as shown in Figure 6.
For example, a mutation in the FGFR3 gene can result in achondroplasia (4,5) similarly,
damage to one allele of a pair of genes causes a deficiency in autosomal dominant disorders.
Figure 6: Types of pedigree based on inheritance patterns
A parent who has an autosomal dominant disorder has a 50% chance of passing the disease on
to her or his child. (6) Some diseases have a wide range of signs and symptoms in different
people (variable expressivity), for example, some people with Marfan syndrome (FBN1
mutation) have only mild symptoms (such as being tall and thin with long, slender fingers),
while others have life-threatening complications involving the heart and blood vessels. (6)
An autosomal recessive disorder, in which the affected person inherits one abnormal allele
from a heterozygous parent, is one in which both alleles of a gene must be altered (loss of
function) for the defect to manifest. In this type of disorder, heterozygous parents have a 25%
chance of having an affected offspring. When it comes to common autosomal recessive
disorders or traits.
The mutated gene in an X-linked disorder is found on the X chromosome. The disease can be
caused by a recessive mutation. To cause the condition, the gene on chromosome X must be
mutated; thus, an X-linked recessive disorder is passed by females but tends to affect males.
Some genetic conditions have non-Mendelian or non-traditional inheritance patterns; for e.g.,
Figure 6: Types of pedigree based on inheritance patterns
mitochondrial diseases, trinucleotide expansion disorders, and genomic imprinting
malformations have non-Mendelian or non-traditional inheritance patterns. (6,7)
Technique To Detect Mutations.
The term "next-generation sequencing" refers to newer technologies that have recently been
developed for DNA sequencing on a large scale (NGS). NGS technologies enable high speed
and throughput, as well as both qualitative and quantitative sequence data, allowing for the
quick completion of genome sequencing projects. (42,43) Numerous sequencing methods are
available with NGS systems, such as whole-genome sequencing (WGS), whole exome
sequencing (WES), transcriptome sequencing, methylome sequencing, etc. (11-12)
The genome's coding sequences make up around 30Mb, or 1%, of the total size. In contrast to
the fact that 85% of disease-causing mutations in Mendelian illnesses are found in coding
regions, more than 95% of exons are covered by WES. (10)
Therefore, sequencing of the entire coding region (exome). (10) may be able to identify the
mutations causing rare, primarily monogenic, genetic variation additionally to predisposing
variations in typical disorders and cancer.
This project included whole-genome sequencing, clinical exome sequencing,
whole mitochondrial genome, sequencing panel analysis of 50 Indian cohorts to study the
inheritance pattern of the Indian population. The project is primarily concerned with data on
variant allele frequency, allele number, allele count, and the total number of heterozygous and
homozygous individual people identified in this study. As a result, clinicians and researchers
will be better able to distinguish between pathogenic and benign variants in the context of the
Indian population when querying variations for various medical applications.
Novel DNA sequencing techniques, known as "next-generation" sequencing (NGS), provide
high speed and throughput, allowing to produce a huge volume of sequences with wide range
of applications in study and diagnostic setups.
The throughput requirement for DNA sequencing increased by an unanticipated amount with
the true objective of deciphering the human genome, spurring innovations like automated
capillary electrophoresis. Research facility digitalisation and task of parallel processing led to
the establishment of sequencing centres, which house hundreds of Genome sequencing devices
established by cohorts of personnel.
NGS could also be used to offer extensive depth study of either genomic DNA to detect genetic
variations or RNA to report gene expression analysis differences. NGS has been used
effectively to genetically characterise types of cancer (14) by identifying novel disease-
associated translocations (16) and changes in miRNA abundance. (15)
The four-chief advantage of NGS are as follows:
2.2.5 The sample sizes
NGS is significantly less expensive, faster, requires less DNA, and is much more accurate and
consistent than Sanger sequencing. So, every read of Sanger sequencing requires a substantial
number of template DNA. As a strand that process is terminated on each base is required to
build a sequence, multiple strands of template DNA are required for each base of been
sequenced (i.e., for a 100bp sequence, several greater number of copies, for a 1000bp sequence,
many hundreds of copies). A sequence can be produced from a single strand in NGS. Multiple
staggered copies are captured in both types of sequencing for contig building and sequence
verification.
2.2.6 The speed
In two diverse ways, NGS is faster than Sanger sequencing. To begin, in some types of NGS,
the chemical processes and signal detection are merged, whereas in Sanger sequencing, these
are two different things. Also, and more importantly, Sanger sequencing can only take each
read (maximum 1kb) at a moment, whilst NGS is massively simultaneously, permitting 300Gb
of DNA to be peruse in a single cycle on a microchip.
2.2.7 The cost
Because NGS requires less time, man - power, and sample preparation, the costs are
significantly lower. The first human genome sequence cost around £300 million. A complete
human genome still would cost £6 million if modern Sanger sequencing methodologies were
used, facilitated by data from the known sequence. A human genome sequence with Illumina
presently would cost just under £1,000.
2.2.8 The accuracy
Because every read is multiplied before sequencing, and because it depends on several short
overlapping reads, every portion of DNA or RNA is sequenced numerous times, repeats are
inherent in NGS. Furthermore, because it is so much faster and less expensive, it is possible to
perform so many repeats than it is with Sanger sequencing. More repeats mean more coverage,
which results in a more reliable and accurate sequence, even if individual NGS reads are less
accurate.
Sanger sequencing can produce much extremely long sequence reads. However, because NGS
is parallel, extremely long reads can be formed from several contiguous sequence data.
3. MATERIALS AND METHODS
3.1 Materials
 Varminer- variant interpretation tool
Medgenome internal software which helps in interpretation of variant based on patient
phenotype mentioned in the TRFs.
 TRFs- test request form
This form contains all the clinical information about the patient such as age, gender,
consanguinity, family history of affected sibling (if present), age of manifestation of
disease, type of test to be performed, clinical indications of the patients.
 IGV- Integrative Genomics Viewer
It is a powerful desktop programme for interactively visualising various genomic data.
IGV offers real time interaction throughout all scales of genome complexity, from
entire genome through base pairs, even for ridiculously huge data sets. (33) Bench
scientists and bioinformaticians are just two of the many people IGV is intended to be
useful for. While more professional users can benefit from the many advanced features
and options, novice users particularly value the client friendly and easy to use interface.
(34)
Figure 7: IGV
 Population databases (ExAc, 1000G, gnomAD)
Reference population databases are an essential tool in variant and gene interpretation.
Their use guides the identification of pathogenic variants amidst the sea of benign
variation present in every human genome and supports the discovery of new disease–
gene relationships. (35)
ExAc - For clinical and biomedical applications, the ExAC browser offers gene- and transcript-
centric presentations of variation. This browser offers a variant display for the reported variant
that shows population frequencies, functional annotation, and shorter read support information.
It is free and open source to use this browser. (37)
1000G - Through the 1000 Genomes website, users could investigate genotype identification,
variant identification, supportive alignments of sequence read as from 1000 Genomes, and
associated variations in dbSNP.
GnomAD - A global alliance of researchers created the GnomAD, also called as that of the
Genome Aggregation Database Consortium, to compile and unify exome and genome
sequencing information gathered from a variety of large-scale sequencing initiatives and
provide data summaries to the science establishment. (36)
 In-silico Prediction Tools (SIFT, LRT, MuationTaster2)
In silico predictive software allows assessing the effect of amino acid substitutions on the
structure or function of a protein without conducting functional studies. (38)
SIFT - Using sequence similarities and the physicochemical properties of amino acids, SIFT
determines that whether an amino acid replacement will impact protein expression. Both found
in nature nonsynonymous polymorphisms and missense variants created in the lab can be
analysed using SIFT.
LRT - Identifies a group of deleterious mutations that can distort amino acids which are highly
conserved within the sequences that are protein coding and those which are likely to be
deleterious. (39)
MuationTaster2 - Enables in evaluation of the DNA sequence variation that can cause a
deleterious effect to the DNA. (39)
PolyPhen2 – It helps in prediction of the impact of substitution of an amino acid on human
proteins’ structure and function using considerations that are physical and comparative. (40)
OMIM database – continuously updating database of human genes, genetic diseases, and traits.
This database specifically focuses on the relationship between genotype and phenotype.
UCSC genome browser - The Genome Browser (genome.ucsc.edu) of the University of
California Santa Cruz (UCSC) is a well-known Web-based application for instantly presenting
a requested region of a genome at any scale, together with several aligned annotations "tracks".
The genotype estimations, mRNA and items considered to be representative tag alignments,
simple nucleotide polymorphisms, expression and regulatory data, genetic makeup and
variation data, and correlation and various comparative genomics data are all displayed in the
annotations, which were produced by the UCSC Genome Bioinformatics Group and external
collaborators.
Ensembl - Ensembl is a system for producing and disseminating genome annotation, such as
genes, variations, regulatory, and comparative genomics, throughout the vertebrate’s
subfamily and important model species. A single unified resource can be created from
experimental and reference data from various providers using the Ensembl annotation pathway.
METHODOLOGY
1. Sample Registration
Methodology begins with the sample collection and registration. To begin with, ascertain the
patient's identity. Acceptable identifiers include the patient's name, date of birth, and hospital
number, among others.
2. Library preparation
The process of library preparation depends on the test booked by the physician.
(Whole exome sequencing, clinical exome sequencing, whole mitochondrial genome
sequencing)
3. Sequencing
All NGS platforms, however, sequence millions of small fragments of DNA in parallel. By
mapping individual reads to the human reference genome, bioinformatics analyses are used to
piece together these fragments.
Each of the human genome's three billion bases is sequenced multiple times, providing
sufficient depth to deliver accurate data and insight into unexpected DNA variation. NGS can
be used to sequence entire genomes or specific areas of interest, such as all 22,000 coding genes
(a whole exome) or a small number of individual genes.
4. Bioinformatics analysis
This step is performed by clinical bioinformaticians. They work to check the quality of the
reads, gender confirmation, CNV SNV annotations, coverage and depth of the reads, Total
Data generated, On-target percentage.
5. Interpretation and Reporting
The genome analyst is responsible for checking the sample quality metrics (QC) before
proceeding with the analysis and to diligently follow the check list and SOP for both reporting
and proofread processes.
This step involves the use of Varminer software which contains the comprehensive information
about the variants. Hence based on various parameters like MAF, in silico prediction tools,
OMIM phenotype which matches the clinical phenotype of the patients, IGV of the variant that
can predict the depth and coverage of the variant are used to determine the specific variant.
Figure 8: The flow chart represents the overall workflow.
Variant interpretation and reporting can further be categorized as follows:
1. Analysis: As mentioned earlier, before the actual analysis of the samples, the samples need
to pass through various parameters like QC check, checking the patients’ phenotype mentioned
in TRF.
Figure 9:Steps in variant interpretation and reporting
2. Sample allotment: Following bioinformatics analysis, samples are assigned to genome
analysts for data reporting and analysis. Following the generation of the report, the samples are
assigned for rechecking. Rechecked samples are sent to the clinical geneticist for review.
Proofreading will be assigned to approved reports. The primary genome analyst will revise the
proofread reports. The person in charge distributes the reports using internal software. The
genome analyst must generate, proofread, and distribute approved reports within the time frame
specified by the software.
3. Report generation: Based on pedigree, family history, or phenotype, the following are the
modes of inheritance:
If a family history is provided, such as a pedigree (autosomal dominant, recessive, or X linked),
disease specific inheritance based on diagnosis, such as cystic fibrosis, is inherited in an
autosomal recessive mode.
Minor allele frequency: Based on minor allele frequencies in population databases (1000
genomes, ExAC, internal database), variants detected in clinically relevant genes are
systematically prioritised to distinguish baseline polymorphisms from clinically significant
variants.
The threshold value is determined by the disease condition's prevalence.
For example, if a variant is more common than the prevalence of a disease, it is less likely to
cause the disease. In contrast, when a variant is rare in the general population, it is likely to be
significant.
Supporting data for predictions made using in-silico techniques: To evaluate the importance,
we use various predictive methods, including SIFT, PolyPhen2, Mutation Taster2, and LRT.
These tools evaluate the impact of single nucleotide changes on the structure and functionality
of proteins.
Three or more tools must agree to be supporting evidence.
The variant should be classified as pathogenic, likely pathogenic or Variant of uncertain
significance by following The American College of Medical Genetics and Genomics (ACMG-
AMP). It is difficult to assign pathogenicity status to patient DNA sequence variations. The
American College of Medical Genetics and Genomics (ACMG-AMP) recommendations were
created with the purpose of promoting standardisation. There is a high rate of discordance
observed, with inconsistent criteria application being a common source of discordance.
Variants of uncertain significance (VUS) are not usually disclosed to doctors, depending on
the healthcare system, precluding any detailed interpretation in the scenario of a patient already
known to have illness. To avoid a VUS classification, numerous requirements demonstrating
pathogenicity must be met as shown in Figure 7. If the variation is enriched in patients vs.
controls in case-control studies, or is detected in numerous unrelated probands, this is
convincing evidence of pathogenicity (PS4). Such unrelated probands may not exist for the
rarest alleles or may be unavailable in scientific literature. Only the moderate allele frequency
criteria PM2 ("missing from controls or at extremely low frequency") is fulfilled in these
scenarios. (41)
ClinGen proposed reducing PM2 from moderate to supporting evidence in 2020 to account for
genome-wide evidence of allele rarity.
Figure 10: ACMG-AMP criteria that might be used to a single patient suffering from a rare
disorder
The prioritised variations are examined for genotype-phenotype association to determine the
relevance of the variant based on all the mentioned criteria (s)
Before the final report is generated, there are various other proofreading that take place, which
is listed in the Figure 10.
Figure 11: Stages of report before release.
4. RESULTS
During the dissertation, I have analyzed about 55 samples and interpretated the variants
causative of various clinical phenotypes.
The samples were analysed under different NGS- Targeted Panels which include, disease
specific panel, Clinical exome sequencing, Whole exome sequencing and Whole mitochondrial
genome sequencing.
The variants were prioritized based on the genotype-phenotype correlation, age of
manifestation of diseases, consanguinity status, ACMG guidelines.
The variants which were prioritized were of different variant classes i.e., missense, nonsense,
frameshift, intronic, inframe insertions and deletions.
4.1 Gender distribution of patients
The samples analysed contained both male and female samples, out of 55 samples analysed 41
samples were of male and 15 samples were of female. Only male samples can be examined in
homozygous X-linked dominant situations; female samples can be accepted only if the
variation is disclosed; this is because homozygous autosomal dominant variants are not a
suitable variant if not reported. Segregation of samples based on the different age groups is
shown in Figure 12.
Figure 12: Chart shows gender distribution among the samples analysed
4.2 Age distribution of the sample
As shown in the figure 13, majority of the samples analysed belong to 16 years and above
category, hence it can be said that the age of manifestation of disease in majority of samples
were above 16 years of age.
Figure 13:The graph shows age and gender distribution of the samples analysed.
73%
27%
GENDER DISTRIBUTION OF SAMPLES
MALE FEMALE
12
8
10
25
0
5
10
15
20
25
30
below 5 years 5-10 years 11-15 years 16 and above
Age distribution in samples
4.3 NGS – Targeted Panel
Various targeted panel were booked for analysis of the samples based on the test suggested by
clinicians, depending on the patients’ phenotypes. Figure 12 shows the spectrum of panels
analyzed.
Compared to whole exome sequencing, 71% samples were analysed for clinical exome
sequencing as the clinical exome test looks for a smaller number of genes. A greater or broader
number of genes are covered in whole exome testing. Panel testing, on the other hand, tests for
particular genes in the specified panel which forms 13 % of total samples.
Figure 14: Distribution of the targeted panel analysed
Variant class
The significant variants belonged to several variant classes such as nonsense, missense,
frameshift, splice, and copy number variations (CNVs).
39, 71%
7, 13%
4, 7%
5, 9%
TARGETED PANEL
CES PANEL WES COMBO
Out of 55 samples analysed variant in 31 samples belonged to the missense category, which
requires evidence from the literature to be proven as pathogenic or potentially pathogenic, for
17 samples no variants causative of disease phenotypes were identified, 6 samples contained
frameshift variant and 1 nonsense variant.
Figure 15: distribution of samples based on variant class
4.4 Disease Causing Variants
Most of the variants in the samples analysed contain variants of uncertain significance
(VUS), with 24 VUS, nine disease causing variants (P) responsible for disease
manifestation, 3 likely pathogenic (LP), and no significant variants matching the disease
phenotype in the remaining 19 samples.
The metrics below show that most null variants (frameshift, nonsense) are classified as
pathogenic or likely pathogenic, whereas very few missenses are classified as pathogenic
or likely pathogenic, with only those reported in disease databases and having relevant
functional evidence classified as pathogenic or likely pathogenic.
6
1
31
17
FRAMESHIFT NONSENSE MISSENSE NONE
0
5
10
15
20
25
30
35
VARIANT CLASS
Figure 16: significance of variants
9
3
24
19
SIGNIFICANCE OF THE VARIANTS
PATHOGENIC
LIKELY PATHOGENIC
VARAINT OF UNCERATIN
SIGNIFICANCE
NONE
5. DISCUSSION AND CONCLUSION
The purpose of this study was to look at the use of next-generation sequencing (NGS) in
understanding the molecular basis of kidney disease in individuals. While genetic testing has
been shown to aid in providing more accurate diagnoses in patients with heritable kidney
disease, such as polycystic kidney disease, Alport Syndrome, and Fabry Disease, its application
to patients with nephrotic diseases, as well as its utility in identifying unsuspected kidney
disease in nephrotic patients, has been limited. (42) To examine further, a unique gene panel
of 345 kidney disease-related genes to effectively perform targeted NGS in seven samples with
kidney disease.
Approximately 65% of the nephrotic disease patients included in this study had diagnostic
variations that were compatible with their clinical diagnosis, according to the analysis. While
one-third of nephrotic disease patients were found to have pathogenic or likely pathogenic
variations in kidney disease-related genes.
PKD1, SLC4A1, NPHP1, AVPR2, NPHS2, CTNS, GRHPR and COL4A3, among the genes
reported to contain pathogenic mutations in patients, have previously been linked to nephrotic
disease.
COL4A3 is a gene that encodes a significant structural component of the glomerular basement
membrane and has been linked to heritable nephropathies. The uncommon COL4A3 variation
found in two individuals in the study. Genes like CFH, PAX2, CRB2 and various other genes
included in the study were classified as VUS as no literature evidence for the variant was found.
The study demonstrates the use of genetic testing in determining the underlying molecular
aetiology of disease in patients with varied kinds of nephrotic syndrome. In addition to
assigning genetic diagnoses that validated numerous nephrotic syndrome patients' clinical
diagnoses, was able to find rare variations in a subgroup of patients that suggested re-evaluation
of their diagnosis was necessary. For these individuals, genetic analysis may have assisted in
the detection of undiagnosed clinical signs and better guided their care management.
The study has limitations that must be addressed. To begin, the study only comprised fifty-five
samples. Despite its small sample size, it was discovered that of the samples contained rare
variants in the genes, highlighting the importance of genetic screening in identifying precise
molecular diagnoses. Second, while we were able to detect both coding variants and structural
rearrangements using targeted sequencing, we did not evaluate variants in non-coding regions
that could affect RNA expression and/or processing (e.g., those affecting transcriptional
repression, exon skipping, and intron inclusion). Because our proprietary capture technique
was limited to the coding area and exon-intron boundaries of the genes in NGS, any variants
outside of these regions, such as those in the promoter regions and other noncoding or
regulatory regions, could not be identified. Third, to improve specificity, the ACMG-AMP
criteria and guidelines was followed for variant interpretation; as a result, many of the
variations discovered in the study were designated as VUSs and were not associated with
clinical phenotypes. Because VUSs cannot be completely ruled out as benign, the sensitivity
of the investigation may have been compromised. Computational prediction techniques
combined with functional investigations may aid in better understanding the pathogenicity of
VUSs and prioritising them for future investigation.
The objective was to use NGS technique to better determine the genetic basis of kidney disease.
Although the samples in the analysis did not have pathogenic/likely pathogenic mutations in
kidney disease-related genes, a significant minority possess variants that was thought
to contribute to their disorder. The findings suggest that rare variants in genes not previously
linked to polycystic kidney disease susceptibility (e.g., ciliopathy genes) may contribute to
Nephropathy, and that genetic screening in chronic kidney disease patients can help provide a
molecular diagnosis, which could lead to improved precision diagnostics and help in the
prognosis and long-term management of their kidney disease.
6. REFERENCES
1. Smith RA, Andrews KS, Brooks D, Fedewa SA, Manassaram‐ Baptiste D, Saslow D,
Brawley OW, Wender RC. Cancer screening in the United States, 2017: a review of current
American Cancer Society guidelines and current issues in cancer screening. CA: a cancer
journal for clinicians. 2017 Mar;67(2):100-21.
2. Mahdieh N, Rabbani B. An overview of mutation detection methods in genetic disorders.
Iranian journal of pediatrics. 2013 Aug;23(4):375.
3. Usdin K. The biological effects of simple tandem repeats: lessons from the repeat
expansion diseases. Genome research. 2008 Jul 1;18(7):1011-9.
4. Richette P, Bardin T, Stheneur C. L’achondroplasie: du génotype au phénotype. Revue du
rhumatisme. 2008 May 1;75(5):405-11.
5. Su N, Sun Q, Li C, Lu X, Qi H, Chen S, Yang J, Du X, Zhao L, He Q, Jin M. Gain-of-
function mutation in FGFR3 in mice leads to decreased bone mass by affecting both
osteoblastogenesis and osteoclastogenesis. Human molecular genetics. 2010 Apr
1;19(7):1199-210
6. Mahdieh N, Rabbani B. An overview of mutation detection methods in genetic disorders.
Iranian journal of pediatrics. 2013 Aug;23(4):375.
7. Van Heyningen V, Yeyati PL. Mechanisms of non-Mendelian inheritance in genetic
disease. Human molecular genetics. 2004 Oct 1;13(suppl_2): R225-33.
8. Hunt PA, Hassold TJ. Human female meiosis: what makes a good egg go bad? Trends in
Genetics. 2008 Feb 1;24(2):86-93.
9. Soutar AK, Naoumova RP. Mechanisms of disease: genetic causes of familial
hypercholesterolemia. Nature clinical practice Cardiovascular medicine. 2007
Apr;4(4):214-25.
10. Ku CS, Cooper DN, Polychronakos C, Naidoo N, Wu M, Soong R. Exome sequencing:
dual role as a discovery and diagnostic tool. Annals of neurology. 2012 Jan;71(1):5-14.
11. Rabbani B, Mahdieh N, Hosomichi K, Nakaoka H, Inoue I. Next-generation sequencing:
impact of exome sequencing in characterizing Mendelian disorders. Journal of human
genetics. 2012 Oct;57(10):621-32.
12. Jain A, Bhoyar RC, Pandhare K, Mishra A, Sharma D, Imran M, Senthivel V, Divakar MK,
Rophina M, Jolly B, Batra A. IndiGenomes: a comprehensive resource of genetic variants
from over 1000 Indian genomes. Nucleic acids research. 2021 Jan 8;49(D1): D1225-32.
13. Schuster SC. Next-generation sequencing transforms today's biology. Nature methods.
2008 Jan;5(1):16-8.
14. Simpson AJ. Sequence-based advances in the definition of cancer-associated gene
mutations. Current Opinion in Oncology. 2009 Jan 1;21(1):47-52.
15. Chen W, Kalscheuer V, Tzschach A, Menzel C, Ullmann R, Schulz MH, Erdogan F, Li N,
Kijas Z, Arkesteijn G, Pajares IL. Mapping translocation breakpoints by next-generation
sequencing. Genome research. 2008 Jul 1;18(7):1143-9.
16. Kuchenbauer F, Morin RD, Argiropoulos B, Petriv I, Griffith M, Heuser M, Yung E, Piper
J, Delaney A, Prabhu AL, Zhao Y. In-depth characterization of the microRNA
transcriptome in a leukemia progression model. Genome research. 2008 Nov
1;18(11):1787-97.
17. Mastana SS. Unity in diversity: an overview of the genomic anthropology of India. Annals
of human biology. 2014 Jul 1;41(4):287-99.
18. Chaubey G, Metspalu M, Kivisild T, Villems R. Peopling of South Asia: investigating the
caste–tribe continuum in India. Bioessays. 2007 Jan;29(1):91-100.
19. Sivasubbu S, Scaria V. Genomics of rare genetic diseases—experiences from India. Human
genomics. 2019 Dec;13(1):1-8.
20. "Whole-genome sequence variation, population structure and demographic history of the
Dutch population." Nature genetics 46, no. 8 (2014): 818-825.
21. Fakhro KA, Staudt MR, Ramstetter MD, Robay A, Malek JA, Badii R, Al-Marri AA, Khalil
CA, Al-Shakaki A, Chidiac O, Stadler D. The Qatar genome: a population-specific tool for
precision medicine in the Middle East. Human genome variation. 2016 Jun 30;3(1):1-7.
22. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P,
De Bakker PI, Daly MJ, Sham PC. PLINK: a tool set for whole-genome association and
population-based linkage analyses. The American journal of human genetics. 2007 Sep
1;81(3):559-75.
23. Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, O’Donnell-Luria
AH, Ware JS, Hill AJ, Cummings BB, Tukiainen T. Analysis of protein-coding genetic
variation in 60,706 humans. Nature. 2016 Aug;536(7616):285-91.
24. Poszewiecka B, Pienkowski VM, Nowosad K, Robin JD, Gogolewski K, Gambin A.
TADeus2: a web server facilitating the clinical diagnosis by pathogenicity assessment of
structural variations disarranging 3D chromatin structure. Nucleic Acids Research. 2022
May 7.
25. Miller NA, Farrow EG, Gibson M, Willig LK, Twist G, Yoo B, Marrs T, Corder S,
Krivohlavek L, Walter A, Petrikin JE. A 26-hour system of highly sensitive whole genome
sequencing for emergency management of genetic diseases. Genome medicine. 2015
Dec;7(1):1-6.
26. Hou L, Kember RL, Roach JC, O’Connell JR, Craig DW, Bucan M, Scott WK, Pericak-
Vance M, Haines JL, Crawford MH, Shuldiner AR. A population-specific reference panel
empowers genetic studies of Anabaptist populations. Scientific reports. 2017 Jul 20;7(1):1-
9.
27. Alkan C, Coe BP, Eichler EE. Genome structural variation discovery and genotyping.
Nature reviews genetics. 2011 May;12(5):363-76.
28. Eichler EE. Copy number variation and human disease. Nat Educ. 2008;1(3):1.
29. Mefford HC, Sharp AJ, Baker C, Itsara A, Jiang Z, Buysse K, Huang S, Maloney VK,
Crolla JA, Baralle D, Collins A. Recurrent rearrangements of chromosome 1q21. 1 and
variable pediatric phenotypes. New England Journal of Medicine. 2008 Oct
16;359(16):1685-99.
30. Hollox EJ, Huffmeier U, Zeeuwen PL, Palla R, Lascorz J, Rodijk-Olthuis D, van de
Kerkhof P, Traupe H, De Jongh G, Heijer MD, Reis A. Psoriasis is associated with
increased β-defensin genomic copy number. Nature genetics. 2008 Jan;40(1):23-5.
31. Bailey JA, Gu Z, Clark RA, Reinert K, Samonte RV, Schwartz S, Adams MD, Myers EW,
Li PW, Eichler EE. Recent segmental duplications in the human genome. Science. 2002
Aug 9;297(5583):1003-7.
32. Hollox EJ, Detering JC, Dehnugara T. An integrated approach for measuring copy number
variation at the FCGR3 (CD16) locus. Human mutation. 2009 Mar;30(3):477-84.
33. Robinson JT, Thorvaldsdóttir H, Winckler W, Guttman M, Lander ES, Getz G, Mesirov
JP. Integrative genomics viewer. Nature biotechnology. 2011 Jan;29(1):24-6.
34. Milne I, Bayer M, Cardle L, Shaw P, Stephen G, Wright F, Marshall D. Tablet—next
generation sequence assembly visualization. Bioinformatics. 2010 Feb 1;26(3):401-2.
35. Chen X. Statistical Modeling of Next Generation Sequencing Data. Yale University; 2014.
36. Souza Junior ML, de Sousa JV, Guerreiro JF. Analysis of coding variants in the human
FTO gene from the gnomAD database. PloS one. 2022 Jan 6;17(1): e0248610.
37. Pallares LF. Genetic Variation: Searching for solutions to the missing heritability problem.
Elife. 2019 Dec 4;8: e53018.
38. Wu J, Jiang R. Prediction of deleterious nonsynonymous single-nucleotide polymorphism
for human diseases. The Scientific World Journal. 2013 Oct;2013.
39. Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, Kondrashov
AS, Sunyaev SR. A method and server for predicting damaging missense mutations. Nature
methods. 2010 Apr;7(4):248-9.
40. Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL.
BLAST+: architecture and applications. BMC bioinformatics. 2009 Dec;10(1):1-9.
41. Davieson CD, Joyce KE, Sharma L, Shovlin CL. DNA variant classification–reconsidering
“allele rarity” and “phenotype” criteria in ACMG/AMP guidelines. European Journal of
Medical Genetics. 2021 Oct 1;64(10):104312.
42. Lazaro-Guevara J, Fierro-Morales J, Wright AH, Gunville R, Simeone C, Frodsham SG,
Pezzolesi MH, Zaffino CA, Al-Rabadi L, Ramkumar N, Pezzolesi MG. Targeted Next-
Generation Sequencing Identifies Pathogenic Variants in Diabetic Kidney Disease.
American journal of nephrology. 2021;52(3):239-49.

thesis_final dhwani.docx

  • 1.
    ABSTRACT Advances in next-generationsequencing (NGS) methods have allowed for the cost-effective sequencing of huge portions of the genome, allowing NGS to be used in clinical practise. Diseases related to the kidney contributes significantly to the global disease burden and is associated with an elevated risk of morbidity and death. A wide range of primary renal diseases can result in chronic kidney disease. One in every five nephology disease patients does not have a primary disease diagnosis. Moreover, new research reveal that the clinical diagnosis may be inaccurate in a considerable proportion of cases. The absence of a diagnosis, as well as an inaccurate diagnosis, might have therapeutic consequences. Genetic testing may improve diagnosis accuracy in individuals with nephrotic disease, particularly in people with unknown aetiology. The study mainly focuses on the genetic cause of disease in individuals with nephrotic diseases using targeted NGS panels. Based on ACMG-AMP standards, the analysis discovered uncommon pathogenic variations that were compatible with the clinical diagnosis of 65% of the samples included in this study. Similarly, 12% of patients had rare pathogenic/likely pathogenic mutations in nephrotic disease-related genes.
  • 2.
    1. INTRODUCTION Genetics playsa key role, to a lesser or greater extent, in all diseases. Variations in the DNA and their impact in functions of their protein products by itself or together with epigenetic modifications leads to disease processes. WHO recommends a prevalence of less than 10 cases/10,000 persons for designating rare illnesses. The typical prevalence criteria used to classify rare diseases vary among various jurisdictions and range from 1 to 6 cases/10,000 people. The study concluded that efforts to harmonise the disparate definitions should concentrate on standardising quantitative criteria like prevalence thresholds and minimise qualitative descriptors like illness severity. The constitution of genetics plays a key role, in all disease processes, including both rare and common disorders, because of the variations. These differences, either together or individually, might proffer an individual more liable to cause a single disorder (for example, a type of cancer), but at same time could manifest the same individual less liable to come about an unrelated disorder (for example, diabetes). The instructions for the generation of a human being are encoded in the substance called DNA present in the cells: the genome of the human, which contains approximately 3 billion bp. Scientists all over the world joined in the ‘Human Genome Project’ to initiate the first DNA sequences of whole human genome which contained additions and corrections being made in the subsequent years (reference). The major mass of the DNA is present inside the nucleus in the form of chromosomes. A small amount of DNA is also present in mitochondria known as mitochondrial DNA (mt DNA) Most of the individuals have 23 pairs of chromosomes and the is present in form of 2 copies that is one copy of chromosome is from our mother and the other copy is from our father. The nuclear genome of the Human contains roughly 20,000 genes which are protein coding, and mainly consists of both protein-coding sequences known as Exons and non-coding sequences are known as Introns. 22000 genes of the Human genome
  • 3.
    encode RNA moleculesonly; out of which a small part of RNAs (rRNA, tRNA) form the parts of the translation machinery also there are many RNAs which perform a variety of roles within the cell or in expression of other genes via regulation. In fact, it is now believed that as much as 80% of our genome has biological activity that may influence structure and function. (1) The human genome also contains over 14000 ‘pseudogenes’; these are imperfect copies of protein-coding genes that have lost the ability to code for protein. (1) Some of the pseudogenes have their application in gene therapy by gene editing approaches furthermore can be used to generate functional genes. One of the important findings is the gene distribution between the chromosomes is unequal: chromosome 19 is particularly gene-dense, while the autosomes for which trisomy is viable (13, 18, 21) are relatively gene-poor. It was recognised from the outset of the Human Genome Project that there was a significant degree of DNA sequence variation amongst healthy individuals, and so there is no such thing as a 'normal' human DNA sequence. However, if to explain changes to the DNA sequence, it must be done in relation to some baseline, which is the human reference genome sequence.
  • 4.
    2. REVIEW OFLITERATURE With over 1.3 billion people, India is the world's second most populous country, accounting for 17% of the global population. With over 4500 anthropologically distinct populations, the country is extremely diverse. (18) These populations have been divided into castes, tribes, and religious groups, which differ in terms of cultural practises, geographic areas, climatic conditions, physical characteristics, marriage practises, linguistics, and genetic architecture. (18-19) Despite its rich genetic variation, India was under in global genome research. (26) Furthermore, the Indian population is divided into several huge endogamous clusters and is characterised by consanguineous marriages. Consequently, recessive alleles are common with in Indian cohort. In the absence of significant whole-genome studies from India, those certain subpopulation-specific genetic variants are not sufficiently captured and categorised in the worldwide scientific literature. (20) The emergence of next generation sequencing and its expanding availability over the last decade has revolutionised our understanding of the genetic architectural history of various communities around the globe. (21) Various worldwide population datasets, including the 1000 genome sequencing, (22) ExAC, (23) ESP6500 (https://evs.gs.washington.edu/EVS/), and gnomAD (24) have produced guide and patient genome sets of data from communities across the regions as part of this effort. Even though these data - sets include Indian genomes, the number is negligible compared to the genetic variation and heterogeneous nature of the Indian population. (25) 2.1 Genetic Alterations Genes code for proteins that act as pigments, enzymes, hormones, antibodies, and regulate other proteins, as well as all metabolic pathways. Replication (DNA makes DNA), transcription
  • 5.
    (DNA makes RNA),RNA processing (capping, splicing, tailing, and RNA translocation to cytoplasm), translation (RNA makes protein), and protein processing, folding, transport, and incorporation are all steps in the transmission of genetic information Figure 1. If the DNA sequence is mutated and not repaired by the cell, subsequent replications replicate the mutation. Mutations can be caused by a variety of mechanisms, ranging from a single nucleotide change to the loss, duplication, or rearrangement of chromosomes. Single-gene, chromosomal, and multifactorial disorders are the three major types of genetic diseases. Multifactorial disorders, such as congenital heart disease, most types of cleft lip/palate, club foot, and neural tube defects, are caused by a combination of genes and environmental factors. (2) Figure 1: Steps in the transmission of genetic information
  • 6.
    2.2 Variation A variationoccurs when the sequence of DNA of an organism change. Variation can occur because of errors in Replication of DNA throughout cell division, mutagen exposure, or viral infection. Variants can be classified depending on cell type: Table 1 GERMLINE VARIATION SOMATIC VARIATION Any variation occurring in germline lineage and is heritable is known as germline variation. Any variation occurring in the somatic cells and would result in mosaic individual is known as somatic variation. Also known as heritable variation Also known as acquired variation Occurs in different stages of gametogenesis. Occurs in body cells like skin liver etc. These variation can be passed on from one generation to another. These variation do not pass to future generations. Through natural selections, these variation have an effect on evolution. These variation do not have any affect on evolution. Variants can be classified depending on type of alteration: Single nucleotide variant (SNV) A single nucleotide is substituted for another. A SNV may be uncommon in one population but can be more common in another. SNVs are often referred to as single nucleotide
  • 7.
    polymorphisms (SNPs), butthe terms are not exchangeable. The variant must be present in approximately 1% of the population or above to meet criteria as an SNP. The coding gene area exclusively contains silent (non-synonymous) SNPs that result in phenotypic alterations, as seen in the Figure 2 below. Figure 2: Types of variations If an SNV is found in a protein-coding region, it may cause either a: Silent Variation /missense/nonsense/indel. 2.2.1 Silent Variation When a nucleotide is swapped out for another nucleotide in a way that still results in the same amino acid being generated, this is known as a silent mutation. Silent mutation produces a new codon for the same native amino acid. For instance, the codes GAG and GAA both represent glutamic acid. The same amino acid form results from changing the G at this specific codon to an A. Silent mutation is the term used to describe this kind of mutation.
  • 8.
    When a nucleotideis swapped out for another nucleotide in a way that still results in the same amino acid being generated, this is known as a silent mutation. Silent mutation produces a new codon for the same native amino acid. For instance, the codes GAG and GAA both represent glutamic acid. The same amino acid form results from changing the G at this specific codon to an A. Silent mutation is the term used to describe this kind of mutation. 2.2.2 Missense Variation Nucleotide substitution causes a different codon to be generated in the missense mutation, but the new codon is not a stop codon. A missense mutation causes a protein to lose its function or change completely. To define clearly, when the idea of protein creation leads to functionally different amino acids, then a different protein is created overall. 2.2.3 Nonsense Variation When a stop codon is replaced for an amino acid coding codon, nonsense mutation occurs. A stop codon, a particular kind of triplet codon that marks the completion of protein synthesis, is made up of the amino acids UAA, UAG, and UGA. The start codon initiates the synthesis of protein in an equivalent manner. The premature end of amino acid synthesis or loss of a functional protein results from the stop codon being inserted at an unexpected location in an amino acid sequence. The term "nonsense mutation" refers to this kind of mutation. An example of silent, missense and nonsense mutations are described in the Figure 3.
  • 9.
    Figure 3: Illustrationof silent, nonsense and missense mutations 2.2.4 Indel A combination of insertion and deletion mutation. It describes a length discrepancy between two “alleles” in which it is unknown whether a sequence insertion or deletion originally generated the disparity. It is also a Frame - shift Mutation if the insertion or deletion has a nucleotide count that is not divisible by three and is present within a protein-coding area. As several indels were found in various disease types by current genome sequencing research, indels can be harmful and might increase disease susceptibility. Structural Variation The functional spectra of structural variation have recently been expanded to cover events >50 bp in length. Formerly, structural variation was described as insertions, deletions, duplication, translocations, and inversions (as shown in Figure 4) larger than 1 kb in length. The accurate description of the copy, content, as well as structure of genomic variations should be the primary goal of structural variant (SV) investigations. (27)
  • 10.
    Figure 4: Structuralvariation was described as insertions, deletions, inversions, translocation, ring chromosome and fragile site. Human genome comparisons reveal that structural variation, particularly copy number variation, results in far more base pair changes than do point mutations. (27) Copy Number Variation (CNV) 2.4 COPY NUMBER VARIATION (CNV) The term "copy number variation" denotes to a mid-scale genetic change, best characterized as segments longer than 1,000 base pairs but often only about 5 MB, the cytogenetic level of specificity. CNVs contain both genetic material losses (deletion) and extra copies of the sequence (duplications). CNVs, along with inversions and translocations, are typically categorised as types of genome structural variation because they alter the structure of the genome. Recently, researchers have realised how much of human diversity can be attributed to CNVs.
  • 11.
    In general, dependingon length of the damaged sequence, scientists classify CNVs into one of two basic groups. Copy number polymorphisms (CNPs), which have a prevalence of more than 1% overall in the general population, are included in the first group. Most CNPs are under 10 kb in size in length, and they are frequently enriched for genes which produce proteins vital to immunity and drug detoxification. A portion of these CNPs exhibit significant copy number variation. (30) The size of the second class of CNVs, which ranges from a few hundred thousand base pairs to over a million base pairs, comprises relatively uncommon variants that are substantially longer than CNPs. These variants, which are often referred to as microdeletions and microduplications, typically have a more recent ancestry within a family. (29) These CNVs might have developed during the development of the egg or sperm that gave rise to a specific person, or they might have been passed down within a family for only a brief time. A disproportionate number of patients with mental retardation, developmental delay, schizophrenia, and autism have been shown to have these significant and unusual structural variations. (31) Numerical Variations Chromosome defects include many types of numerical anomalies. These kinds of birth abnormalities happen when the body's cells contain a different number of chromosomes than is typically observed. There may therefore be 45 or 47 chromosomes in each cell of the body rather than the customary 46. Chromosome imbalances might result in health issues or birth abnormalities. Numerical aberrations include Edwards syndrome and Down syndrome (trisomy 21). Patau syndrome (trisomy 13), Klinefelter syndrome (XXY syndrome), Turner syndrome (monosomy
  • 12.
    X), and trisomyX as shown in Figure 5 with maternal age, chromosomal errors in oocytes increase dramatically. Figure 5: Structural variation was described as insertions, deletions, inversions, translocation, ring chromosome and fragile site. Mendelian Inheritance Pattern In the mid-nineteenth century, Mendel discovered a collection of heredity principles; these principles are used to determine characteristic patterns of inheritance. Single gene disorders are grouped as autosomal dominant (AD), autosomal recessive (AR), X-linked recessive (XR), X-linked dominant, and Y-linked (holandric) as shown in Figure 6. For example, a mutation in the FGFR3 gene can result in achondroplasia (4,5) similarly, damage to one allele of a pair of genes causes a deficiency in autosomal dominant disorders. Figure 6: Types of pedigree based on inheritance patterns
  • 13.
    A parent whohas an autosomal dominant disorder has a 50% chance of passing the disease on to her or his child. (6) Some diseases have a wide range of signs and symptoms in different people (variable expressivity), for example, some people with Marfan syndrome (FBN1 mutation) have only mild symptoms (such as being tall and thin with long, slender fingers), while others have life-threatening complications involving the heart and blood vessels. (6) An autosomal recessive disorder, in which the affected person inherits one abnormal allele from a heterozygous parent, is one in which both alleles of a gene must be altered (loss of function) for the defect to manifest. In this type of disorder, heterozygous parents have a 25% chance of having an affected offspring. When it comes to common autosomal recessive disorders or traits. The mutated gene in an X-linked disorder is found on the X chromosome. The disease can be caused by a recessive mutation. To cause the condition, the gene on chromosome X must be mutated; thus, an X-linked recessive disorder is passed by females but tends to affect males. Some genetic conditions have non-Mendelian or non-traditional inheritance patterns; for e.g., Figure 6: Types of pedigree based on inheritance patterns
  • 14.
    mitochondrial diseases, trinucleotideexpansion disorders, and genomic imprinting malformations have non-Mendelian or non-traditional inheritance patterns. (6,7) Technique To Detect Mutations. The term "next-generation sequencing" refers to newer technologies that have recently been developed for DNA sequencing on a large scale (NGS). NGS technologies enable high speed and throughput, as well as both qualitative and quantitative sequence data, allowing for the quick completion of genome sequencing projects. (42,43) Numerous sequencing methods are available with NGS systems, such as whole-genome sequencing (WGS), whole exome sequencing (WES), transcriptome sequencing, methylome sequencing, etc. (11-12) The genome's coding sequences make up around 30Mb, or 1%, of the total size. In contrast to the fact that 85% of disease-causing mutations in Mendelian illnesses are found in coding regions, more than 95% of exons are covered by WES. (10) Therefore, sequencing of the entire coding region (exome). (10) may be able to identify the mutations causing rare, primarily monogenic, genetic variation additionally to predisposing variations in typical disorders and cancer. This project included whole-genome sequencing, clinical exome sequencing, whole mitochondrial genome, sequencing panel analysis of 50 Indian cohorts to study the inheritance pattern of the Indian population. The project is primarily concerned with data on variant allele frequency, allele number, allele count, and the total number of heterozygous and homozygous individual people identified in this study. As a result, clinicians and researchers will be better able to distinguish between pathogenic and benign variants in the context of the Indian population when querying variations for various medical applications.
  • 15.
    Novel DNA sequencingtechniques, known as "next-generation" sequencing (NGS), provide high speed and throughput, allowing to produce a huge volume of sequences with wide range of applications in study and diagnostic setups. The throughput requirement for DNA sequencing increased by an unanticipated amount with the true objective of deciphering the human genome, spurring innovations like automated capillary electrophoresis. Research facility digitalisation and task of parallel processing led to the establishment of sequencing centres, which house hundreds of Genome sequencing devices established by cohorts of personnel. NGS could also be used to offer extensive depth study of either genomic DNA to detect genetic variations or RNA to report gene expression analysis differences. NGS has been used effectively to genetically characterise types of cancer (14) by identifying novel disease- associated translocations (16) and changes in miRNA abundance. (15) The four-chief advantage of NGS are as follows: 2.2.5 The sample sizes NGS is significantly less expensive, faster, requires less DNA, and is much more accurate and consistent than Sanger sequencing. So, every read of Sanger sequencing requires a substantial number of template DNA. As a strand that process is terminated on each base is required to build a sequence, multiple strands of template DNA are required for each base of been sequenced (i.e., for a 100bp sequence, several greater number of copies, for a 1000bp sequence, many hundreds of copies). A sequence can be produced from a single strand in NGS. Multiple staggered copies are captured in both types of sequencing for contig building and sequence verification.
  • 16.
    2.2.6 The speed Intwo diverse ways, NGS is faster than Sanger sequencing. To begin, in some types of NGS, the chemical processes and signal detection are merged, whereas in Sanger sequencing, these are two different things. Also, and more importantly, Sanger sequencing can only take each read (maximum 1kb) at a moment, whilst NGS is massively simultaneously, permitting 300Gb of DNA to be peruse in a single cycle on a microchip. 2.2.7 The cost Because NGS requires less time, man - power, and sample preparation, the costs are significantly lower. The first human genome sequence cost around £300 million. A complete human genome still would cost £6 million if modern Sanger sequencing methodologies were used, facilitated by data from the known sequence. A human genome sequence with Illumina presently would cost just under £1,000. 2.2.8 The accuracy Because every read is multiplied before sequencing, and because it depends on several short overlapping reads, every portion of DNA or RNA is sequenced numerous times, repeats are inherent in NGS. Furthermore, because it is so much faster and less expensive, it is possible to perform so many repeats than it is with Sanger sequencing. More repeats mean more coverage, which results in a more reliable and accurate sequence, even if individual NGS reads are less accurate. Sanger sequencing can produce much extremely long sequence reads. However, because NGS is parallel, extremely long reads can be formed from several contiguous sequence data.
  • 18.
    3. MATERIALS ANDMETHODS 3.1 Materials  Varminer- variant interpretation tool Medgenome internal software which helps in interpretation of variant based on patient phenotype mentioned in the TRFs.  TRFs- test request form This form contains all the clinical information about the patient such as age, gender, consanguinity, family history of affected sibling (if present), age of manifestation of disease, type of test to be performed, clinical indications of the patients.  IGV- Integrative Genomics Viewer It is a powerful desktop programme for interactively visualising various genomic data. IGV offers real time interaction throughout all scales of genome complexity, from entire genome through base pairs, even for ridiculously huge data sets. (33) Bench scientists and bioinformaticians are just two of the many people IGV is intended to be useful for. While more professional users can benefit from the many advanced features and options, novice users particularly value the client friendly and easy to use interface. (34)
  • 19.
    Figure 7: IGV Population databases (ExAc, 1000G, gnomAD) Reference population databases are an essential tool in variant and gene interpretation. Their use guides the identification of pathogenic variants amidst the sea of benign variation present in every human genome and supports the discovery of new disease– gene relationships. (35) ExAc - For clinical and biomedical applications, the ExAC browser offers gene- and transcript- centric presentations of variation. This browser offers a variant display for the reported variant that shows population frequencies, functional annotation, and shorter read support information. It is free and open source to use this browser. (37)
  • 20.
    1000G - Throughthe 1000 Genomes website, users could investigate genotype identification, variant identification, supportive alignments of sequence read as from 1000 Genomes, and associated variations in dbSNP. GnomAD - A global alliance of researchers created the GnomAD, also called as that of the Genome Aggregation Database Consortium, to compile and unify exome and genome sequencing information gathered from a variety of large-scale sequencing initiatives and provide data summaries to the science establishment. (36)  In-silico Prediction Tools (SIFT, LRT, MuationTaster2) In silico predictive software allows assessing the effect of amino acid substitutions on the structure or function of a protein without conducting functional studies. (38) SIFT - Using sequence similarities and the physicochemical properties of amino acids, SIFT determines that whether an amino acid replacement will impact protein expression. Both found in nature nonsynonymous polymorphisms and missense variants created in the lab can be analysed using SIFT. LRT - Identifies a group of deleterious mutations that can distort amino acids which are highly conserved within the sequences that are protein coding and those which are likely to be deleterious. (39)
  • 21.
    MuationTaster2 - Enablesin evaluation of the DNA sequence variation that can cause a deleterious effect to the DNA. (39) PolyPhen2 – It helps in prediction of the impact of substitution of an amino acid on human proteins’ structure and function using considerations that are physical and comparative. (40) OMIM database – continuously updating database of human genes, genetic diseases, and traits. This database specifically focuses on the relationship between genotype and phenotype. UCSC genome browser - The Genome Browser (genome.ucsc.edu) of the University of California Santa Cruz (UCSC) is a well-known Web-based application for instantly presenting a requested region of a genome at any scale, together with several aligned annotations "tracks". The genotype estimations, mRNA and items considered to be representative tag alignments, simple nucleotide polymorphisms, expression and regulatory data, genetic makeup and variation data, and correlation and various comparative genomics data are all displayed in the annotations, which were produced by the UCSC Genome Bioinformatics Group and external collaborators. Ensembl - Ensembl is a system for producing and disseminating genome annotation, such as genes, variations, regulatory, and comparative genomics, throughout the vertebrate’s subfamily and important model species. A single unified resource can be created from experimental and reference data from various providers using the Ensembl annotation pathway.
  • 22.
    METHODOLOGY 1. Sample Registration Methodologybegins with the sample collection and registration. To begin with, ascertain the patient's identity. Acceptable identifiers include the patient's name, date of birth, and hospital number, among others. 2. Library preparation The process of library preparation depends on the test booked by the physician. (Whole exome sequencing, clinical exome sequencing, whole mitochondrial genome sequencing) 3. Sequencing All NGS platforms, however, sequence millions of small fragments of DNA in parallel. By mapping individual reads to the human reference genome, bioinformatics analyses are used to piece together these fragments. Each of the human genome's three billion bases is sequenced multiple times, providing sufficient depth to deliver accurate data and insight into unexpected DNA variation. NGS can be used to sequence entire genomes or specific areas of interest, such as all 22,000 coding genes (a whole exome) or a small number of individual genes. 4. Bioinformatics analysis This step is performed by clinical bioinformaticians. They work to check the quality of the reads, gender confirmation, CNV SNV annotations, coverage and depth of the reads, Total Data generated, On-target percentage.
  • 23.
    5. Interpretation andReporting The genome analyst is responsible for checking the sample quality metrics (QC) before proceeding with the analysis and to diligently follow the check list and SOP for both reporting and proofread processes. This step involves the use of Varminer software which contains the comprehensive information about the variants. Hence based on various parameters like MAF, in silico prediction tools, OMIM phenotype which matches the clinical phenotype of the patients, IGV of the variant that can predict the depth and coverage of the variant are used to determine the specific variant. Figure 8: The flow chart represents the overall workflow. Variant interpretation and reporting can further be categorized as follows: 1. Analysis: As mentioned earlier, before the actual analysis of the samples, the samples need to pass through various parameters like QC check, checking the patients’ phenotype mentioned in TRF.
  • 24.
    Figure 9:Steps invariant interpretation and reporting 2. Sample allotment: Following bioinformatics analysis, samples are assigned to genome analysts for data reporting and analysis. Following the generation of the report, the samples are assigned for rechecking. Rechecked samples are sent to the clinical geneticist for review. Proofreading will be assigned to approved reports. The primary genome analyst will revise the proofread reports. The person in charge distributes the reports using internal software. The genome analyst must generate, proofread, and distribute approved reports within the time frame specified by the software. 3. Report generation: Based on pedigree, family history, or phenotype, the following are the modes of inheritance: If a family history is provided, such as a pedigree (autosomal dominant, recessive, or X linked), disease specific inheritance based on diagnosis, such as cystic fibrosis, is inherited in an autosomal recessive mode.
  • 25.
    Minor allele frequency:Based on minor allele frequencies in population databases (1000 genomes, ExAC, internal database), variants detected in clinically relevant genes are systematically prioritised to distinguish baseline polymorphisms from clinically significant variants. The threshold value is determined by the disease condition's prevalence. For example, if a variant is more common than the prevalence of a disease, it is less likely to cause the disease. In contrast, when a variant is rare in the general population, it is likely to be significant. Supporting data for predictions made using in-silico techniques: To evaluate the importance, we use various predictive methods, including SIFT, PolyPhen2, Mutation Taster2, and LRT. These tools evaluate the impact of single nucleotide changes on the structure and functionality of proteins. Three or more tools must agree to be supporting evidence. The variant should be classified as pathogenic, likely pathogenic or Variant of uncertain significance by following The American College of Medical Genetics and Genomics (ACMG- AMP). It is difficult to assign pathogenicity status to patient DNA sequence variations. The American College of Medical Genetics and Genomics (ACMG-AMP) recommendations were created with the purpose of promoting standardisation. There is a high rate of discordance observed, with inconsistent criteria application being a common source of discordance. Variants of uncertain significance (VUS) are not usually disclosed to doctors, depending on the healthcare system, precluding any detailed interpretation in the scenario of a patient already known to have illness. To avoid a VUS classification, numerous requirements demonstrating pathogenicity must be met as shown in Figure 7. If the variation is enriched in patients vs.
  • 26.
    controls in case-controlstudies, or is detected in numerous unrelated probands, this is convincing evidence of pathogenicity (PS4). Such unrelated probands may not exist for the rarest alleles or may be unavailable in scientific literature. Only the moderate allele frequency criteria PM2 ("missing from controls or at extremely low frequency") is fulfilled in these scenarios. (41) ClinGen proposed reducing PM2 from moderate to supporting evidence in 2020 to account for genome-wide evidence of allele rarity. Figure 10: ACMG-AMP criteria that might be used to a single patient suffering from a rare disorder The prioritised variations are examined for genotype-phenotype association to determine the relevance of the variant based on all the mentioned criteria (s) Before the final report is generated, there are various other proofreading that take place, which is listed in the Figure 10.
  • 27.
    Figure 11: Stagesof report before release.
  • 28.
    4. RESULTS During thedissertation, I have analyzed about 55 samples and interpretated the variants causative of various clinical phenotypes. The samples were analysed under different NGS- Targeted Panels which include, disease specific panel, Clinical exome sequencing, Whole exome sequencing and Whole mitochondrial genome sequencing. The variants were prioritized based on the genotype-phenotype correlation, age of manifestation of diseases, consanguinity status, ACMG guidelines. The variants which were prioritized were of different variant classes i.e., missense, nonsense, frameshift, intronic, inframe insertions and deletions. 4.1 Gender distribution of patients The samples analysed contained both male and female samples, out of 55 samples analysed 41 samples were of male and 15 samples were of female. Only male samples can be examined in homozygous X-linked dominant situations; female samples can be accepted only if the variation is disclosed; this is because homozygous autosomal dominant variants are not a suitable variant if not reported. Segregation of samples based on the different age groups is shown in Figure 12.
  • 29.
    Figure 12: Chartshows gender distribution among the samples analysed 4.2 Age distribution of the sample As shown in the figure 13, majority of the samples analysed belong to 16 years and above category, hence it can be said that the age of manifestation of disease in majority of samples were above 16 years of age. Figure 13:The graph shows age and gender distribution of the samples analysed. 73% 27% GENDER DISTRIBUTION OF SAMPLES MALE FEMALE 12 8 10 25 0 5 10 15 20 25 30 below 5 years 5-10 years 11-15 years 16 and above Age distribution in samples
  • 30.
    4.3 NGS –Targeted Panel Various targeted panel were booked for analysis of the samples based on the test suggested by clinicians, depending on the patients’ phenotypes. Figure 12 shows the spectrum of panels analyzed. Compared to whole exome sequencing, 71% samples were analysed for clinical exome sequencing as the clinical exome test looks for a smaller number of genes. A greater or broader number of genes are covered in whole exome testing. Panel testing, on the other hand, tests for particular genes in the specified panel which forms 13 % of total samples. Figure 14: Distribution of the targeted panel analysed Variant class The significant variants belonged to several variant classes such as nonsense, missense, frameshift, splice, and copy number variations (CNVs). 39, 71% 7, 13% 4, 7% 5, 9% TARGETED PANEL CES PANEL WES COMBO
  • 31.
    Out of 55samples analysed variant in 31 samples belonged to the missense category, which requires evidence from the literature to be proven as pathogenic or potentially pathogenic, for 17 samples no variants causative of disease phenotypes were identified, 6 samples contained frameshift variant and 1 nonsense variant. Figure 15: distribution of samples based on variant class 4.4 Disease Causing Variants Most of the variants in the samples analysed contain variants of uncertain significance (VUS), with 24 VUS, nine disease causing variants (P) responsible for disease manifestation, 3 likely pathogenic (LP), and no significant variants matching the disease phenotype in the remaining 19 samples. The metrics below show that most null variants (frameshift, nonsense) are classified as pathogenic or likely pathogenic, whereas very few missenses are classified as pathogenic or likely pathogenic, with only those reported in disease databases and having relevant functional evidence classified as pathogenic or likely pathogenic. 6 1 31 17 FRAMESHIFT NONSENSE MISSENSE NONE 0 5 10 15 20 25 30 35 VARIANT CLASS
  • 32.
    Figure 16: significanceof variants 9 3 24 19 SIGNIFICANCE OF THE VARIANTS PATHOGENIC LIKELY PATHOGENIC VARAINT OF UNCERATIN SIGNIFICANCE NONE
  • 33.
    5. DISCUSSION ANDCONCLUSION The purpose of this study was to look at the use of next-generation sequencing (NGS) in understanding the molecular basis of kidney disease in individuals. While genetic testing has been shown to aid in providing more accurate diagnoses in patients with heritable kidney disease, such as polycystic kidney disease, Alport Syndrome, and Fabry Disease, its application to patients with nephrotic diseases, as well as its utility in identifying unsuspected kidney disease in nephrotic patients, has been limited. (42) To examine further, a unique gene panel of 345 kidney disease-related genes to effectively perform targeted NGS in seven samples with kidney disease. Approximately 65% of the nephrotic disease patients included in this study had diagnostic variations that were compatible with their clinical diagnosis, according to the analysis. While one-third of nephrotic disease patients were found to have pathogenic or likely pathogenic variations in kidney disease-related genes. PKD1, SLC4A1, NPHP1, AVPR2, NPHS2, CTNS, GRHPR and COL4A3, among the genes reported to contain pathogenic mutations in patients, have previously been linked to nephrotic disease. COL4A3 is a gene that encodes a significant structural component of the glomerular basement membrane and has been linked to heritable nephropathies. The uncommon COL4A3 variation found in two individuals in the study. Genes like CFH, PAX2, CRB2 and various other genes included in the study were classified as VUS as no literature evidence for the variant was found. The study demonstrates the use of genetic testing in determining the underlying molecular aetiology of disease in patients with varied kinds of nephrotic syndrome. In addition to assigning genetic diagnoses that validated numerous nephrotic syndrome patients' clinical diagnoses, was able to find rare variations in a subgroup of patients that suggested re-evaluation
  • 34.
    of their diagnosiswas necessary. For these individuals, genetic analysis may have assisted in the detection of undiagnosed clinical signs and better guided their care management. The study has limitations that must be addressed. To begin, the study only comprised fifty-five samples. Despite its small sample size, it was discovered that of the samples contained rare variants in the genes, highlighting the importance of genetic screening in identifying precise molecular diagnoses. Second, while we were able to detect both coding variants and structural rearrangements using targeted sequencing, we did not evaluate variants in non-coding regions that could affect RNA expression and/or processing (e.g., those affecting transcriptional repression, exon skipping, and intron inclusion). Because our proprietary capture technique was limited to the coding area and exon-intron boundaries of the genes in NGS, any variants outside of these regions, such as those in the promoter regions and other noncoding or regulatory regions, could not be identified. Third, to improve specificity, the ACMG-AMP criteria and guidelines was followed for variant interpretation; as a result, many of the variations discovered in the study were designated as VUSs and were not associated with clinical phenotypes. Because VUSs cannot be completely ruled out as benign, the sensitivity of the investigation may have been compromised. Computational prediction techniques combined with functional investigations may aid in better understanding the pathogenicity of VUSs and prioritising them for future investigation. The objective was to use NGS technique to better determine the genetic basis of kidney disease. Although the samples in the analysis did not have pathogenic/likely pathogenic mutations in kidney disease-related genes, a significant minority possess variants that was thought to contribute to their disorder. The findings suggest that rare variants in genes not previously linked to polycystic kidney disease susceptibility (e.g., ciliopathy genes) may contribute to Nephropathy, and that genetic screening in chronic kidney disease patients can help provide a
  • 35.
    molecular diagnosis, whichcould lead to improved precision diagnostics and help in the prognosis and long-term management of their kidney disease.
  • 36.
    6. REFERENCES 1. SmithRA, Andrews KS, Brooks D, Fedewa SA, Manassaram‐ Baptiste D, Saslow D, Brawley OW, Wender RC. Cancer screening in the United States, 2017: a review of current American Cancer Society guidelines and current issues in cancer screening. CA: a cancer journal for clinicians. 2017 Mar;67(2):100-21. 2. Mahdieh N, Rabbani B. An overview of mutation detection methods in genetic disorders. Iranian journal of pediatrics. 2013 Aug;23(4):375. 3. Usdin K. The biological effects of simple tandem repeats: lessons from the repeat expansion diseases. Genome research. 2008 Jul 1;18(7):1011-9. 4. Richette P, Bardin T, Stheneur C. L’achondroplasie: du génotype au phénotype. Revue du rhumatisme. 2008 May 1;75(5):405-11. 5. Su N, Sun Q, Li C, Lu X, Qi H, Chen S, Yang J, Du X, Zhao L, He Q, Jin M. Gain-of- function mutation in FGFR3 in mice leads to decreased bone mass by affecting both osteoblastogenesis and osteoclastogenesis. Human molecular genetics. 2010 Apr 1;19(7):1199-210 6. Mahdieh N, Rabbani B. An overview of mutation detection methods in genetic disorders. Iranian journal of pediatrics. 2013 Aug;23(4):375. 7. Van Heyningen V, Yeyati PL. Mechanisms of non-Mendelian inheritance in genetic disease. Human molecular genetics. 2004 Oct 1;13(suppl_2): R225-33. 8. Hunt PA, Hassold TJ. Human female meiosis: what makes a good egg go bad? Trends in Genetics. 2008 Feb 1;24(2):86-93. 9. Soutar AK, Naoumova RP. Mechanisms of disease: genetic causes of familial hypercholesterolemia. Nature clinical practice Cardiovascular medicine. 2007 Apr;4(4):214-25.
  • 37.
    10. Ku CS,Cooper DN, Polychronakos C, Naidoo N, Wu M, Soong R. Exome sequencing: dual role as a discovery and diagnostic tool. Annals of neurology. 2012 Jan;71(1):5-14. 11. Rabbani B, Mahdieh N, Hosomichi K, Nakaoka H, Inoue I. Next-generation sequencing: impact of exome sequencing in characterizing Mendelian disorders. Journal of human genetics. 2012 Oct;57(10):621-32. 12. Jain A, Bhoyar RC, Pandhare K, Mishra A, Sharma D, Imran M, Senthivel V, Divakar MK, Rophina M, Jolly B, Batra A. IndiGenomes: a comprehensive resource of genetic variants from over 1000 Indian genomes. Nucleic acids research. 2021 Jan 8;49(D1): D1225-32. 13. Schuster SC. Next-generation sequencing transforms today's biology. Nature methods. 2008 Jan;5(1):16-8. 14. Simpson AJ. Sequence-based advances in the definition of cancer-associated gene mutations. Current Opinion in Oncology. 2009 Jan 1;21(1):47-52. 15. Chen W, Kalscheuer V, Tzschach A, Menzel C, Ullmann R, Schulz MH, Erdogan F, Li N, Kijas Z, Arkesteijn G, Pajares IL. Mapping translocation breakpoints by next-generation sequencing. Genome research. 2008 Jul 1;18(7):1143-9. 16. Kuchenbauer F, Morin RD, Argiropoulos B, Petriv I, Griffith M, Heuser M, Yung E, Piper J, Delaney A, Prabhu AL, Zhao Y. In-depth characterization of the microRNA transcriptome in a leukemia progression model. Genome research. 2008 Nov 1;18(11):1787-97. 17. Mastana SS. Unity in diversity: an overview of the genomic anthropology of India. Annals of human biology. 2014 Jul 1;41(4):287-99. 18. Chaubey G, Metspalu M, Kivisild T, Villems R. Peopling of South Asia: investigating the caste–tribe continuum in India. Bioessays. 2007 Jan;29(1):91-100. 19. Sivasubbu S, Scaria V. Genomics of rare genetic diseases—experiences from India. Human genomics. 2019 Dec;13(1):1-8.
  • 38.
    20. "Whole-genome sequencevariation, population structure and demographic history of the Dutch population." Nature genetics 46, no. 8 (2014): 818-825. 21. Fakhro KA, Staudt MR, Ramstetter MD, Robay A, Malek JA, Badii R, Al-Marri AA, Khalil CA, Al-Shakaki A, Chidiac O, Stadler D. The Qatar genome: a population-specific tool for precision medicine in the Middle East. Human genome variation. 2016 Jun 30;3(1):1-7. 22. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, De Bakker PI, Daly MJ, Sham PC. PLINK: a tool set for whole-genome association and population-based linkage analyses. The American journal of human genetics. 2007 Sep 1;81(3):559-75. 23. Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, O’Donnell-Luria AH, Ware JS, Hill AJ, Cummings BB, Tukiainen T. Analysis of protein-coding genetic variation in 60,706 humans. Nature. 2016 Aug;536(7616):285-91. 24. Poszewiecka B, Pienkowski VM, Nowosad K, Robin JD, Gogolewski K, Gambin A. TADeus2: a web server facilitating the clinical diagnosis by pathogenicity assessment of structural variations disarranging 3D chromatin structure. Nucleic Acids Research. 2022 May 7. 25. Miller NA, Farrow EG, Gibson M, Willig LK, Twist G, Yoo B, Marrs T, Corder S, Krivohlavek L, Walter A, Petrikin JE. A 26-hour system of highly sensitive whole genome sequencing for emergency management of genetic diseases. Genome medicine. 2015 Dec;7(1):1-6. 26. Hou L, Kember RL, Roach JC, O’Connell JR, Craig DW, Bucan M, Scott WK, Pericak- Vance M, Haines JL, Crawford MH, Shuldiner AR. A population-specific reference panel empowers genetic studies of Anabaptist populations. Scientific reports. 2017 Jul 20;7(1):1- 9.
  • 39.
    27. Alkan C,Coe BP, Eichler EE. Genome structural variation discovery and genotyping. Nature reviews genetics. 2011 May;12(5):363-76. 28. Eichler EE. Copy number variation and human disease. Nat Educ. 2008;1(3):1. 29. Mefford HC, Sharp AJ, Baker C, Itsara A, Jiang Z, Buysse K, Huang S, Maloney VK, Crolla JA, Baralle D, Collins A. Recurrent rearrangements of chromosome 1q21. 1 and variable pediatric phenotypes. New England Journal of Medicine. 2008 Oct 16;359(16):1685-99. 30. Hollox EJ, Huffmeier U, Zeeuwen PL, Palla R, Lascorz J, Rodijk-Olthuis D, van de Kerkhof P, Traupe H, De Jongh G, Heijer MD, Reis A. Psoriasis is associated with increased β-defensin genomic copy number. Nature genetics. 2008 Jan;40(1):23-5. 31. Bailey JA, Gu Z, Clark RA, Reinert K, Samonte RV, Schwartz S, Adams MD, Myers EW, Li PW, Eichler EE. Recent segmental duplications in the human genome. Science. 2002 Aug 9;297(5583):1003-7. 32. Hollox EJ, Detering JC, Dehnugara T. An integrated approach for measuring copy number variation at the FCGR3 (CD16) locus. Human mutation. 2009 Mar;30(3):477-84. 33. Robinson JT, Thorvaldsdóttir H, Winckler W, Guttman M, Lander ES, Getz G, Mesirov JP. Integrative genomics viewer. Nature biotechnology. 2011 Jan;29(1):24-6. 34. Milne I, Bayer M, Cardle L, Shaw P, Stephen G, Wright F, Marshall D. Tablet—next generation sequence assembly visualization. Bioinformatics. 2010 Feb 1;26(3):401-2. 35. Chen X. Statistical Modeling of Next Generation Sequencing Data. Yale University; 2014. 36. Souza Junior ML, de Sousa JV, Guerreiro JF. Analysis of coding variants in the human FTO gene from the gnomAD database. PloS one. 2022 Jan 6;17(1): e0248610. 37. Pallares LF. Genetic Variation: Searching for solutions to the missing heritability problem. Elife. 2019 Dec 4;8: e53018.
  • 40.
    38. Wu J,Jiang R. Prediction of deleterious nonsynonymous single-nucleotide polymorphism for human diseases. The Scientific World Journal. 2013 Oct;2013. 39. Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, Kondrashov AS, Sunyaev SR. A method and server for predicting damaging missense mutations. Nature methods. 2010 Apr;7(4):248-9. 40. Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL. BLAST+: architecture and applications. BMC bioinformatics. 2009 Dec;10(1):1-9. 41. Davieson CD, Joyce KE, Sharma L, Shovlin CL. DNA variant classification–reconsidering “allele rarity” and “phenotype” criteria in ACMG/AMP guidelines. European Journal of Medical Genetics. 2021 Oct 1;64(10):104312. 42. Lazaro-Guevara J, Fierro-Morales J, Wright AH, Gunville R, Simeone C, Frodsham SG, Pezzolesi MH, Zaffino CA, Al-Rabadi L, Ramkumar N, Pezzolesi MG. Targeted Next- Generation Sequencing Identifies Pathogenic Variants in Diabetic Kidney Disease. American journal of nephrology. 2021;52(3):239-49.