This document outlines a new genetic diagnosis protocol using whole exome capture and massively parallel DNA sequencing. The protocol allows for decreased DNA sequencing and cost while increasing detection of heterozygous bases and read depth. It was used to diagnose a patient with congenital chloride diarrhea through an unexpected genetic finding that confirmed the clinical diagnosis. The protocol has implications for discovering disease genes involved in Mendelian traits through identifying de novo mutations or homozygous segments, and for complex traits by applying selective sequencing of promising genes identified through initial sequencing of subjects.
24. Implications of the new Protocol
● Decrease in the amount of DNA sequencing
● Decrease in the cost of detection of exonic mutations
● High read depth, increase in detection of heterozygous bases (97.2%)
● Increased read per lane in the future
25. Clinical Implications
● New method: Unexpected genetic diagnosis (with an undiagnosed illness) +
Clinical diagnosis
-Congenital Chloride Diarrhea
-Mutation in SLC26A3
-Confirmation
● Parsing large amount of genetic data to produce clinically useful information using
the new method plus hints from clinical condition to arrive at a correct diagnosis.
27. Mendelian traits
❖ Dominant traits (Alleles that have been difficult to map via linkage analysis):
➢ Traits with substantial locus heterogeneity
➢ Solution: Identifying significant excess no of independent mutations in the
same gene by mapping data that constrains the location of the disease locus.
➢ Alleles that impair reproductive fitness such that affected subjects harbor De
Novo mutations.
➢ Solution: Finding the individual De Novo mutations will provide evidence of
disease
28. Mendelian traits
❖ Recessive Traits (Diseases due to consanguineous union):
➢ Disease locus is homozygous within a segment that is homozygous by descent
from a recent ancestor. Information search is therefore constrained to 10% of
the exome for first cousins.
➢ HNV detection in such segments has high efficiency even at low levels of per-
base coverage (5X).
➢ Number of protein-altering variants in these segments is very low (~29)
➢ Solution: KO in animal models (loss of function)
29. Complex Traits
❖ Non mendelian: derived from multiple genes and exhibit variety of phenotypes
➢ Sample size for loci identification is very large
➢ Apply new method for all subjects or a subset, then apply selective
sequencing for most promising genes.
❖ Missing some non-coding regions with rare variant that have large effects becomes
inevitable.
30. Conclusion
Whole-exome sequencing will make broad contributions to understanding the genes
and pathways that contribute to rare and common human diseases as well as clinical
practice.
Now, it is time for the results of the experiment. I will show you a lot of tables and figures. So bare with me, I want you to actually learn something. After the presentation I want to leave you with some applicable knowledge. We worked hard to understand what the paper is talking about and making it simple enough, so hopefully you will get something new today.
Before get into the tables I need to explain couple of terms to you. I believe many of you know some of these but I see benefit in explaining it because they are included in the next table.
SNV stand for single-nucleotide variant and it means a variation in a single nucleotide. The main difference between SNP and SNV is, SNV has no frequency limitation. SNP needs to happen at least %1 of the population but SNV is completely random and doesn’t reach to that heights.
If this variation changes result, we call them Missense.
Alleles are different forms of the same gene. If same kind of alleles come together we call it homozygous otherwise heterozygous.
After this very short lecture what this table tells us? This is whole genome sequencing results of 6 patients. At the top there are codes for the patients, you may think they are the scientific names for the people who attended to the experiment. The most interesting statistics belongs to the first guy. He got the biggest number of homozygous snv’s and smallest number of heterozygous snv’s. There is a reason for that. Because he is a product of consanguineous union which basicaly means, his parent were relatives. That is why he had biggest number of homozygous snv’s. His parent had similar genes, so his alleles are mostly homozygous. That leads to a lot of homozygous SNV’s.
This is the family tree of subject 1. As you can see there are no affected person in the whole family before the last generation. Mother and father got total of 18 sibling but it didn’t occur until our subject 1. He had a sibling died when she was 4 days old and presumably affected. Triangles means unknown gender and disease status because there were 2 abortions. So, what this tells us? Because there were no affected member before subject 1, we can easily say that, even though parent didn’t had the disease they were both carriers. Their child inherited affected genes from both of them. Parent were Heterozygous and the child is Homozygous. This is called autosomal recessive inheritance.
Even though accuracy of the sequencing machines are quite high, even higher than %99 sometimes, it is not enough for a truly accurate result. Becasue %99.5 accuracy means thousands of errors when sequencing millions of nucleotides. That is why sequencing is done multiple times. Number of times a nucleotide is sequenced called covarage. In order to decrease the amount of incorrect nucleotides in whole-exome capture data this method is used. It helps to increase the sequencing accuracy.
You can see in this table that coverage increase if percent decreases. If sequencing makes more error, they sequence more to get rid of that error.
Do you remember this from the long video you just saw? Know we got the real gene sequence. At the top you can see the reference part. Other genes ordered according to this reference. Forward reads shown with capital letter, reverse reads shown in lowercase letter. There is a homozygous missense mutation in here.
We can see single nuclotide polymorphsims here. Minumum is two, maximum is six. Generally choromosomes have two in this particular example.
And this is the primer sequence of where the mutation happened for this disease. Also, number of exons are given. Exons form exomes which are protein coding genes that makes up to 1% of all genes.