Genetic Variations Resources Developed by:Presentation Transcript
Genetic Variations Resources Developed by: Ansuman Chattopadhyay, Ph.D Information Specialist in Molecular Biology and Genetics Health Sciences Library System University of Pittsburgh [email_address]
Introduction Scientists expect that comparison of genomic sequences taken from two unrelated individuals will reveal that they are 99.9% identical. The 0.1% difference is due to genetic variations, and mainly one form of variation called single nucleotide polymorphisms. These polymorphisms are considered one of the the key factors that makes each and every one of us different and can have a major impact on how we respond to diseases; environmental insults such as bacteria, viruses and chemicals; and drugs and other therapies. This makes genetic variations of great value for biomedical research and for developing pharmaceutical products or medical diagnostics. This module will focus on human genetic variations and mainly cover Single Nucleotide Polymorphisms (SNP).
At the end of this module participants will be able to:
Understand the basic concepts behind different forms of
Understand the terminologies used by researchers
studying genetic variations
Identify genetic variation databases and interpret database
Understand and use online resources for functional
analysis of variation information
Understand the significance of the International Hap Map Project
Questions - A Few Examples
Participants will be able to answer questions like:
Mutations on BRCA1 gene have been reported to be associated with the
early onset of breast cancer. Retrieve all non-synonymous and validated
coding SNPs for BRCA1 from dbSNP.
What disorders are caused by a mutation to the gene HFE? Do all known
substitutions in this gene cause disease? How many SNPs have been
located in the HFE gene?
A gene variant primarily found in African Americans, that slightly
increases the risk for developing an irregular heartbeat,
known as arrhythmia. The variant occurs in the cardiac
sodium channel gene SCN5A which results a change of amino acid
at the position of 1102 from serine to tyrosine (S To Y) .
Can you predict the effect of this non-synonymous SNP ( rs7626962).
Human Genetic Variations
Primarily two types of genetic mutation events create all forms of variations:
Single base mutation which substitutes one nucleotide for another
-- Single Nucleotide Polymorphisms (SNP)
Insertion or deletion of one or more nucleotide(s)
--Tandem Repeat Polymorphisms
Single Nucleotide Polymorphisms Single nucleotide polymorphisms (SNP) are DNA sequence variations that occur when a single nucleotide (A,T,C,or G) in the genome sequence is altered. For example a SNP might change the DNA sequence A A GGCTAA to A T GGCTAA. SNPS are the most common class of polymorphisms. example:
Tandem Repeat Polymorphisms
Tandem repeats or variable number of tandem repeats (VNTR)
are a very common class of polymorphism, consisting of variable length
of sequence motifs that are repeated in tandem in a variable copy number.
VNTRs are subdivided into two subgroups based on the size of the tandem
Spinocerebellar ataxia Type10 (SCA10) (OMIM: +603516 ) is caused
by largest tandem repeat seen in human genome. Normal population
has 10-22 mer pentanucleotide ATTCT repeat in intron 9 of SCA10
gene; where as SCA10 patients have 800-4500 repeat units, which
causes the disease allele up to 22.5 kb larger than the normal one.
Insertion/Deletion Polymorphisms Insertion/Deletion (INDEL) polymorphisms are quite common and widely distributed throughout the human genome. Sequence repetitiveness in the form of direct or inverted tandem repeat have been shown to predispose DNA to localized rearrangements between homologous repeats. Such rearrangements are thought to be one of the reason which create INDEL polymorphism. example: Association between coronary heart disease and a 287 bp Indel Polymorphism located in intron 16 of the angiotensin converting enzyme (ACE) have been reported (OMIM 106180 ). This Indel, known as ACE/ID is responsible for 50% of the inter individual variability of plasma ACE concentration.
Chromosome Aberrations Gross chromosomal aberrations like deletions, inversions or translocations with a large segment of DNA sequences were thought to be quite rare. Although numerous clinically characterized genomic syndromes have been reported to be associated with chromosomal aberrations. example: Velocardiofacial syndrome (VCSF) characterized by the presence of features like cleft palate, cardiac anomalies and learning disabilities is associated with a deletion mutation on chromosome 22q11.2. (OMIM: 192430 )
SNPs appear at 0.3-1-kb average intervals, considering
the size of entire human genome, which is 3X10 7 bp, the total
number scales up to 5-10 million. (Altshuler et al., 2000)
In sillico estimation of potentially polymorphic VNTR are
over 100,000 across the human genome.
The short insertion/deletions are very difficult to quantify and
the number is likely to fall in between SNPs and VNTR
Polymorphisms and Disease Markers
Very few of these polymorphisms show direct impact on deleterious phenotype.
The non-disease-causing polymorphisms when mapped to the genome,may serve as markers to identify and map other genes that do cause disease when mutated.
If these non-disease-causing variations are found to be inherited with a particular trait, but do not cause the trait, they may provide evidence of where the trait's gene is located in the genome.
Single Nucleotide Polymorphisms (SNP)
Common Terminologies Allele: Alternative form of a genetic locus; a single allele for each locus is inherited separately from each parent. Polymorphism: Difference in DNA sequence among individuals. Linkage Disequilibrium (LD): If two alleles tend to be inherited together more often than would be predicted, then the alleles are in linkage disequilibrium. Haplotype refers to the set of alleles on one particular chromosome. Each person has two haplotypes in a given region, and each haplotype will be passed on as a complete unit.
Transitions and Transversions
SNPs include single base substitutions such as:
Transitions: change of one purine (A,G) for a purine,
or a pyrimidine (C,T) for a pyrimidine;
Transversions: change of a purine (A,G) for a pyrimidine
(C,T), or vice versa.
CpG dinucleotides are first methylated and then
deaminated to form either CpA or TpG.
G>A and C>T transitions accounting for 25% of all SNPs
in human genome.
SNPs and Mutations
Terminology for variation at a single nucleotide position is
defined by allele frequency.
A single base change, occurring in a population at a frequency
of >1% is termed a single nucleotide polymorphism (SNP).
When a single base change occurs at <1% it is considered
to be a mutation.
Life Cycle of SNPs and Mutations
Classification of SNPs
SNPs may occur at any position in the above gene structure and
based on its location it can be classified as: intronic, exonic or
promoter region etc.
Coding SNPs can be further subdivided into two groups:
Synonymous: when single base substitutions do not cause a change
in the resultant amino acid
Non-synonymous: when single base substitutions cause a change
in the resultant amino acid.
Coding SNPs Image from the Geospiza Green Arrow (TM) tutorial by Sandra Porter, Ph.D. on SNP or Sequencing Error
Genetic Variations Databases
Human Genome Variation Database (HGVbase)
TSC: The SNP Consortium
The Single Nucleotide Polymorphism database (dbSNP) is a public- domain archive for a broad collection of simple genetic polymorphisms.
This collection of polymorphisms includes:
Single-base nucleotide substitutions (also known as single nucleotide polymorphisms or SNPs )
Small-scale multi-base deletions or insertions (also called deletion insertion polymorphisms or DIPs ),
Microsatellite repeat variations (also called short tandem repeats or STRs ).
Ref SNPs are similar to records in RefSeq, as they are
curated by NCBI staff.
Ref SNP Clusters define a "non-redundant set of
When SNPs are first submitted by a researcher,
the SNP is given an ss#.
Non-redundant SNPs are then provided a unique
Submitted SNPs that represent redundant data are
instead deposited into the matching RefSNP cluster.
RefSNP Clusters are identified by RS numbers.
The dbSNP build Cycle Image from The NCBI Handbook
Ref SNP Graphic Summary
dbSNP Search Options
The NCBI Hand Book
Entrez SNP The dbSNP is now a part of the Entrez integrated information retrieval system and may be searched using either qualifiers (aliases) or a combination of 28 different search fields. A complete list of the qualifiers and search fields can be found on the Entrez SNP site .
Entrez SNP: Limit Options
The extensive limits screen in Entrez SNP cover a variety of features, including:
Function Class (coding non synonymous; intron; etc.)
Chromosome (including W and Z for nonmammals)
Observed Alleles (using IUPAC-International Union of Pure and Applied Chemistry -codes)
Map Weight (how many times in genome)
"Created" and "Updated" Builds
Records with links to other NCBI data domains (OMIM, Nucleotide, Protein, Structure, PubMed)
Type of validation
Success Rate (likelihood that the SNP is real; = 1 minus false positive rate)
SNP Class and Method Class
Q1. Find SNPs for a gene Mutations on BRCA1 gene have been reported to be associated with the early onset of breast cancer. Retrieve all non-synonymous and validated coding reference SNPs for BRCA1 from dbSNP. dbSNP Answer
dbSNP - Search dbSNP to find SNP records for a gene
Step By Step Guide
Enter "BRCA1 [Gene Name]" in the search box
Click on "Limits"
Go to "Function class" and select "coding nonsynonymous"
Go to "Organism(s)" and select "Homo Sapiens"
Go to "Validation" and select all options except "no info"
Click on "Details" and review "Query Translation"
GeneView for All SNPs Display of all known ref SNPs overlaid on the gene structure
Genome-Oriented SNP Visualization RefSNP Summary Info
Map Viewer icon for SNPs (1)
Map Viewer Icon for SNPs (2)
Map Viewer Icon for SNPs(3)
Map Viewer Icon for SNPs(4)
NCBI and dbSNP
Question 2: Genome oriented SNP visualization Mutations in Dopamine Receptor 5 (DRD5) gene have been observed in patients with various neurological disorders. Search dbSNP and find how many refSNP records have been reported for DRD5. Show all refSNPs in the context of a chromosome. Answer
Entez Gene - Search Entrez Gene database to find
gene-centered information and use link to access dbSNP
and Map Viewer
Entez SNP - Find SNP information for a gene
Map Viewer - Display all SNPs in the context of a
Step By Step Guide
Enter "DRD5" in the search box
Click on "Limits"
Select "Gene Name" from the drop down list of "To limit
your search to a specific field"
Go to "Limit by Taxonomy" and select "Homo sapiens"
Click on "DRD5" from "Entrez Gene" search result to view
Click on "Links" and select "SNP" to retrieve all SNPs
records from dbSNP
Click on "Links" and select "GeneView in dbSNP" to find
location of SNPs on the gene
Click on "Links" and select "Map Viewer" to display all
SNPs in Map Viewer
Step By Step Guide
2. Map Viewer
Click on "Map and Options" (appears at left side bar)
A new pop up window will appear, Select "Variation"
available under "Sequence Maps" in "Available Maps“
section and click on "ADD" button to include it
in "Maps Displayed (left to right)" box
Select "Variation" in "Maps Displayed (left to right)" box and
Click on "Make Master/Move to Bottom" button
Click on "Apply" button
Map Viewer Display Option
SNP: Genome View
SNP : Chromosome Report http://www.ncbi.nlm.nih.gov/SNP/maplists/maplist-newmap.html
NCBI Map Viewer offers integration of variation data from
several data sources. The list includes
OMIM ( Morbid/Disease)
Question 3: How to create a genetic variation Map Generate an integrated variation map with reference SNPs, Mitelman breakpoints and OMIM diseases for chromosome 17; region 7773,000-7792,000 bp. What gene(s) have you found in this region? Answer
Step By Step Guide
Click on "Homo sapens (Human)" appears under "mammals“
node in the tree diagram
Click on chromosome 17
Specify the region by entering "7773,000" and "7792,000"
respectively in the "Region Shown" boxes appear in the left
Click on "Go" button
Click on "Map and Options" (appears at left side bar)
A new pop up window will appear, Select "Variation",
"Mitelman Breakpoints" and "OMIM/Morbid Diseases" from
"Available Maps" section
Click on "ADD" button to include the selected map options into
"Maps Displayed (left to right)" box.
Select "Variation" in "Maps Displayed (left to right)" box
and Click on "Make Master/Move to Bottom" button
Click on "Apply" button
Integrated Variations Map
Functional Analysis of Polymorphisms
SNPs and The Structure of a Gene
A Decision Tree for SNP Analysis
Exonic Splicing Enhancer/Silencer
Question 4: Functional analysis of a SNP A gene variant primarily found in African Americans, that slightly increases the risk for developing an irregular heartbeat, known as arrhythmia. The variant occurs in the cardiac sodium channel gene SCN5A which results a change of amino acid at the position of 1102 from serine to tyrosine (S To Y) . Can you predict the effect of this non-synonymous SNP ( rs7626962). Answer
Flow Chart 1. Entrez SNP - Search Entrez SNP by refSNP ID to find SNP information. 2. Entrez Protein - Find protein information including its amino acid sequence and the presence of functional domains 3. NCBI Amino Acid Explorer - Compare amino acids in terms of physyo-chemical properties 4. NCBI Mutation Analyzer - Predict the effect of amino acid change on the protein structure 5. TMHMM Server v. 2.0 - Predict the presence of transmembrane helix in a protein sequence 6. Russel etal ., Amino Acid Properties Table - Predict the effect of amino acid change on the protein structure
Step By Step Guide
Enter "rs7626962" in the search box and click on "GO"
Click on "GeneView" and note the amino acid change at
Click on "NP_000326" appears in Protein column in GeneView
to view the protein information of SCN5A present in Entrez
2. Entrez Protein
Select "FASTA" from "Display" drop-down menu and click on
"Display" to get amino acid sequence for SCN5A
Click on "Domains" to see the presence of conserved
domains in the protein sequence
Select the domain, which covers the amino acid
position 1102 and click on the domain to view the sequence
alignment. Check whether "Ser" at position 1102 is
conserved among the family members or not
3. NCBI Amino Acid Explorer
Go to "Compare" option appears in the left side bar and
select "S-Ser" to "Y-Tyr" and click "Compare"
4. NCBI Mutation Analyzer
Select "ser" to "tyr" and click on "Mutate" button"
In the "results of mutating serine to tyrosine" page note
the color which indicates the amino acid substitution score
based on BLOSUM62 matrix
5. TMHMM Server v. 2.0
Copy the FASTA formatted sequence for SCN5A from
Entrez Protein step and paste it into the sequence
Select output format "Extensive, with graphics" and
Find the topology (transmembrane helix/inside/outside)
around position 1102
6. Russel etal., Amino Acid Properties Table
Click on "S" present in the "Overview of Amino Acid
Check the substitution score for ser to tyr
SNP Gene View for SCN5A
Amino Acid Comparison Text View NCBI Amino Acid Explorer
Amino Acid Comparison Graphic View NCBI Amino Acid Explorer
Pharmacogenomics Pharmacogenomics is a science that examines the inherited variations in genes that dictate drug response and explores the ways these variations can be used to predict whether a patient will have a good response to a drug, a bad response to a drug, or no response at all. SOURCE: NCBI A Science Primer
Pharmacogenomics : Example
PharmGKB The Pharmacogenetics and Pharmacogenomics Knowledge Base: URL: http://www.pharmgkb.org/index.jsp
PharmGKB : Search Example Search PharmGKB for Albuterol :
Whole-genome genotyping of 10 million SNPs
Researchers are trying to downsize the problem of genome-wide genotyping by studying haplotypes.
A haplotype is a contiguous, linear set of SNP alleles along a genome that is inherited as a block.
Genetic Terminologies Allele: Alternative form of a genetic locus; a single allele for each locus is inherited separately from each parent. Genotype : Each person has two copies of all chromosomes except the sex chromosomes. The set of alleles that a person has is called a genotype. The term genotype can refer to the SNP alleles that a person has at a particular SNP, or for many SNPs across the genome . Genotyping : A method that discovers what genotype a person has is called genotyping.
Sets of nearby SNPs on the same chromosome are
inherited in blocks, called haplotype blocks , which are 12 or more kb long.
Between 65% and 85% of the human genome is organized
in haplotype blocks.
Each block comes in three or four common versions that capture
the majority of genetic diversity throughout the entire human
Blocks may contain a large number of SNPs, but a few SNPs
are enough to uniquely identify the haplotypes in a block.
The specific SNPs that identify the haplotypes are called tag SNPs .
International Hap Map Project The goal of the International HapMap Project is to develop a haplotype map of the human genome, the HapMap , which will describe the common patterns of human DNA sequence variation . The HapMap will be a tool that will allow researchers to find genes and genetic variations that affect health and disease. The HapMap Home Page URL: http://www.hapmap.org/index.html.en
Hap Map Project: Population and Sample
The DNA samples for the HapMap will come from a total of
Yoruba people in Ibadan, Nigeria
(30 both-parent-and-adult-child trios)
Japanese in Tokyo (45 unrelated individuals)
Han Chinese in Beijing (45 unrelated individuals)
Centre d'Etude du Polymorphisme Humain (CEPH)
(30 trios,residents with ancestry from Northern
and Western Europe )
SOURCE: International Hap Map Project
Hap Map Project: Scientific strategy To develop the HapMap, the samples will be genotyped for at least 1 million SNPs across the human genome. When the Project started, 2.8 million SNPs were in the public database dbSNP . However, many chromosome regions had too few SNPs, and many SNPs were too rare to be useful, so millions of additional SNPs were needed to develop the HapMap. The Project discovered another 2.8 million SNPs by September of 2003, and SNP discovery continues . Participating Centers: Canada, China, Japan, the United Kingdom, and the United States The Project initially will produce a map of 600,000 SNPs evenly spaced across the genome, which is a density of one SNP every 5000 bases. SOURCE: International Hap Map Project http://www.hapmap.org/abouthapmap.htm
Question 5: Identify disease-causing mutations What disorders are caused by a mutation to the gene HFE? Do all known substitutions in this gene cause disease? How many SNPs have been located in the HFE gene? Answer
Question 6: Analysis of exonic SNPs Germ line mutations in the BRCA1 gene lead to the predisposition to breast and ovarian cancer. A single point mutation, a G to T substitution in exon 18 at nucleotide 5199 ( codon 1694) have been observed in a group of breast and ovarian cancer patients. This mutation changes a glutamic acid to a stop codon (Glu 1694 ter). Further study revealed that instead of expressing any transcript with exon 18 containing stop codon, the mutant allele produces only mRNA in which the entire exon 18 has been skipped. Explain the cause of this exon skipping phenomenon. Answer
Question 7 : Find SNPs in a given base pair range on an assembled genome Have any SNPs been discovered on mouse chromosome 5 betweenchromosome position 38000000 and 39000000? Which of these SNPs have observed alleles of A/G? Can these SNPs be viewed on a map? Answer