testing123
Upcoming SlideShare
Loading in...5
×
 

testing123

on

  • 512 views

testing123

testing123

Statistics

Views

Total Views
512
Views on SlideShare
511
Embed Views
1

Actions

Likes
0
Downloads
2
Comments
0

1 Embed 1

http://184.168.115.128 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • hemoglobin with is a tetramer of 2 alpha and 2 beta subunits. structure on left from J.Mol. Biol he High Resolution Crystal Structure of Deoxyhemoglobin S Daniel J. Harrington, Kazuhiko Adachi, William E. Royer, Jr The Journal of Molecular Biology V272 No. 3 pp. 398-407 September 1997 http://web.wi.mit.edu/proteins/pub/BOA-2000/left.htm
  • Start off by with slide by defining SNPs 529 SNP/Mb in exon 921 SNP/Mb intron Nucleotide diversity
  • 40% of proteins belong to a family 70% has at least one other match
  • (information content without correction) “In general,” “there are some exceptions”
  • pseudocounts are based on prior knowledge of the most common amino acid distributions observed in a database of many protein alignments probabilities are calcualted for ever amino acid at every position position aa allowed 5Y all 20
  • Ideal case: a variety of amino acids have had the time to evolve at positions not important for function. many highly identical sequences e.g. viral proteins, Ig’s (can be fixed by going to smaller database)
  • 1764 substitutions that affect function 2240 substitutions that give no phenotype Intermediate grouped with null Mention intermediate grouped with null 15% better total prediction accuracy 10% increase in experimental prediction accuracy
  • white: tolerate >= 6 substitutions in assay red : positions high false positive error
  • what genes in whitehead SNPs. candidate genes for coronary artery disease, type II diabetes, schizophrenia
  • Substitutions were first identified in patients and then deposited into dbSNP. Thus it makes sense that the substitutions should be preicted as damaging.
  • when purine repressor , a LacI paralogue, used for prediction on LacI, Variagenics only predicted 19% of the substitutions that have an effect were correctly predicted as damaging.
  • There are two genetic approaches that make use of the variation around genes to find disease loci. Haplotypes may be stronger predictors of phenotype (mirvana, chakravarti) haplotype a set of alleles grouped together haplotype is a group of SNPs that are linked together tagSNPs are most informative Neil Risch – reduced positive with direct appraoch
  • Is the direct approach possible? Hoogendoorn, Bastiaan used reporter gene assays in cell lines. We have used denaturing high performance liquid chromatography to screen the first 500 bp of the 5' flanking region of 170 opportunistically selected genes identified from the Eukaryotic Promoter Database (EPD) for common polymorphisms. Using a screening set of 16 chromosomes, single-nucleotide polymorphisms were found in approximately 35% of genes. It was attempted to clone each of these promoters into a T-vector constructed from the reporter gene vector pGL3. The relative ability of each promoter haplotype to promote transcription of the luciferase gene was tested in each of three human cell lines (HEK293, JEG and TE671) using a co-transfected SEAP-CMV plasmid as a control. The findings suggest that around a third of promoter variants may alter gene expression to a functionally relevant extent .
  • causal variant may not have been identified. 80% common identified in European. 50% in Africans (Nickerson) rare variant some genes have no coverage – there may be no nsSNP or it has not yet been identified and deposited in dbSNP
  • dbSNP 120, 3.5 million double hit and snps with frequencies

testing123 testing123 Presentation Transcript

  • Sifting the human genome for functional polymorphisms Pauline C. Ng, PhD
  • From genotype to phenotype
    • humans are ~99.9% identical to each other
    • genetic variation causes different phenotypes
  • Variation around genes are most likely to contribute to phenotype Coding Nonsynonymous SNPs, variation that causes an amino acid substitution 3’UTR Change in protein function? 5’UTR upstream 5’UTR
  • Amino acid substitutions can cause disease
    • gene lesions responsible for disease : aa substitutions ~50% (Human Mutation 15:45-51)
    Hemoglobin  E6V  sickle-cell anemia
    • 1 SNP / 1000 bp
    • Protein:
    • 1:1 synonymous:nonsynonymous
    • 1:2 expected
    • 1:1 conservative:nonconservative
    • 1:2 expected
    • Nat. Genetics 22:231-238, 22:239-247
    • Science 293:489-93
    nsSNPs in humans are selected against ? some of the observed nsSNPs may be involved in disease
  • Predicting the effect of an amino acid substitution
    • Applications
      • Nonsynonymous SNPs
      • Large-scale random mutagenesis projects
    • Cheap and quick for suggesting experiments
  • Computational Tools for Predicting AA Substitution Effects
    • 1) SIFT ( s orts i ntolerant f rom t olerant)
    • uses sequence
    • Genome Research 11:863-574
    • 12:436-446
    • 2) EMBL
    • uses structure + sequence + annotation Human Mol. Gen. 10:591-597
    • 3) Variagenics
    • uses structure + sequence
    • J. Mol. Biol. 307:683-706
  • Sequence conservation correlated with intolerance to substitutions Conservation  log 2 20 +  f aa log f aa
  • SIFT Choosing sequences a) Database search b) Choose closely related sequences Obtain alignment with related proteins. For each position, calculate scaled probabilities for each amino acid substitution. Query protein < cutoff > cutoff tolerated affects function
  • SIFT: Choosing sequences # of sequences: 1 2 3 4 5 6 7 8 9 10 11 12 13 14
  • SIFT: Calculating probabilities 1 0 2 0 1 0 4 0 1 0 1 0 1 0 1 0 1 0 2 0 3 0 1 0 p x /p max < 0.05 => x affects function 20 12 4 1 20 16 13 9 4 2 12 0 9 7 18 13 19 12 16 11 c 20 14 c 13 9 c 16 10 16 9 c 5 2 13 7 12 8 12 8
  • SIFT output Substitution Probability Prediction Confidence M24S 0.04 Affect Function Low S82T 0.36 Tolerated High V247A 0.03 Affect Function High !!!
  • Confidence is determined by the diversity of sequences in the alignment many highly identical sequences Ideal case: Diverse set of orthologous proteins few sequences available Low confidence examples
  • Case Study: LacI lac operon repressed LacI expressed lactose present normal state 4000 single amino acid substitutions assayed: throughout entire protein both neutral and affected phenotypes TIBS 22:334-339 c c
  • Prediction on LacI substitutions 63% 28% Substitutions that affect protein function Substitutions that give no phenotype Total prediction accuracy 68% (2726/4004) Pr(observe affected phenotype | predicted to be damaging) 63% false - false + 37% 72% predicted to affect function predicted to be tolerated 37%
  • False negative error: Positions not conserved among paralogues dimer & sugar interface not conserved
  • False positive error in LacI: surface with unknown function?
  • SIFTing human variant databases 69% 25% Substitutions involved in disease 7397 subst., 606 proteins from SWISS-PROT Predicted on 76% proteins 71% subst nsSNPs in normal individuals 19% Putative polymorphisms 5780 nsSNPs, 3005 proteins from dbSNP Predicted on 60% prot., 53% subst. 185 nsSNPs, 69 proteins from Whitehead Institute Predicted on 77% prot. 62% subst 31% 81% 75%
    • On functionally neutral substitutions, expected false positive error ~20%
    dbSNP nsSNPs in normal individuals Whitehead Institute Putative polymorphisms suggests that most nsSNPs are functionally neutral What accounts for the 5% difference? 25% 19%
  • Account for 5% difference in dbSNP 16 genes with a high fraction of dbSNP variants predicted to affect function 1) Substitutions found in patients 2) Substitutions mapped to nonfunctional genes/regions 3) Substitutions detected in error Supports SIFT as a prediction tool
  • Account for 5% difference in dbSNP 16 genes with a high fraction of dbSNP variants predicted to affect function 1) Substitutions found in patients 2) Substitutions mapped to nonfunctional genes/regions 3) Substitutions detected in error Supports SIFT as a prediction tool
  • Mutations in MSHR increase skin cancer Mutations associated with cutaneous malignant melanoma 1 Mutations not associated with CMM 1-3 1 Am. J. Hum. Genet . 66: 176-186, 2 J. Invest. Dermatol . 116 :224-229, 3 J. Invest. Dermatol . 112: 512-513 R151C L60V  R151C  D294H  R160W Tolerated Affect function Prediction Substitution  L60V  R163Q  D84E Tolerated Affect function Prediction Substitution
  • Mutations in PPAR  , a candidate gene for diabetes
    • *** In diabetics and controls, but increases cholesterol levels in diabetics and
    • perhaps nondiabetics 2-4
    • SIFT will detect what has been selected against in evolution;
    • inappropriate assay may fail to detect
    1 Am. J. Hum. Genet . 63:abs997 2 Diabetologia 43:673-680 3 Diabetes Metab . 26:393-401 4 J.Lipid Res. 41: 945-952 5 J. Hum. Genet . 46: 285-288 Mutations in diabetics 1 Mutations in nondiabetics 1-5  R127Q  R409T  D304N Tolerated Affect function Prediction Substitution  V227A  A268V  *** L162V Tolerated Affect function Prediction Substitution
  • Mutations in MTHFR Mutations with diminished enzyme activity 1-5 Unknown effect
    • Common
    • Under balancing selection
    • Increases neural tube defects
    • Reduce risk for some types of leukemia
    Found by contig comparison 1 Nat. Genet . 10:111-113 2 PNAS 96:12810-12815 3 PNAS 98:4004-4009 4 Cancer Res . 57:1098-1102 5 Mol. Genet. Metab . 64: 169-172  A222V  E429A Tolerated Affect function Prediction Substitution  R68Q
  • dbSNP variants from patients
    • Can distinguish patients from controls
    • Individuals with disease: 18/22 predicted to be damaging
    • Control individuals: 9/10 predicted to be functionally neutral
    • SIFT detects what’s selected against in evolution & is independent of assay
    • Example: PPAR 
    • Detect substitutions that are deleterious in the context of the protein, not the organism
      • Can detect nsSNPs with minor effects on phenotype
    • genes increase risk of skin cancer, diabetes, cholesterol levels
    • The protein need not be essential because SIFT predicts on the substitution.
      • Can detect nsSNPs under balancing selection
    • Example : MTHFR
  • 16 genes with a high fraction of dbSNP variants predicted to affect function 1) Substitutions found in patients 2) Substitutions mapped to nonfunctional genes or regions 3) Substitutions detected in error
  • 16 genes with a high fraction of dbSNP variants predicted to affect function 1) Substitutions found in patients 2) Substitutions mapped to nonfunctional genes or regions 3) Substitutions detected in error
  • 16 genes with a high fraction of dbSNP variants predicted to affect function 1) Substitutions found in patients 2) Substitutions mapped to nonfunctional genes/regions 3) Substitutions detected in error Changes found in patients Confirms SIFT prediction and its sensitivity Unlikely to affect human health Irrelevant to human health
  • Comparison of Prediction Tools 69% 69% 63% 75% 28% 9% 25% 32% 15% 19% Variagenics SIFT SIFT EMBL disease subst. LacI Variagenics SIFT LacI EMBL* 15% Variagenics SIFT SIFT EMBL SNP databases normal individuals Substitutions that affect function Substitutions that do not affect function Polymorphisms 31% 72% 69% 91% 75% 68% 81% 85% SIFT has similar prediction accuracy to tools that use structure
    • http:// blocks.fhcrc.org/sift/SIFT.html
    • SIFT, a prediction tool for the effect of substitutions
    • prediction is based only on sequence
    • Detect damaging nsSNPs on a large scale
  • Association studies for finding disease loci
    • Direct approach
      • SNPs likely to affect gene function
      • Association leads directly to candidate gene
      • Fewer SNPs to genotype
    • Indirect approach using haplotypes
      • tagSNPs to identify common haplotypes
      • in a region
      • Relies on LD with causal variant
      • Genotype 200K-1 million SNPs
    AATACGAT AATACGAT AATACGAT GATACAAC GATACAAC GATACAAC
  • Feasibility of direct approach
    • Have we identified all the causative variants?
    • common variant, common disease hypothesis
        • 80% of common SNPs in Europeans in dbSNP, 50% of common SNPs in Africans. Nat Genet. 33:518-21
    • What types of variants are involved in disease?
      • nsSNPs & splicing variants account for a large proportion of Mendelian disease
      • regulatory variation has a role in disease
        • ~50% genes show allele-specific expression Science 297 :1143; Hum. Genet. 113: 149–153
        • ~1/3 of promoter variants may alter gene expression Hum. Mol. Genet. 12: 2249–2254
    • nsSNPs
    • SNPs near intron/exon boundary
    • UTRs and promoter region
    • 4. synonymous SNPs
    Possible effect? In LD with causative variant? SNPs in and near genes protein function splicing regulation
    • covers 20,024 genes
    Double-hit and known-frequency SNPs in genes
  • Non-genic regions could potentially harbor disease variants
    • ~70% of bases in conserved sequences are noncoding (Genome Res. 13:2507-18 )
    • regulatory elements
    • noncoding RNAs
    • unknown genes
    • 41,193 SNPs in noncoding conserved regions
    • >= 80% identity with mouse
  • Adding SNPs in conserved regions improves SNP density
  • Focusing on variation in functional regions
    • Large # of SNPs makes direct approach possible
      • If causative variant is not in set, may be in LD with another SNP in the functional region
    • Concentrating on functional regions allows interesting experiments
      • genotyping
      • DNA copy number
      • allele-expression differences
    • Complementary to the indirect approach using haplotypes
  • Acknowledgments
    • FHCRC
    • Steve Henikoff
    • Jorja Henikoff
    • Henikoff Lab