Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Snp

1,269 views

Published on

  • Be the first to comment

  • Be the first to like this

Snp

  1. 1. SNP Analysis Vipin Kumar Csci 8980 - DMBIO October 13, 2008 Vipin Kumar SNP Analysis
  2. 2. Messages Single SNP Methods do not capture multi-locus interactions Multi SNP Methods can do that, But they can’t handle high dimensionality Our work on Myeloma data Vipin Kumar SNP Analysis
  3. 3. Single Nucleotide Polymorphism SNP is a single nucleotide AGCGTGCATCAGTC individual 1 variation that occurs at an AGCGTGCATCTGTC individual 2 appreciable frequency (1% to 5%) AGCGTGCATCAGTC individual 3 12 Million SNPs on human AGCGTGCATCAGTC individual 4 genome SNP Spread uniformly across the genome Contributes to 90% of the genetic variation in human genome Vipin Kumar SNP Analysis
  4. 4. Linkage Disequilibrium (LD) probability that no recombination occurs in between two alleles close regions tend to stay together during recombination Regions that are far apart have low LD Regions that are close together have high LD Figure: Crossover between chromosomes Vipin Kumar SNP Analysis
  5. 5. Example LD Plot for a sample set of SNPs Vipin Kumar SNP Analysis
  6. 6. Single SNP approaches Each SNP is tested for its association with the phenotype Most prevalent methods for testing SNP associations are: Chi-squared statistic test Fishers exact test Subjects Cochran-Armitage test, etc. These tests give the probability of association by chance Phenotype SNPs Vipin Kumar SNP Analysis
  7. 7. Chi-square test Observed Matrix: MM Mm mm Row Sum Affected 8 27 65 100 Unaffected 70 20 10 100 Column Sum 78 47 75 200 Expected Matrix: MM Mm mm Row Sum Affected 39 23.5 37.5 100 Unaffected 39 23.5 37.5 100 Column Sum 78 47 75 200 Vipin Kumar SNP Analysis
  8. 8. Chi-square test Col.sum×Rowsum Expected Value E = Totalsamples (O−E )2 χ2 = E Degrees of freedom = (m − 1) × (n − 1) Using χ2 and deg. of freedom, lookup the probability of finding the observed matrix by chance -log(p) is often used for convenience Each snp having -log(p) > significance level is associated with the phenotype Vipin Kumar SNP Analysis
  9. 9. GWAS of Coronary Artery Disease - Samani et. al. 2007 1926 cases (subjects with artery disease before 66 yrs) 2938 controls 377,857 SNPs Found strong association with SNPs on chromosome 9 Vipin Kumar SNP Analysis
  10. 10. GWAS for lung cancer - Amos et. al. 1,154 ever-smoking lung cancer cases 1,137 ever-smoking controls 317,498 SNPs Vipin Kumar SNP Analysis
  11. 11. Multi SNP approaches Here multiple SNPs are tested for association with the phenotype Most suitable for complex diseases The following combinatorial methods are used: Subjects Multifactor Dimensionality Reduction Combinatorial Partitioning Method, etc Phenotype SNPs Vipin Kumar SNP Analysis
  12. 12. Multifactor Dimensionality Reduction Vipin Kumar SNP Analysis
  13. 13. Application MDR reveals higher order interaction in sporadic breast cancer, Ritchie et. al. Am. J. Hum. Genet. 2001 200 women with sporadic breast cancer Age matched controls (patients with other illness) 9 SNPs were considered Found a 4 locus genotype with highest crossvalidation consistency. Vipin Kumar SNP Analysis
  14. 14. Combinatorial Partitioning Method (for QTL) Vipin Kumar SNP Analysis
  15. 15. SNP Data Set for finding Associations Each pixel is either MM (green), Mm (red) or mm (blue). Vipin Kumar SNP Analysis
  16. 16. Statistical Significance Vipin Kumar SNP Analysis
  17. 17. Classification Based Approaches Train set Cases Train Classifier Controls Cases Cases Controls Model Test Controls Accuracy Test Set Vipin Kumar SNP Analysis
  18. 18. Using Location Information Non-synonymous Introns Synonymous Admixture UTR Other Accuracy 66.43 58.74 51.74 72.72 71.33 54.54 69.9 Nonsyn + Promolign (Syn + Introns): 75.75 % Vipin Kumar SNP Analysis
  19. 19. Statistical Significance Vipin Kumar SNP Analysis
  20. 20. Messages Single SNP Methods do not capture multi-locus interactions Multi SNP Methods can do that, But they can’t handle high dimensionality Our work on Myeloma data Vipin Kumar SNP Analysis
  21. 21. Questions? Vipin Kumar SNP Analysis

×