Perspectives of identifying Korean genetic variations

1,211 views
965 views

Published on

Single Nucleotide Polymorphism (SNP) is the genetic variation most frequently occurred in human genome. SNP is considered as one of the well characterized genetic marker which is useful for the research on human disease genomics as well as the human population stratification. Currently a type of structural variation in the genome, so called Copy Number Variation (CNV), have received public attention in the hope to get additional genetic information that can not be answered by SNPs.

To gain insight into Korean specific genetic markers, we analyzed 54,794 SNPs from 159 individuals in 10 regional areas in Korea (CheonAn, NaJu, GimJe, UlSan, Jeju, YeonCheon, JeCheon, GoRyeong, GyeongJu, PyeongChang) and obtained from 1,629 individuals in Pan-Asia (70 population) data set. In addition, we analyzed considerable number of CNVs typed from 16 pairs of twins in Korea.

In our study we were able to identify several informative SNP markers that are valuable to distinguish Korean from other ethnic groups. In addition, the investigation of the distribution of identity by descent (IBD) distance within a large Korean family provided a way to examine relationship between individuals. Another interesting finding resulted from this study include the differences in CNV patterns between identical twins. Possible application of genotype data to figuring out individual phenotypes (such as pigmentation, eye color, hair color, height, blood type, etc.) would be an additional profit obtained from this study in the hope to montage or even identify individuals (such as criminal suspects) using genotype data in the near future.

In this presentation, I will discuss results generated from this study which may represent the most comprehensive characterization to date for the Korean genome.

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,211
On SlideShare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
26
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Perspectives of identifying Korean genetic variations

  1. 1. Perspectives of identifying Korean genetic variations Chang Bum Hong Center for Genome Science KNIH, KCDC Dec 1, 2009
  2. 2. Contents • Population Structure Based on SNP Genotypes • SNP Based Kinship • Identify Monozygotic Twins using CNV • SNP Based Physical Traits
  3. 3. Population Structure Based on SNP Genotypes Hello
  4. 4. Objectives • Basic study of Korean population stratification • Evidence of gene flow between Korean and neighbor country • Informative marker of east asian East Asia
  5. 5. 0.1 % difference btw individuals > 10M SNPs in population Body mass index Waist-hip ratio Height Blood pressure Pulse rate Bone density
  6. 6. Confounding in genetic studies
  7. 7. East Asia - Public genotype data SNP Individual Population PASNP 54,794 1,928 75 HGDP a 2,834~ 1,056 52 HapMap 1,481,135 1,397 11 b SGVP 268,667 292 3 Korean 58,625 159 10 China(Yanbian) 58,625 16 1 Japan(Kobe) 58,625 5 1 Korea-Japan 58,625 6 1 Vietnam 58,625 16 1 Korean-Vietnam 58,625 8 1 Cambodia 58,625 16 1 Mongol 58,625 16 1 a. Pan-Asian SNP Consortium(http://www4a.biotec.or.th/PASNP) b. Singapore Genome Variation Project(http://www.nus-cme.org.sg/SGVP)
  8. 8. HGDP(Human Genome Diversity Project)
  9. 9. PASNP(Pan-Asian SNP Consortium)
  10. 10. Korean Data 16 YeonCheon 16 Pyeong Chang MW JeCheon 16 16 Cheonan average >70 year old long settlement Affymetrix 50K Xba GyeongJu 16 16 GimJe 15 China(Yanbian) Goryeong UlSan Japan(Kobe) 16 Korea-Japan Vietnam Korean-Vietnam SW 16 NaJu SE Cambodia Mongol 16 58,960 SNPs Jeju
  11. 11. Quality Control 58,960 SNPs n = 242 (Korean n = 159) 2 join HapMap CHB join HapMap JPT autosomal 1 54,794 SNPs n=367(230+137) 26,189 SNPs n=480(367+113) 25,796 SNPs high missing individual gentoype call rate (>3%, mind 0.03) high missing genotype call rate (>4%, geno 0.04) low MAF(<0.0.1, maf 0.01) hardy-weinberg test (p < 1x10-6, hwe 0.000001) n = 230(Korean n = 153) 46,559 SNPs
  12. 12. Missing genotype individuals GimJe GoRyeong Gyeong Ju Before QC 58,960 SNPs Before QC 58,960 SNPs All Asian Korean
  13. 13. SNP Individual QC Korean 46,559 159 153 China(Yanbian) 46,559 16 16 Japan(Kobe) 46,559 5 2 Korea-Japan 46,559 6 4 Vietnam 46,559 16 16 Korean-Vietnam 46,559 8 8 Cambodia 46,559 16 16 Mongol 46,559 16 15 Total 242 230 Quality Control All Asian
  14. 14. Relatedness between the 153 Korean(10 region) Individuals YeonCheon PyeongChang JeCheon CheonAn GyeongJu UlSan GimJe GoRyeong NaJu JeJu PCA analysis using autosomal 46,559 SNP markers (n=153, Korean)
  15. 15. LD-based SNP Pruning Generate subset of SNPs that are in approximate linkage equilibrium Sliding window 50 SNPs and calculate LD 2 Select representative SNPs which have low LD(R ≤ 0.2) 50 SNPs 50 SNPs 5 SNPs 5 SNPs 5 SNPs First Step Second Step
  16. 16. PCA using Pruned SNPs PCA analysis using PCA analysis using pruned 46,559 SNP markers (n=153) 23,290 SNP markers (n=153)
  17. 17. Fst of population Fst(Fixation index): measure of the genetic differentiation(allele frequency) over subpopulation Tishoff SA and Kidd KK.(2004). Nature Genetics Suplement 36:S21-S27. 0 ≤ Fst ≤ 0.05: 무시할 정도 Fst ≥ 0.25: 유전적 분화의 정도가 큼 Fst = 1: 완전히 고립
  18. 18. Paired Fst values for Korean Population Groups 0 ≤ Fst ≤ 0.05: 무시할 정도 Fst ≥ 0.25: 유전적 분화의 정도가 큼 Fst = 1: 완전히 고립
  19. 19. Differences between Korea(9 Region) and Jeju SNPs Showing Significant Differences in Genotype Frequencies between Korea and Jeju a b SNPs for which P values less than 10-3 are listed a. p values for the Cochran-Armitage trend test of genotype frequencies b. The KARE are indicated
  20. 20. Substructure of East Asian descent YanBian Mongol Korea Kobe Vietnam Korea-Vietnam Korea-Japan Cambodia PCA analysis using 46,559 SNP markers (n=230)
  21. 21. International HapMap HapMap 3 Release 3 POP Num_samples Num_SNPs_QC Num_SNPs_QC_poly ------------------------------------------------------------------- ASW 87 1623986 1543115 CEU 165 1623122 1397814 CHB 137 1626122 1341772 CHD 109 1620198 1311767 GIH 101 1630857 1408904 JPT 113 1634041 1294406 LWK 110 1625159 1526783 MEX 86 1604948 1453054 MKK 184 1611733 1532002 TSI 102 1632607 1419970 YRI 203 1625669 1493761
  22. 22. Substructure with HapMap YanBian Vietnam Mongol Korea-Vietnam Jeju Korea-Japan Kobe Cambodia JPT-HapMap CHB-HapMap PCA analysis using 25,796 SNPs(n = 480) PCA analysis using pruned 8,347 SNPs(n = 480)
  23. 23. PCA analysis of East Asian descent Mongol Yanbian Kobe JPT- Jeju HapMap CHB- HapMap Vietnam Cambodia illustration of geographic correspondence of ethnic group Korea-Vietnam Korea-Japan locations
  24. 24. Relationship between Eigenvector values and Latitude 47.81 39.98 37.53 2 R = 0.8621 y = 36.65 + 166.33x 14.72
  25. 25. EAS-AIMs(Ancestry Informative Marker) Calculate ln value using infocalc 1) All population(KOR, CHB, JPT, MON, CAM): top 300 SNPs 2) Korean and Japanese: top 900 SNPs 3) Korean and Chinese: top 900 SNPs 4) Korean and Vietnam: top 900 SNPs 3,000 East Asian Ancestry Informative Markers Best performance 1,500 SNP using PCA
  26. 26. 3,000 East Asian AIM List of East Asian Ancestry Informative Markers a a. All Asian(Korea, China, Vietnam, Cambodia, Mongol) In, informativeness for assignment Ia, informative for ancestry coefficients ORCA, optimal rate for correct assignment
  27. 27. AIM Sets for determining East Asia PCA analysis using 1500 AIMs PCA analysis using 1500 Random SNPs
  28. 28. KDGV(Korean Database of Genomic Variants) http://ksnp.cdc.go.kr
  29. 29. WiKi Based SNP Annotation A B A, Human Genome Diversity Project. B, SNP information with allele frequency
  30. 30. SNP Based Kinship
  31. 31. Identity-by state(IBS) sharing Exclude individuals from pairs of samples identified as cryptic first degree relatives(parent-offspring, twins, or siblings concordant for phenotype) or more distant 2 relationships if clusters were linked by a first-degree relative(Science, 2007) Individual 1 A/C G/T A/G A/A G/G Individual 2 C/C T/T A/G C/C G/G IBS 1 1 2 0 2 Pair from same population
  32. 32. Identical twin Cryptic First or degree redundant relatives samples autosomal 60,959 SNPs (n=608, unrelated individuals + 5 families)
  33. 33. IBS value in Korean large family 삼촌-조카 조부모-손자 형제
  34. 34. Identify Monozygotic Twins using CNV
  35. 35. Twins CNV(Copy Number Variation) 24 families(24 monozygotic twins and their parent or brothers) Agilent Human CNV Microarray 244K X 2 array twin gain loss parent Region: chr1
  36. 36. Region: chr2
  37. 37. Region: chrX
  38. 38. SNP Based Physical Traits
  39. 39. SNPedia & Promethease SNPedia http://www.snpedia.com April 16, 2009 in Seoul
  40. 40. Promethease Report
  41. 41. Pictures of Lilly: 23andMe Contest
  42. 42. Thank you
  43. 43. Questions? Hong ChangBum Center for genome Science NIH, KCDC http://cgs.cdc.go.kr http://ksnp.cdc.go.kr

×