Yingrui Li: Complete Solutions for Now-Generation Bioinformatics

1,786 views
1,639 views

Published on

Yingrui Li (BGI-Shenzhen) Beyond the Genome talk on "Complete Solutions for Now-Generation Bioinformatics". September 19th, 2011

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,786
On SlideShare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
15
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Yingrui Li: Complete Solutions for Now-Generation Bioinformatics

  1. 1. Heading for full solution to Now Generation Informatics<br />BGI-Shenzhen<br />Sep 19, 2011<br />
  2. 2. Nothing in biology makes sense except in light of evolutionTheodosius Dobzhansky<br />“Tree” type of thinking of Genomics<br />They are different, they are also related<br />
  3. 3. What is the scope of bioinformatics?<br />Bioinformatics is to understand the tree of life.<br />Bioinformatics will:<br />Draw trees (basic information)<br />Map information on trees (association/cause-effect)<br />Show the trees (visualizations, databases, clouds)<br />
  4. 4. Mission 1: Tree of Species<br />A set of different genes (sequence) made different forms of life<br />
  5. 5. Mission 1: Tree of Species<br />Draw<br />De novo genome assembly<br />Multiple sequence mapping and alignment<br />Phylogenic tree construction<br />Map<br />In-depth Annotation<br />Comparative genomicss<br />Show<br />Genome browsers<br />
  6. 6. Dinner<br /> “taste good, sequence it!”<br />Peking Duck<br />cucumber<br />Cabbage<br />kung pao chicken<br />Mapotoufu<br />oyster<br />
  7. 7. Factory<br />Silk and silkworm<br />Oil and castor bean<br />“Useful, sequence it!”<br />Cloth and cotton<br />
  8. 8. Zoo<br />“look cute, sequence it!”<br />Panda<br />Polar bear and Penguin<br />Antelope <br />
  9. 9. Misson 2: Tree of Individuals<br />A set of different variations (sequence) made different individuals/cells of Human<br />
  10. 10. An Evolutionary perspective<br /><ul><li>The oldest human alleles originated in Africa well before the diasporas of modern humans 50,000 – 60,000 years ago.
  11. 11. These oldest alleles are common in all populations worldwide.
  12. 12. Approximately 90% of the variability in allele frequencies is of this sort.</li></ul>From Mary-Claire King<br />
  13. 13. International project to construct a next generation baseline data set for human genetics<br />Sequence level HapMap, an order of magnitude deeper <br />Consortium with multiple centres, platforms, funders<br />Aims<br />Find >95% accessible SNPs at allele frequencies above 1%, down towards 0.1% in coding regions<br />Genotype them and place on haplotype backgrounds<br />Also discover and characterize indels, structural variants<br />
  14. 14. An Evolutionary perspective<br /><ul><li>Germlinede novo substitution rate =~ 1 x10-8 per generation
  15. 15. Somatic/LCL substitution rate = 7-12x higher than germline rate
  16. 16. Male mutation rate ~7x higher than female mutation rate</li></ul>From 1000G Project<br />From Mary-Claire King<br />Development of agriculture in the past 10,000 years and of urbanization and industrialization in the past 700 years has led to rapid populations growth and therefore to the appearance of vast numbers of new alleles, each individually rare and specific to one population or even to one family.<br />
  17. 17. What’s the whole picture of genetic variants ?<br />Billion Genomes <br />Project<br />Personal genomics with <br />phenotype information<br />Allele Frequency<br />50%<br /> 5%<br />0.5%<br />0.05%<br />Rarer Alleles<br />Stronger Effects<br />Common Alleles<br />Less Effects<br />Very Rare Alleles<br />Strongest Effects<br />Eg: CFTR delta 508 PCSK9 C679X<br />Eg: MC4R, ABCA1 1q21.1 in SCZ<br />Common/rare Disease<br />Mendelian Disease<br />
  18. 18. If selection goes another direction…lesson from the domesticated animal/plant<br />The history of silkworm domestication<br />D Domesticated<br />W wild <br />Silkworm domestication history<br />Silkworm phylogenetic tree<br /><ul><li>relationship is not simply follow the geographic distribution which reflect gene-flow and other population level processes related to human activities such as ancient commercial trade
  19. 19. domestication event lead to a 90% reduction in effective population size during the initial bottleneck</li></ul>Published in Science 16 Oct. <br />
  20. 20. from Andersson and Georges, Nature Reviews of Genetic5: 202-212 (2004)<br />selective sweep: inheritance of regions around adaptive alleles<br />extent of selective sweep for domestication in MAIZE: tb1 locus (60 to 90-kb) (Clark et al. 2004), Y1 locus (about 600-kb) (Palaisa et al. 2004) <br />
  21. 21. Domestication<br />Genome variation during silkworm domestication<br />354 candidate domesticated genes<br />159 tissue-specific expressed (silk gland, midgut, testis)<br />Published in Science 16 Oct. <br />
  22. 22. 50 Tibetan’s and 40 Han’s exomes has been sequenced<br />Function further validated in<br /><ul><li>Association with blood hemoglobin level
  23. 23. Expression level difference in placenta</li></ul>EPAS1: endothelial Per-Arnt-Sim (PAS) domain protein 1<br />The signal of selection<br />The gene (EPAS1) showing strongest selection signal (up to 80% frequency change in allele distribution), Han: 9%; Tibetan: 87%<br />
  24. 24. Your Micro-Environment, Your other genome?<br />
  25. 25. PCA analysis for 85 Danish samples (based on gene profiling)<br />BMI data<br />Gene level<br />
  26. 26. Misson 2. Tree of Individuals<br />Draw<br />(Complete spectrum of) variation identification<br />Population frequencies and spectrums<br />Map<br />Selection and evolution<br />Phenotypic traits<br />Intermediate phenotypes<br />
  27. 27. Misson 3: Tree of Cells<br />Cell lineages are characterized by single biological levels and their inter-correlations.<br />
  28. 28. On DNA<br />Differentiate the cancer and normal cells by PCA analysis<br />ET<br />AML<br />+ : cancer<br />*: normal<br />*:cells possibly mixed <br />(from tumor, but clustered to normal cells)<br />these cancers are really heterogeneous.<br />BTCC<br />
  29. 29. Phylogenetic trees clearly show subpopulations in ET and AML cancers <br />ET<br />AML<br />Essential Thrombocythemia<br />Acute Myeloid Leukemia<br />
  30. 30. Inferring key genes in AML (a typical heterozygous cancer)<br />Key Gene?<br />Key Gene for sub-pop?<br />Consensus Tree<br />
  31. 31. Key genes for AML<br />MLL<br />ALK<br />G1~G6: different subpopulations from AML cancer <br />MLL: myeloid/lymphoid or mixed-lineage leukemia, recurrent translocations in acute leukemias that may be characterized as acute myeloid leukemia (AML; MIM 601626), acute lymphoblastic leukemia (ALL), or mixed lineage (biphenotypic) leukemia (MLL). <br />
  32. 32. LILRA1<br />G1~G6: different subpopulations from AML cancer <br />LILRA1: leukocyte immunoglobulin-like receptor <br />Inferring key genes in AML (a typical heterozygous cancer)<br />
  33. 33. CTNNA1<br />G1~G6: different subpopulations from AML cancer <br />CTNNA1:Leukocyte transendothelial migration; Pathways in cancer <br />Inferring key genes in AML (a typical heterozygous cancer)<br />
  34. 34. CTSS<br />G1~G6: different subpopulations from AML cancer <br />CTSS: cathepsin<br />Inferring key genes in AML (a typical heterozygous cancer)<br />
  35. 35. PPP2R1A<br />G1~G6: different subpopulations from AML cancer <br />PPP2R1A: TGF-beta signaling pathway <br />Inferring key genes in AML (a typical heterozygous cancer)<br />
  36. 36. DIAPH1<br />G1~G6: different subpopulations from AML cancer <br />DIAPH1: Focal adhesion; Regulation of actin cytoskeleton <br />Inferring key genes in AML (a typical heterozygous cancer)<br />
  37. 37. LILRA1<br />G1~G6: different subpopulations from AML cancer <br />LILRA1: leukocyte immunoglobulin-like receptor <br />Inferring key genes in AML (a typical heterozygous cancer)<br />
  38. 38.
  39. 39. 3. Tree of cells<br />Draw<br />Single-cell information acquisition technologies<br />Map<br />Single-cell metrics measurement technologies<br />
  40. 40. Integrating DNA variation, molecular traits, and phenotypes to construct causal gene networks<br />Gene works in a network!<br />
  41. 41.
  42. 42. Finally: Where are the papers?<br />On what paper you draw and map and show?<br />It is harder and harder to find a platform efficient enough<br />Sample house<br />High-throughput biology<br />Capable computing system with high I/O performance<br />Interlinked database and standardized formats<br />Bioinformatics workflows to perform in silico analysis on data<br />
  43. 43. Making data PUBLIC!<br />Does not mean making data downloadable in theory<br />Does mean the public could make use of data<br />New types of databases with operations to the data are required<br />New academic credit system to motivate high-quality easy-to-access datasets.<br />http://www.gigasciencejournal.com<br />http://climb.genomics.cn<br />
  44. 44. Acknowledgements<br />Great International Efforts<br />The Genome 10K Consortium<br />The 1000 Genomes Project Consortium<br />The 1000 Plant Genomes Project Consortium<br />The 5000 insects Project Consortium (pending)<br />BGI Initiatives and collaboration framework<br />The 1000 Plant and Animal Genomes Project<br />The 10K Microbial Genomes Project<br />http://ldl.genomics.org.cn<br />
  45. 45. Acknowledgements<br />Prof. Rasmus Nielson’s lab in UC Berkeley and in University of Copenhagen<br />Prof. Richard Durbin’s lab in Wellcome Trust Sanger Insititute<br />Prof. Tak-Wah Lam and Siu-Ming Yiu’s lab in Department of Computer Sciences, Hong Kong University<br />Dr. Heng Li in Broad Insititute<br />…<br />

×