Your SlideShare is downloading. ×
20110919 beyond genome
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

20110919 beyond genome

309
views

Published on

Li Yingrui Talk at the Beyond the Genome Meeting 2011. "Heading for a Full Solution to Now-Generation Bioinformatics" Covers BGI's missions using "tree" view of genome analysis for discovery.

Li Yingrui Talk at the Beyond the Genome Meeting 2011. "Heading for a Full Solution to Now-Generation Bioinformatics" Covers BGI's missions using "tree" view of genome analysis for discovery.

Published in: Education, Technology

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
309
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Heading for full solution to Now Generation Informatics
    BGI-Shenzhen
    Sep 19, 2011
  • 2. Nothing in biology makes sense except in light of evolutionTheodosius Dobzhansky
    “Tree” type of thinking of Genomics
    They are different, they are also related
  • 3. What is the scope of bioinformatics?
    Bioinformatics is to understand the tree of life.
    Bioinformatics will:
    Draw trees (basic information)
    Map information on trees (association/cause-effect)
    Show the trees (visualizations, databases, clouds)
  • 4. Mission 1: Tree of Species
    A set of different genes (sequence) made different forms of life
  • 5. Mission 1: Tree of Species
    Draw
    De novo genome assembly
    Multiple sequence mapping and alignment
    Phylogenic tree construction
    Map
    In-depth Annotation
    Comparative genomicss
    Show
    Genome browsers
  • 6. Dinner
    “taste good, sequence it!”
    Peking Duck
    cucumber
    Cabbage
    kung pao chicken
    Mapotoufu
    oyster
  • 7. Factory
    Silk and silkworm
    Oil and castor bean
    “Useful, sequence it!”
    Cloth and cotton
  • 8. Zoo
    “look cute, sequence it!”
    Panda
    Polar bear and Penguin
    Antelope
  • 9. Misson 2: Tree of Individuals
    A set of different variations (sequence) made different individuals/cells of Human
  • 10. An Evolutionary perspective
    • The oldest human alleles originated in Africa well before the diasporas of modern humans 50,000 – 60,000 years ago.
    • 11. These oldest alleles are common in all populations worldwide.
    • 12. Approximately 90% of the variability in allele frequencies is of this sort.
    From Mary-Claire King
  • 13. International project to construct a next generation baseline data set for human genetics
    Sequence level HapMap, an order of magnitude deeper
    Consortium with multiple centres, platforms, funders
    Aims
    Find >95% accessible SNPs at allele frequencies above 1%, down towards 0.1% in coding regions
    Genotype them and place on haplotype backgrounds
    Also discover and characterize indels, structural variants
  • 14. An Evolutionary perspective
    • Germlinede novo substitution rate =~ 1 x10-8 per generation
    • 15. Somatic/LCL substitution rate = 7-12x higher than germline rate
    • 16. Male mutation rate ~7x higher than female mutation rate
    From 1000G Project
    From Mary-Claire King
    Development of agriculture in the past 10,000 years and of urbanization and industrialization in the past 700 years has led to rapid populations growth and therefore to the appearance of vast numbers of new alleles, each individually rare and specific to one population or even to one family.
  • 17. What’s the whole picture of genetic variants ?
    Billion Genomes
    Project
    Personal genomics with
    phenotype information
    Allele Frequency
    50%
    5%
    0.5%
    0.05%
    Rarer Alleles
    Stronger Effects
    Common Alleles
    Less Effects
    Very Rare Alleles
    Strongest Effects
    Eg: CFTR delta 508 PCSK9 C679X
    Eg: MC4R, ABCA1 1q21.1 in SCZ
    Common/rare Disease
    Mendelian Disease
  • 18. If selection goes another direction…lesson from the domesticated animal/plant
    The history of silkworm domestication
    D Domesticated
    W wild
    Silkworm domestication history
    Silkworm phylogenetic tree
    • relationship is not simply follow the geographic distribution which reflect gene-flow and other population level processes related to human activities such as ancient commercial trade
    • 19. domestication event lead to a 90% reduction in effective population size during the initial bottleneck
    Published in Science 16 Oct.
  • 20. from Andersson and Georges, Nature Reviews of Genetic5: 202-212 (2004)
    selective sweep: inheritance of regions around adaptive alleles
    extent of selective sweep for domestication in MAIZE: tb1 locus (60 to 90-kb) (Clark et al. 2004), Y1 locus (about 600-kb) (Palaisa et al. 2004)
  • 21. Domestication
    Genome variation during silkworm domestication
    354 candidate domesticated genes
    159 tissue-specific expressed (silk gland, midgut, testis)
    Published in Science 16 Oct.
  • 22. 50 Tibetan’s and 40 Han’s exomes has been sequenced
    Function further validated in
    • Association with blood hemoglobin level
    • 23. Expression level difference in placenta
    EPAS1: endothelial Per-Arnt-Sim (PAS) domain protein 1
    The signal of selection
    The gene (EPAS1) showing strongest selection signal (up to 80% frequency change in allele distribution), Han: 9%; Tibetan: 87%
  • 24. Your Micro-Environment, Your other genome?
  • 25. PCA analysis for 85 Danish samples (based on gene profiling)
    BMI data
    Gene level
  • 26. Misson 2. Tree of Individuals
    Draw
    (Complete spectrum of) variation identification
    Population frequencies and spectrums
    Map
    Selection and evolution
    Phenotypic traits
    Intermediate phenotypes
  • 27. Misson 3: Tree of Cells
    Cell lineages are characterized by single biological levels and their inter-correlations.
  • 28. On DNA
    Differentiate the cancer and normal cells by PCA analysis
    ET
    AML
    + : cancer
    *: normal
    *:cells possibly mixed
    (from tumor, but clustered to normal cells)
    these cancers are really heterogeneous.
    BTCC
  • 29. Phylogenetic trees clearly show subpopulations in ET and AML cancers
    ET
    AML
    Essential Thrombocythemia
    Acute Myeloid Leukemia
  • 30. Inferring key genes in AML (a typical heterozygous cancer)
    Key Gene?
    Key Gene for sub-pop?
    Consensus Tree
  • 31. Key genes for AML
    MLL
    ALK
    G1~G6: different subpopulations from AML cancer
    MLL: myeloid/lymphoid or mixed-lineage leukemia, recurrent translocations in acute leukemias that may be characterized as acute myeloid leukemia (AML; MIM 601626), acute lymphoblastic leukemia (ALL), or mixed lineage (biphenotypic) leukemia (MLL).
  • 32. LILRA1
    G1~G6: different subpopulations from AML cancer
    LILRA1: leukocyte immunoglobulin-like receptor
    Inferring key genes in AML (a typical heterozygous cancer)
  • 33. CTNNA1
    G1~G6: different subpopulations from AML cancer
    CTNNA1:Leukocyte transendothelial migration; Pathways in cancer
    Inferring key genes in AML (a typical heterozygous cancer)
  • 34. CTSS
    G1~G6: different subpopulations from AML cancer
    CTSS: cathepsin
    Inferring key genes in AML (a typical heterozygous cancer)
  • 35. PPP2R1A
    G1~G6: different subpopulations from AML cancer
    PPP2R1A: TGF-beta signaling pathway
    Inferring key genes in AML (a typical heterozygous cancer)
  • 36. DIAPH1
    G1~G6: different subpopulations from AML cancer
    DIAPH1: Focal adhesion; Regulation of actin cytoskeleton
    Inferring key genes in AML (a typical heterozygous cancer)
  • 37. LILRA1
    G1~G6: different subpopulations from AML cancer
    LILRA1: leukocyte immunoglobulin-like receptor
    Inferring key genes in AML (a typical heterozygous cancer)
  • 38.
  • 39. 3. Tree of cells
    Draw
    Single-cell information acquisition technologies
    Map
    Single-cell metrics measurement technologies
  • 40. Integrating DNA variation, molecular traits, and phenotypes to construct causal gene networks
    Gene works in a network!
  • 41.
  • 42. Finally: Where are the papers?
    On what paper you draw and map and show?
    It is harder and harder to find a platform efficient enough
    Sample house
    High-throughput biology
    Capable computing system with high I/O performance
    Interlinked database and standardized formats
    Bioinformatics workflows to perform in silico analysis on data
  • 43. Making data PUBLIC!
    Does not mean making data downloadable in theory
    Does mean the public could make use of data
    New types of databases with operations to the data are required
    New academic credit system to motivate high-quality easy-to-access datasets.
    http://www.gigasciencejournal.com
    http://climb.genomics.cn
  • 44. Acknowledgements
    Great International Efforts
    The Genome 10K Consortium
    The 1000 Genomes Project Consortium
    The 1000 Plant Genomes Project Consortium
    The 5000 insects Project Consortium (pending)
    BGI Initiatives and collaboration framework
    The 1000 Plant and Animal Genomes Project
    The 10K Microbial Genomes Project
    http://ldl.genomics.org.cn
  • 45. Acknowledgements
    Prof. Rasmus Nielson’s lab in UC Berkeley and in University of Copenhagen
    Prof. Richard Durbin’s lab in Wellcome Trust Sanger Insititute
    Prof. Tak-Wah Lam and Siu-Ming Yiu’s lab in Department of Computer Sciences, Hong Kong University
    Dr. Heng Li in Broad Insititute