A consortium of 440 scientists, 32
           laboratories
   Sucheta Tripathy, IICB, 17th Sept. 2012
   http://www.nature.com/encode/
   http://www.encodeproject.org/ENCODE/
   http://www.factorbook.org/
   http://encodeproject.org/ENCODE/dataStand
    ards.html
   http://1000genomes.org
   http://genome.ucsc.edu/ENCODE/
http://www.gencodegenes.org/data.html
Characterization
                                              of intergenic
                                              region and gene
                                              definition




http://homes.gersteinlab.org/people/rar62/subwaymap/SubwayMap8
_16_12.pdf
http://homes.gersteinlab.org/people/rar62/subwaymap/SubwayMap
NHGRI
                            Solicited           RFAs were
                First
                              pilot             sought for
              Publicat
                            proposal               full
               ion in
                          for ENCODE             ENCODE
               2000



 In October                         GWAS    -
1990 Human               Finished   90% lies    First Report
                                                                ENCODE
  Genome                 paper in   outside      on Encode
                                                               published
   project                 2003     coding       Published
                                        2005                     2012
   started                                        in 2007
http://www.nature.
com/nature/journa
l/v489/n7414/full
/489049a.html
Treasure Hunt?




It is like google map says Eric Lander : Map of earth
from outer space
   95% of the genome is “junk”.
    ◦ 2.94% of the genome is coding
   cis regulatory elements occur within a
    limited genome distance.
   Most of the genome is transposable
    elements that are of obscure origin are
    dying.
   Transcribed elements are most often
    translated than not.
   80% of the human genome is active!!
    ◦ 70,000 promoters and 400,000 enhancers
   75% of the genome transcribed in some
    tissue or other during life time.
   Environment plays great role in switching on
    or off of a lot many genes. [Epigenetics]
   Most of the diseases don‟t lie with the genes
    but the switches!!
   Dark matters controlling the genes are
    physically close to the genes they control.
   Genes and the switches don‟t hold one to one
    relationship!
   4 million switches controlling 21,000 genes!!
   Identical twins are NOT identical – greatly
    influenced by environments.

   Astronomy and genetic Biology looks
    similar(95% of the Universe is called as dark
    matter – we don‟t understand)
   “This explains why 6.5 billion people on earth
    don‟t look alike”..
   Intelligent Design (Creationism) believers are
    excited that it is handiwork of God.
   Natural selectionists (Darwinists) excited that
    natural selection at its best.
    ◦ This has raged a war between democrats and
      republicans as usual.
   Junk DNA is an “Oxymoron”.
   Some are still wondering about the remaining
    20%.
   „I hope this information stirs the mind of
    those researchers that have ignored "trace
    minerals" in food as part of the nutritional
    package‟.
   The more we think we are close to finding an
    answer – the far we find ourselves. Reminds
    me of Aristotle Who once said “The more you
    know, the more you know you don't know”
   Most part of DNA was considered “Garbage”
    but later upgraded to “junk”.
   Most people are actually happy because it is
    happening during their “life time”.
   Switches are software and genes are
    hardware.
   Ancient Egyptians considered “torso” has a
    divine role and discarded grey matter in head
    as “junk”.
   Sean Eddy “At least 40% of the human genome is
    composed of the decaying DNA remains of transposable
    elements (TEs), different species of which have
    replicated in great waves during the evolution of our
    genome.”
   “I sure wish I‟d gotten the memo, because this week a
    collaboration of labs led by myself, Arian Smit, and
    Jerzy Jurka just released a new data resource that
    annotates nearly 50% of the human genome as
    transposable element-derived, and transposon-derived
    repetitive sequence is the poster child for what we
    colloquially call “junk DNA”.”


   http://cryptogenomicon.org/
PLoS Biol.
2011
April; 9(4):
e1001046
.
PLoS Biol.
2011 April;
9(4):
e1001046.
PLoS Biol.
2011 April;
9(4):
e1001046.
The Cell Types
Cell Type          Tier   Description                    Source


GM12878            1      B-Lymphoblastoid cell line     Coriell GM12878



                          Chronic
K562               1      Myelogenous/Erythroleukemia ATCC CCL-243
                          cell line



                          Human Embryonic Stem Cells,    Cellular Dynamics
H1-hESC            1
                          line H1                        International



HepG2              2      Hepatoblastoma cell line       ATCC HB-8065



HeLa-S3            2      Cervical carcinoma cell line   ATCC CCL-2.2




                          Human Umbilical Vein
HUVEC              2                                     Lonza CC-2517
                          Endothelial Cells


                                                                             PLoS Biol.
Various (Tier 3)   3
                          Various cell lines, cultured
                          primary cells, and primary     Various
                                                                             2011
                          tissues                                            April; 9(4):
                                                                             e1001046
                                                                             .
   DNAseI -> Transcription factor binding sites
    (2.9 million sites, 1/3 rd in one cell type and
    remaining in others)
   Chip-seq -> sequence transcription factor
    and histone binding sites (HeLA and
    GM12878 – qualified to be called as new
    species)
   5C technology -> Finding proximity between
    regulatory and regulated regions
   High density 5 bp tiling DNA micro arrays
   Cap Analysis of Gene Expression
   Paired-End diTag (PET)
   Reduced Representation Bisulphite
    Sequencing (RRBS)
   33.45% exon and 66.55% intron.
   62% of the genome is transcribed
    reproducibly.
   231 MB of genome has protein binding sites.
    ◦ 80% of which are low affinity sites
      (http://www.factorbook.org/)
    ◦ Many are highly conserved cell selective type
   96% of the CpG exhibited differential
    methylation pattern.
   GWAS SNPs had overlaps with ENCODE
    elements.
   Chromosome confirmation capture carbon
    copy(5C)
    ◦ 1% of the genome is distally regulated (>1000 bp)
    ◦ On an average 3.9 distal elements interacted with
      TSS.
    ◦ Distance could be several KBs to MBs
   cis-regulatory elements - Enhancers,
    promoters, insulators, silencers.
   2.9 million DHS encompassing 125 diverse
    cell and tissue types.
   20-50 bp length DHS mapped uniquely to
    86.9% of genome
    ◦   580,000 distal DHS with target promoters
    ◦   3% lie in TSS
    ◦   5% lie within 2.5 KB of TSS
    ◦   95% lie distally (introns and intergenic regions)
    ◦   Strongly enriched in LTRs
   3/4th of genome is capable of transcription –
    redefine concept of gene?
    ◦ 62.1% AND 74.7% are processed or primary
      transcripts.
    ◦ 10-12 expressed isoforms per gene per cell.
    ◦ Coding and non-coding transcripts are localized in
      cytoplasm and nucleus respectively.
    ◦ 6% of the coding and non-coding transcripts
      overlap with small RNAs – precursors?
    ◦ Most of the novel transcripts lacked protein coding
      ability.
   Mapping job is only half done.
   Characterizing everything a genome does is
    10% done.
   Finding Network of switches for genes.
   A number of correlations…..
   Where does gene therapy go from here?
   Our fundamental understanding of genes as
    the functional units are flawed??
   Epigenetics becomes the key player…
   Gives impetus to holistic approach in treating
    a disease.

   Do we still believe that human genome is
    most efficient?

Human encodeproject

  • 1.
    A consortium of440 scientists, 32 laboratories Sucheta Tripathy, IICB, 17th Sept. 2012
  • 2.
    http://www.nature.com/encode/  http://www.encodeproject.org/ENCODE/  http://www.factorbook.org/  http://encodeproject.org/ENCODE/dataStand ards.html  http://1000genomes.org  http://genome.ucsc.edu/ENCODE/
  • 3.
  • 4.
    Characterization of intergenic region and gene definition http://homes.gersteinlab.org/people/rar62/subwaymap/SubwayMap8 _16_12.pdf
  • 5.
  • 6.
    NHGRI Solicited RFAs were First pilot sought for Publicat proposal full ion in for ENCODE ENCODE 2000 In October GWAS - 1990 Human Finished 90% lies First Report ENCODE Genome paper in outside on Encode published project 2003 coding Published 2005 2012 started in 2007
  • 7.
  • 8.
    Treasure Hunt? It islike google map says Eric Lander : Map of earth from outer space
  • 9.
    95% of the genome is “junk”. ◦ 2.94% of the genome is coding  cis regulatory elements occur within a limited genome distance.  Most of the genome is transposable elements that are of obscure origin are dying.  Transcribed elements are most often translated than not.
  • 10.
    80% of the human genome is active!! ◦ 70,000 promoters and 400,000 enhancers  75% of the genome transcribed in some tissue or other during life time.  Environment plays great role in switching on or off of a lot many genes. [Epigenetics]  Most of the diseases don‟t lie with the genes but the switches!!  Dark matters controlling the genes are physically close to the genes they control.
  • 11.
    Genes and the switches don‟t hold one to one relationship!  4 million switches controlling 21,000 genes!!  Identical twins are NOT identical – greatly influenced by environments.  Astronomy and genetic Biology looks similar(95% of the Universe is called as dark matter – we don‟t understand)
  • 12.
    “This explains why 6.5 billion people on earth don‟t look alike”..  Intelligent Design (Creationism) believers are excited that it is handiwork of God.  Natural selectionists (Darwinists) excited that natural selection at its best. ◦ This has raged a war between democrats and republicans as usual.  Junk DNA is an “Oxymoron”.  Some are still wondering about the remaining 20%.
  • 13.
    „I hope this information stirs the mind of those researchers that have ignored "trace minerals" in food as part of the nutritional package‟.  The more we think we are close to finding an answer – the far we find ourselves. Reminds me of Aristotle Who once said “The more you know, the more you know you don't know”
  • 14.
    Most part of DNA was considered “Garbage” but later upgraded to “junk”.  Most people are actually happy because it is happening during their “life time”.  Switches are software and genes are hardware.  Ancient Egyptians considered “torso” has a divine role and discarded grey matter in head as “junk”.
  • 15.
    Sean Eddy “At least 40% of the human genome is composed of the decaying DNA remains of transposable elements (TEs), different species of which have replicated in great waves during the evolution of our genome.”  “I sure wish I‟d gotten the memo, because this week a collaboration of labs led by myself, Arian Smit, and Jerzy Jurka just released a new data resource that annotates nearly 50% of the human genome as transposable element-derived, and transposon-derived repetitive sequence is the poster child for what we colloquially call “junk DNA”.”  http://cryptogenomicon.org/
  • 16.
  • 17.
  • 18.
  • 19.
    The Cell Types CellType Tier Description Source GM12878 1 B-Lymphoblastoid cell line Coriell GM12878 Chronic K562 1 Myelogenous/Erythroleukemia ATCC CCL-243 cell line Human Embryonic Stem Cells, Cellular Dynamics H1-hESC 1 line H1 International HepG2 2 Hepatoblastoma cell line ATCC HB-8065 HeLa-S3 2 Cervical carcinoma cell line ATCC CCL-2.2 Human Umbilical Vein HUVEC 2 Lonza CC-2517 Endothelial Cells PLoS Biol. Various (Tier 3) 3 Various cell lines, cultured primary cells, and primary Various 2011 tissues April; 9(4): e1001046 .
  • 20.
    DNAseI -> Transcription factor binding sites (2.9 million sites, 1/3 rd in one cell type and remaining in others)  Chip-seq -> sequence transcription factor and histone binding sites (HeLA and GM12878 – qualified to be called as new species)  5C technology -> Finding proximity between regulatory and regulated regions  High density 5 bp tiling DNA micro arrays
  • 21.
    Cap Analysis of Gene Expression  Paired-End diTag (PET)  Reduced Representation Bisulphite Sequencing (RRBS)
  • 22.
    33.45% exon and 66.55% intron.  62% of the genome is transcribed reproducibly.  231 MB of genome has protein binding sites. ◦ 80% of which are low affinity sites (http://www.factorbook.org/) ◦ Many are highly conserved cell selective type  96% of the CpG exhibited differential methylation pattern.  GWAS SNPs had overlaps with ENCODE elements.
  • 23.
    Chromosome confirmation capture carbon copy(5C) ◦ 1% of the genome is distally regulated (>1000 bp) ◦ On an average 3.9 distal elements interacted with TSS. ◦ Distance could be several KBs to MBs
  • 24.
    cis-regulatory elements - Enhancers, promoters, insulators, silencers.  2.9 million DHS encompassing 125 diverse cell and tissue types.  20-50 bp length DHS mapped uniquely to 86.9% of genome ◦ 580,000 distal DHS with target promoters ◦ 3% lie in TSS ◦ 5% lie within 2.5 KB of TSS ◦ 95% lie distally (introns and intergenic regions) ◦ Strongly enriched in LTRs
  • 25.
    3/4th of genome is capable of transcription – redefine concept of gene? ◦ 62.1% AND 74.7% are processed or primary transcripts. ◦ 10-12 expressed isoforms per gene per cell. ◦ Coding and non-coding transcripts are localized in cytoplasm and nucleus respectively. ◦ 6% of the coding and non-coding transcripts overlap with small RNAs – precursors? ◦ Most of the novel transcripts lacked protein coding ability.
  • 26.
    Mapping job is only half done.  Characterizing everything a genome does is 10% done.  Finding Network of switches for genes.  A number of correlations…..
  • 27.
    Where does gene therapy go from here?  Our fundamental understanding of genes as the functional units are flawed??  Epigenetics becomes the key player…  Gives impetus to holistic approach in treating a disease.  Do we still believe that human genome is most efficient?