Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
New insights into the human   genome by ENCODE
What is a gene???             • Union of genomic sequences encoding a               coherent set of potentially overlappin...
Its been ten years since scientists sequenced the humangenomeBut What do all these letters????????
21,000 genes
ENCODE- the Encyclopedia ofDNA Elements has ANSWERS                              Aiming to                              de...
ENCODE Consortium         (The ENCODE Project Consortium, 2011)
Pilot Phase    • 2003-2007               Technology                             • 2007-2012              development    • ...
Major methods         Data production and         initial analysis         Accessing ENCODE         dataENCODE         Wor...
Major Methods      (The ENCODE Project Consortium, 2004)
Overall data flow          (The ENCODE Project Consortium, 2011)
(The ENCODE Project Consortium, 2011)
RNA-seq – Isolation of RNA sequences followed by high-throughputsequencingCAGE – Capture of the methylated cap at the 5’en...
(The ENCODE Project Consortium, 2011)
ENCODE cell types          (The ENCODE Project Consortium, 2011)
ENCODE data production and initial analyses•   Since 2007, ENCODE has developed methods and performed a large    number of...
Transcribed and protein-coding regions•   In total, GENCODE-annotated exons of protein-coding genes cover 2.94% of the    ...
Process flow of experimental evaluation of                               pseudogene transcriptionExperimental validationre...
ENCODE gene and transcript annotations.                     (The ENCODE Project Consortium, 2011)
RNA•   They sequenced RNA from different cell lines and multiple    subcellular fractions to develop an extensive RNA expr...
A large majority of GENCODE elements are detected by                     RNA-seq data                                     ...
Protein bound regions•   119 different DNA-binding proteins and a number of RNA    polymerase components in 72 cell types ...
Occupancy of transcription factors and RNApolymerase 2 on human chromosome 6p as        determined by ChIP-seq
(The ENCODE Project Consortium, 2011)
DNase I hypersensitive sites and footprinting•   Chromatin accessibility characterized by DNase I hypersensitivity    is t...
Density of DNase I cleavage sites for selected cell types                                             (Thurman et al., 2012)
•   On average, 98.5% of the occupancy sites of transcription factors    mapped by ENCODE ChIP-seq•   Using genomic DNase ...
Regions of histone modification •   They assayed chromosomal locations for up to 12 histone     modifications and variants...
DNA methylation•   They used reduced representation bisulphite sequencing (RRBS)    to profile DNA methylation quantitativ...
Proteomics To assess putative protein products generated from novel RNA  transcripts and isoforms, proteins are sequenced...
ENCODE chromatin annotations in the HLA                locus                     (The ENCODE Project Consortium, 2011)
Accessing ENCODE DataENCODE Data Release and Use Policy•   The ENCODE Data Release and Use Policy is described at    http:...
UCSC Portal
Working with ENCODE DataUsing ENCODE Data in the UCSC Browser•   Many users will want to view and interpret the ENCODE dat...
ENCODE Data Analysis•   Development and implementation of algorithms and pipelines for    processing and analyzing data - ...
Analysis tools applied by the ENCODE             consortium                   (The ENCODE Project Consortium, 2011)
Integrating ENCODE with other projects and the               Scientific Community1. defining promoter and enhancer regions...
•   ENCODE Project - interpretation of human genome variation that    is associated with disease or quantitative phenotype...
Limitations of ENCODE Annotations•   Cell types - physiologically and genetically inhomogeneous.•   Local micro-environmen...
Challenges•   Adult human body contains several hundred distinct cell types•   Each of which expresses a unique subset of ...
Outcome•   Understanding of the human genome•   The broad coverage of ENCODE annotations enhances our    understanding of ...
http://www.nature.com/encode/#/threads
13 Threads1.   Transcription factor motifs2.   Chromatin patterns at transcription factor binding sites3.   Characterizati...
Schematic overview of the functional SNP               approach                                (Schaub et al., 2012)
Comparison of GWAS identified loci with            ENCODE data
(Boyle et al., 2012)
Future goal•   Mechanistic processes that generate these elements and how and    where they function•   Enlarge the data s...
Project is still far from completeConclusionFor update: https://www.facebook.com/ENCODEProject
Encode – assign word to letter
Thank you:)
New insights into the human genome by ENCODE project
New insights into the human genome by ENCODE project
Upcoming SlideShare
Loading in …5
×

New insights into the human genome by ENCODE project

2,299 views

Published on

It’s been ten years since scientists sequenced the human genome. But what do all these letters?
Researchers could identify in its 3 billion letters many of the regions that code for proteins, but those make
up little more than 1% of the genome, contained in around 21,000 genes a few familiar objects in an otherwise stark and unrecognizable landscape. Many biologists suspected that the information responsible
for the wondrous complexity of humans lay somewhere in the ‘deserts’ between the genes (The ENCODE Project Consortium, 2012).
Interpreting the human genome sequence is one of the leading challenges of 21st century biology
(Collins et al., 2003). In 2003, the National Human Genome Research Institute (NHGRI) embarked on an
ambitious project the Encyclopedia of DNA Elements (ENCODE), aiming to delineate all of the functional elements encoded in the human genome sequence (The ENCODE Project Consortium 2004). To further
this goal, NHGRI organized the ENCODE Consortium, an international group of investigators with diverse
backgrounds and expertise in production and analysis of high-throughput functional genomic data. In a pilot project phase spanning 2003–2007, the Consortium applied and compared a variety of experimental and computational methods to annotate functional elements in a defined 1% of the human genome (The ENCODE Project Consortium, 2007)

Published in: Education
  • Be the first to comment

New insights into the human genome by ENCODE project

  1. 1. New insights into the human genome by ENCODE
  2. 2. What is a gene??? • Union of genomic sequences encoding a coherent set of potentially overlapping functional products. ENCODE (Gerstein et al., 2007)
  3. 3. Its been ten years since scientists sequenced the humangenomeBut What do all these letters????????
  4. 4. 21,000 genes
  5. 5. ENCODE- the Encyclopedia ofDNA Elements has ANSWERS Aiming to delineate all of the functional elements encoded in the human genome sequence
  6. 6. ENCODE Consortium (The ENCODE Project Consortium, 2011)
  7. 7. Pilot Phase • 2003-2007 Technology • 2007-2012 development • 30 papers phase Production phase
  8. 8. Major methods Data production and initial analysis Accessing ENCODE dataENCODE Working with ENCODE data Data analysis Limitations Threads – Nature explorer
  9. 9. Major Methods (The ENCODE Project Consortium, 2004)
  10. 10. Overall data flow (The ENCODE Project Consortium, 2011)
  11. 11. (The ENCODE Project Consortium, 2011)
  12. 12. RNA-seq – Isolation of RNA sequences followed by high-throughputsequencingCAGE – Capture of the methylated cap at the 5’end of RNA, followedby high-throughput sequencingRNA-PET – Simultaneous capture of RNAs with both a 5’methyl capand a poly(A) tailChIP-seq - Chromatin immunoprecipitation followed by sequencingFAIRE-seq - Formaldehyde assisted isolation of regulatoryelements. Crosslinking, phenol extraction, and sequencing the DNAfragments in the aqueous phase
  13. 13. (The ENCODE Project Consortium, 2011)
  14. 14. ENCODE cell types (The ENCODE Project Consortium, 2011)
  15. 15. ENCODE data production and initial analyses• Since 2007, ENCODE has developed methods and performed a large number of sequence-based studies to map functional elements across the human genome.• The elements mapped (and approaches used) include  RNA transcribed regions (RNA-seq, CAGE, RNA-PET and manual annotation),  Protein-coding regions (mass spectrometry),  Transcription-factor-binding sites (ChIP-seq and DNase-seq),  Chromatin structure (DNase-seq, FAIRE-seq, histone ChIP-seq),  DNA methylation sites (RRBS assay) (The ENCODE Project Consortium, 2012)
  16. 16. Transcribed and protein-coding regions• In total, GENCODE-annotated exons of protein-coding genes cover 2.94% of the genome or 1.22% for protein-coding exons.• Protein-coding genes span 33.45% from the outermost start to stop codons, or 39.54% from promoter to poly(A) site.• Additional protein-coding genes remain to be found.• In addition, they annotated 8,801 automatically derived small RNAs and 9,640 manually curated long non-coding RNA (lncRNA) loci• The GENCODE annotated 11,224 pseudogenes (The ENCODE Project Consortium, 2012)
  17. 17. Process flow of experimental evaluation of pseudogene transcriptionExperimental validationresults showing thetranscription of pseudogenesin different tissues (Pei et al., 2012)
  18. 18. ENCODE gene and transcript annotations. (The ENCODE Project Consortium, 2011)
  19. 19. RNA• They sequenced RNA from different cell lines and multiple subcellular fractions to develop an extensive RNA expression catalogue.• They used CAGE-seq (5’cap-targeted RNA isolation and sequencing) to identify 62,403 (TSSs) in tier 1 and2 cell types (The ENCODE Project Consortium, 2012)
  20. 20. A large majority of GENCODE elements are detected by RNA-seq data (Djebali et al., 2012)
  21. 21. Protein bound regions• 119 different DNA-binding proteins and a number of RNA polymerase components in 72 cell types using ChIP-seq• Overall, 636,336 binding regions covering 231 mega bases (8.1%) of the genome are enriched for regions bound by DNA- binding proteins across all cell types. (The ENCODE Project Consortium, 2012)
  22. 22. Occupancy of transcription factors and RNApolymerase 2 on human chromosome 6p as determined by ChIP-seq
  23. 23. (The ENCODE Project Consortium, 2011)
  24. 24. DNase I hypersensitive sites and footprinting• Chromatin accessibility characterized by DNase I hypersensitivity is the hallmark of regulatory DNA regions.• 2.89 million unique, non-overlapping (DHSs) by DNase-seq in 125 cell types – lie distal to TSSs• In tier 1 and tier 2 cell types - 205,109 DHSs per cell type, encompassing an average of 1.0% of the genomic sequence in each cell type, and 3.9% in aggregate. (The ENCODE Project Consortium, 2012)
  25. 25. Density of DNase I cleavage sites for selected cell types (Thurman et al., 2012)
  26. 26. • On average, 98.5% of the occupancy sites of transcription factors mapped by ENCODE ChIP-seq• Using genomic DNase I footprinting on 41 cell types they identified 8.4million distinct DNase I footprints (The ENCODE Project Consortium, 2012)
  27. 27. Regions of histone modification • They assayed chromosomal locations for up to 12 histone modifications and variants in 46 cell types, across tier 1 and 2.(http://www.factorbook.org) (The ENCODE Project Consortium, 2012)
  28. 28. DNA methylation• They used reduced representation bisulphite sequencing (RRBS) to profile DNA methylation quantitatively for an average of 1.2 million CpGs in each of 82 cell lines and tissues (8.6% of non- repetitive genomic CpGs), including CpGs in intergenic regions, proximal promoters and intragenic regions. (The ENCODE Project Consortium, 2012)
  29. 29. Proteomics To assess putative protein products generated from novel RNA transcripts and isoforms, proteins are sequenced and quantified by mass spectrometry and mapped back to their encoding transcripts. K562 and GM12878 – protein study begun (The ENCODE Project Consortium, 2011)
  30. 30. ENCODE chromatin annotations in the HLA locus (The ENCODE Project Consortium, 2011)
  31. 31. Accessing ENCODE DataENCODE Data Release and Use Policy• The ENCODE Data Release and Use Policy is described at http://www.encodeproject.org/ENCODE/terms.html.• ENCODE data are released for viewing in a publicly accessible browser (initially at http://genome-preview.ucsc.edu/ENCODE and, after additional quality checks, at http://encodeproject.org)Public Repositories• UCSC Genome Browser database (http://genome.ucsc.edu). (The ENCODE Project Consortium, 2011)
  32. 32. UCSC Portal
  33. 33. Working with ENCODE DataUsing ENCODE Data in the UCSC Browser• Many users will want to view and interpret the ENCODE data for particular genes of interest. At the online ENCODE portal (http://encodeproject.org), users should follow a ‘‘Genome Browser’’ link to visualize the data in the context of other genome annotations. (The ENCODE Project Consortium, 2011)
  34. 34. ENCODE Data Analysis• Development and implementation of algorithms and pipelines for processing and analyzing data - major activity of the ENCODE Project. • Short sequences 2nd Phase •Integrating the are aligned to identified regions the reference • Identifying the of enriched signal genome enriched regions with each other and with other data types 1st Phase 3rd Phase (The ENCODE Project Consortium, 2011)
  35. 35. Analysis tools applied by the ENCODE consortium (The ENCODE Project Consortium, 2011)
  36. 36. Integrating ENCODE with other projects and the Scientific Community1. defining promoter and enhancer regions by combining transcript mapping and biochemical marks,2. delineating distinct classes of regions within the genomic landscape by their specific combinations of biochemical and functional characteristics, and3. defining transcription factor co-associations and regulatory networks. (The ENCODE Project Consortium, 2011)
  37. 37. • ENCODE Project - interpretation of human genome variation that is associated with disease or quantitative phenotypes• Integrate with 1,000 Genomes Project - how SNPs and structural variation may affect transcript, regulatory and DNA methylation data• ENCODE - GWAS and other sequence variation driven studies of human phenotypes Major contributor not only of data but also novel technologies for deciphering the human genome (The ENCODE Project Consortium, 2011)
  38. 38. Limitations of ENCODE Annotations• Cell types - physiologically and genetically inhomogeneous.• Local micro-environments in culture may also vary• Use of DNA sequencing to annotate functional genomic features is also constrained.• Considerable quantitative variation in the signal strength along the genome (The ENCODE Project Consortium, 2011)
  39. 39. Challenges• Adult human body contains several hundred distinct cell types• Each of which expresses a unique subset of the 1,800 TFs encoded in the human genome• Brain alone contains thousands of types of neurons that are likely to express not only different sets of TFs but also a larger variety of non-coding RNAs• A truly comprehensive atlas of human functional elements is not practical with current technologies (The ENCODE Project Consortium, 2011)
  40. 40. Outcome• Understanding of the human genome• The broad coverage of ENCODE annotations enhances our understanding of common diseases with a genetic component, rare genetic diseases• 119 of 1,800 known transcription factors and 13 of more than 60 currently known histone or DNA modifications across 147 cell types• Overall these data reflect a minor fraction of the potential functional information encoded in the human genome (The ENCODE Project Consortium, 2012)
  41. 41. http://www.nature.com/encode/#/threads
  42. 42. 13 Threads1. Transcription factor motifs2. Chromatin patterns at transcription factor binding sites3. Characterization of intergenic regions and gene definition4. RNA and chromatin modification patterns around promoters5. Epigenetic regulation of RNA processing6. Non-coding RNA characterization7. DNA methylation8. Enhancer discovery and characterization9. Three-dimensional connections across the genome10. Characterization of network topology11. Machine learning approaches to genomics12. Impact of functional information on understanding variation13. Impact of evolutionary selection on functional regions
  43. 43. Schematic overview of the functional SNP approach (Schaub et al., 2012)
  44. 44. Comparison of GWAS identified loci with ENCODE data
  45. 45. (Boyle et al., 2012)
  46. 46. Future goal• Mechanistic processes that generate these elements and how and where they function• Enlarge the data set to additional factors, modifications and cell types, complementing the other related projects• Constitute foundational resources for human genomics, allowing a deeper interpretation of the organization of gene and regulatory information and the mechanisms of regulation, and thereby provide important insights into human health and disease (The ENCODE Project Consortium, 2012)
  47. 47. Project is still far from completeConclusionFor update: https://www.facebook.com/ENCODEProject
  48. 48. Encode – assign word to letter
  49. 49. Thank you:)

×