Analysis of protein-DNA interactions with tiling microarrays Srinivasan (Vasan) Yegnasubramanian Sidney Kimmel Comprehensive Cancer Center Oncology Dept., Genitourinary Division March 7, 2007
Identical genetic sequence, but very different gene expression and phenotypes… … These differences are due to Epigenetic changes.
Epigenetics is the study of heritable processes that alter gene expression without an accompanying change in gene sequence These processes are usually mediated by factors, such as proteins/ribonucleo-proteins, that bind genomic DNA
(3.4x10 -10 meters/bp) x (6x10 9 bp/genome) = ~2 meters/genome Radius of the nucleus is ~ 10 µ M !!! Klug and Cummings, 1997
[(6 x 10 9 bp/genome) / (195 bp/nucleosome)] = ~ 30.8 x 10 6 nucleosomes/genome ~ 5 % of nuclear volume
Diameter of DNA Double helix: 20 Angstroms Diameter of Transcriptional machinery: >1,000 Angstroms
Developing an understanding of epigenetic processes… DNA Modifications (e.g. Methylation) Gene Transcriptional Changes DNA-Protein Interactions
Characteristics of Tiling Microarrays
Microarray contains n probes of length L distributed across x base pairs on a genomic region of interest. That is, n probes are tiled across a genomic region of interest
The average resolution or sampling/window size, then, is R = x / n , or
d 1 d 2 d 6 d 5 d 4 d 3 d 7
Affymetrix Tiling microarrays
Human Chromosome 21/22 microarrays
> 35 million bp of non-repetitive sequence on Chrom 21/22 represented with >1 million probe sets on three microarrays (currently on a single array). R ~ 35 bp.
representation of 1% of genome corresponding with ENCODE regions at 35 bp resolution with single microarray.
Tiled arrays of 10 human chromosomes
74,180,611 probe pairs interrogating 30% of human genome (i.e. 10 complete chromosomes) at on >90 microarrays. R ~ 5 bp.
Tiled arrays of whole genome
interrogation of whole genome (1.7 Gb) on 7 microarrays (~50,000,000 PM probes only) or 14 microarrays (~50,000,000 PM + MM probe sets). R ~ 35 bp.
Promoter Tiling arrays
interrogation of all 5’ upstream regions of known genes on a single microarray
All probes are 25-mers
Strategy Label and Hybridize Samples To Tiling Microarrays Chromatin Structure ( In vivo DNA/Protein Interactions) Biostatistical Analysis to Identify Genomic Regions of Interest DNA Methylation ( In Vitro DNA/Protein Interactions) Transcriptome Analysis
ChIP-Chip for “ in vivo” DNA protein interactions Crosslink Lyse & Sonicate IP Reverse crosslinks Total Reverse crosslinks Amplify Amplify Label/hybridize Label/hybridize Other controls for IP (e.g., no antibody, non-specific antibody) Y
Current limitations for ChIP-Chip
Process is very inefficient and requires large amounts of input material
Sonication step can be quite variable and cannot be easily quality controlled with small amounts of starting material
Currently difficult to perform on clinical specimens
Genome-wide, high-resolution DNA methylation detection by taking advantage of tiling arrays and DNA-protein interactions in vitro
Middle ground Pool different restriction enzyme digests
Dynamics of amplification and fold enrichment…
Fold enrichment dependent on:
Amount of each species after enrichment
Total amount of all enriched species
Enrich Enrich Total Amplify to 20 Amplify to 20 Amplify to 20
Ongoing and future work DNA Modifications (e.g. Methylation) Gene Transcriptional Changes DNA-Protein Interactions Preprocessing Preprocessing Preprocessing Analysis Analysis Analysis Meta-Analysis Cancer Normal