I am going to spend a few minutes illustrating how existing and emerging high-throughput genomic technologies are being used to understand cancer, a mindnumbingly complex and disregulated biologic process.
Since Knudson’s famous hypothesis proposing the two-hit model, our understanding of cancer as a genetic disease has progressed to the realization that cancer is not often a function of a single gene gone awry, but probably represents a complex interaction of multiple processes in the genome including altered copy number, gene expression, transcriptional regulation, chromatin modification, sequence variation, and DNA methylation. It is vital to the goal of producing better patient outcomes to understand not only what genes are involved in a certain type of cancer, but also how these other processes affect gene regulation. In short, an integrated view of the cancer genome is necessary and is now becoming possible.
The first karyotypes were produced in 1956. Shown here is a comparison of a normal karyotype of a normal female and one from a tumor. By 1960, a karyotype of a cancer genome revealed the presence of the Philadelphia chromosome. Now known to represent the BCR-ABL fusion protein, it was not until 33 years later in 1993 that a drug, gleevec, become available that targeted the fusion product. By applying high-throughput microarray technologies, the Cancer Genetics Branch is striving to make observations of the cancer genome that will provide deeper understandings of the biology of cancer, to develop prognostic and diagnostic markers to improve patient-specific treatments, and to find promising targets for directed drug therapy.
The technology for looking at genomic copy number is quite simple. DNA extracted from tumors is labeled with a red fluorochrome and normal DNA is labeled in green. The two extracts are allowed to hybridize to a microarray slide that contains probes that will each bind the DNA from a specific region of the genome. A scanner extracts the intensities in the red and green channels, providing a precise measurement of the amount of tumor and normal DNA present at each spot. Spots that show more red represent a relative abundance of tumor DNA and are amplified while regions that show green represent a relative loss of tumor DNA. By lining the probes up along the chromosomes, we can begin to define regions of DNA copy number that could contain oncogenes or tumor suppressor genes.
Doing so results in a very high-resolution map of, in this case, chromosome 8. On the left is what is typically seen in normal germline DNA. On the right is a view of a cancer chromosome. In red are regions of copy number increase, some with as many as 20-30 copies. In green are regions that show relative loss of DNA in the tumor and may represent LOH or total deletion. On average in this view, there is a probe every 75kb spaced throughout the genome or between 1.5 and 2 probes per gene. With this resolution, we can quickly determine the breakpoints associated with these regions of copy number and determine what genes are involved in a given ampification or deletion. Of course, the figures here represent only one chromosome.
Zooming out to look at the whole genome at once, the normal genome with normal female DNA in red and normal male DNA in green shows the expected abnormalities on the X and Y chromosomes. Comparing that to a single breast cancer genome reveals the richness of the data that we are producing. Nearly every chromosome shows some copy number alteration that can be mapped to the genome to produce lists of candidate genes. But with so many alterations, it is helpful to consider multiple genomes at once, as copy number changes that occur in multiple samples are more likely to be of biological importance and not simply a product of an unstable cancer genome.
This figure shows the frequency of copy number changes along chromosome 17 and is a summary of the results of measuring copy number in 46 breast cancer cell lines. On the right-hand y-axis is noted the percentage of samples showing copy number gain (in red) and copy number loss (in green. I have marked the location of a gene known to be important in breast cancer, ERBB2 (also knows as Her-2). This gene is known to be amplified in 10%-40% of breast cancer, agreeing with our own estimate from the breast cancer cell lines, and is associated with poor prognosis. It is now the target of an directed monoclonal antibody therapy. It is enticing to think that hidden in the other peaks of copy number change are multiple other potential drug targets. With such a high-resolution overview of the breast cancer genome, we can begin to dissect each region to determine what those genes might be.
To dissect some of these regions in more detail, we can employ a new, extremely flexible and powerful technology now referred to as Tiling Array Technology. It works in principle the same as the microarrays that I have already described except that we can design the arrays to cover portions of the genome with extraordinary resolution. Continuing with the example of ERBB2 that we saw in the last slide, we can take a zoomed-in look at the ERBB2 gene. Zooming in again, we are now looking at exons in blue connected by introns. We choose probes spaced throughout the region, covering both exons and introns. The technology has progressed so that we can measure the copy number of 400,000 probes on a single array at any resolution we desire.
ZNF217 is a candidate oncogene located at chromosome 20q13. When overexpressed in human mammary epithelial cells, it is sufficient to immortalize them. Here, I show how integrating gene copy number, gene expression, and other genomic information, namely evolutionary conservation helps to frame the observation that the gene is amplified, shows expression at the exons as expected, but also shows expression outside the exons that roughly correlates with areas of evolutionarily conserved sequence. Interestingly, there is expression on the opposite strand that is not accounted for by any known transcribed element (gene or otherwise). Observations of abberant transcription outside of exons and transcription not associated with any known genes reveal the NEED for more observations at this level of detail. However, they also raise questions that demand further experimentation at this incredible resolution to help understand the processes that lead to these phenomenon and the biological importance of them.
To this end, the Cancer genetics branch is actively developing and using these technologies to look at these other aspects of the cancer genome. Again the goal is to produce an integrated view of the cancer genome in unprecedented detail and to distill from that view genes of therapeutic import and convenience, observations of prognostic or diagnostic importance, and, in the process to answer questions of intrinsic biologic interest.
Sean Davis, M.D., Ph.D. Genetics Branch, Center for Cancer Research National Cancer Institute National Institutes of Health High-Resolution Views Of The Cancer Genome
Simultaneous Gene Expression and Copy Number on Tiling Arrays Annotated Genes Expression Copy Number, Sample 1 Copy Number, Sample 2 Simultaneous measurement of copy number in two samples and gene expression in one sample overlayed on map of genes in the region
Simultaneous Gene Expression and Copy Number on Tiling Arrays Annotated Genes Expression Copy Number, Sample 1 Copy Number, Sample 2 Increased expression in the small amplicon does not include all genes, giving clues as to the biologically important genes in the region
Simultaneous Gene Expression and Copy Number on Tiling Arrays Annotated Genes Expression Copy Number, Sample 1 Copy Number, Sample 2 Spikes of expression at exons of ERBB2
Simultaneous Expression and Copy Number Copy Number Expression Copy Number Evolutionary Conservation Expression Copy Number Opposite Strand Expression Evolutionary Conservation Expression Copy Number
Growth in Density Over Time And these numbers are from only a single array! <ul>And these numbers are from only a single array! </ul><ul>Excel doesn't work! </ul>2,000 spots, 1997 8,000 spots, 2000 36,000 spots, 2003 85,000 to 390,000 spots, 2004 10,000,000 beads, 2005
Why did the chicken cross the road? Darwin1: It was the logical next step after coming down from the trees. Darwin2: The fittest chickens cross the road.
Sequencing <ul><li>Why use hybridization, which is just a measure of sequences, correct? </li><ul><li>Sequencing is costly, time- and labor-intensive, and inefficient </li></ul><li>Next-generation sequencing technology changes the equation such that sequencing can be more efficient, cheaper, and less time- and labor-intensive than hybridization-based methods like microarrays </li></ul>
Chromatin <ul><li>Chromatin is the complex of protein and DNA that make up the chromosomes. It is not a static structure. </li></ul><ul><li>The nucleosomes are the basic building blocks of chromatin structure. Their positioning on the genome and the regulation of their placement is not well described. </li></ul>
<ul><li>DNAse is an enzyme that cuts DNA at locations where DNA is accessible
These “accessible” regions have been associated with open chromatin
Regions of open chromatin are necessary for transcriptional and regulatory machinery to have access to gene neighborhoods and facilitate transcription </li></ul>
DNAse Hypersensitivity <ul><li>Method for finding regions of “open” chromatin
In data published with the ENCODE consortium, DNAse hypersensitive (HS) were shown to be correlated with: </li><ul><li>Histone modification
Transcription factor binding sites (experimentally determined by ChIP/chip, etc.) </li></ul></ul>Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. The ENCODE Consortium. Nature , 2007.
<ul><li>Distances between sequences in non-DNAse HS regions have an oscillating pattern with frequency that corresponds to a single turn of the double-helix
DNAse is known to cut preferentially in the minor groove, which is exposed every 10.4 bases when wrapped around a nucleosome
A nucleosome is wrapped by 147 base pairs when complexed with DNA
Implication: Nucleosomes are positioned in a highly organized, precise manner </li></ul>Nucleosome Positioning
Phenotype Gene Copy Number Sequence Variation Chromatin Modification Gene Expression Transcriptional Regulation DNA Methylation
Public Data <ul><li>NCBI Gene Expression Omnibus (GEO) </li><ul><li>250,000 microarray experiments already done ! </li></ul><li>NCBI Short Read Archive (SRA) </li><ul><li>Compendium of sequencing experiments utilizing next-generation sequencing technologies </li></ul><li>GWAS databases
Databases of gene and protein function and interactions </li></ul>
Challenges <ul><li>Most of these technologies are still quite expensive and do not adapt well to clinical laboratory settings
Designing studies that evaluate the operating characteristics of new testing methods is costly and requires the appropriate patient populations
There are many ethical concerns associated with the enormous amounts of personal information that might be gleaned from genomic technologies applied in the clinical setting </li></ul>
The Biggest Challenge? How do we integrate all the disparate pieces of information, collected longitudinally and by many sources, to improve the health of the individual?
One day the zoo-keeper noticed that the orangutan was reading two books - the Bible and Darwin's The Origin of Species. In surprise he asked the ape, "Why are you reading both those books"? "Well," said the orangutan, "I just wanted to know if I was my brother's keeper or my keeper's brother." [email_address]