Study of chromosomal aberrations often found in cancer and developmental abnormalities.
Study of variations in the baseline sequence in a microbial population (microbial comparative genomics).
PhD defense, October 27th 2006
A Variety of Genetic Alterations Underlie Developmental Abnormalities and Disease Any of the above may lead to an oncogene activation or to inactivation of a tumor suppressor.
Inappropriate gene activation or inactivation can be caused by:
Epigenetic gene silencing (e.g. addition of methyl groups)
Reciprocal translocation (exchange of fragments between two non-homologous chromosomes)
Gain or loss of genetic material
Existing techniques for detecting structural abnormalities Albertson and Pinkel, Human Molecular Genetics, 2003
Some microarray platforms for copy number analysis
Affymetrix SNP chip (500 K)
Representational oligonucleotide microarray analysis (ROMA) in
Whole genome tiling arrays
Own design (NimbleGen/NimbleExpress)
Array CGH: BAC arrays 12 mm HumArray3.1 2464 human BAC clones spotted in triplicates 164-196 kbp
Array CGH Maps DNA Copy Number Alterations to Positions in the Genome Loss of DNA copies in tumor Gain of DNA copies in tumor Test Genomic DNA Reference Genomic DNA Ratio Position on Sequence Cot-1 DNA
Example: Detection of DiGeorge region (A) Detection of deletion in the DiGeorge region by FISH. A chromosome 22 subtelomere probe (green) and the TUPLE1 probe for the DiGeorge region (red) were hybridized to metaphase chromosomes from a normal individual and an individual with the deletion. The arrow indicates the missing red FISH signal on the deleted chromosome. (B) Array CGH copy number profile of chromosome 22 showing deletion in the DiGeorge region (arrow). Albertson and Pinkel, Human Molecular Genetics, 2003
Structural abnormalities Albertson and Pinkel, Human Molecular Genetics, 2003 *HSR: homogeneously staining region *
Tumor Genomes are Stable Copy Number Profiles of a Tumor & Recurrence
Analysis of array CGH Goal: To partition the clones into sets with the same copy number and to characterize the genomic segments in terms of copy number. Biological model: genomic rearrangements lead to gains or losses of sizable contiguous parts of the genome, possibly spanning entire chromosomes, or, alternatively, to focal high-level amplifications.
Exercise Part I: Plot and view array CGH data DNA Microarray Analysis Course, 2007
Observed clone value and spatial coherence DNA Microarray Analysis Course, 2006 N(-.3, .08^2) N(.6, .1^2) ? ? Useful to make use of the physical dependence of the nearby clones, which translates into copy number dependence.
Expected log 2 ratio as a function of copy number change, normal cell contamination and ploidy Reference ploidy=2 0.58 0.07 0.58 2.58 100% 10% Reference ploidy=3 50% 2.0 0.38 0.0 0.42
Tertiary splits of the chromosomes into contiguous regions of equal copy number and assesses significance of the proposed splits by using a permutation reference distribution (Olshen et al, Biostatistics, 2004).
GLAD: Gain and Loss Analysis of DNA (GLAD package)
Detects chromosomal breakpoints by estimating a piecewise constant function that is based on adaptive weights smoothing ( Hupe et al, Bioinformatics, 2004) .
Segment length and copy number is taken from the empirical distribution observed in breast cancer data (DNAcopy segmentation).
Mixture of cells (sample is not pure)
Each sample was assigned a value, P t : proportion of tumor cells, between 0.3 and 0.7 from a uniform distribution.
Experimental noise is Gaussian
Standard deviations drawn from a uniform distribution between 0.1 and 0.2 to imitate real data where the noise may vary between experiments.
Cancer subtypes are heterogeneous
Certain aberrations characteristic for a cancer subtype may only exist in a percentage of the patients with that cancer subtype. Thus, in each sample, segments with copy number alterations (copy number not 2) was removed at random with probability 30%.
By repeating random class assigningment and testing, e.g. 100 times, the following ”permutation reference distribution” of maximum absolute test statistic is obtained (maxT distribution):
We wish to control the family wise error rate (FWER) at alpha=0.05 (5% chance of 1 false positive). Therefore, the cut-off should be such that only in 5% of the random cases, we will get one false positive (95 percentile): cutoff = 5 standard significance threshold MaxT multiple testing corrected threshold
Testing samples (original values) standard p-value cutoff for alpha=0.05 maxT p-value cutoff for alpha = 0.05
Numerous methods have been introduced for segmentation of DNA copy number data and breakpoint identification. It is important to benchmark them against existing methods (however, only feasible if the software is publicly available)
Currently, CBS (DNAcopy package) has the best overall performance
Use of spatial dependency in the analysis improves testing power on clone-by-clone basis
Merging of segmentation results improves copy number phenotype characterization
Study of copy number in cancer samples
Study of samples from patients with mental diseases
Comparison of bacterial strains
Questions? Exercise Part IV + Bonus exercise: Real data analysis DNA Microarray Analysis Course, 2007