1. Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
The Identification of Circadian Clock Genes
By Data Mining Microarray Data
Atreyi Banerjee and Martin Hunt
The University of Leicester
June 27, 2008
4. Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
What is circadian rhythm?
Circadian circa (about) + dies (a day) Circadian rhythm is the
self-sustained cycle with 24 hour period that controls rest/activity
time awareness, photosynthesis, etc. Common among eukaryotes
(Neurospora, Drosophila, Mammals) Reserved for living organisms
(daily traffic congestions is not a circadian rhythm) Circannual 1
year period(e.g. migration)
5. Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
Circadian rhythm properties
Circadian rhythm properties are conserved across plant and animal
kingdom Basic properties of circadian rhythm: Endogenous free
running period of 24 hours Synchronization of stimuli Period is
unchanged with temperature Advantage: learn from studying
simple organisms (Drosophila, Neurospora, Mouse) Mechanisms
are similar but the genes are different The main cycling genes:
PER, TIM, CLK, CYC, BMAL
8. Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
Circadian clock control in Drosophila
ADD REFERENCE
9. Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
Experimentations
Drosophila entrained in 12:12 hour light dark (LD) cycle Then left
in complete darkness and analysed every 4 hours The final dataset
included replicas of 4 chips CT0, CT4, CT8, CT12, CT16 and
CT20
12. Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
Promoter analysis
To detect genes having same regulatory mechanism Extracting the
5’ untranslated region of the genes Finding out the over
represented motifs in the sequences Finding out the cis-regulatory
modules (combination of binding sites) in sets of co-expressed or
coregulated genes Getting the putative transcription factor binding
sites (TFBS) Functional analysis
13. Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
Effects of clock mutations on enhancers regulating
circadian gene expression
Stempfl, T. et al. Genetics 2002;160:571-593
14. Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
TOUCAN software
An interactive java display Map genes onto the Sequence set space
Flexibilty of using any identifier(Affy ID, EMBL, Refseq etc)
Perform statistical tests for finding regulatory sequences, selecting
parts of sequences, finding CpG islands in metazoan genome
15. Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
Predict instances of known motifs with MotifScanner
16. Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
The Significant motifs found in each cluster
17. Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
Predict cis-regulatory modules with MotifSampler
The co-expression of Dorsal 2 and Myf showing
18. Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
The cis-regulatory modules in each cluster
19. Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
The cis-regulatory module in genes listed with p-values
21. Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
List of unknown TFBS found in each cluster
22. Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
de novo discovery of unknown TFBS
MotifSampler tool in TOUCAN used to find unknown motifs which
could be novel transcription factors The 5’UTR sequences also
extracted from Ensembl Biomart The over represented TFBS were
extracted from MATCH and OTFBS Dorsal 2 and Myf were over
represented modules ARNT also found in cycle an important clock
gene, was located Genscan predicted genes in each cluster
25. Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
Identifying circadian genes: an outline
Microarray experiment
?
Data (spreadsheet)
?
Process data in R
?
Data analysis in R
?
List of circadian genes
26. Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
Identifying circadian genes: an outline
Microarray experiment
?
Data (spreadsheet)
?
Process data in R
?
Data analysis in R
?
List of circadian genes
27. Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
Identifying circadian genes: an outline
Four methods considered, all of which were implemented in R:
GeneCycle based
• The Fisher Method (Wichert et al. 2004)
• The Robust Method (Ahdesmaki et al. 2005)
“Sine wave” based
• The M&R Method (McDonald & Rosbash 2001)
• The Sine Method
28. Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
The Fisher Method
Implemented by the R package GeneCycle, based on Fourier
methods and Fisher’s g test
Time Series:
CT0 = 1.2
CT4 = 4.9
- Fisher’s g test - p-value = 0.3213
CT8 = 9.5
CT12 = 0.4
CT16 = 1.5
CT20 = −42
Repeat this process for each time series
29. Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
The Fisher Method: FDR
Oops! We’ve carried out over 6000 multiple tests.
The solution: false discovery rate (FDR) control, implemented by
the R package fdrtool
Definition
The FDR value is the percentage of false-positives we expect to be
found in our results
0.011, 0.021, 0.042, 0.045, 0.056, 0.065, 0.066, . . .
30. Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
The Robust Method
Also implemented by the R package GeneCycle
31. Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
The M&R Method
The M&R Method
32. Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
The Sine Method
The Sine Method
33. Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
Heatmap: The Fisher Method
heatmap of Fisher method
35. Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
The Numbers
How many in genes in common between methods etc
36. Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
Fisher Vs Sine Methods
what’s so different about them?
37. Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
Conclusions
• Why only use sine waves as a model?
• Is FDR really better than multiple testing?
• Why use GeneCycle?
38. Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
Conclusions
• All methods find some circadian clock genes
• . . . and some false positives
• Best approach: use many methods
• There is always a new, better method around the corner . . .