Mining large-scale data sets on the eukaryotic cell cycle Lars Juhl Jensen EMBL Heidelberg
the cell cycle
grow and divide
one cell
two cells
four phases
G 1  phase
growth
S phase
DNA replication
G 2  phase
growth
M phase
cell division
 
regulation
gene expression
phosphorylation
targeted degradation
protein interactions
molecular biology
one gene
one postdoc
many types of data
a single gene
high-throughput biology
one lab
one technology
all the relevant genes
a single type of data
systems biology
many types of data
all the relevant genes
data integration
data mining
expression data
cell cultures
 
synchronization
microarrays
 
time courses
 
expression profiles
 
list of genes
periodically expressed
peak times
S. cerevisiae
expression data
Cho et al.
Spellman et al.
computational methods
Zhao et al.
Langmead et al.
Johansson et al.
Wichert et al.
Luan and Li
Lu et al.
Ahdesm äki et al.
Willbrand et al.
Chen et al.
Qiu et al.
Ahnert et al.
Andersson et al.
no benchmarking
reanalysis
benchmarking
 
no progress
no benchmarking
 
S. pombe
Rustici et al.
Peng et al.
Oliva et al.
no benchmarking
no integration
reanalysis
integration
benchmarking
 
no progress
no benchmarking
no integration
 
H. sapiens
Whitfield et al.
reanalysis
benchmarking
 
A. thaliana
Menges et al.
reanalysis
benchmarking
 
four organisms
list of genes
periodically expressed
peak times
protein interactions
S. cerevisiae
yeast two-hybrid
Uetz et al.
Ito et al.
complex pull-down
Gavin et al.
Ho et al.
 
30–50% false positives
topology-based scoring
yeast two-hybrid
-log((N 1 +1) · (N 2 +1))
complex pull-down
log[(N 12 · N)/((N 1 +1) · (N 2 +1))]
calibrate against KEGG
 
quality threshold
subcellular localization
 
expression data
temporal network
 
benchmarking
 
 
 
30–50% false positives
 
3–5% false positives
detailed function prediction
uncharacterized proteins
who
whom
when
global statements
dynamic and static
 
 
CDK–cyclin complexes
 
consistent timing
 
 
pre-replication complex
 
just-in-time assembly
dynamic and static
partial protein complexes
last missing subunits
phosphorylation
Übersax et al.
 
27% of dynamic proteins
8% of static proteins
targeted degradation
PEST regions
 
44% of dynamic proteins
29% of static proteins
data mining
undescribed link
transcriptional regulation
post-translational regulation
 
how can we test this?
cross-species comparison
evolutionary conservation
orthology detection
sequence similarity
 
not conserved
individual genes
just-in-time assembly
protein complexes
peak times
not comparable
time warping
 
same color = same phase
DNA replication
DNA polymerases
 
deoxynucleotide synthesis
 
phosphorylation
Übersax et al.
Loog et al.
Phospho.ELM
NetPhosK
correlation
 
 
cell cycle vs. non-cell cycle
co-evolution
 
 
transcriptional regulation
post-translational regulation
co-evolution
summary
reanalysis
integration
high-throughput data
biological discoveries
challenge
data mining
do this automatically
beware of the noise
benchmark!
Acknowledgments <ul><li>Thomas Skøt Jensen </li></ul><ul><li>Ulrik de Lichtenberg </li></ul><ul><li>Søren Brunak </li></ul...
Upcoming SlideShare
Loading in …5
×

Mining large-scale data sets on the eukaryotic cell cycle

478 views

Published on

Data and Text Mining for Integrative Biology, Humboldt University, Berlin, Germany, September 18-22, 2006

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
478
On SlideShare
0
From Embeds
0
Number of Embeds
13
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Mining large-scale data sets on the eukaryotic cell cycle

  1. 1. Mining large-scale data sets on the eukaryotic cell cycle Lars Juhl Jensen EMBL Heidelberg
  2. 2. the cell cycle
  3. 3. grow and divide
  4. 4. one cell
  5. 5. two cells
  6. 6. four phases
  7. 7. G 1 phase
  8. 8. growth
  9. 9. S phase
  10. 10. DNA replication
  11. 11. G 2 phase
  12. 12. growth
  13. 13. M phase
  14. 14. cell division
  15. 16. regulation
  16. 17. gene expression
  17. 18. phosphorylation
  18. 19. targeted degradation
  19. 20. protein interactions
  20. 21. molecular biology
  21. 22. one gene
  22. 23. one postdoc
  23. 24. many types of data
  24. 25. a single gene
  25. 26. high-throughput biology
  26. 27. one lab
  27. 28. one technology
  28. 29. all the relevant genes
  29. 30. a single type of data
  30. 31. systems biology
  31. 32. many types of data
  32. 33. all the relevant genes
  33. 34. data integration
  34. 35. data mining
  35. 36. expression data
  36. 37. cell cultures
  37. 39. synchronization
  38. 40. microarrays
  39. 42. time courses
  40. 44. expression profiles
  41. 46. list of genes
  42. 47. periodically expressed
  43. 48. peak times
  44. 49. S. cerevisiae
  45. 50. expression data
  46. 51. Cho et al.
  47. 52. Spellman et al.
  48. 53. computational methods
  49. 54. Zhao et al.
  50. 55. Langmead et al.
  51. 56. Johansson et al.
  52. 57. Wichert et al.
  53. 58. Luan and Li
  54. 59. Lu et al.
  55. 60. Ahdesm äki et al.
  56. 61. Willbrand et al.
  57. 62. Chen et al.
  58. 63. Qiu et al.
  59. 64. Ahnert et al.
  60. 65. Andersson et al.
  61. 66. no benchmarking
  62. 67. reanalysis
  63. 68. benchmarking
  64. 70. no progress
  65. 71. no benchmarking
  66. 73. S. pombe
  67. 74. Rustici et al.
  68. 75. Peng et al.
  69. 76. Oliva et al.
  70. 77. no benchmarking
  71. 78. no integration
  72. 79. reanalysis
  73. 80. integration
  74. 81. benchmarking
  75. 83. no progress
  76. 84. no benchmarking
  77. 85. no integration
  78. 87. H. sapiens
  79. 88. Whitfield et al.
  80. 89. reanalysis
  81. 90. benchmarking
  82. 92. A. thaliana
  83. 93. Menges et al.
  84. 94. reanalysis
  85. 95. benchmarking
  86. 97. four organisms
  87. 98. list of genes
  88. 99. periodically expressed
  89. 100. peak times
  90. 101. protein interactions
  91. 102. S. cerevisiae
  92. 103. yeast two-hybrid
  93. 104. Uetz et al.
  94. 105. Ito et al.
  95. 106. complex pull-down
  96. 107. Gavin et al.
  97. 108. Ho et al.
  98. 110. 30–50% false positives
  99. 111. topology-based scoring
  100. 112. yeast two-hybrid
  101. 113. -log((N 1 +1) · (N 2 +1))
  102. 114. complex pull-down
  103. 115. log[(N 12 · N)/((N 1 +1) · (N 2 +1))]
  104. 116. calibrate against KEGG
  105. 118. quality threshold
  106. 119. subcellular localization
  107. 121. expression data
  108. 122. temporal network
  109. 124. benchmarking
  110. 128. 30–50% false positives
  111. 130. 3–5% false positives
  112. 131. detailed function prediction
  113. 132. uncharacterized proteins
  114. 133. who
  115. 134. whom
  116. 135. when
  117. 136. global statements
  118. 137. dynamic and static
  119. 140. CDK–cyclin complexes
  120. 142. consistent timing
  121. 145. pre-replication complex
  122. 147. just-in-time assembly
  123. 148. dynamic and static
  124. 149. partial protein complexes
  125. 150. last missing subunits
  126. 151. phosphorylation
  127. 152. Übersax et al.
  128. 154. 27% of dynamic proteins
  129. 155. 8% of static proteins
  130. 156. targeted degradation
  131. 157. PEST regions
  132. 159. 44% of dynamic proteins
  133. 160. 29% of static proteins
  134. 161. data mining
  135. 162. undescribed link
  136. 163. transcriptional regulation
  137. 164. post-translational regulation
  138. 166. how can we test this?
  139. 167. cross-species comparison
  140. 168. evolutionary conservation
  141. 169. orthology detection
  142. 170. sequence similarity
  143. 172. not conserved
  144. 173. individual genes
  145. 174. just-in-time assembly
  146. 175. protein complexes
  147. 176. peak times
  148. 177. not comparable
  149. 178. time warping
  150. 180. same color = same phase
  151. 181. DNA replication
  152. 182. DNA polymerases
  153. 184. deoxynucleotide synthesis
  154. 186. phosphorylation
  155. 187. Übersax et al.
  156. 188. Loog et al.
  157. 189. Phospho.ELM
  158. 190. NetPhosK
  159. 191. correlation
  160. 194. cell cycle vs. non-cell cycle
  161. 195. co-evolution
  162. 198. transcriptional regulation
  163. 199. post-translational regulation
  164. 200. co-evolution
  165. 201. summary
  166. 202. reanalysis
  167. 203. integration
  168. 204. high-throughput data
  169. 205. biological discoveries
  170. 206. challenge
  171. 207. data mining
  172. 208. do this automatically
  173. 209. beware of the noise
  174. 210. benchmark!
  175. 211. Acknowledgments <ul><li>Thomas Skøt Jensen </li></ul><ul><li>Ulrik de Lichtenberg </li></ul><ul><li>Søren Brunak </li></ul><ul><li>Peer Bork </li></ul>

×