Integration of biomedical data and electronic publications Lars Juhl Jensen EMBL Heidelberg
printed publications
dead wood
electronic publications
virtual dead wood
de Lichtenberg et al., Science, 2005
small font sizes
 
“ no”
Jensen et al., Nature Reviews Genetics, 2006
small font sizes
hyperlinks
 
“ no”
“ hell no”
why?
archival
reanalysis
data mining
reader interaction
what?
raw data
processed data
final data
“ facts”
where?
part of the document
too much data
too coarse grained
escalates the problem
institutional repositories
too many types of data
lack of standardization
difficult to download all data
public databases
specialization
standardization
mandatory deposition
easy to download all data
cross references
examples from biomedicine
GenBank
 
17.9 million sequences
80 billion nucleotides
UniProt
 
 
4.7 million sequences
Ensembl
 
35 complete genomes
PDB
 
44000 protein structures
GEO
 
5800 data sets
152000 samples
ArrayExpress
 
1800 data sets
BioGRID
 
186000 interactions
129000 proteins
MINT
 
103000 interactions
28000 proteins
PubChem
 
7.5 million compounds
PubMed Central
 
330 open access journals
12000 open access papers
downloadable
standardized formats
cross-referenced
archival
reanalysis
data mining
reader interaction
thank you!
Upcoming SlideShare
Loading in …5
×

Integration of biomedical data and electronic publications

442 views

Published on

10th International Symposium on Electronic Theses and Dissertations, Uppsala University, Uppsala, Sweden, June 13-16, 2007

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
442
On SlideShare
0
From Embeds
0
Number of Embeds
9
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Integration of biomedical data and electronic publications

  1. 1. Integration of biomedical data and electronic publications Lars Juhl Jensen EMBL Heidelberg
  2. 2. printed publications
  3. 3. dead wood
  4. 4. electronic publications
  5. 5. virtual dead wood
  6. 6. de Lichtenberg et al., Science, 2005
  7. 7. small font sizes
  8. 9. “ no”
  9. 10. Jensen et al., Nature Reviews Genetics, 2006
  10. 11. small font sizes
  11. 12. hyperlinks
  12. 14. “ no”
  13. 15. “ hell no”
  14. 16. why?
  15. 17. archival
  16. 18. reanalysis
  17. 19. data mining
  18. 20. reader interaction
  19. 21. what?
  20. 22. raw data
  21. 23. processed data
  22. 24. final data
  23. 25. “ facts”
  24. 26. where?
  25. 27. part of the document
  26. 28. too much data
  27. 29. too coarse grained
  28. 30. escalates the problem
  29. 31. institutional repositories
  30. 32. too many types of data
  31. 33. lack of standardization
  32. 34. difficult to download all data
  33. 35. public databases
  34. 36. specialization
  35. 37. standardization
  36. 38. mandatory deposition
  37. 39. easy to download all data
  38. 40. cross references
  39. 41. examples from biomedicine
  40. 42. GenBank
  41. 44. 17.9 million sequences
  42. 45. 80 billion nucleotides
  43. 46. UniProt
  44. 49. 4.7 million sequences
  45. 50. Ensembl
  46. 52. 35 complete genomes
  47. 53. PDB
  48. 55. 44000 protein structures
  49. 56. GEO
  50. 58. 5800 data sets
  51. 59. 152000 samples
  52. 60. ArrayExpress
  53. 62. 1800 data sets
  54. 63. BioGRID
  55. 65. 186000 interactions
  56. 66. 129000 proteins
  57. 67. MINT
  58. 69. 103000 interactions
  59. 70. 28000 proteins
  60. 71. PubChem
  61. 73. 7.5 million compounds
  62. 74. PubMed Central
  63. 76. 330 open access journals
  64. 77. 12000 open access papers
  65. 78. downloadable
  66. 79. standardized formats
  67. 80. cross-referenced
  68. 81. archival
  69. 82. reanalysis
  70. 83. data mining
  71. 84. reader interaction
  72. 85. thank you!

×