Integration of biomedical literature and databases Lars Juhl Jensen EMBL Heidelberg
biomedical databases
DNA sequences
GenBank
 
protein sequences
UniProt
 
protein structures
PDB
 
expression
ArrayExpress
GEO Gene Expression Omnibus
 
modifications
Phospho.ELM
PhosphoSite
interactions
BioGRID
DIP Database of Interacting Proteins
IntAct
MINT Molecular Interactions Database
 
chemical compounds
PubChem
 
database of databases
Duncan Hull, nodalpoint.org
freely available
literature mining
PubMed
exponential increase
 
 
some things never change
 
“ graph calculus”
=
~50 seconds per paper
information retrieval
find the relevant papers
ad hoc  retrieval
user-specified query
“ yeast  AND  cell cycle”
stemming
yeast / yeasts
dynamic query expansion
yeast /  S. cerevisiae
 
 
 
 
Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming st...
no tool will find it
entity recognition
identify the substance(s)
Mitotic cyclin ( Clb2 )-bound  Cdc28  (Cdk1 homolog) directly phosphorylated  Swe1  and this modification served as a prim...
good synonyms list
orthographic variation
CDC28
Cdc28p
disambiguation
Cdc2
SDS
information extraction
formalize the facts
co-mentioning
NLP Natural Language Processing
Mitotic cyclin ( Clb2 )-bound  Cdc28  (Cdk1 homolog) directly phosphorylated  Swe1  and this modification served as a prim...
integration tools
“ document-centric” tools
Reflect
 
browser add-on
real-time tagging service
any HTML document
augmented document
information from databases
 
iHOP
 
web interface
precomputed index
abstracts
find text about a protein
link proteins and text
 
experimental interactions
 
“ entity-centric” tools
STRING & STITCH
 
 
functional associations
heterogeneous evidence
information extraction
 
curated knowledge
 
interaction data
 
expression data
 
genomic context
 
quality scores
probabilistic framework
cross-species transfer
association networks
 
 
Acknowledgments <ul><li>STRING & STITCH </li></ul><ul><ul><li>Christian von Mering </li></ul></ul><ul><ul><li>Michael Kuhn...
hands-on exercises
<ul><li>Exercises </li></ul><ul><li>Find literature on human CDC2 </li></ul><ul><li>Find data and literature on targets an...
Upcoming SlideShare
Loading in …5
×

Integration of biomedical literature and databases

2,028 views
1,999 views

Published on

2nd European Conference on Scientific Publishing in Biomedicine and Medicine, Rikshospitalet, Oslo, Norway, September 5-6, 2008

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,028
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
37
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Integration of biomedical literature and databases

  1. 1. Integration of biomedical literature and databases Lars Juhl Jensen EMBL Heidelberg
  2. 2. biomedical databases
  3. 3. DNA sequences
  4. 4. GenBank
  5. 6. protein sequences
  6. 7. UniProt
  7. 9. protein structures
  8. 10. PDB
  9. 12. expression
  10. 13. ArrayExpress
  11. 14. GEO Gene Expression Omnibus
  12. 16. modifications
  13. 17. Phospho.ELM
  14. 18. PhosphoSite
  15. 19. interactions
  16. 20. BioGRID
  17. 21. DIP Database of Interacting Proteins
  18. 22. IntAct
  19. 23. MINT Molecular Interactions Database
  20. 25. chemical compounds
  21. 26. PubChem
  22. 28. database of databases
  23. 29. Duncan Hull, nodalpoint.org
  24. 30. freely available
  25. 31. literature mining
  26. 32. PubMed
  27. 33. exponential increase
  28. 36. some things never change
  29. 38. “ graph calculus”
  30. 39. =
  31. 40. ~50 seconds per paper
  32. 41. information retrieval
  33. 42. find the relevant papers
  34. 43. ad hoc retrieval
  35. 44. user-specified query
  36. 45. “ yeast AND cell cycle”
  37. 46. stemming
  38. 47. yeast / yeasts
  39. 48. dynamic query expansion
  40. 49. yeast / S. cerevisiae
  41. 54. Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1 hyperphosphorylation and degradation
  42. 55. no tool will find it
  43. 56. entity recognition
  44. 57. identify the substance(s)
  45. 58. Mitotic cyclin ( Clb2 )-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5 -dependent Swe1 hyperphosphorylation and degradation
  46. 59. good synonyms list
  47. 60. orthographic variation
  48. 61. CDC28
  49. 62. Cdc28p
  50. 63. disambiguation
  51. 64. Cdc2
  52. 65. SDS
  53. 66. information extraction
  54. 67. formalize the facts
  55. 68. co-mentioning
  56. 69. NLP Natural Language Processing
  57. 70. Mitotic cyclin ( Clb2 )-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5 -dependent Swe1 hyperphosphorylation and degradation
  58. 71. integration tools
  59. 72. “ document-centric” tools
  60. 73. Reflect
  61. 75. browser add-on
  62. 76. real-time tagging service
  63. 77. any HTML document
  64. 78. augmented document
  65. 79. information from databases
  66. 81. iHOP
  67. 83. web interface
  68. 84. precomputed index
  69. 85. abstracts
  70. 86. find text about a protein
  71. 87. link proteins and text
  72. 89. experimental interactions
  73. 91. “ entity-centric” tools
  74. 92. STRING & STITCH
  75. 95. functional associations
  76. 96. heterogeneous evidence
  77. 97. information extraction
  78. 99. curated knowledge
  79. 101. interaction data
  80. 103. expression data
  81. 105. genomic context
  82. 107. quality scores
  83. 108. probabilistic framework
  84. 109. cross-species transfer
  85. 110. association networks
  86. 113. Acknowledgments <ul><li>STRING & STITCH </li></ul><ul><ul><li>Christian von Mering </li></ul></ul><ul><ul><li>Michael Kuhn </li></ul></ul><ul><ul><li>Manuel Stark </li></ul></ul><ul><ul><li>Samuel Chaffron </li></ul></ul><ul><ul><li>Philippe Julien </li></ul></ul><ul><ul><li>Jean Muller </li></ul></ul><ul><ul><li>Tobias Doerks </li></ul></ul><ul><ul><li>Jan Korbel </li></ul></ul><ul><ul><li>Berend Snel </li></ul></ul><ul><ul><li>Martijn Huynen </li></ul></ul><ul><ul><li>Peer Bork </li></ul></ul><ul><li>Natural Language Processing </li></ul><ul><ul><li>Jasmin Saric </li></ul></ul><ul><ul><li>Rossitza Ouzounova </li></ul></ul><ul><ul><li>Isabel Rojas </li></ul></ul><ul><ul><li>Peer Bork </li></ul></ul><ul><li>Reflect </li></ul><ul><ul><li>Evangelos Pafilis </li></ul></ul><ul><ul><li>Heiko Horn </li></ul></ul><ul><ul><li>Michael Kuhn </li></ul></ul><ul><ul><li>Sean O’Donoghue </li></ul></ul><ul><ul><li>Reinhardt Schneider </li></ul></ul>
  87. 114. hands-on exercises
  88. 115. <ul><li>Exercises </li></ul><ul><li>Find literature on human CDC2 </li></ul><ul><li>Find data and literature on targets and cytochrome P450 enzymes for Aspirin, Viagra, as well as for similar compounds </li></ul><ul><li>Find information on the genes in doi:10.1371/journal.pgen.1000120 </li></ul><ul><li>Construct an interaction network of genes that cause G2/M delays in the budding yeast cell cycle </li></ul><ul><li>Tools </li></ul><ul><li>http://www.ihop-net.org </li></ul><ul><li>http://string.embl.de </li></ul><ul><li>http://stitch.embl.de </li></ul><ul><li>http://reflect.ws </li></ul>

×