Biomedical literature mining (and why we  really  need Open Access) Lars Juhl Jensen EMBL Heidelberg
why biomedicine?
why literature mining?
why open access?
M EDLINE
17 million citations
Jensen et al., Nature Reviews Genetics, 2006
too much to read
literature mining
open access
information retrieval
finding the papers
ad hoc  retrieval
 
user-specified query
“ yeast  AND  cell cycle”
stemming
yeast / yeasts
dynamic query expansion
yeast /  S. cerevisiae
 
M EDLINE
abstracts
complete papers
Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming st...
yeast?
cell cycle?
entity recognition
identifying the substance(s)
Mitotic cyclin ( Clb2 )-bound  Cdc28  (Cdk1 homolog) directly phosphorylated  Swe1  and this modification served as a prim...
Cdc28    yeast
Cdc28    cell cycle
good synonyms list
manual curation
orthographic variation
CDC28
Cdc28p
disambiguation
hairy
SDS
Cdc2
 
 
abstracts
complete papers
information extraction
formalizing the facts
 
co-mentioning
statistical methods
NLP Natural Language Processing
<ul><li>Gene  and protein  names </li></ul><ul><li>Cue words for entity recognition </li></ul><ul><li>Verbs for relation e...
Mitotic cyclin ( Clb2 )-bound  Cdc28  (Cdk1 homolog) directly phosphorylated  Swe1  and this modification served as a prim...
Jensen et al., Nature Reviews Genetics, 2006
new discoveries
text mining
 
Jensen et al., Nature Reviews Genetics, 2006
abstracts
complete papers
temporal trends
Jensen et al., Nature Reviews Genetics, 2006
buzzwords
Jensen et al., Nature Reviews Genetics, 2006
grant applications
integration of text and data
Genomic neighborhood Species co-occurrence Gene fusions Database imports Experimental interaction data Microarray expressi...
genotype to phenotype
Korbel et al., PLoS Biology, 2005
Korbel et al., PLoS Biology, 2005
Korbel et al., PLoS Biology, 2005
where are we now?
Jensen et al., Nature Reviews Genetics, 2006
abstracts
complete papers
restricted access
open access
the tools are there
now we need the text!
Acknowledgments <ul><li>Jasmin Saric </li></ul><ul><li>Rossitza Ouzounova </li></ul><ul><li>Michael Kuhn </li></ul><ul><li...
Upcoming SlideShare
Loading in …5
×

Biomedical literature mining (and why we really need open access)

643 views

Published on

The 28th IATUL annual conference: Global Access to Science - Scientific Publishing for the Future, Royal Institute of Technology (KTH), Stockholm, Sweden, June 11-14, 2007

Published in: Technology, Health & Medicine
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
643
On SlideShare
0
From Embeds
0
Number of Embeds
12
Actions
Shares
0
Downloads
14
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Biomedical literature mining (and why we really need open access)

  1. 1. Biomedical literature mining (and why we really need Open Access) Lars Juhl Jensen EMBL Heidelberg
  2. 2. why biomedicine?
  3. 3. why literature mining?
  4. 4. why open access?
  5. 5. M EDLINE
  6. 6. 17 million citations
  7. 7. Jensen et al., Nature Reviews Genetics, 2006
  8. 8. too much to read
  9. 9. literature mining
  10. 10. open access
  11. 11. information retrieval
  12. 12. finding the papers
  13. 13. ad hoc retrieval
  14. 15. user-specified query
  15. 16. “ yeast AND cell cycle”
  16. 17. stemming
  17. 18. yeast / yeasts
  18. 19. dynamic query expansion
  19. 20. yeast / S. cerevisiae
  20. 22. M EDLINE
  21. 23. abstracts
  22. 24. complete papers
  23. 25. Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1 hyperphosphorylation and degradation
  24. 26. yeast?
  25. 27. cell cycle?
  26. 28. entity recognition
  27. 29. identifying the substance(s)
  28. 30. Mitotic cyclin ( Clb2 )-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5 -dependent Swe1 hyperphosphorylation and degradation
  29. 31. Cdc28  yeast
  30. 32. Cdc28  cell cycle
  31. 33. good synonyms list
  32. 34. manual curation
  33. 35. orthographic variation
  34. 36. CDC28
  35. 37. Cdc28p
  36. 38. disambiguation
  37. 39. hairy
  38. 40. SDS
  39. 41. Cdc2
  40. 44. abstracts
  41. 45. complete papers
  42. 46. information extraction
  43. 47. formalizing the facts
  44. 49. co-mentioning
  45. 50. statistical methods
  46. 51. NLP Natural Language Processing
  47. 52. <ul><li>Gene and protein names </li></ul><ul><li>Cue words for entity recognition </li></ul><ul><li>Verbs for relation extraction </li></ul><ul><li>[ nxexpr T he expression of [ nxgene the cytochrome genes [ nxpg CYC1 and CYC7 ]]] is controlled by [ nxpg HAP1 ] </li></ul>
  48. 53. Mitotic cyclin ( Clb2 )-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5 -dependent Swe1 hyperphosphorylation and degradation
  49. 54. Jensen et al., Nature Reviews Genetics, 2006
  50. 55. new discoveries
  51. 56. text mining
  52. 58. Jensen et al., Nature Reviews Genetics, 2006
  53. 59. abstracts
  54. 60. complete papers
  55. 61. temporal trends
  56. 62. Jensen et al., Nature Reviews Genetics, 2006
  57. 63. buzzwords
  58. 64. Jensen et al., Nature Reviews Genetics, 2006
  59. 65. grant applications
  60. 66. integration of text and data
  61. 67. Genomic neighborhood Species co-occurrence Gene fusions Database imports Experimental interaction data Microarray expression data Literature mining
  62. 68. genotype to phenotype
  63. 69. Korbel et al., PLoS Biology, 2005
  64. 70. Korbel et al., PLoS Biology, 2005
  65. 71. Korbel et al., PLoS Biology, 2005
  66. 72. where are we now?
  67. 73. Jensen et al., Nature Reviews Genetics, 2006
  68. 74. abstracts
  69. 75. complete papers
  70. 76. restricted access
  71. 77. open access
  72. 78. the tools are there
  73. 79. now we need the text!
  74. 80. Acknowledgments <ul><li>Jasmin Saric </li></ul><ul><li>Rossitza Ouzounova </li></ul><ul><li>Michael Kuhn </li></ul><ul><li>Jan Korbel </li></ul><ul><li>Tobias Doerks </li></ul><ul><li>Isabel Rojas </li></ul><ul><li>Miguel Andrade </li></ul><ul><li>Peer Bork </li></ul>

×