Biomedical text mining

407 views
256 views

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
407
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Biomedical text mining

  1. 1. Biomedical text mining Lars Juhl Jensen
  2. 2. exponential growth
  3. 3. ~45 seconds per paper
  4. 4. information retrieval
  5. 5. named entity recognition
  6. 6. augmented browsing
  7. 7. text corpora
  8. 8. information extraction
  9. 9. information retrieval
  10. 10. find the relevant papers
  11. 11. ad hoc retrieval
  12. 12. user-specified query
  13. 13. “yeast AND cell cycle”
  14. 14. PubMed
  15. 15. indexing
  16. 16. fast lookup
  17. 17. stemming
  18. 18. word endings
  19. 19. dynamic query expansion
  20. 20. MeSH terms
  21. 21. Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5dependent Swe1 hyperphosphorylation and degradation
  22. 22. no tool will find that
  23. 23. named entity recognition
  24. 24. computer
  25. 25. as smart as a dog
  26. 26. teach it specific tricks
  27. 27. identify the concepts
  28. 28. Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5dependent Swe1 hyperphosphorylation and degradation
  29. 29. comprehensive lexicon
  30. 30. proteins
  31. 31. chemicals
  32. 32. compartments
  33. 33. tissues
  34. 34. diseases
  35. 35. organisms
  36. 36. CDC2
  37. 37. cyclin dependent kinase 1
  38. 38. orthographic variation
  39. 39. upper- and lower-case
  40. 40. CDC2
  41. 41. Cdc2
  42. 42. spaces and hyphens
  43. 43. cyclin dependent kinase 1
  44. 44. cyclin-dependent kinase 1
  45. 45. flexible matching
  46. 46. similar to alignment
  47. 47. prefixes and postfixes
  48. 48. CDC2
  49. 49. hCDC2
  50. 50. plurals
  51. 51. adjectives
  52. 52. “black list”
  53. 53. SDS
  54. 54. efficient tagger
  55. 55. Pafilis et al., PLOS ONE, 2013
  56. 56. text corpora
  57. 57. >10 km <10 hours
  58. 58. most use Medline
  59. 59. ~22 million abstracts
  60. 60. few use full-text articles
  61. 61. no access
  62. 62. PDF files
  63. 63. layout-aware extraction
  64. 64. millions of full-text articles
  65. 65. information extraction
  66. 66. formalize the facts
  67. 67. Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5dependent Swe1 hyperphosphorylation and degradation
  68. 68. two approaches
  69. 69. co-mentioning
  70. 70. counting
  71. 71. within documents
  72. 72. within paragraphs
  73. 73. within sentences
  74. 74. co-mentioning score
  75. 75. NLP Natural Language Processing
  76. 76. grammatical analysis
  77. 77. part-of-speech tagging
  78. 78. multiword detection
  79. 79. semantic tagging
  80. 80. sentence parsing
  81. 81. Gene and protein names Cue words for entity recognition Verbs for relation extraction [nxexpr The expression of [nxgene the cytochrome genes [nxpg CYC1 and CYC7]]] is controlled by [nxpg HAP1]
  82. 82. extract stated facts
  83. 83. high precision
  84. 84. poor recall
  85. 85. questions?
  86. 86. Exercise 3 Go to http://diseases.jensenlab.org Find TYMS disease associations nspect the text-mining evidence Look for examples of synonym usage Find genes linked to colorectal cance
  87. 87. thank you!

×