Text mining

804 views

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
804
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
57
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Text mining

  1. 1. Text mining
  2. 2. Explosion
  3. 3. exponential increase
  4. 4. some things are constant
  5. 5. “graph calculus”
  6. 6. =
  7. 7. ~45 seconds per paper
  8. 8. Information retrieval
  9. 9. find the relevant papers
  10. 10. user-specified query
  11. 11. “yeast AND cell cycle”
  12. 12. stemming
  13. 13. dynamic query expansion
  14. 14. ranking
  15. 15. Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1 hyperphosphorylation and degradation
  16. 16. no tool will find that
  17. 17. Entity recognition
  18. 18. identify the substance(s)
  19. 19. Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1 hyperphosphorylation and degradation
  20. 20. Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1 hyperphosphorylation and degradation
  21. 21. Pafilis, O’Donoghue, Jensen et al., Nature Biotechnology, 2009
  22. 22. comprehensive lexicon
  23. 23. orthographic variation
  24. 24. “black list”
  25. 25. manual correction
  26. 26. still too much to read
  27. 27. Information extraction
  28. 28. formalize the facts
  29. 29. co-occurrence
  30. 30. global statistical analysis
  31. 31. NLP Natural Language Processing
  32. 32. parsing individual sentences
  33. 33. Gene and protein names Cue words for entity recognition Verbs for relation extraction [nxexpr The expression of [nxgene the cytochrome genes [nxpg CYC1 and CYC7]]] is controlled by [nxpg HAP1]
  34. 34. Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1 hyperphosphorylation and degradation
  35. 35. store in a database
  36. 36. then the fun begins :-)
  37. 37. Acknowledgments NLP pipeline – Jasmin Saric – Rossitza Ouzounova – Isabel Rojas – Peer Bork Reflect – Heiko Horn – Sune Frankild – Evangelos Pafilis – Sven Haag – Michael Kuhn – Peer Bork – Reinhardt Schneider – Sean O’Donoghue

×