Your SlideShare is downloading. ×
Text mining
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Text mining

587
views

Published on

Published in: Technology, Education

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
587
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
55
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Text mining
  • 2. Explosion
  • 3. exponential increase
  • 4. some things are constant
  • 5. “graph calculus”
  • 6. =
  • 7. ~45 seconds per paper
  • 8. Information retrieval
  • 9. find the relevant papers
  • 10. user-specified query
  • 11. “yeast AND cell cycle”
  • 12. stemming
  • 13. dynamic query expansion
  • 14. ranking
  • 15. Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1 hyperphosphorylation and degradation
  • 16. no tool will find that
  • 17. Entity recognition
  • 18. identify the substance(s)
  • 19. Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1 hyperphosphorylation and degradation
  • 20. Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1 hyperphosphorylation and degradation
  • 21. Pafilis, O’Donoghue, Jensen et al., Nature Biotechnology, 2009
  • 22. comprehensive lexicon
  • 23. orthographic variation
  • 24. “black list”
  • 25. manual correction
  • 26. still too much to read
  • 27. Information extraction
  • 28. formalize the facts
  • 29. co-occurrence
  • 30. global statistical analysis
  • 31. NLP Natural Language Processing
  • 32. parsing individual sentences
  • 33. Gene and protein names Cue words for entity recognition Verbs for relation extraction [nxexpr The expression of [nxgene the cytochrome genes [nxpg CYC1 and CYC7]]] is controlled by [nxpg HAP1]
  • 34. Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1 hyperphosphorylation and degradation
  • 35. store in a database
  • 36. then the fun begins :-)
  • 37. Acknowledgments NLP pipeline – Jasmin Saric – Rossitza Ouzounova – Isabel Rojas – Peer Bork Reflect – Heiko Horn – Sune Frankild – Evangelos Pafilis – Sven Haag – Michael Kuhn – Peer Bork – Reinhardt Schneider – Sean O’Donoghue