Your SlideShare is downloading. ×
0
Text mining exercise
Text mining exercise
Text mining exercise
Text mining exercise
Text mining exercise
Text mining exercise
Text mining exercise
Text mining exercise
Text mining exercise
Text mining exercise
Text mining exercise
Text mining exercise
Text mining exercise
Text mining exercise
Text mining exercise
Text mining exercise
Text mining exercise
Text mining exercise
Text mining exercise
Text mining exercise
Text mining exercise
Text mining exercise
Text mining exercise
Text mining exercise
Text mining exercise
Text mining exercise
Text mining exercise
Text mining exercise
Text mining exercise
Text mining exercise
Text mining exercise
Text mining exercise
Text mining exercise
Text mining exercise
Text mining exercise
Text mining exercise
Text mining exercise
Text mining exercise
Text mining exercise
Text mining exercise
Text mining exercise
Text mining exercise
Text mining exercise
Text mining exercise
Text mining exercise
Text mining exercise
Text mining exercise
Text mining exercise
Text mining exercise
Text mining exercise
Text mining exercise
Text mining exercise
Text mining exercise
Text mining exercise
Text mining exercise
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Text mining exercise

393

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
393
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
11
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Text mining exercise~5 m Lars Juhl Jensen
  • 2. the task
  • 3. named entity recognition
  • 4. human proteins
  • 5. link proteins to diseases
  • 6. what I have done
  • 7. information retrieval
  • 8. two diseases
  • 9. prostate cancer
  • 10. schizophrenia
  • 11. two sets of documents
  • 12. 62,755 abstracts
  • 13. 65,588 abstracts
  • 14. one directory with each set
  • 15. one file with each abstract
  • 16. dictionary
  • 17. tab-delimited file
  • 18. human proteins
  • 19. 22,523 entities
  • 20. synonyms
  • 21. from many databases
  • 22. orthographic variation
  • 23. prefixes and postfixes
  • 24. automatically generated
  • 25. 2,726,495 names
  • 26. tagdir program
  • 27. flexible matching
  • 28. upper- and lower-case
  • 29. spaces and hyphens
  • 30. tab-delimited output
  • 31. what you will do
  • 32. named entity recognition
  • 33. find unfortunate names
  • 34. create “black list”
  • 35. information extraction
  • 36. co-mentioning
  • 37. within documents
  • 38. link proteins to diseases
  • 39. link between the diseases
  • 40. a helping hand
  • 41. “black list”
  • 42. 100+ matches
  • 43. 10+ matches
  • 44. wrap up
  • 45. prostate cancer
  • 46. FOLH1
  • 47. schizophrenia
  • 48. Glutamate carboxypeptidase II
  • 49. same protein
  • 50. synonyms matter
  • 51. “black list” is crucial
  • 52. text mining is quite simple
  • 53. diseases.jensenlab.org

×