Text-mining practical

255 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
255
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
15
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Text-mining practical

  1. 1. Lars Juhl Jensen Text-mining practical
  2. 2. the task
  3. 3. named entity recognition
  4. 4. human proteins
  5. 5. link proteins to diseases
  6. 6. what I have done
  7. 7. information retrieval
  8. 8. two diseases
  9. 9. prostate cancer
  10. 10. schizophrenia
  11. 11. two sets of documents
  12. 12. 62,755 abstracts
  13. 13. 65,588 abstracts
  14. 14. one directory with each set
  15. 15. one file with each abstract
  16. 16. dictionary
  17. 17. tab-delimited file
  18. 18. human proteins
  19. 19. 22,523 entities
  20. 20. synonyms
  21. 21. from many databases
  22. 22. orthographic variation
  23. 23. prefixes and postfixes
  24. 24. automatically generated
  25. 25. 2,726,495 names
  26. 26. tagdir program
  27. 27. flexible matching
  28. 28. upper- and lower-case
  29. 29. spaces and hyphens
  30. 30. tab-delimited output
  31. 31. what you will do
  32. 32. named entity recognition
  33. 33. find unfortunate names
  34. 34. create “black list”
  35. 35. information extraction
  36. 36. co-mentioning
  37. 37. within documents
  38. 38. link proteins to diseases
  39. 39. link between the diseases
  40. 40. a helping hand
  41. 41. “black list”
  42. 42. 100+ matches
  43. 43. 10+ matches
  44. 44. wrap up
  45. 45. prostate cancer
  46. 46. FOLH1
  47. 47. schizophrenia
  48. 48. Glutamate carboxypeptidase II
  49. 49. same protein
  50. 50. synonyms matter
  51. 51. “black list” is crucial
  52. 52. text mining is useful
  53. 53. not black magic

×