Text-mining practical

338 views

Published on

Published in: Science
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
338
On SlideShare
0
From Embeds
0
Number of Embeds
16
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Text-mining practical

  1. 1. unix primer
  2. 2. the command line
  3. 3. some useful commands
  4. 4. cat
  5. 5. less
  6. 6. head -10
  7. 7. tail -10
  8. 8. grep ‘needle’
  9. 9. cut -f 2
  10. 10. sort
  11. 11. sort -nr
  12. 12. uniq -c
  13. 13. redirecting output
  14. 14. write to file
  15. 15. command > filename
  16. 16. using pipes
  17. 17. command1 | command2
  18. 18. putting it all together
  19. 19. cut -f 4 infile | sort | uniq -c | sort -nr | head -100 > outfile
  20. 20. the task
  21. 21. disease gene finding
  22. 22. named entity recognition
  23. 23. human genes
  24. 24. gene prioritization
  25. 25. what I have done
  26. 26. information retrieval
  27. 27. two diseases
  28. 28. prostate cancer
  29. 29. schizophrenia
  30. 30. two sets of documents
  31. 31. 62,755 abstracts
  32. 32. 65,588 abstracts
  33. 33. one directory with each set
  34. 34. one file with each abstract
  35. 35. dictionary
  36. 36. tab-delimited file
  37. 37. human genes
  38. 38. 22,523 entities
  39. 39. synonyms
  40. 40. from many databases
  41. 41. orthographic variation
  42. 42. prefixes and postfixes
  43. 43. automatically generated
  44. 44. 2,726,495 names
  45. 45. tagdir program
  46. 46. flexible matching
  47. 47. upper- and lower-case
  48. 48. spaces and hyphens
  49. 49. tab-delimited output
  50. 50. what you will do
  51. 51. named entity recognition
  52. 52. find unfortunate names
  53. 53. create “black list”
  54. 54. information extraction
  55. 55. co-mentioning
  56. 56. within abstracts
  57. 57. ank genes for each disease
  58. 58. find shared gene
  59. 59. a helping hand
  60. 60. “black list”
  61. 61. 100+ matches
  62. 62. 10+ matches
  63. 63. wrap up
  64. 64. prostate cancer
  65. 65. FOLH1
  66. 66. schizophrenia
  67. 67. Glutamate carboxypeptidase II
  68. 68. same protein
  69. 69. synonyms matter
  70. 70. “black list” is crucial
  71. 71. text mining is useful
  72. 72. not black magic
  73. 73. EMBO Practical Course Computational Biology: Genomesto Systems Puerto Varas, 3-9April2014 Thank you!Thank you!

×