Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Turning big data and text collections into web resrouces

211 views

Published on

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

Turning big data and text collections into web resrouces

  1. 1. Lars Juhl JensenTurning big data and textcollections into web resources
  2. 2. three parts
  3. 3. data integration
  4. 4. text mining
  5. 5. interface design
  6. 6. data integration
  7. 7. association networks
  8. 8. guilt by association
  9. 9. STRING
  10. 10. Szklarczyk, Franceschini et al., Nucleic Acids Research, 2011
  11. 11. computational predictions
  12. 12. gene fusion
  13. 13. Korbel et al., Nature Biotechnology, 2004
  14. 14. experimental data
  15. 15. physical interactions
  16. 16. Jensen & Bork, Science, 2008
  17. 17. curated knowledge
  18. 18. metabolic pathways
  19. 19. Letunic & Bork, Trends in Biochemical Sciences, 2008
  20. 20. many databases
  21. 21. different formats
  22. 22. different identifiers
  23. 23. variable quality
  24. 24. not comparable
  25. 25. hard work
  26. 26. quality scores
  27. 27. von Mering et al., Nucleic Acids Research, 2005
  28. 28. calibrate vs. gold standard
  29. 29. missing most of the data
  30. 30. text mining
  31. 31. >10 km
  32. 32. too much to read
  33. 33. computer
  34. 34. as smart as a dog
  35. 35. teach it specific tricks
  36. 36. named entity recognition
  37. 37. comprehensive lexicon
  38. 38. cyclin dependent kinase 1
  39. 39. CDC2
  40. 40. expansion rules
  41. 41. flexible matching
  42. 42. cyclin dependent kinase 1
  43. 43. cyclin-dependent kinase 1
  44. 44. CDC2
  45. 45. hCdc2
  46. 46. “black list”
  47. 47. SDS
  48. 48. proteins
  49. 49. small molecules
  50. 50. compartments
  51. 51. tissues
  52. 52. diseases
  53. 53. information extraction
  54. 54. count co-mentioning
  55. 55. within documents
  56. 56. within paragraphs
  57. 57. within sentences
  58. 58. corpora
  59. 59. ~22 million abstracts
  60. 60. no access
  61. 61. ~4 million full-text articles
  62. 62. interface design
  63. 63. ease of use
  64. 64. web resources
  65. 65. simple search interface
  66. 66. complex relational database
  67. 67. attractiveness
  68. 68. data visualization
  69. 69. STRING
  70. 70. Szklarczyk, Franceschini et al., Nucleic Acids Research, 2011
  71. 71. payload
  72. 72. compartments.jensenlab.org
  73. 73. COMPARTMENTS
  74. 74. compartments.jensenlab.org
  75. 75. TISSUES
  76. 76. tissues.jensenlab.org
  77. 77. provenance
  78. 78. evidence viewers
  79. 79. DISEASES
  80. 80. reusability
  81. 81. web services
  82. 82. download files
  83. 83. open licenses
  84. 84. AcknowledgmentsProteinnetworksChristian von MeringDamian SzklarczykMichael KuhnManuel StarkSamuel ChaffronChris CreeveyJean MullerTobias DoerksPhilippe JulienAlexander RothMilan SimonovicJan KorbelBerend SnelMartijn HuynenLiterature miningSune FrankildEvangelos PafilisJanos BinderKalliopi TsafouAlberto SantosHeiko HornMichael KuhnNigel BrownReinhardt SchneiderSean O’Donoghue

×