Semantic annotation, clustering      and visualization                Media Technology Msc Programme  David Graus         ...
David Graus   Media Technology Msc Programme   07/02/2012Introduction
David Graus   Media Technology Msc Programme                               07/02/2012Cyttron DB entry                   "T...
David Graus           Media Technology Msc Programme        07/02/2012Tasks 1.      Semantic annotation         Identify a...
David Graus            Media Technology Msc Programme   07/02/20121. Semantic Annotation  Method I: Find words       Metho...
David Graus   Media Technology Msc Programme                            07/02/2012Semantic Annotation: Method I           ...
David Graus    Media Technology Msc Programme   07/02/2012Formal knowledge: Biomedical Ontology
David Graus                  Media Technology Msc Programme                       07/02/2012NCI Thesaurus                8...
David Graus                            Media Technology Msc Programme   07/02/2012Semantic Annotation: Method I           ...
David Graus                            Media Technology Msc Programme   07/02/2012Semantic Annotation: Method I           ...
David Graus                            Media Technology Msc Programme   07/02/2012Semantic Annotation: Method I           ...
David Graus                       Media Technology Msc Programme                        07/02/2012Example     "The volume ...
David Graus                       Media Technology Msc Programme                        07/02/2012Example     "The volume ...
David Graus                         Media Technology Msc Programme   07/02/2012Semantic Annotation: Method I       2 ‘Modi...
David Graus                              Media Technology Msc Programme                                        07/02/2012D...
David Graus        Media Technology Msc Programme    07/02/2012Semantic Annotation: Method I  6 Representations (literal +...
David Graus                   Media Technology Msc Programme   07/02/2012Method II: Text Comparison Find concepts that mig...
David Graus                           Media Technology Msc Programme                            07/02/2012Compare text to ...
David Graus                   Media Technology Msc Programme   07/02/2012Method II: Text Comparison Find concepts that mig...
David Graus                   Media Technology Msc Programme   07/02/2012Compare how?       Bag of Words + TF-IDF       Di...
David Graus                           Media Technology Msc Programme                  07/02/2012Method II: Text Comparison...
David Graus                    Media Technology Msc Programme   07/02/2012Method II: Text Comparison       Different cut-o...
David Graus            Media Technology Msc Programme   07/02/2012Result       Long list of (linked) concepts       Releva...
David Graus             Media Technology Msc Programme   07/02/2012Find clusters       Measure semantic similarity between...
David Graus   Media Technology Msc Programme   07/02/2012
David Graus          Media Technology Msc Programme   07/02/2012To do        Get data!        Analyse algorithms
Upcoming SlideShare
Loading in …5
×

Semantic annotation, clustering and visualization

786 views

Published on

"Practise" presentation of my MSc thesis I did for the Leiden University Bio-imaging Group. More information @ http://graus.nu/category/thesis/

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
786
On SlideShare
0
From Embeds
0
Number of Embeds
23
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • So these are the 10 most similar concepts returned
  • Example of a connectedgraph.I want to explore the possibilities of visualizing the results, withvarying node (circle) sizesfor more and less important concepts.Colored and transparant circlesforliteral and non-literalconcepts.Conveying the information from the text in a graph.This might also help with analyzing the differences of my method vs. that of humans.
  • Semantic annotation, clustering and visualization

    1. 1. Semantic annotation, clustering and visualization Media Technology Msc Programme David Graus Graduation Project Supervisor: Joris Slob
    2. 2. David Graus Media Technology Msc Programme 07/02/2012Introduction
    3. 3. David Graus Media Technology Msc Programme 07/02/2012Cyttron DB entry "The volume of the brain evaluated in this study. The color scale represents the number of 4-mm voxels with data in at least 7 subjects along a 3-cm deep line into the brain. A three-dimensional rendering of a brain is shown in regions where insufficient data were obtained. The most superior regions of the frontal and parietal lobes and the most inferior regions of the temporal lobes were not evaluated. Imaging artifacts may also compromise the significance of results in the most inferior portions of the frontal lobe."
    4. 4. David Graus Media Technology Msc Programme 07/02/2012Tasks 1. Semantic annotation Identify and tag most important concepts from text [NLP] 2. Topic extraction Relate concepts and find clusters [Linked Data] 3. Visualization Draw resulting graphs and clusters [Datavisualization]
    5. 5. David Graus Media Technology Msc Programme 07/02/20121. Semantic Annotation Method I: Find words Method II: Compare texts
    6. 6. David Graus Media Technology Msc Programme 07/02/2012Semantic Annotation: Method I "The volume of the brain evaluated in this study. The color scale represents the number of 4-mm voxels with data in at least 7 subjects along a 3-cm deep line into the brain. A three-dimensional rendering of a brain is shown in regions where insufficient data were obtained. The most superior regions of the frontal and parietal lobes and the most inferior regions of the temporal lobes were not evaluated. Imaging artifacts may also compromise the significance of results in the most inferior portions of the frontal lobe."
    7. 7. David Graus Media Technology Msc Programme 07/02/2012Formal knowledge: Biomedical Ontology
    8. 8. David Graus Media Technology Msc Programme 07/02/2012NCI Thesaurus 89.129 unique concepts 50.804 definitions 258.051 synonyms Relations! Concept Agrobacterium tumefaciens Definition A species of Gram negative, rod shaped bacteria assigned to the phylum Proteobacteria. This bacteria is motile by flagella and mediates the horizontal gene transfer of its Ti plasmid to infect plants. A. tumefaciens is commonly found in soil and around the root surfaces of plants and is the causative agent of crown gall disease. Synonyms RHIZOBIUM RADIOBACTER CDC GROUP VD-3
    9. 9. David Graus Media Technology Msc Programme 07/02/2012Semantic Annotation: Method I "The volume of the brain evaluated in this study. The color scale represents the number of 4-mm voxels with data in at least 7 subjects along a 3-cm deep line into the brain. A three-dimensional rendering of a brain is shown in regions where insufficient data were obtained. The most superior regions of the frontal and parietal lobes and the most inferior regions of the temporal lobes were not evaluated. Imaging artifacts may also compromise the significance of results in the most inferior portions of the frontal lobe."
    10. 10. David Graus Media Technology Msc Programme 07/02/2012Semantic Annotation: Method I "The volume of the brain evaluated in this study. The color scale represents the number of 4-mm voxels with data in at least 7 subjects along a 3-cm deep line into the brain. A three-dimensional rendering of a brain is shown in regions where insufficient data were obtained. The most superior regions of the frontal and parietal lobes and the most inferior regions of the temporal lobes were not evaluated. Imaging artifacts may also compromise the significance of results in the most inferior portions of the frontal lobe."
    11. 11. David Graus Media Technology Msc Programme 07/02/2012Semantic Annotation: Method I "The volume of the brain evaluated in this study. The color scale represents the number of 4-mm voxels with data in at least 7 subjects along a 3-cm deep line into the brain. A three-dimensional rendering of a brain is shown in regions where insufficient data were obtained. The most superior regions of the frontal and parietal lobes and the most inferior regions of the temporal lobes were not evaluated. Imaging artifacts may also compromise the significance of results in the most inferior portions of the frontal lobe."
    12. 12. David Graus Media Technology Msc Programme 07/02/2012Example "The volume of the brain evaluated in this study. The color scale represents the number of 4-mm voxels with data in at least 7 subjects along a 3-cm deep line into the brain. A three- dimensional rendering of a brain is shown in regions where insufficient data were obtained. The most superior regions of the frontal and parietal lobes and the most inferior regions of the temporal lobes were not evaluated. Imaging artifacts may also compromise the significance of results in the most inferior portions of the frontal lobe." Most, Brain, A, Inferior, Data, And, With, Volume, Volume, Three, Temporal, Superior, Study, Scale, Parietal, Number, Lobe, Line, Into, Frontal Lobe, Deep, Color, At
    13. 13. David Graus Media Technology Msc Programme 07/02/2012Example "The volume of the brain evaluated in this study. The color scale represents the number of 4-mm voxels with data in at least 7 subjects along a 3-cm deep line into the brain. A three- dimensional rendering of a brain is shown in regions where insufficient data were obtained. The most superior regions of the frontal and parietal lobes and the most inferior regions of the temporal lobes were not evaluated. Imaging artifacts may also compromise the significance of results in the most inferior portions of the frontal lobe."
    14. 14. David Graus Media Technology Msc Programme 07/02/2012Semantic Annotation: Method I 2 ‘Modifiers’ of representations: 1. (Porter) Stemming (text & ontologyconcepts) Lobes – lobe Brains – brain Etc… 2. Generate synonyms (using WordNet)
    15. 15. David Graus Media Technology Msc Programme 07/02/2012Different text representations Most frequent brain, regions, data, evaluated, frontal, inferior, lobes, along, also, artifacts, words color, compromise, deep, dimensional, imaging, insufficient, least, line, lobe‘ Most frequent brain, color, deep, imaging, insufficient, line, lobe, number, rendering, scale, nouns significance, study, volume‘ Bigrams also compromise, artifacts may, cm deep, color scale, compromise significance, deep line, dimensional rendering, imaging artifacts, may also, mm voxels, represents number, scale represents, significance results, subjects along, data least, data obtained, evaluated study, frontal lobe, frontal parietal, inferior portions‘ Trigrams also compromise significance, artifacts may also, cm deep line, color scale represents, compromise significance results, imaging artifacts may, may also compromise, scale represents number, insufficient data obtained, mm voxels data, portions frontal lobe, […] Combo brain, regions, data, evaluated, frontal, inferior, lobes, along, also, artifacts, color, compromise, deep, dimensional, imaging, insufficient, least, line, lobe. brain, color, deep, imaging, insufficient, […]
    16. 16. David Graus Media Technology Msc Programme 07/02/2012Semantic Annotation: Method I 6 Representations (literal + 5 keyword variations) 4 Treatments (literal + stem + synonyms + both) 24 results
    17. 17. David Graus Media Technology Msc Programme 07/02/2012Method II: Text Comparison Find concepts that might not occur in text "The volume of the brain evaluated in this study. The color scale represents the number of 4-mm voxels with data in at least 7 subjects along a 3-cm deep line into the brain. A three-dimensional rendering of a brain is shown in regions where insufficient data were obtained. The most superior regions of the frontal and parietal lobes and the most inferior regions of the temporal lobes were not evaluated. Imaging artifacts may also compromise the significance of results in the most inferior portions of the frontal lobe."
    18. 18. David Graus Media Technology Msc Programme 07/02/2012Compare text to definitions Find relevant concepts based on their (textual) definitions "The volume of the brain evaluated in this study. The color scale represents the Parietal Lobe: One number of 4-mm of the lobes of the voxels with data in cerebral hemisphere at least 7 subjects located superiorly to along a 3-cm deep the occipital lobe and line into the brain. posteriorly to the A three-dim frontal lobe. Cognition and visuospatial processing are its Cyttron entry main functions. NCI Thesaurus definitions
    19. 19. David Graus Media Technology Msc Programme 07/02/2012Method II: Text Comparison Find concepts that might not occur in text "The volume of the brain evaluated in this study. The color scale represents the number of 4-mm voxels with data in at least 7 subjects along a 3-cm deep line into the brain. A three-dimensional rendering of a brain is shown in regions where insufficient data were obtained. The most superior regions of the frontal and parietal lobes and the most inferior regions of the temporal lobes were not evaluated. Imaging artifacts may also compromise the significance of results in the most inferior portions of the frontal lobe."
    20. 20. David Graus Media Technology Msc Programme 07/02/2012Compare how? Bag of Words + TF-IDF Dictionary: BioMedCentral Corpus > 100.000 articles > 8GB raw data Process Corpus Clean (strip tags, store only article body) Tokenize (create list of words) Remove common words (stopwords) Stem remaining words
    21. 21. David Graus Media Technology Msc Programme 07/02/2012Method II: Text Comparison Convert both texts to vector space using dictionary, compute similarity. Return most similar concepts. "The volume of the brain evaluated in this study. 1. Frontotemporal Dementia The color scale represents the number of 4-mm voxels with data in at least 7 subjects along a 3- 2. Parietal Lobe cm deep line into the brain. A three-dimensional 3. Area of Broca rendering of a brain is shown in regions where 4. Anterior Cranial Fossa insufficient data were obtained. The most superior regions of the frontal and parietal lobes 5. Brain Lobectomy and the most inferior regions of the temporal 6. Anterior Parietal Artery lobes were not evaluated. Imaging artifacts may also compromise the significance of results in the 7. Mammary Gland most inferior portions of the frontal lobe." 8. Frontal Lobe 9. Interlobar 10. Lobar
    22. 22. David Graus Media Technology Msc Programme 07/02/2012Method II: Text Comparison Different cut-off rules: 1. Anything over x% similar 2. 5 most similar 3. 10 most similar 4. 20% most similar 5. 10% most similar
    23. 23. David Graus Media Technology Msc Programme 07/02/2012Result Long list of (linked) concepts Relevancy?
    24. 24. David Graus Media Technology Msc Programme 07/02/2012Find clusters Measure semantic similarity between concepts - Shortest paths - Shared parents - Node’s ‘depth’
    25. 25. David Graus Media Technology Msc Programme 07/02/2012
    26. 26. David Graus Media Technology Msc Programme 07/02/2012To do  Get data!  Analyse algorithms

    ×