• Save
EuroVis DocuBurst Presentation 2009
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

EuroVis DocuBurst Presentation 2009

  • 34,415 views
Uploaded on

 

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
34,415
On Slideshare
2,827
From Embeds
31,588
Number of Embeds
20

Actions

Shares
Downloads
0
Comments
0
Likes
2

Embeds 31,588

http://faculty.uoit.ca 27,150
http://www.cs.utoronto.ca 3,545
http://www.learnlogic.net 741
http://www.cs.toronto.edu 83
http://translate.googleusercontent.com 24
http://127.0.0.1 13
http://www.linkedin.com 6
http://www.slideshare.net 5
http://www.christophercollins.ca 3
https://library.g-c-i.net 3
http://209.85.129.132 3
http://74.125.155.132 2
http://webcache.googleusercontent.com 2
http://web.archive.org 2
http://fanyi.youdao.com 1
http://74.125.77.132 1
http://74.125.93.132 1
http://static.slidesharecdn.com 1
http://74.125.67.132 1
https://twitter.com 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. DocuBurst: Visualizing Document Content Using Language Structure EuroVis 2009 Christopher Collins, Sheelagh Carpendale, and Gerald Penn
  • 2. 2
  • 3. 3
  • 4. Document Content Visualization 4  Navigation in collections of digital text  Content analysis (digital humanities)  Plagiarism detection  Authorship attribution
  • 5. ...Using Language Structure 5  Traditional glyph techniques use unstructured word counts (e.g. tag clouds)  DocuBurst structure is based on a carefully designed ontology called WordNet
  • 6. WordNet Background 6  Basic data unit is a set of synonyms called a synset: {lawyer, attorney}, {jump, hop, skip}  Words can occur in multiple synsets: {bank, financial institution} {bank, slope, riverside}  Free resource from Princeton University
  • 7. Hyponymy Relation 7  X is a Y or X is a kind of Y  transitive, asymmetric relationship  example  {robin,redbreast} IS A {bird}  robin and redbreast are hyponyms of bird  forms the basic structure of the noun network {robin, redbreast} IS-A {bird} IS-A {animal, animate_being} IS-A {organism, life_form, living_thing} IS-A {entity}
  • 8. Creating DocuBurst 8 gamesgame takentake absolute,noun,10 chair,noun,2 moment,noun,11 game,noun,30 reality,noun,3 take,verb,13 represent,verb,17 ... game IS activity chair IS furniture
  • 9. Hyponymy Structure
  • 10. Word Sense Ambiguity 10  Man = {mankind,world}, {male human}, ...  Water = {H2O}, {water supply}, {body of water}, ...  Word senses are roughly ordered by frequency in WordNet
  • 11. Alternative Scoring Models 11  Count for all senses  undue prominence to ambiguous words  Count first sense only  loses too much information  Divide by sense count (same for all senses)  high penalty on polysemous words  Divide by sense index  decreased prominence for uncommon senses
  • 12. Visual Encoding 12  Node Size: # of leaves in subtree  Stability across documents  Node Position: IS-A relation  Multi-level linguistic abstraction  Additive (2 ducks + 3 geese = 5 birds)  Node Hue: sense index  Differentiates subtrees  Node Saturation: word count  Ordering & approximate scale is perceived  Node Label: First word in synset  Words are ordered by commonality in the language, reveals well-known words
  • 13. Node Colouring Alternatives 13 Cumulative Counts Single Node Counts Supports Visual Summaries Supports Precision and Selection
  • 14. 14 Interaction
  • 15. Trace-to-Root 15 Cattle IS-A bovine IS-A bovid IS-A ... Mammal IS-A vertebrate IS-A chordate IS-A animal
  • 16. Roll Up 16
  • 17. Drill Down 17
  • 18. 18
  • 19. 19
  • 20. Concordance 20
  • 21. Level of Detail Filter 21  Nodes > N away from root are hidden
  • 22. Search 22
  • 23. 23 Design Trade-Offs
  • 24. Node Size Mapping 24  Size by # leaves + consistent – visual artifacts (highly relevant words with few leaves are too small)  Size by score + redundant encoding + important words more prominent – disrupts inter-document comparison
  • 25. Font Size Mapping 25  Size to fit cell + maximize legibility – short words have huge font  Font size proportional to cell size + short words not more prominent – small maximum size to accommodate long words
  • 26. Inclusion of Zero-count Words 26 + provides context (what is not in document) – more cluttered
  • 27. 27 Case Studies
  • 28. 28
  • 29. 29
  • 30. 30
  • 31. 31
  • 32. 2008 U.S. Presidential Debate 32
  • 33. Unexpected Uses 33  WordNet Visualization
  • 34. Unexpected Uses 34  WordNet Visualization
  • 35. Unexpected Uses 35  Language Education  “invaluable potential for writing and vocabulary development at the secondary level”  “I'm very interested in using the program, I'm an English teacher”
  • 36. 36 Related Work
  • 37. Types of Document 37 Visualization
  • 38. Features of Document 38 Visualization  Semantic: indicate meaning  Cluster: generalize into concepts  Overview: provide quick gist  Zoom: support varying level of detail  Compare: multi-document comparisons  Search: find specific words/phrases  Read: drill-down to original text  Pattern: reveal patterns of repetition  Features: reveal extracted features such as emotion  Suggest: automatically select interesting focus words  Phrases: can show multi-word phrases  All words: can show all parts of speech
  • 39. Features of Document 39 Visualization
  • 40. Semantics & Clustering 40  Provides word definitions and relations  Clusters of related terms allow variable level of abstraction
  • 41. Phrases & All Words 41  Cannot visualize multi-word phrases that are not ‘words’ in WordNet  Only English nouns, verbs
  • 42. 42 Future Work
  • 43. Uneven Tree Cut Models 43
  • 44. 44
  • 45. DocuBurst Comparative Views 45  Embed small multiples in e-libraries  Colour scale based on text difference  From each other  From corpus average
  • 46. Simplification 47  Root suggestion  How to know where to start exploring?  Word sense disambiguation  Attempt to select a sense  Use a less detailed ontology
  • 47. Thanks for your Attention! Acknowledgements: Ravin Balakrishnan and helpful reviewers. Contact: ccollins@cs.utoronto.ca EuroVis 2009 Christopher Collins, Sheelagh Carpendale, and Gerald Penn