Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Lecture: Semantic Word Clouds


Published on

folksonomy, social tagging, tag clouds, automatic folksonomy construction, word clouds, wordle,context-preserving word cloud visualisation, CPEWCV, seam carving, inflate and push, star forest, cycle cover, quantitative metrics, realized adjacencies, distortion, area utilization, compactness, aspect ratio, running time, semantics in language technology

Published in: Education
  • Be the first to comment

  • Be the first to like this

Lecture: Semantic Word Clouds

  1. 1. Seman&c  Analysis  in  Language  Technology 
 Semantic Word Clouds Marina  San(ni   san$     Department  of  Linguis(cs  and  Philology   Uppsala  University,  Uppsala,  Sweden     Spring  2016      
  2. 2. Previous  lecture:  Ontologies   2  
  3. 3. Semantic Web & Ontologies •  The  goal  of  the  Seman(c  Web  is  to  allow  web  informa(on  and  services  to  be  more   effec(vely  exploited  by  humans  and  automated  tools.     •  Essen(ally,  the  focus  of  the  seman(c  web  is  to  share  data  instead  of  documents.     •  This  data  must  be  ”meaningful”  both  for  human  and  for  machines  (ie  automated  tools  and   web  applica(ons)   •  Q:  How  are  we  going  to  represent  meaning  and  knowledge  on  the  web?   •  A:  …  via  annota&on.     •  Knowledge  is  represented  in  the  form  of  rich  conceptual  schemas/formalisms  called   ontologies.     •  Therefore,  ontologies  are  the  backbone  of  the  Seman(c  Web.   •  Ontologies  give  formally  defined  meanings  to  the  terms  used  in  annota&ons,  transforming   them  into  seman&c  annota&ons.   3
  4. 4. Ontologies  are…   •  …  concepts  that  are   hierarchically   organized   4   Tree  of  Porphyry,  III  AD   Wordnet,  XXI  AD  (see  Lect  5,  ex  similarity  measures)  
  5. 5. Reasoning:   RDF/OWL  vs  Databases  (and  other  data  structures)   OWL  axioms  behave  like  inference  rules  rather  than  database  constraints.     ! Class: Phoenix! !SubClassOf: isPetOf only Wizard! ! Individual: Fawkes! Types: Phoenix! Facts: isPetOf Dumbledore! •  Fawkes  is  said  to  be  a  Phoenix  and  to  be  the  pet  of  Dumbledore,  and  it  is  also  stated  that  only  a   Wizard  can  have  a  pet  Phoenix.     •  In  OWL,  this  leads  to  the  implica(on  that  Dumbledore  is  a  Wizard.  That  is,  if  we  were  to  query  the   ontology  for  instances  of  Wizard,  then  Dumbledore  would  be  part  of  the  answer.     •  In  a  database  se[ng  the  schema  could  include  a  similar  statement  about  the  Phoenix  class,  but  in   this  case  it  would  be  interpreted  as  a  constraint  on  the  data:  adding  the  fact  that  Fawkes  isPetOf   Dumbledore  without  Dumbledore  being  already  known  to  be  a  Wizard  would  lead  to  an  invalid   database  state,  and  such  an  update  would  therefore  be  rejected  by  a  database  management   system  as  a  constraint  viola(on.   5  
  6. 6. So, what is an ontology for us? 6 “An  ontology  is  a  FORMAl,  EXPLICIT  specifica&on  of  a    SHARED  conceptualiza&on”   Studer,  Benjamins,  Fensel.  Knowledge  Engineering:  Principles  and  Methods.  Data  and  Knowledge  Engineering.  25  (1998)  161-­‐197     An ontology is an explicit specification of a conceptualization Gruber, T. A translation Approach to portable ontology specifications. Knowledge Acquisition. Vol. 5. 1993. 199-220 Abstract model and simplified view of some phenomenon in the world that we want to represent Machine-readable Concepts, properties relations, functions, constraints, axioms, are explicitly defined Consensual Knowledge
  7. 7. How  to  build  an  ontology   Generally  speaking  (and  roughly  said),  when   designing  an  ontology,  four  main  components   are  used:   1.  Classes   2.  Rela(ons   3.  Axioms   4.  Instances       7  
  8. 8. Prac(cal  Ac(vity:  emo(ons   8   Your  remarks:   •  Emo(ons  are  ambiguous:   eg.  happiness  can  be  also   ill-­‐directed   •  The  polarity  of  some   emo(ons  cannot  be   assessed…   •  etc.       Classes   Rela(ons   Axioms   Instances   etc.    
  9. 9. Occupa(onal  psychology  (wikipedia)   •  Industrial  and  organiza(onal  psychology  (also  known  as  I–O   psychology,  occupa(onal  psychology,  work  psychology,  WO   psychology,  IWO  psychology  and  business  psychology)  is  the   scien$fic  study  of  human  behavior  in  the  workplace  and  applies   psychological  theories  and  principles  to  organiza(ons  and   individuals  in  their  workplace.   •  I-­‐O  psychologists  are  trained  in  the  scien(st–prac((oner  model.  I-­‐O   psychologists  contribute  to  an  organiza(on's  success  by  improving   the  performance,  mo(va(on,  job  sa(sfac(on,  occupa(onal  safety   and  health  as  well  as  the  overall  health  and  well-­‐being  of  its   employees.  An  I–O  psychologist  conducts  research  on  employee   behaviors  and  a[tudes,  and  how  these  can  be  improved  through   hiring  prac(ces,  training  programs,  feedback,  and  management   systems.   9  
  10. 10. In  summary…   Why  to  build  an  ontology?     •  To  share  common  understanding  of  the  structure   of  informa(on  among  people  or  machines   •  To  make  domain  assump$ons  explicit   •  Ojen  based  on  controlled  vocabulary   •  To  analyze  domain  knowledge   •  To  enable  reuse  of  domain  knowledge   10  
  11. 11. Ontologies  and  Tags   •  Ontologies  and  tagging  systems  are  two  different   ways  to  organize  the  knowledge  present  in  Web.     •  The  first  one  has  a  formal  fundamental  that   derives  from  descrip(ve  logic  and  ar(ficial   intelligence.  Domain  experts  decide  the  terms.   •  The  other  one  is  simpler  and  it  integrates   heterogeneous  contents,  and  it  is  based  on  the   collabora(on  of  users  in  the  Web  2.0.  User-­‐   generated  annota(on.     11  
  12. 12. Folksonomies   •  Tagging  facili(es  within  Web  2.0  applica(ons   have  shown  how  it  might  be  possible  for  user   communi$es  to  collabora$vely  annotate  web   content,  and  create  simple  forms  of  ontology   via  the  development  of  loosely-­‐hierarchically   organised  sets  of  tags,  oNen  called   folksonomies….     12  
  13. 13. Folksonomy=Social  Tagging   •  Folksonomies  (also  known  as  social  tagging)  are   user-­‐defined  metadata  collec(ons.     •  Users  do  not  deliberately  create  folksonomies   and  there  is  rarely  a  prescribed  purpose,  but  a   folksonomy  evolves  when  many  users  create  or   store  content  at  par(cular  sites  and  iden(fy  what   they  think  the  content  is  about.     •  “Tag  clouds”  pinpoint  the  frequency  of  certain   tags.   13  
  14. 14. •  A  common   way  to   organize  tags   is  in  tag   clouds…   14  
  15. 15. Automa(c  folksonomy  construc(on   •  The  collec(ve  knowledge  expressed  though  user-­‐ generated  tags  has  a  great  poten(al.     •  However,  we  need  tools  to  efficiently  aggregate   data  from  large  numbers  of  users  with  highly   idiosyncra$c  vocabularies  and  invented  words   or  expressions.     •  Many  approaches  to  automa(c  folksonomy   construc(on  combine  tags  using  sta(s(cal   methods  ...     •  Ample  space  for  improvement…   15  
  16. 16. Ontology,  taxonomy,  folksonomy,  etc.     •  Many  different  defini(ons…   •  A  good  summary  and  interpreta(on  is  here:   hpp://­‐ ontologies-­‐0602     16  
  17. 17. Today…   •  We  will  talk  more  generally  about  word   clouds…   17  
  18. 18. Further  Reading   Seman&c  Similarity  from  Natural  Language  and  Ontology  Analysis   by  Sébas(en  Harispe,  Sylvie  Ranwez,  Stefan  Janaqi,  and  Jacky   Montmain   Synthesis  Lectures  on  Human  Language  Technologies,  May  2015,  Vol.   8,  No.  1   •  The  two  state-­‐of-­‐the-­‐art  approaches  for  es(ma(ng  and  quan(fying   seman(c  similari(es/relatedness  of  seman(c  en((es  are  presented   in  detail:  the  first  one  relies  on  corpora  analysis  and  is  based  on   Natural  Language  Processing  techniques  and  seman(c  models   while  the  second  is  based  on  more  or  less  formal,  computer-­‐ readable  and  workable  forms  of  knowledge  such  as  seman(c   networks,  thesauri  or  ontologies.   18  
  19. 19. Previous  lecture:  the  end   19  
  20. 20. Acknowledgements   This  presenta(on  is  based  on  the  following  paper:     •  Barth  et  al.  (2014).  Experimental  Comparison  of  Seman(c   Word  Cloud.  In  Experimental  Algorithms,  Volume  8504  of  the   series  Lecture  Notes  in  Computer  Science  pp  247-­‐258     –  Link:  hpps://       Some  slides  have  been  borrowed  from  Sergey  Pupyrev.   20  
  21. 21. Today   •  Experiments  on  seman&cs-­‐preserving  word   clouds,  in  which  seman(cally  related  words   are  close  to  each  other.   21  
  22. 22. Outline   •  What  is  a  Word  Cloud?   •  3  early  algorithms   •  3  new  algorithms   •  Metrics  &  Quan(ta(ve  Evalua(on   22  
  23. 23. Word  Clouds   •  Word  clouds  have  become  a  standard  tool  for   abstrac(ng,  visualizing  and  comparing  texts…   •  We  could  apply  the  same  or  similar   techniques  to  the  huge  amonts  of  tags   produced  by  users  interac(ng  in  the  social   networks     23  
  24. 24. Comparison  &  conceptualiza(on  Tool   24   •  Word  Clouds  as  a  tool  for  ”conceptualizing”  documents.  Cf   Ontologies   •  Ex:  2008,    comparison  of  speeches:  Obama  vs  McCain   Cf.  Lect  10:   Extrac(ve   summariza(on  &     Abstrac(ve   summariza(on  
  25. 25. Word  Clouds  and  Tag  Clouds…   •  …  are  ojen  used  to  represent  importance   among  terms  (ex,  band  popularity)  or  serve  as   a  naviga(on  tool  (ex,  Google  search  results).   25  
  26. 26. The  Problem…   • How  to  compute  seman(c-­‐preserving  word   clouds  in  which  seman(cally-­‐related  words   are  close  to  each  other?   26  
  27. 27. Wordle   hpp://     •  Prac(cal  tools,  like  Wordle,   make  word  cloud  visualiza(on   easy.   They  offer  an  appealing  way   to  SUMMARIZE  text…   Shortoming:  they  do  not  capture   the  rela(onships  between  words  in   any  way  since  word  placement  is   independent  of  context   27  
  28. 28. Many  word  clouds  are  arranged  randomly  (look   also  at  the  scapered  colours)   28  
  29. 29. Paperns  and  Vicinity/Adjacency   Humans  are  spontaneously  papern-­‐seekers:     if  they  see  two  words  close  to  each  other  in  a   word  cloud,  they  spontaneously  think  they  are   related…   29  
  30. 30. In  Linguis(cs  and  NLP…   •  This  natural  tendency  in  linking  spacial  vicinity   to  seman&c  relatedness  is  exploited  as   evidence  that  words  are  seman(cally  related   or  seman(cally  similar…   Remember?  :  ”You  shall  know  a  word  by  the   company  it  keeps  (Firth,  J.  R.  1957:11)”     30  
  31. 31. So,  it  makes  sense  to  place  such  related  words  close   to  each  other  (look  also  at  the  color  distribu(on)   31  
  32. 32. Seman(c  word  clouds  have  higher  user   sa(sfac(on  compared  to  other  layouts…   32  
  33. 33. All  recent  word  cloud  visualiza(on  tools  aim  to   incoprorate  seman(cs  in  the  layout…     33  
  34. 34. …  but  none  of  them  provide  any  guarantee  about  the   quality  of  the  layout  in  terms  of  seman(cs   34  
  35. 35. Early  algorithms:  Force-­‐Directed  Graph   •  Most  of  the  exis(ng  algorithms  are  based   on  force-­‐directed  graph  layout.     •  Force-­‐directed  graph  drawing  algorithms   are  a  class  of  algorithms  for  drawing   graphs  in  an  aesthe(cally  pleasing  way   –  Aprac(ve  forces  between  pairs  to  reduce   empty  space   –  Repulsive  forces  ensure  that  words  do  not   overlap   –  Final  force  preserve  seman(c  rela(ons   between  words.     35   Some  of  the  most  flexible   algorithms  for  calcula(ng   layouts  of  simple  undirected   graphs  belong  to  a  class   known  as  force-­‐directed   algorithms.  Such  algorithms   calculate  the  layout  of  a   graph  using  only   informa(on  contained   within  the  structure  of  the   graph  itself,  rather  than   relying  on  domain-­‐specific   knowledge.  Graphs  drawn   with  these  algorithms  tend   to  be  aesthe(cally  pleasing,   exhibit  symmetries,  and   tend  to  produce  crossing-­‐ free  layouts  for  planar   graphs.  
  36. 36. Newer  Algorithms:  rectangle   representa(on  of  graphs   •  Vertex-­‐weighted  and  edge-­‐weighed  graph:   –  The  ver(ces  of  the  graph  are  the  words   •  Their  weight  correspond  to  some  measure  of  importance   (eg.  word  frequencies)   –  The  edges  capture  the  seman(c  relatedness  of  pair  of   words  (eg.  co-­‐occurrence)   •  Their  weight  correspond  to  the  strength  of  the  rela(on   –  Each  vertex  can  be  drawn  as  a  box  (rectangle)  with  a   dimension  determing  by  its  weight   –  A  realized  adjacency    is  the  sum  of  the  edge  weights   for  all  pairs  of  touching  boxes.     –  The  goal  is  to  maximize  the  realized  adjacencies.   36  
  37. 37. Purpose  of  the  experiments  that  are  shown   here:   •  Seman(cs  preserva(on  in  terms  of  closeness/ vicinity/adjacency   37  
  38. 38. Example   •  A  contact  of  2  boxes  is  a  common  boundary.   •  The  contact  of  two  boxes  is  interpredet  as   seman(c  relatedness   •  The  contact  of  2  boxes  can  be  calculated,  so  the   adjacency  can  be  computed  and  evaluated.   38  
  39. 39. Preprocessing:     1)  Term  Extrac(on     2)  Ranking     3)  Similarity/Dissimilarity  Computa(on   39  
  40. 40. •  Similarity/dissimilarity  matrix   40  
  41. 41. Lect  6:   Repe((on   large   data   computer   apricot   1   0   0   digital   0   1   2   informa(on   1   6   1   41   Which  pair  of  words  is  more  similar?   cosine(apricot,informa(on)  =         cosine(digital,informa(on)  =         cosine(apricot,digital)  =     cos(  v,  w)=  v•  w  v  w =  v  v •  w  w = viwii=1 N ∑ vi 2 i=1 N ∑ wi 2 i=1 N ∑ 1+0+0 1+0+0 1+36+1 1+36+1 0+1+4 0+1+4 1+0+0 0+6+2 0+0+0 = 1 38 =.16 = 8 38 5 =.58 = 0
  42. 42. Lect  06:  Other  possible  similarity  measures   42  
  43. 43. Input  -­‐  Output   •  The  input  for  all  algorithms  is     – a  collec(on  of  n  rectangles,  each  with  a  fixed   width  and  height  propor(onal  to  the  rank  of  the   word   – A  similarity/dissimilarity  matrix   •  The  output  is  a  set  of  non-­‐overlapping   posi(ons  for  the  rectangles.   43  
  44. 44. Early  Algorithms   1.  Wordle  (Random)   2.  Context-­‐Preserving  Word  Cloud  Visualiza(on   (CPWCV)   3.  Seam  Carving   44  
  45. 45. Wordle  à  Random   •   The  Wordle  algorithm  places  one  word  at  a  (me   in  a  greedy  fashion,  ie  aiming  to  use  space  as   efficiently  as  possible.     •  First  the  words  are  sorted  by  weight/rank  in   decreasing  order.     •  Then  for  each  word  in  the  order,  a  posi(on  is   picked  at  random.     45  
  46. 46. 1:  Random   46  
  47. 47. 2:  Random   47  
  48. 48. 3:  Random   48  
  49. 49. 4:  Random   49  
  50. 50. 5:  Random   50  
  51. 51. 6:  Random   51  
  52. 52. Context-­‐Preserving  Word  Cloud  Visualiza(on  (CPWCV)     •  First,  a  dissimilarity  matrix  is  computed  and   Mul(dimensional  Scaling  (MDS)  is  performed   •  Second,  effort  to  create  a  compact  layout     52   Mul(dimensional  Scaling   (MDS)  aims  at  detec(ng   meaningful  underlying   dimensions  in  the  data.    
  53. 53. 1:  Context-­‐Preserving     53  
  54. 54. 2:  Context-­‐Preserving  :  repulsive  force   54  
  55. 55. 3:  Context-­‐Preserving  :  aprac(ve  force   55  
  56. 56. Seam  Carving     •  Basically,  an  algorithm  for  image  resizing   •  It  was  invented  at  Mitsubishi’s   56  
  57. 57. 1:  Seam  Carving   57  
  58. 58. 2:  Seam  Carving  :  space  is  divided  into   regions   58  
  59. 59. 3:  Seam  Carving  :  empty  paths   trimmed  out  itera(vely   59  
  60. 60. 4:  Seam  Carving   60  
  61. 61. 5:  Seam  Carving   61  
  62. 62. 6:  Seam  Carving:  space  divided  into   regions   62  
  63. 63. 7:  Seam  Carving   63  
  64. 64. 3  New  Algorithms   1.  Inflate  and  Push   2.  Star  Forest   3.  Cycle  Cover   64  
  65. 65. Inflate-­‐and-­‐Push   •  Simple  heuris(c  method  for  word  layout,  which  aims   to  preserve  seman(c  rela(ons  between  pair  of  words.   •  Based  on     1.  Heuris(cs:  scaling  down  all  word  rectangles  by  some   constant;     2.  Compu(ng  MDS  (mul(dimensional  scaling)  on  the   dissimilarity  matrix   3.  Iteretavely  increase  the  size  of  rectangles  by  5%  (ie   ”inflate”  words;     4.  When  words  overlaps,  apply  a  force-­‐directed  algorithm   to  ”push”  words  away.   65  
  66. 66. Inflate:  star(ng  point   66  
  67. 67. Inflate  :  scaling  down   67  
  68. 68. Inflate  :  seman(cally-­‐related  words  are  placed  close   to  each  other.  Apply  ”inflate  words”  (5%)  itera(vely.   68  
  69. 69. Inflate:  ”push  words”:  repulsive  force   to  resolve  overlaps   69  
  70. 70. Inflate:  final  stage   70  
  71. 71. Star  Forest   •  A  star  is  a  tree     •  A  star  forest  is  a  forest  whose  connected   components  are  all  stars.   71  
  72. 72. Repe((on:  trees  and  graphs   •  A  tree  is  special  form  of  graph  i.e.  minimally   connected  graph  and  having  only  one  path  between   any  two  ver(ces.     •  In  a  graph  there  can  be  more  than  one  path  i.e.  graph   can  have  uni-­‐direc(onal  or  bi-­‐direc(onal  paths  (edges)   between  nodes.   72  
  73. 73. Three  steps   1.  Extrac(ng  the  star  forest:  par&&on  a  graph   into  disjoint  stars     2.  Realising  a  star:  build  a  word  cloud  for  every   star   3.  Pack  all  the  stars  together   73  
  74. 74. Star  Forest  :  star  =  tree   1.  Extract  stars  greedily  from  a  dissimilarity  matrix  à  disjoint  stars  =  star  forest   2.  Compute  the  op(mal  stars,  ie  the  best  set  of  words  to  be  adjacent   3.  Aprac(ve  force  to  get  a  compact  layout   74  
  75. 75. Cycle  Cover   •  This  algorithm  is  based  on  a  similarity  matrix.   •  First,  a  similarity  path  is  created   •  Then,  the  op(mal  level  of  compact-­‐ness  is  computed   75  
  76. 76. Quan(ta(ve  Metrics   76   1.  Realized  Adjacenies   –  how  close  are  similar  words  to  each   other?   2.  Distor(on   –  how  distant  are  dissimilar  words?   3.  Uniform  Area  U(liza(on   –  uniformity  of  the  distribu(on   (overpopulated  vs  sparse  areas  in   the  word  cloud)   4.  Comptactness   –  how  well  u(lized  is  the  drawing   area?   5.  Aspect  Ra(o   –  width  and  height  of  the  bounding   box   6.  Running  Time   –  execu(on  (me  
  77. 77. 2  datasets    (1)  WIKI  ,  a  set  of  112    plain-­‐text  ar(cles   extracted  from  the  English  Wikipedia,  each   consis(ng  of  at  least  200    dis(nct  words     (2)  PAPERS  ,  a  set  of  56    research  papers   published  in  conferences  on  experimental   algorithms  (SEA  and  ALENEX)  in  2011-­‐2012.   77  
  78. 78. Cycle  Cover  wins   78  
  79. 79. Seam  Carving  wins   79  
  80. 80. Random  wins   80  
  81. 81. Inflate  wins   81  
  82. 82. Random  and  Seam  Carving  win   82  
  83. 83. All  ok  except  Seam  Carving     83  
  84. 84. Demo   84  
  85. 85. The  end   85