Upcoming SlideShare
×

# List rank whitepaper

295 views

Published on

Published in: Education, Technology
0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total views
295
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
3
0
Likes
0
Embeds 0
No embeds

No notes for slide

### List rank whitepaper

1. 1. The  ListRank  Algorithm  1     The ListRank Algorithm Ian  Gregorio-­‐de  Souza,  John  Murphy   White  Paper   1  October,  2010   Thayer  School  of  Engineering   Dartmouth  College   Hanover,  NH  03766     It  is  important  for  large-­‐scale  comparisons  of  user  profiles  to  be  able  to  directly   compare  two  ranked  lists.  We  developed  an  algorithm  called  ListRank  to  estimate   the  difference,  based  on  the  presence  of  items  in  the  two  lists  and  their  relative  rank   in  their  respective  lists.  This  asymmetric  method  treats  one  ranked  list  as  a  base  and   produces  a  score  against  a  candidate  list.  A  higher  score  indicates  a  closer  match:  1   point  is  awarded  for  each  member  of  the  base  list  that  appears  in  the  candidate  list.   Further  points  are  awarded  for  the  similarity  in  rank.  If  the  base  list  is  length  Nb  and   the  candidate  list  is  rank  Nc,  Nm  is  the  larger  of  {Nb,  Nc}.  The  award  s  for  rank   similarity  between  an  item  at  rank  i  in  the  base  list  and  rank  j  in  the  candidate  list,   then,  is     s = N m − i − j   The  maximum  score  for  a  pair  of  ranked  lists,  then,  is  Nb+NbNc.  Normalizing  the   score  against  this  maximum  gives  a  ListRank  score  between  0  and  1.  A  symmetrized  € version  can  be  made  by  simply  adding  ListRank(a,b)  +  ListRank(b,a).  Figure  1  shows   a  pseudocode  block,  for  clarity.     Figure  1  ListRank  pseudocode  block
2. 2. 2  The  ListRank  Algorithm    Once  ListRank  has  been  applied,  symmetrically  or  asymmetrically,  to  a  set  of  profiles,  there  are  several  ways  to  plot  those  comparisons.  For  example,  Figure  2  shows  the  result  of  intercomparisons  of  a  series  of  weekly  profiles  for  one  user’s  list  of  hosts  and  ports  contacted,  ranked  according  to  traffic  volume.  (It  uses  the  symmetrized  version  of  the  algorithm)  This  user’s  behavior  changed  over  time,  with  the  first  three  weeks  being  very  self-­‐similar,  and  the  latter  weeks  being  self-­‐similar,  but  the  two  blocks  of  weeks  were  not  very  similar  to  each  other.    Figure  2  Weekly  profiles  for  one  user  compared  against  each  other  by  ListRank  show  gradual  change  over  time  Change  over  time  is  frequently  useful  to  know,  and  there  are  several  ways  to  graph  time  histories.  One  useful  view  is  to  compare  one  user’s  baseline  against  other  users’  monthly  profiles  to  get  a  sense  at  the  same  time  of  when  other  users’  behavior  most  matched  this  one.    Figure  3  shows  one  such  representation  for  a  set  of  users  who  were  profiled  in  terms  of  their  use  of  wireless  access  points  on  the  Dartmouth  Campus.  Here,  user  #20’s  long-­‐term  behavior  (that  is,  the  ranked  list  of  his  all-­‐time  most-­‐used  access  points)  was  compared  against  one-­‐month  profiles  for  a  set  of  users  over  the  course  of  a  year.  The  results  are  plotted  as  vertical  lines,  showing  the  range  in  score  over  that  year.  User  #20  sticks  out  with  self-­‐comparison  scores  consistently  high  relative  to  other  users.    Other  users  scored  higher  or  lower  according  to  time:  User  #10  occasionally  matched  with  a  somewhat  high  score,  but  also  scored  as  low  as  a  zero  match.  A  graph  of  this  type  allows  the  reader  to  quickly  see  just  how  much  these  similarity  scores  varied  over  the  measured  time  period,  and  not  only  which  users  ever  had  similar  behavior,  but  how  consistently  similar  that  behavior  was.
3. 3. The  ListRank  Algorithm  3      Figure  3  ListRank  ranges  comparing  the  baseline  profile  for  user  #20  to  all  other  users  in  terms  of  their  monthly  short-­term  profiles.  For  this  metric,  a  higher  score  indicates  a  better  match  Another  view  of  this  same  graph  is  shown  in  Figure  4,  a  45-­‐degree  rotation.  Here,  the  range  is  not  as  visible,  but  the  changes  over  time  become  apparent.  User  #20’s  behavior  with  respect  to  his  baseline  changed,  with  peaks  at  different  times  of  year.  Also,  User  #3’s  similarity,  which  was  unclear  above,  is  shown  here  to  be  due  entirely  to  a  single  spike  in  the  month  of  March.
4. 4. 4  The  ListRank  Algorithm      Figure  4  A  side  view  of  the  ranges  in  Figure  3  showing  the  time  history  for  that  series  of  monthly  comparisons  to  the  baseline