Death and Edits

417 views
309 views

Published on

an analysis of death spikes on celebrity wikipedia pages created for gslis 590mt informetrics

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
417
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Death and Edits

  1. 1. Death and editsMiles LincolnLIS590MT
  2. 2. Wikipedia!You probably already knowwhat it is!
  3. 3. Bursts in social networks Bursts of edits on Wikipedia in particular When do those occur?
  4. 4. What can we learn by looking atspikes in edit frequency? How have edit spikes changed over Wikipedia’s ten years of existence? Does the size of an edit spike correlate to anything?
  5. 5. Bursts in other social networks
  6. 6. Google Trends
  7. 7. Celebrity deaths!
  8. 8. Revision history
  9. 9. Revision history
  10. 10. But first… We need to process the data so that we can answer that question
  11. 11. Perl
  12. 12. Regular Expressions (Regex) Perl script uses regular expressions to find and output matching pieces of text. In this case, I am pulling out dates in Wikipedia’s day month year format and re-writing them in a more machine-readable MM/DD/YYYY format. 11/08/2011
  13. 13. Data manipulationCopy/pase the revision history of wikipages into a text document which Ifeed to my perl scriptResults in lists consisting of one dateper edit that occurred on that dateCopying/pasting isn’t superelegant, but I haven’t gottenLWP/useragent stuff to work yet
  14. 14. Excel! Throw my lists of dates into a pivot table, which shows me the frequency that each date occurs Some vlookup magic allows me to combine these edit frequencies of individual actors into one big list covering every day from 6/1/2001 to the present
  15. 15. Et Voila!
  16. 16. Problems9 actors over 10 years means close to 100k cellsExcel is not built for speedMatlab might work better
  17. 17. What does the data look like overtime? 6/1-5/31 from 2001 (when Wikipedia’s current edit no.’s begin) to 2010 (when all of the bursts have settled down)
  18. 18. 6/1/2001-5/31/20021.2 10.8 Series1 Series2 Series3 Series40.6 Series5 Series6 Series70.4 Series8 Series90.2 0 6/1/01 7/1/01 8/1/01 9/1/01 10/1/01 11/1/01 12/1/01 1/1/02 2/1/02 3/1/02 4/1/02 5/1/02
  19. 19. 6/1/2002-5/31/2003141210 Series1 Series2 8 Series3 Series4 Series5 6 Series6 Series7 Series8 4 Series9 2 0 6/1/02 7/1/02 8/1/02 9/1/02 10/1/02 11/1/02 12/1/02 1/1/03 2/1/03 3/1/03 4/1/03 5/1/03
  20. 20. 6/1/2003-5/31/2004302520 Series1 Series2 Series3 Series415 Series5 Series6 Series710 Series8 Series9 5 0 6/1/03 7/1/03 8/1/03 9/1/03 10/1/03 11/1/03 12/1/03 1/1/04 2/1/04 3/1/04 4/1/04 5/1/04
  21. 21. 6/1/2004-5/31/2005605040 Series1 Series2 Series3 Series430 Series5 Series6 Series720 Series8 Series910 0 6/1/04 7/1/04 8/1/04 9/1/04 10/1/04 11/1/04 12/1/04 1/1/05 2/1/05 3/1/05 4/1/05 5/1/05
  22. 22. 6/1/2005-5/31/2006302520 Series1 Series2 Series3 Series415 Series5 Series6 Series710 Series8 Series9 5 0 6/1/05 7/1/05 8/1/05 9/1/05 10/1/05 11/1/05 12/1/05 1/1/06 2/1/06 3/1/06 4/1/06 5/1/06
  23. 23. 6/1/2006-5/31/200750454035 Series1 Series230 Series3 Series425 Series520 Series6 Series715 Series8 Series910 5 0 6/1/06 7/1/06 8/1/06 9/1/06 10/1/06 11/1/06 12/1/06 1/1/07 2/1/07 3/1/07 4/1/07 5/1/07
  24. 24. 6/1/2007-5/31/2008400350300 Series1250 Series2 Series3 Series4200 Series5 Series6150 Series7 Series8100 Series9 50 0 6/1/07 7/1/07 8/1/07 9/1/07 10/1/07 11/1/07 12/1/07 1/1/08 2/1/08 3/1/08 4/1/08 5/1/08
  25. 25. 6/1/2008-5/31/2009807060 Series150 Series2 Series3 Series440 Series5 Series630 Series7 Series820 Series910 0 6/1/08 7/1/08 8/1/08 9/1/08 10/1/08 11/1/08 12/1/08 1/1/09 2/1/09 3/1/09 4/1/09 5/1/09
  26. 26. 6/1/2009-5/31/2010200180160140 Series1 Series2120 Series3 Series4100 Series5 Series6 80 Series7 Series8 60 Series9 Series10 40 20 0 6/1/09 7/1/09 8/1/09 9/1/09 10/1/09 11/1/09 12/1/09 1/1/10 2/1/10 3/1/10 4/1/10 5/1/10
  27. 27. Spike sizes over the years 400 350 300 250 200 Series2 150 100 50 0 2002 2003 2004 2005 2006 2007 2008 2009
  28. 28. Let’s take a closer look at the moreinteresting actors
  29. 29. Actors #4-9 6/1/2008-5/31/200980706050 Series1 Series240 Series3 Series430 Series5 Series62010 0 6/1/08 7/1/08 8/1/08 9/1/08 10/1/08 11/1/08 12/1/08 1/1/09 2/1/09 3/1/09 4/1/09 5/1/09
  30. 30. Actors #4-9 6/1/2008-5/31/2009 -log 21.81.61.41.2 Series1 Series2 1 Series3 Series40.8 Series5 Series60.60.40.2 0 6/1/08 7/1/08 8/1/08 9/1/08 10/1/08 11/1/08 12/1/08 1/1/09 2/1/09 3/1/09 4/1/09 5/1/09
  31. 31. One actor at a time ~10 years
  32. 32. Actor #1 DoD: 6/27/2001 -edits/day141210 8 Series1 6 4 2 06/28/01 6/28/02 6/28/03 6/28/04 6/28/05 6/28/06 6/28/07 6/28/08 6/28/09 6/28/10 6/28/11
  33. 33. Actor #1 –log(edits)/day1.2 10.80.6 Series10.40.2 0 6/28/01 6/28/02 6/28/03 6/28/04 6/28/05 6/28/06 6/28/07 6/28/08 6/28/09 6/28/10 6/28/11
  34. 34. Actor #7 -edits/day100 90 80 70 60 50 Series1 40 30 20 10 0 9/24/03 9/24/04 9/24/05 9/24/06 9/24/07 9/24/08 9/24/09 9/24/10 9/24/11
  35. 35. Actor #7 –log(edits)/day2.5 21.5 Series1 10.5 0 9/24/03 9/24/04 9/24/05 9/24/06 9/24/07 9/24/08 9/24/09 9/24/10 9/24/11
  36. 36. Actor #8 -edits/day400350300250200 Series1150100 50 012/10/03 12/10/04 12/10/05 12/10/06 12/10/07 12/10/08 12/10/09 12/10/10
  37. 37. Actor #8 –log(edits)/day 32.5 21.5 Series1 10.5 012/10/03 12/10/04 12/10/05 12/10/06 12/10/07 12/10/08 12/10/09 12/10/10
  38. 38. Actor #9 –edits/day200180160140120100 Series1 80 60 40 20 0 2/28/04 2/28/05 2/28/06 2/28/07 2/28/08 2/28/09 2/28/10 2/28/11
  39. 39. Actor #9 –log(edits)/day2.5 21.5 Series1 10.5 0 2/28/04 2/28/05 2/28/06 2/28/07 2/28/08 2/28/09 2/28/10 2/28/11
  40. 40. If we tweak the data to takeimportance into consideration… Average gross, adjusted for inflation*  Only available for a small amount of actors chosen in the sample set  Taken from boxofficemojo.com  Extremely reliable source
  41. 41. 50 100 150 200 250 300 350 400 0 1 9 17 25 33 41 49 57 65 73 81 89 97105113121129137145153161169177185193201209217225233241249 Actor #8 vs. Actor #9257265273281289297305313321329337345353361 ledger swayze
  42. 42. 50 100 150 200 250 300 350 400 0 1 9 17 25 33 41 49 57 65 73 81 89 97105113121129137145153161169177185193201209217225233241249257265273281289297305313321329337345353361 Actor #8 vs. Actor #9 (adjusted) ledger swayze adjusted
  43. 43. 1.5 2.5 0.5 0 1 2 3 1 10 19 28 37 46 55 64 73 82 91100109118127136145154163172181190199208217226235244253262271280289298307316325334343352361 Actor #8 Vs. Actor #9 (adjusted) ledger log swayze adjusted log
  44. 44. The same data on Google trends
  45. 45. -10 days to +40 days (log) 32.5 2 coburn log peck log brando log1.5 davis log palance log goulet log 1 ledger log swayze log0.5 0 1 2 3 4 5 6 7 8 9 1011121314151617181920212223242526272829303132333435363738394041424344454647484950
  46. 46. Other things I should consider Age at death Cause of death Were they still acting?
  47. 47. Future directions New sample of Wikipedia pages  Need to compare more contemporary pages  Need new metrics for comparison Better workflows
  48. 48. Thanks! Questions? http://www.slideshare.net/mlincol2/informetrics

×