Death and Edits
Upcoming SlideShare
Loading in...5
×
 

Death and Edits

on

  • 345 views

an analysis of death spikes on celebrity wikipedia pages created for gslis 590mt informetrics

an analysis of death spikes on celebrity wikipedia pages created for gslis 590mt informetrics

Statistics

Views

Total Views
345
Views on SlideShare
344
Embed Views
1

Actions

Likes
0
Downloads
1
Comments
0

1 Embed 1

https://si0.twimg.com 1

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Death and Edits Death and Edits Presentation Transcript

  • Death and editsMiles LincolnLIS590MT
  • Wikipedia!You probably already knowwhat it is!
  • Bursts in social networks Bursts of edits on Wikipedia in particular When do those occur?
  • What can we learn by looking atspikes in edit frequency? How have edit spikes changed over Wikipedia’s ten years of existence? Does the size of an edit spike correlate to anything?
  • Bursts in other social networks
  • Google Trends
  • Celebrity deaths!
  • Revision history
  • Revision history
  • But first… We need to process the data so that we can answer that question
  • Perl
  • Regular Expressions (Regex) Perl script uses regular expressions to find and output matching pieces of text. In this case, I am pulling out dates in Wikipedia’s day month year format and re-writing them in a more machine-readable MM/DD/YYYY format. 11/08/2011
  • Data manipulationCopy/pase the revision history of wikipages into a text document which Ifeed to my perl scriptResults in lists consisting of one dateper edit that occurred on that dateCopying/pasting isn’t superelegant, but I haven’t gottenLWP/useragent stuff to work yet
  • Excel! Throw my lists of dates into a pivot table, which shows me the frequency that each date occurs Some vlookup magic allows me to combine these edit frequencies of individual actors into one big list covering every day from 6/1/2001 to the present
  • Et Voila!
  • Problems9 actors over 10 years means close to 100k cellsExcel is not built for speedMatlab might work better
  • What does the data look like overtime? 6/1-5/31 from 2001 (when Wikipedia’s current edit no.’s begin) to 2010 (when all of the bursts have settled down)
  • 6/1/2001-5/31/20021.2 10.8 Series1 Series2 Series3 Series40.6 Series5 Series6 Series70.4 Series8 Series90.2 0 6/1/01 7/1/01 8/1/01 9/1/01 10/1/01 11/1/01 12/1/01 1/1/02 2/1/02 3/1/02 4/1/02 5/1/02
  • 6/1/2002-5/31/2003141210 Series1 Series2 8 Series3 Series4 Series5 6 Series6 Series7 Series8 4 Series9 2 0 6/1/02 7/1/02 8/1/02 9/1/02 10/1/02 11/1/02 12/1/02 1/1/03 2/1/03 3/1/03 4/1/03 5/1/03
  • 6/1/2003-5/31/2004302520 Series1 Series2 Series3 Series415 Series5 Series6 Series710 Series8 Series9 5 0 6/1/03 7/1/03 8/1/03 9/1/03 10/1/03 11/1/03 12/1/03 1/1/04 2/1/04 3/1/04 4/1/04 5/1/04
  • 6/1/2004-5/31/2005605040 Series1 Series2 Series3 Series430 Series5 Series6 Series720 Series8 Series910 0 6/1/04 7/1/04 8/1/04 9/1/04 10/1/04 11/1/04 12/1/04 1/1/05 2/1/05 3/1/05 4/1/05 5/1/05
  • 6/1/2005-5/31/2006302520 Series1 Series2 Series3 Series415 Series5 Series6 Series710 Series8 Series9 5 0 6/1/05 7/1/05 8/1/05 9/1/05 10/1/05 11/1/05 12/1/05 1/1/06 2/1/06 3/1/06 4/1/06 5/1/06
  • 6/1/2006-5/31/200750454035 Series1 Series230 Series3 Series425 Series520 Series6 Series715 Series8 Series910 5 0 6/1/06 7/1/06 8/1/06 9/1/06 10/1/06 11/1/06 12/1/06 1/1/07 2/1/07 3/1/07 4/1/07 5/1/07
  • 6/1/2007-5/31/2008400350300 Series1250 Series2 Series3 Series4200 Series5 Series6150 Series7 Series8100 Series9 50 0 6/1/07 7/1/07 8/1/07 9/1/07 10/1/07 11/1/07 12/1/07 1/1/08 2/1/08 3/1/08 4/1/08 5/1/08
  • 6/1/2008-5/31/2009807060 Series150 Series2 Series3 Series440 Series5 Series630 Series7 Series820 Series910 0 6/1/08 7/1/08 8/1/08 9/1/08 10/1/08 11/1/08 12/1/08 1/1/09 2/1/09 3/1/09 4/1/09 5/1/09
  • 6/1/2009-5/31/2010200180160140 Series1 Series2120 Series3 Series4100 Series5 Series6 80 Series7 Series8 60 Series9 Series10 40 20 0 6/1/09 7/1/09 8/1/09 9/1/09 10/1/09 11/1/09 12/1/09 1/1/10 2/1/10 3/1/10 4/1/10 5/1/10
  • Spike sizes over the years 400 350 300 250 200 Series2 150 100 50 0 2002 2003 2004 2005 2006 2007 2008 2009
  • Let’s take a closer look at the moreinteresting actors
  • Actors #4-9 6/1/2008-5/31/200980706050 Series1 Series240 Series3 Series430 Series5 Series62010 0 6/1/08 7/1/08 8/1/08 9/1/08 10/1/08 11/1/08 12/1/08 1/1/09 2/1/09 3/1/09 4/1/09 5/1/09
  • Actors #4-9 6/1/2008-5/31/2009 -log 21.81.61.41.2 Series1 Series2 1 Series3 Series40.8 Series5 Series60.60.40.2 0 6/1/08 7/1/08 8/1/08 9/1/08 10/1/08 11/1/08 12/1/08 1/1/09 2/1/09 3/1/09 4/1/09 5/1/09
  • One actor at a time ~10 years
  • Actor #1 DoD: 6/27/2001 -edits/day141210 8 Series1 6 4 2 06/28/01 6/28/02 6/28/03 6/28/04 6/28/05 6/28/06 6/28/07 6/28/08 6/28/09 6/28/10 6/28/11
  • Actor #1 –log(edits)/day1.2 10.80.6 Series10.40.2 0 6/28/01 6/28/02 6/28/03 6/28/04 6/28/05 6/28/06 6/28/07 6/28/08 6/28/09 6/28/10 6/28/11
  • Actor #7 -edits/day100 90 80 70 60 50 Series1 40 30 20 10 0 9/24/03 9/24/04 9/24/05 9/24/06 9/24/07 9/24/08 9/24/09 9/24/10 9/24/11
  • Actor #7 –log(edits)/day2.5 21.5 Series1 10.5 0 9/24/03 9/24/04 9/24/05 9/24/06 9/24/07 9/24/08 9/24/09 9/24/10 9/24/11
  • Actor #8 -edits/day400350300250200 Series1150100 50 012/10/03 12/10/04 12/10/05 12/10/06 12/10/07 12/10/08 12/10/09 12/10/10
  • Actor #8 –log(edits)/day 32.5 21.5 Series1 10.5 012/10/03 12/10/04 12/10/05 12/10/06 12/10/07 12/10/08 12/10/09 12/10/10
  • Actor #9 –edits/day200180160140120100 Series1 80 60 40 20 0 2/28/04 2/28/05 2/28/06 2/28/07 2/28/08 2/28/09 2/28/10 2/28/11
  • Actor #9 –log(edits)/day2.5 21.5 Series1 10.5 0 2/28/04 2/28/05 2/28/06 2/28/07 2/28/08 2/28/09 2/28/10 2/28/11
  • If we tweak the data to takeimportance into consideration… Average gross, adjusted for inflation*  Only available for a small amount of actors chosen in the sample set  Taken from boxofficemojo.com  Extremely reliable source
  • 50 100 150 200 250 300 350 400 0 1 9 17 25 33 41 49 57 65 73 81 89 97105113121129137145153161169177185193201209217225233241249 Actor #8 vs. Actor #9257265273281289297305313321329337345353361 ledger swayze
  • 50 100 150 200 250 300 350 400 0 1 9 17 25 33 41 49 57 65 73 81 89 97105113121129137145153161169177185193201209217225233241249257265273281289297305313321329337345353361 Actor #8 vs. Actor #9 (adjusted) ledger swayze adjusted
  • 1.5 2.5 0.5 0 1 2 3 1 10 19 28 37 46 55 64 73 82 91100109118127136145154163172181190199208217226235244253262271280289298307316325334343352361 Actor #8 Vs. Actor #9 (adjusted) ledger log swayze adjusted log
  • The same data on Google trends
  • -10 days to +40 days (log) 32.5 2 coburn log peck log brando log1.5 davis log palance log goulet log 1 ledger log swayze log0.5 0 1 2 3 4 5 6 7 8 9 1011121314151617181920212223242526272829303132333435363738394041424344454647484950
  • Other things I should consider Age at death Cause of death Were they still acting?
  • Future directions New sample of Wikipedia pages  Need to compare more contemporary pages  Need new metrics for comparison Better workflows
  • Thanks! Questions? http://www.slideshare.net/mlincol2/informetrics