Summaries of wikipedia usage data
Upcoming SlideShare
Loading in...5
×
 

Summaries of wikipedia usage data

on

  • 767 views

This set of slides illustrates the growing interest people have in Wikipedia, changes in relative interest between languages, and how much Wikipedia interest there is in different language zones.

This set of slides illustrates the growing interest people have in Wikipedia, changes in relative interest between languages, and how much Wikipedia interest there is in different language zones.

Statistics

Views

Total Views
767
Views on SlideShare
767
Embed Views
0

Actions

Likes
1
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Summaries of wikipedia usage data Summaries of wikipedia usage data Presentation Transcript

  • Summaries of Wikipedia Usage Data Paul Houle, Ontology2
  • The x-axis is months since Jan 2008, the Yaxis is the total number of hits to all Wikipedia pages. There are some violent variations that are probably caused by data quality problems, in particular around index 30 (2010-06 and 2010-07) we see a drop in hits, then a very high number of hits in (2010-11). I think there may be a few weeks of data missing sometime in that time range
  • The y-axis here is the fraction of hits to the English Wikipedia. At the beginning, more than 50% of the traffic went to the “en” Wikipedia, but that has fallen off and now “en” represents a bit more than 1/3 of the traffic. “en” is still dominant, but others are catching up.
  • The y-axis here is the fraction of traffic to the German Wikipedia. Like “en”, the fraction falls over time. Note that there is a high spike at Dec 2008
  • The y-axis here is hits to the Japanese Wikipedia and the story is similar to “de” except the crazy spike happens around March 2013
  • The fraction of traffic in the francophone region, “fr”, actually looks stable over time
  • The fraction of hits to the Korean language Wikipedia actually have been increasing (something has to if “en”, “de” and “ja” are declining)
  • The fraction of hits to the Chinese Wikipedia has grown over time, but there is a drop in the time frame that looks unstable on the summary graph at the beginning and another crazy spike
  • The fraction of traffic in the “es” cultural zone seems to have a strong seasonal variation
  • Top 15 Wikimedia Sites ordered by fraction of all-time hits. Note that “ja” is Japan, “zh” is Chinese, and “tr” is Turkish. en.mw and ja.mw both come up with a single URI, so these probably represent a redirect somewhere.
  • Notes on data sources • Original source: http://dumps.wikimedia.org/other/pagecounts-raw/ • Hourly files were aggregated at the month level; a few invalid (empty or full of HTML) files were removed as were a few lines that did not parse. Content sizes were removed • URIs that got fewer than 10 hits a month were removed from the monthlies (this reduces the number of URIs roughly tenfold!)