Traces of the Trackers
                        Tracking the Trackers: A historical analysis using the Internet Archive




                Digital Methods Summer School 2012
Anne Helmond, Hugo Huurdeman, Thaer Samar, Nili Steinfeld, Lonneke van der Velden
natively digital archived objects
“one archives the website over the references
contained therein (hyperlinks), the systems that
delivered them (engines), the ecology in which
they may or may not thrive (the sphere) and
the pages or accounts contained therein that
keep the user actively grooming his or her
online profile and status (the platform).”

                       Rogers, Richard. Digital Methods. MIT Press (forthcoming)
tracing ecologies
This research project looks at a specific
archived object: tracker fingerprints.
In this research we would like to look at the
ecologies websites may be embedded in
beyond traditional hyperlinks, engines, spheres
and platforms. What are other natively digital
archived objects that we can study, and can we
map website ecologies around websites
through ‘invisible’ back-end linking?
Methodology

1. WayBack Machine, select 1 year > Internet
   Archive URL into LinkRipper
2. Clean URL list, one URL per day
3. IA URLs per year into Tracker Tracker tool
4. Analyze and visualize results
total number of unique
   trackers per year
Gephi movie over time: Trackers on the New York Times front page 2001-2011
Further research
• “Back-fingerprint” old trackers > add new
  old tracker fingerprints to track
• Repeat for a set of websites
• Changing characteristics of tracking
  technologies
• Look into media-concentration: who owns
  which trackers?
• Which other types of media ecologies can
  we map using the Internet Archive?

Traces of the Trackers. Tracking the Trackers: A historical analysis using the Internet Archive

  • 1.
    Traces of theTrackers Tracking the Trackers: A historical analysis using the Internet Archive Digital Methods Summer School 2012 Anne Helmond, Hugo Huurdeman, Thaer Samar, Nili Steinfeld, Lonneke van der Velden
  • 3.
    natively digital archivedobjects “one archives the website over the references contained therein (hyperlinks), the systems that delivered them (engines), the ecology in which they may or may not thrive (the sphere) and the pages or accounts contained therein that keep the user actively grooming his or her online profile and status (the platform).” Rogers, Richard. Digital Methods. MIT Press (forthcoming)
  • 4.
    tracing ecologies This researchproject looks at a specific archived object: tracker fingerprints. In this research we would like to look at the ecologies websites may be embedded in beyond traditional hyperlinks, engines, spheres and platforms. What are other natively digital archived objects that we can study, and can we map website ecologies around websites through ‘invisible’ back-end linking?
  • 7.
    Methodology 1. WayBack Machine,select 1 year > Internet Archive URL into LinkRipper 2. Clean URL list, one URL per day 3. IA URLs per year into Tracker Tracker tool 4. Analyze and visualize results
  • 8.
    total number ofunique trackers per year
  • 10.
    Gephi movie overtime: Trackers on the New York Times front page 2001-2011
  • 11.
    Further research • “Back-fingerprint”old trackers > add new old tracker fingerprints to track • Repeat for a set of websites • Changing characteristics of tracking technologies • Look into media-concentration: who owns which trackers? • Which other types of media ecologies can we map using the Internet Archive?

Editor's Notes

  • #2 \n
  • #3 \n
  • #4 \n
  • #5 \n
  • #6 Not all types of trackers might be detected using the Internet Archive. For example, widgets are not possible because they are “embedded” into the page (third party content/snippet of code).\nWhat we can detect are trackers that leave fingerprints that are “hardwired” into the source code of a website.\n
  • #7 a fingerprint\n
  • #8 \n
  • #9 Small decrease. Hypothesis: Changing tracking technologies: moving to third party such as widgets. The character of the trackers may be changing over years. Or: Media-concentration. \n
  • #10 Proliferation of trackers types. New tracker (types) over the years. \n
  • #11 \n
  • #12 \n