Talk given Feb 17, 2016 at Columbus Web Analytics Wednesdays, looking at how web analytics metrics are generated and some of the issues there are with data quality and reporting.
5. DATA COLLECTION 2.0:
CLIENT-SIDE JAVASCRIPT, COOKIES
• Easier to implement (“just a few lines
of JavaScript…”)
• Cookies match users closer than IPs
• Much more info available on client-
side
6. HOW DOES CLIENT-SIDE JS WORK?
…SPECIFICALLY GOOGLE ANALYTICS
2 requests - 1st for code, 2nd with measurement
7. TRACKING CODE SNIPPETS
• Sets up command queue
• Loads analytics.js, which does the
real work.
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-34128028-1', 'auto');
ga('send', 'pageview');
</script>
10. SEEMS GREAT, WHAT COULD
POSSIBLY GO WRONG?
Some data still only on the server side…
• Bot traffic (mostly)
• HTTP errors
• Pages we forgot to tag
• Content blocking users
11. SERVER LOGS, AGAIN
• Distributed systems, distributed logs
• As before, but somewhat different
consumers
12. AS ANALYSTS, WHAT’S GIVING
US GRIEF
• Cookie Deleting Users
• Bots
• Analytics “Referrer” Spam
• Ad blocker Users
13. COOKIE DELETING USERS
IS IT STILL ~30%?
• Artificially increases user counts
• Visit after deletion is direct, no attribution
• Stats based on users accounts? flickr: diskant
CC BY-NC 2.0
14. BROWSER FINGERPRINTS
• Survives Cookie deletion
• 2010 EFF Panopticlick: 84% of browsers unique
• Invasive?
• Browser fingerprint + IP in Piwik as cookie fallback
• Can be thought of as next gen User-Agent + IP
15. BOTS
• About 50% of all traffic may be bots (48.5%,
Incapsula 2015)
• Most of these don’t show in GA (yet?)
• Smaller the site, higher the bot % (85% for <1k
visits/day) flickr: skynoir
CC BY-NC 2.0
BOTS
BOTS
BOTS
BOTS
16. ANALYTICS SPAM
• free-social-buttons.biz, top-seo-blah-
blah-blah.com, number-one-analytics.fail
• Way to get traffic, SEO, and lulz since
before 2009
• Not GA specific, just the #1 target
• Two kinds: Crawler & Ghost
17. WHO’S SPAMMING US TODAY?
List of 2016 GA
Spammers from
Analytics Edge
Google is blocking
offenders, but often
not quickly.
18. WHY IS IT SO PREVALENT?
“Ghost” version via Measurement Protocol abuse
$ curl "https://www.google-analytics.com/collect?v=1&t=pageview&tid=UA-XXXX-X&cid=fa0c8140-
eef8-47c5-a244-b4c60cf46f74&dr=http%3A%2F%2Fmyspamsite.pizza&dp=%2Fhome"
Just iterate through UA-XXXX-1 numbers.
19. HOW DO I FIX IT?
• Filters for new traffic, segments for
historical
• Tool available on my site:
quantable.com/spamfilter
• Higher than UA-XX—1 property
tracking id number for new site
20. AD BLOCKING IS MAKING SOME
OF OUR USERS DISAPPEAR
• Blockers such as AdBlock Plus, Ghostery, uBlock
Origin, and Purify can block analytics tools, not just ads
• ABP has largest install base (300M downloads)
• These users are still in your server logs, but may never
show up in your web analytics
21. HOW DOES THE BLOCKING
WORK?
• Long lists of URLs to block loading for, e.g.:
google-analytics.com/analytics.js
/piwik.php
?[AQB]&ndh=1&t=
com/0.gif?
• EasyPrivacy list (used by ABP and others) is over
10,000 lines long and very actively maintained
22. HOW MANY USERS BLOCK GA?
My study showing 8.7% blocking GA
(for one particular site)
blockers
23. HOW DO I COUNT BLOCKERS?
• Can’t really be “fixed” client-side
• Still show up server-side
• May be against GA terms (can’t
circumvent Opt-Out Add-on)
25. THANKS!
slides & recap to be posted at cbuswaw.com
References & Further Reading
Quantable GA Blocking Analysis:
https://www.quantable.com/analytics/how-many-users-block-google-analytics/
GA Tracking Code walkthrough:
http://code.stephenmorley.org/javascript/understanding-the-google-analytics-tracking-code/
GA Measurement Protocol Hit Builder:
https://ga-dev-tools.appspot.com/hit-builder/
Fingerprintjs2:
http://valve.github.io/fingerprintjs2/
Incapsula 2015 Bot Report
https://www.incapsula.com/blog/bot-traffic-report-2015.html
Analytics Edge’s Guide to GA Spam:
http://help.analyticsedge.com/spam-filter/definitive-guide-to-removing-google-analytics-spam/