SlideShare a Scribd company logo
DIGGING INTO DATA COLLECTION
Jason Packer

jason@quantable.com 

@jhpacker
Feb 17, 2016

#cbuswaw
WHAT DRIVES OUR METRICS?
*Note all metrics may be inaccurate by some amount**
**But we’re not sure which ones and by how much.
DATA COLLECTION 1.0:
SERVER LOGS, HITS, IP ADDRESSES
• Server logs, valid in 1996 and 2016
• Basic, but still contains highly useful
data!
• Unanalyzed raw logs get big, fast.
128.135.189.9 - - [15/Feb/1996:15:16:27] "GET / HTTP/1.1" 200 5397 "Mozilla/1.0 (Win3.1)”
65.60.216.104 - - [15/Feb/2016:15:16:27] "GET / HTTP/1.1" 200 5397 "Mozilla/5.0 (Mac OS)"
WEB ANALYST, CIRCA 2000
flickr: boston_public_library

CC BY-NC-ND 2.0
DATA COLLECTION 2.0:
CLIENT-SIDE JAVASCRIPT, COOKIES
• Easier to implement (“just a few lines
of JavaScript…”)
• Cookies match users closer than IPs
• Much more info available on client-
side
HOW DOES CLIENT-SIDE JS WORK?
…SPECIFICALLY GOOGLE ANALYTICS
2 requests - 1st for code, 2nd with measurement
TRACKING CODE SNIPPETS
• Sets up command queue
• Loads analytics.js, which does the
real work.
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-34128028-1', 'auto');
ga('send', 'pageview');
</script>
MEASUREMENT PROTOCOL
https://www.google-analytics.com/collect?v=1&_v=j41&a=702618035&t=pageview&_s=1&dl=https://
www.quantable.com/&ul=en-us&de=UTF-8&dt=Quantable - Analytics & Optimization&sd=24-
bit&sr=1680x1050&vp=1442x464&je=0&_u=SCCAAUAjK~&jid=&cid=157092037.1441829013&tid=UA-34128028
-1&z=823826407
This hit..
Once made readable, is this data…
from ObservePoint tag debugger
SEEMS GREAT, WHAT COULD
POSSIBLY GO WRONG?
Some data still only on the server side…
• Bot traffic (mostly)
• HTTP errors
• Pages we forgot to tag
• Content blocking users
SERVER LOGS, AGAIN
• Distributed systems, distributed logs
• As before, but somewhat different
consumers
AS ANALYSTS, WHAT’S GIVING
US GRIEF
• Cookie Deleting Users
• Bots
• Analytics “Referrer” Spam
• Ad blocker Users
COOKIE DELETING USERS
IS IT STILL ~30%?
• Artificially increases user counts
• Visit after deletion is direct, no attribution
• Stats based on users accounts? flickr: diskant

CC BY-NC 2.0
BROWSER FINGERPRINTS
• Survives Cookie deletion
• 2010 EFF Panopticlick: 84% of browsers unique
• Invasive?
• Browser fingerprint + IP in Piwik as cookie fallback
• Can be thought of as next gen User-Agent + IP
BOTS
• About 50% of all traffic may be bots (48.5%,
Incapsula 2015)
• Most of these don’t show in GA (yet?)
• Smaller the site, higher the bot % (85% for <1k
visits/day) flickr: skynoir

CC BY-NC 2.0
BOTS
BOTS
BOTS
BOTS
ANALYTICS SPAM
• free-social-buttons.biz, top-seo-blah-
blah-blah.com, number-one-analytics.fail
• Way to get traffic, SEO, and lulz since
before 2009
• Not GA specific, just the #1 target
• Two kinds: Crawler & Ghost
WHO’S SPAMMING US TODAY?
List of 2016 GA
Spammers from
Analytics Edge
Google is blocking
offenders, but often
not quickly.
WHY IS IT SO PREVALENT?
“Ghost” version via Measurement Protocol abuse
$ curl "https://www.google-analytics.com/collect?v=1&t=pageview&tid=UA-XXXX-X&cid=fa0c8140-
eef8-47c5-a244-b4c60cf46f74&dr=http%3A%2F%2Fmyspamsite.pizza&dp=%2Fhome"
Just iterate through UA-XXXX-1 numbers.
HOW DO I FIX IT?
• Filters for new traffic, segments for
historical
• Tool available on my site: 

quantable.com/spamfilter
• Higher than UA-XX—1 property
tracking id number for new site
AD BLOCKING IS MAKING SOME
OF OUR USERS DISAPPEAR
• Blockers such as AdBlock Plus, Ghostery, uBlock
Origin, and Purify can block analytics tools, not just ads
• ABP has largest install base (300M downloads)
• These users are still in your server logs, but may never
show up in your web analytics
HOW DOES THE BLOCKING
WORK?
• Long lists of URLs to block loading for, e.g.:

google-analytics.com/analytics.js

/piwik.php

?[AQB]&ndh=1&t=

com/0.gif?
• EasyPrivacy list (used by ABP and others) is over
10,000 lines long and very actively maintained
HOW MANY USERS BLOCK GA?
My study showing 8.7% blocking GA

(for one particular site)
blockers
HOW DO I COUNT BLOCKERS?
• Can’t really be “fixed” client-side
• Still show up server-side
• May be against GA terms (can’t
circumvent Opt-Out Add-on)
…because sometimes 22/7 is good enough.
SQUARING THAT CIRCLE
THANKS!
slides & recap to be posted at cbuswaw.com
References & Further Reading
Quantable GA Blocking Analysis:

https://www.quantable.com/analytics/how-many-users-block-google-analytics/
GA Tracking Code walkthrough:

http://code.stephenmorley.org/javascript/understanding-the-google-analytics-tracking-code/
GA Measurement Protocol Hit Builder:

https://ga-dev-tools.appspot.com/hit-builder/
Fingerprintjs2:

http://valve.github.io/fingerprintjs2/
Incapsula 2015 Bot Report

https://www.incapsula.com/blog/bot-traffic-report-2015.html
Analytics Edge’s Guide to GA Spam:

http://help.analyticsedge.com/spam-filter/definitive-guide-to-removing-google-analytics-spam/

More Related Content

Similar to Digging into Data Collection

OTG-Recon
OTG-ReconOTG-Recon
Sps toronto introduction to azure functions microsoft flow
Sps toronto introduction to azure functions microsoft flowSps toronto introduction to azure functions microsoft flow
Sps toronto introduction to azure functions microsoft flow
Vincent Biret
 
DevOps and Cloud Native
DevOps and Cloud NativeDevOps and Cloud Native
DevOps and Cloud Native
Alistair Israel
 
Log files: The Overlooked Source of SEO Opportunities
Log files: The Overlooked Source of SEO OpportunitiesLog files: The Overlooked Source of SEO Opportunities
Log files: The Overlooked Source of SEO Opportunities
Robin Rozhon
 
How go makes us faster (May 2015)
How go makes us faster (May 2015)How go makes us faster (May 2015)
How go makes us faster (May 2015)
Wilfried Schobeiri
 
Malware Analysis For The Enterprise
Malware Analysis For The EnterpriseMalware Analysis For The Enterprise
Malware Analysis For The Enterprise
Jason Ross
 
Your Web Application Is Most Likely Insecure
Your Web Application Is Most Likely InsecureYour Web Application Is Most Likely Insecure
Your Web Application Is Most Likely Insecure
Achievers Tech
 
Rapid Assessment of Web Resources (RAWR) - DerbyCon 3.0
Rapid Assessment of Web Resources (RAWR) - DerbyCon 3.0Rapid Assessment of Web Resources (RAWR) - DerbyCon 3.0
Rapid Assessment of Web Resources (RAWR) - DerbyCon 3.0
Tom Moore
 
6 ways DevOps helped PrepSportswear move from monolith to microservices
6 ways DevOps helped PrepSportswear move from monolith to microservices6 ways DevOps helped PrepSportswear move from monolith to microservices
6 ways DevOps helped PrepSportswear move from monolith to microservices
Dynatrace
 
Top 10 DBA Mistakes on Microsoft SQL Server
Top 10 DBA Mistakes on Microsoft SQL ServerTop 10 DBA Mistakes on Microsoft SQL Server
Top 10 DBA Mistakes on Microsoft SQL Server
Kevin Kline
 
10 things you can do to speed up your web app today 2016
10 things you can do to speed up your web app today 201610 things you can do to speed up your web app today 2016
10 things you can do to speed up your web app today 2016
Chris Love
 
Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Pa...
Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Pa...Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Pa...
Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Pa...
rschuppe
 
GDD Japan 2009 - Designing OpenSocial Apps For Speed and Scale
GDD Japan 2009 - Designing OpenSocial Apps For Speed and ScaleGDD Japan 2009 - Designing OpenSocial Apps For Speed and Scale
GDD Japan 2009 - Designing OpenSocial Apps For Speed and Scale
Patrick Chanezon
 
Technical SEO Beyond the Audit - Brighton SEO April 2017 - Philip Gamble
Technical SEO Beyond the Audit - Brighton SEO April 2017 - Philip GambleTechnical SEO Beyond the Audit - Brighton SEO April 2017 - Philip Gamble
Technical SEO Beyond the Audit - Brighton SEO April 2017 - Philip Gamble
Philip Gamble
 
Log Analytics for Distributed Microservices
Log Analytics for Distributed MicroservicesLog Analytics for Distributed Microservices
Log Analytics for Distributed Microservices
Kai Wähner
 
Oracle database threats - LAOUC Webinar
Oracle database threats - LAOUC WebinarOracle database threats - LAOUC Webinar
Oracle database threats - LAOUC Webinar
Osama Mustafa
 
11 Advanced Uses of Screaming Frog Nov 2019 DMSS
11 Advanced Uses of Screaming Frog Nov 2019 DMSS11 Advanced Uses of Screaming Frog Nov 2019 DMSS
11 Advanced Uses of Screaming Frog Nov 2019 DMSS
Oliver Brett
 
10 Things You Can Do to Speed Up Your Web App Today
10 Things You Can Do to Speed Up Your Web App Today10 Things You Can Do to Speed Up Your Web App Today
10 Things You Can Do to Speed Up Your Web App Today
Chris Love
 
HostBridge Virtual User Group December 2020
HostBridge Virtual User Group December 2020HostBridge Virtual User Group December 2020
HostBridge Virtual User Group December 2020
HostBridge Technology
 
Premature optimisation: The Root of All Evil
Premature optimisation: The Root of All EvilPremature optimisation: The Root of All Evil
Premature optimisation: The Root of All Evil
Fabio Akita
 

Similar to Digging into Data Collection (20)

OTG-Recon
OTG-ReconOTG-Recon
OTG-Recon
 
Sps toronto introduction to azure functions microsoft flow
Sps toronto introduction to azure functions microsoft flowSps toronto introduction to azure functions microsoft flow
Sps toronto introduction to azure functions microsoft flow
 
DevOps and Cloud Native
DevOps and Cloud NativeDevOps and Cloud Native
DevOps and Cloud Native
 
Log files: The Overlooked Source of SEO Opportunities
Log files: The Overlooked Source of SEO OpportunitiesLog files: The Overlooked Source of SEO Opportunities
Log files: The Overlooked Source of SEO Opportunities
 
How go makes us faster (May 2015)
How go makes us faster (May 2015)How go makes us faster (May 2015)
How go makes us faster (May 2015)
 
Malware Analysis For The Enterprise
Malware Analysis For The EnterpriseMalware Analysis For The Enterprise
Malware Analysis For The Enterprise
 
Your Web Application Is Most Likely Insecure
Your Web Application Is Most Likely InsecureYour Web Application Is Most Likely Insecure
Your Web Application Is Most Likely Insecure
 
Rapid Assessment of Web Resources (RAWR) - DerbyCon 3.0
Rapid Assessment of Web Resources (RAWR) - DerbyCon 3.0Rapid Assessment of Web Resources (RAWR) - DerbyCon 3.0
Rapid Assessment of Web Resources (RAWR) - DerbyCon 3.0
 
6 ways DevOps helped PrepSportswear move from monolith to microservices
6 ways DevOps helped PrepSportswear move from monolith to microservices6 ways DevOps helped PrepSportswear move from monolith to microservices
6 ways DevOps helped PrepSportswear move from monolith to microservices
 
Top 10 DBA Mistakes on Microsoft SQL Server
Top 10 DBA Mistakes on Microsoft SQL ServerTop 10 DBA Mistakes on Microsoft SQL Server
Top 10 DBA Mistakes on Microsoft SQL Server
 
10 things you can do to speed up your web app today 2016
10 things you can do to speed up your web app today 201610 things you can do to speed up your web app today 2016
10 things you can do to speed up your web app today 2016
 
Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Pa...
Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Pa...Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Pa...
Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Pa...
 
GDD Japan 2009 - Designing OpenSocial Apps For Speed and Scale
GDD Japan 2009 - Designing OpenSocial Apps For Speed and ScaleGDD Japan 2009 - Designing OpenSocial Apps For Speed and Scale
GDD Japan 2009 - Designing OpenSocial Apps For Speed and Scale
 
Technical SEO Beyond the Audit - Brighton SEO April 2017 - Philip Gamble
Technical SEO Beyond the Audit - Brighton SEO April 2017 - Philip GambleTechnical SEO Beyond the Audit - Brighton SEO April 2017 - Philip Gamble
Technical SEO Beyond the Audit - Brighton SEO April 2017 - Philip Gamble
 
Log Analytics for Distributed Microservices
Log Analytics for Distributed MicroservicesLog Analytics for Distributed Microservices
Log Analytics for Distributed Microservices
 
Oracle database threats - LAOUC Webinar
Oracle database threats - LAOUC WebinarOracle database threats - LAOUC Webinar
Oracle database threats - LAOUC Webinar
 
11 Advanced Uses of Screaming Frog Nov 2019 DMSS
11 Advanced Uses of Screaming Frog Nov 2019 DMSS11 Advanced Uses of Screaming Frog Nov 2019 DMSS
11 Advanced Uses of Screaming Frog Nov 2019 DMSS
 
10 Things You Can Do to Speed Up Your Web App Today
10 Things You Can Do to Speed Up Your Web App Today10 Things You Can Do to Speed Up Your Web App Today
10 Things You Can Do to Speed Up Your Web App Today
 
HostBridge Virtual User Group December 2020
HostBridge Virtual User Group December 2020HostBridge Virtual User Group December 2020
HostBridge Virtual User Group December 2020
 
Premature optimisation: The Root of All Evil
Premature optimisation: The Root of All EvilPremature optimisation: The Root of All Evil
Premature optimisation: The Root of All Evil
 

More from Jason Packer

Third Party Cookies: Columbus DAW March 2024
Third Party Cookies: Columbus DAW March 2024Third Party Cookies: Columbus DAW March 2024
Third Party Cookies: Columbus DAW March 2024
Jason Packer
 
Cbuswaw October '23, Marketing Mix Modeling
Cbuswaw October '23, Marketing Mix ModelingCbuswaw October '23, Marketing Mix Modeling
Cbuswaw October '23, Marketing Mix Modeling
Jason Packer
 
Generative AI and SEO
Generative AI and SEOGenerative AI and SEO
Generative AI and SEO
Jason Packer
 
DataOps , cbuswaw April '23
DataOps , cbuswaw April '23DataOps , cbuswaw April '23
DataOps , cbuswaw April '23
Jason Packer
 
Google Analytics Alternatives
Google Analytics AlternativesGoogle Analytics Alternatives
Google Analytics Alternatives
Jason Packer
 
Google Analytics Alternatives
Google Analytics AlternativesGoogle Analytics Alternatives
Google Analytics Alternatives
Jason Packer
 
Web Analytics Wednesday April 2020 - Customer Journey Mapping
Web Analytics Wednesday April 2020 - Customer Journey MappingWeb Analytics Wednesday April 2020 - Customer Journey Mapping
Web Analytics Wednesday April 2020 - Customer Journey Mapping
Jason Packer
 
Introduction to Factor Analysis
Introduction to Factor AnalysisIntroduction to Factor Analysis
Introduction to Factor Analysis
Jason Packer
 
Product Analytics at Web Analytics Wednesday
Product Analytics at Web Analytics WednesdayProduct Analytics at Web Analytics Wednesday
Product Analytics at Web Analytics Wednesday
Jason Packer
 
Columbus Web Analytics Wednesday September 2019
Columbus Web Analytics Wednesday September 2019Columbus Web Analytics Wednesday September 2019
Columbus Web Analytics Wednesday September 2019
Jason Packer
 
How to Present Test Results to Inspire Action
How to Present Test Results to Inspire ActionHow to Present Test Results to Inspire Action
How to Present Test Results to Inspire Action
Jason Packer
 
Sentiment analysis
Sentiment analysisSentiment analysis
Sentiment analysis
Jason Packer
 
CBUSWAW - October 2017 Alain Stephan
CBUSWAW - October 2017 Alain StephanCBUSWAW - October 2017 Alain Stephan
CBUSWAW - October 2017 Alain Stephan
Jason Packer
 
Attribution 101
Attribution 101Attribution 101
Attribution 101
Jason Packer
 
Columbus WordCamp 2015
Columbus WordCamp 2015Columbus WordCamp 2015
Columbus WordCamp 2015
Jason Packer
 

More from Jason Packer (15)

Third Party Cookies: Columbus DAW March 2024
Third Party Cookies: Columbus DAW March 2024Third Party Cookies: Columbus DAW March 2024
Third Party Cookies: Columbus DAW March 2024
 
Cbuswaw October '23, Marketing Mix Modeling
Cbuswaw October '23, Marketing Mix ModelingCbuswaw October '23, Marketing Mix Modeling
Cbuswaw October '23, Marketing Mix Modeling
 
Generative AI and SEO
Generative AI and SEOGenerative AI and SEO
Generative AI and SEO
 
DataOps , cbuswaw April '23
DataOps , cbuswaw April '23DataOps , cbuswaw April '23
DataOps , cbuswaw April '23
 
Google Analytics Alternatives
Google Analytics AlternativesGoogle Analytics Alternatives
Google Analytics Alternatives
 
Google Analytics Alternatives
Google Analytics AlternativesGoogle Analytics Alternatives
Google Analytics Alternatives
 
Web Analytics Wednesday April 2020 - Customer Journey Mapping
Web Analytics Wednesday April 2020 - Customer Journey MappingWeb Analytics Wednesday April 2020 - Customer Journey Mapping
Web Analytics Wednesday April 2020 - Customer Journey Mapping
 
Introduction to Factor Analysis
Introduction to Factor AnalysisIntroduction to Factor Analysis
Introduction to Factor Analysis
 
Product Analytics at Web Analytics Wednesday
Product Analytics at Web Analytics WednesdayProduct Analytics at Web Analytics Wednesday
Product Analytics at Web Analytics Wednesday
 
Columbus Web Analytics Wednesday September 2019
Columbus Web Analytics Wednesday September 2019Columbus Web Analytics Wednesday September 2019
Columbus Web Analytics Wednesday September 2019
 
How to Present Test Results to Inspire Action
How to Present Test Results to Inspire ActionHow to Present Test Results to Inspire Action
How to Present Test Results to Inspire Action
 
Sentiment analysis
Sentiment analysisSentiment analysis
Sentiment analysis
 
CBUSWAW - October 2017 Alain Stephan
CBUSWAW - October 2017 Alain StephanCBUSWAW - October 2017 Alain Stephan
CBUSWAW - October 2017 Alain Stephan
 
Attribution 101
Attribution 101Attribution 101
Attribution 101
 
Columbus WordCamp 2015
Columbus WordCamp 2015Columbus WordCamp 2015
Columbus WordCamp 2015
 

Recently uploaded

UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 

Recently uploaded (20)

UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 

Digging into Data Collection

  • 1. DIGGING INTO DATA COLLECTION Jason Packer
 jason@quantable.com 
 @jhpacker Feb 17, 2016
 #cbuswaw
  • 2. WHAT DRIVES OUR METRICS? *Note all metrics may be inaccurate by some amount** **But we’re not sure which ones and by how much.
  • 3. DATA COLLECTION 1.0: SERVER LOGS, HITS, IP ADDRESSES • Server logs, valid in 1996 and 2016 • Basic, but still contains highly useful data! • Unanalyzed raw logs get big, fast. 128.135.189.9 - - [15/Feb/1996:15:16:27] "GET / HTTP/1.1" 200 5397 "Mozilla/1.0 (Win3.1)” 65.60.216.104 - - [15/Feb/2016:15:16:27] "GET / HTTP/1.1" 200 5397 "Mozilla/5.0 (Mac OS)"
  • 4. WEB ANALYST, CIRCA 2000 flickr: boston_public_library
 CC BY-NC-ND 2.0
  • 5. DATA COLLECTION 2.0: CLIENT-SIDE JAVASCRIPT, COOKIES • Easier to implement (“just a few lines of JavaScript…”) • Cookies match users closer than IPs • Much more info available on client- side
  • 6. HOW DOES CLIENT-SIDE JS WORK? …SPECIFICALLY GOOGLE ANALYTICS 2 requests - 1st for code, 2nd with measurement
  • 7. TRACKING CODE SNIPPETS • Sets up command queue • Loads analytics.js, which does the real work. <script> (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); ga('create', 'UA-34128028-1', 'auto'); ga('send', 'pageview'); </script>
  • 8. MEASUREMENT PROTOCOL https://www.google-analytics.com/collect?v=1&_v=j41&a=702618035&t=pageview&_s=1&dl=https:// www.quantable.com/&ul=en-us&de=UTF-8&dt=Quantable - Analytics & Optimization&sd=24- bit&sr=1680x1050&vp=1442x464&je=0&_u=SCCAAUAjK~&jid=&cid=157092037.1441829013&tid=UA-34128028 -1&z=823826407 This hit.. Once made readable, is this data…
  • 10. SEEMS GREAT, WHAT COULD POSSIBLY GO WRONG? Some data still only on the server side… • Bot traffic (mostly) • HTTP errors • Pages we forgot to tag • Content blocking users
  • 11. SERVER LOGS, AGAIN • Distributed systems, distributed logs • As before, but somewhat different consumers
  • 12. AS ANALYSTS, WHAT’S GIVING US GRIEF • Cookie Deleting Users • Bots • Analytics “Referrer” Spam • Ad blocker Users
  • 13. COOKIE DELETING USERS IS IT STILL ~30%? • Artificially increases user counts • Visit after deletion is direct, no attribution • Stats based on users accounts? flickr: diskant
 CC BY-NC 2.0
  • 14. BROWSER FINGERPRINTS • Survives Cookie deletion • 2010 EFF Panopticlick: 84% of browsers unique • Invasive? • Browser fingerprint + IP in Piwik as cookie fallback • Can be thought of as next gen User-Agent + IP
  • 15. BOTS • About 50% of all traffic may be bots (48.5%, Incapsula 2015) • Most of these don’t show in GA (yet?) • Smaller the site, higher the bot % (85% for <1k visits/day) flickr: skynoir
 CC BY-NC 2.0 BOTS BOTS BOTS BOTS
  • 16. ANALYTICS SPAM • free-social-buttons.biz, top-seo-blah- blah-blah.com, number-one-analytics.fail • Way to get traffic, SEO, and lulz since before 2009 • Not GA specific, just the #1 target • Two kinds: Crawler & Ghost
  • 17. WHO’S SPAMMING US TODAY? List of 2016 GA Spammers from Analytics Edge Google is blocking offenders, but often not quickly.
  • 18. WHY IS IT SO PREVALENT? “Ghost” version via Measurement Protocol abuse $ curl "https://www.google-analytics.com/collect?v=1&t=pageview&tid=UA-XXXX-X&cid=fa0c8140- eef8-47c5-a244-b4c60cf46f74&dr=http%3A%2F%2Fmyspamsite.pizza&dp=%2Fhome" Just iterate through UA-XXXX-1 numbers.
  • 19. HOW DO I FIX IT? • Filters for new traffic, segments for historical • Tool available on my site: 
 quantable.com/spamfilter • Higher than UA-XX—1 property tracking id number for new site
  • 20. AD BLOCKING IS MAKING SOME OF OUR USERS DISAPPEAR • Blockers such as AdBlock Plus, Ghostery, uBlock Origin, and Purify can block analytics tools, not just ads • ABP has largest install base (300M downloads) • These users are still in your server logs, but may never show up in your web analytics
  • 21. HOW DOES THE BLOCKING WORK? • Long lists of URLs to block loading for, e.g.:
 google-analytics.com/analytics.js
 /piwik.php
 ?[AQB]&ndh=1&t=
 com/0.gif? • EasyPrivacy list (used by ABP and others) is over 10,000 lines long and very actively maintained
  • 22. HOW MANY USERS BLOCK GA? My study showing 8.7% blocking GA
 (for one particular site) blockers
  • 23. HOW DO I COUNT BLOCKERS? • Can’t really be “fixed” client-side • Still show up server-side • May be against GA terms (can’t circumvent Opt-Out Add-on)
  • 24. …because sometimes 22/7 is good enough. SQUARING THAT CIRCLE
  • 25. THANKS! slides & recap to be posted at cbuswaw.com References & Further Reading Quantable GA Blocking Analysis:
 https://www.quantable.com/analytics/how-many-users-block-google-analytics/ GA Tracking Code walkthrough:
 http://code.stephenmorley.org/javascript/understanding-the-google-analytics-tracking-code/ GA Measurement Protocol Hit Builder:
 https://ga-dev-tools.appspot.com/hit-builder/ Fingerprintjs2:
 http://valve.github.io/fingerprintjs2/ Incapsula 2015 Bot Report
 https://www.incapsula.com/blog/bot-traffic-report-2015.html Analytics Edge’s Guide to GA Spam:
 http://help.analyticsedge.com/spam-filter/definitive-guide-to-removing-google-analytics-spam/