"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
Like a Pack of Wolves: Community Structure of Web Trackers
1. Like a Pack of Wolves:
Community Structure of Web Trackers
V. Kalavri, kalavri@kth.se (KTH Royal Institute of Technology)
J. Blackburn, M. Varvello, K. Papagiannaki (Telefonica Research)
Passive and Active Measurements Conference
31 March - 1 April 2016, Heraklion, Crete, Greece
4. 4
The study's authors defined "creepiness" by the feeling
consumers get when they sense an ad is too personal
because it uses data the consumer did not agree to
provide, such as online-search and browsing history.
Consumers are even more creeped out by this because
they don't know how and where that information will
be used.
6. Can’t we block them?
proxy
Tracker
Tracker
Ad Server
6
Legitimate site
7. ● not frequently updated
● not sure who or based on what criteria URLs are
blacklisted
● miss “hidden” trackers or dual-role nodes
● blocking requires manual matching against the list
● can you buy your way into the whitelist?
Available Solutions
AdBlock, DoNotTrack, EasyPrivacy:
crowd-sourced “black lists” of tracker URLs
7
9. Towards Automatic Tracker Detection
Exploit fundamental properties of web tracker
operation to automate tracker detection
● Structural attributes: network positions, connections
● Operational aspects: data exchanged, communication
patterns
9
10. DataSet
6 months
(Nov 2014 - April 2015)
of augmented Apache logs
from a web proxy
● 80m requests
● 2m distinct URLs
● 3k users
10
● User identification
● URL requested
● Headers
● Performance
information, i.e.
latency, bytes
● Tagged as Trackers or
non-Trackers with
EasyPrivacy
11. Web Tracking as a Graph Problem
11
facebook.com
youtube.com
google-analytics.com
b.scorecardresearch.com
V: hosts
U: Referers
Referer-Hosts Graph
U: URLs visited by the user
V: embedded URLs
13. Communities in Graphs
13
Vertices in the same community are likely to be similar
with respect to network position and connectivity
Do trackers form communities?
Densely connected
internally
Sparsely connected
with each other
18. Data Pipeline
raw logs
cleaned
logs
1: logs pre-
processing
2: bipartite graph
creation
3: largest
connected
component
extraction
4: hosts-
projection graph
creation
5: community
detection
google-analytics.com: T
bscored-research.com: T
facebook.com: NT
github.com: NT
cdn.cxense.com: NT
...
6: results
18
21. Classification via Label Propagation
non-tracker
tracker
unlabeled
Iterative Algorithm for
Community Detection
● Vertices propagate their labels
to their neighbors and adopt
the most popular label in their
neighborhood.
● Upon convergence, vertices
with the same label belong to
the same community.
● If an unlabeled node ends up
in a trackers community, it is
classified as a tracker
29. Conclusions
● Web trackers are well-connected with each other
○ 94% of web trackers are in the same connected component
● Web trackers are mainly connected to other trackers
○ High clustering, tight communities
● 97% classification accuracy and < 2% FPR with simple
methods
○ Can be used to build robust and fully automated privacy preservation
systems
29
30. Like a Pack of Wolves:
Community Structure of Web Trakcers
V. Kalavri, kalavri@kth.se (KTH Royal Institute of Technology)
J. Blackburn, M. Varvello, K. Papagiannaki (Telefonica Research)
Passive and Active Measurements Conference
31 March - 1 April 2016, Heraklion, Crete, Greece