SlideShare a Scribd company logo
1 of 32
Download to read offline
Like a Pack of Wolves:
Community Structure of Web Trackers
V. Kalavri, kalavri@kth.se (KTH Royal Institute of Technology)
J. Blackburn, M. Varvello, K. Papagiannaki (Telefonica Research)
Passive and Active Measurements Conference
31 March - 1 April 2016, Heraklion, Crete, Greece
Ads
Recommendations
Browsing the Web
2
Tracker
Tracker
Ad Server
display relevant
ads
cookie exchange
profiling
Tracking
3
4
The study's authors defined "creepiness" by the feeling
consumers get when they sense an ad is too personal
because it uses data the consumer did not agree to
provide, such as online-search and browsing history.
Consumers are even more creeped out by this because
they don't know how and where that information will
be used.
5
Can’t we block them?
proxy
Tracker
Tracker
Ad Server
6
Legitimate site
● not frequently updated
● not sure who or based on what criteria URLs are
blacklisted
● miss “hidden” trackers or dual-role nodes
● blocking requires manual matching against the list
● can you buy your way into the whitelist?
Available Solutions
AdBlock, DoNotTrack, EasyPrivacy:
crowd-sourced “black lists” of tracker URLs
7
8
Towards Automatic Tracker Detection
Exploit fundamental properties of web tracker
operation to automate tracker detection
● Structural attributes: network positions, connections
● Operational aspects: data exchanged, communication
patterns
9
DataSet
6 months
(Nov 2014 - April 2015)
of augmented Apache logs
from a web proxy
● 80m requests
● 2m distinct URLs
● 3k users
10
● User identification
● URL requested
● Headers
● Performance
information, i.e.
latency, bytes
● Tagged as Trackers or
non-Trackers with
EasyPrivacy
Web Tracking as a Graph Problem
11
facebook.com
youtube.com
google-analytics.com
b.scorecardresearch.com
V: hosts
U: Referers
Referer-Hosts Graph
U: URLs visited by the user
V: embedded URLs
Referer-Hosts Graph: Connected Components
12
94% of all trackers belong to the
same connected component!
Communities in Graphs
13
Vertices in the same community are likely to be similar
with respect to network position and connectivity
Do trackers form communities?
Densely connected
internally
Sparsely connected
with each other
h2
h3 h4
h5 h6
h8
h7
h1
h3
h4
h5
h6
h1
h2
h7
h8
r1
r2
r3
r5
r6
r7
NT
NT
T
T
?
T
NT
NT
r4
referer-hosts graph
r1
r2
r3
r3
r3
r4
r5r6
r7
hosts-projection graph
: referer
: non-tracker host
: tracker host
: unlabeled host
The Hosts-Projection Graph
14
Hosts-Projection Graph: Degrees
15
#unique referers that tracker / other host are embedded within
Hosts-Projection Graph: Tracker Neighbors
16
Trackers are mainly connected to other Trackers
Web Tracker Communities
17
Popular trackers, e.g.
google-analytics
Smaller trackers
Ad servers
Normal webpages
Data Pipeline
raw logs
cleaned
logs
1: logs pre-
processing
2: bipartite graph
creation
3: largest
connected
component
extraction
4: hosts-
projection graph
creation
5: community
detection
google-analytics.com: T
bscored-research.com: T
facebook.com: NT
github.com: NT
cdn.cxense.com: NT
...
6: results
18
h5
h7 h8 h3 h4 h6
h2
h3 h4
h5 h6
h8
h7
h1
Classification via Neighborhood Analysis
19
: non-tracker host
: tracker host
: unlabeled host
⅖ non-tracker neighbors
⅗ tracker neighbors
if % of tracker neighbors > threshold
=> classify as tracker
Results
20
Classification via Label Propagation
non-tracker
tracker
unlabeled
Iterative Algorithm for
Community Detection
● Vertices propagate their labels
to their neighbors and adopt
the most popular label in their
neighborhood.
● Upon convergence, vertices
with the same label belong to
the same community.
● If an unlabeled node ends up
in a trackers community, it is
classified as a tracker
Classification via Label Propagation
2
3 4
5 6
8
7
1
i=0
Classification via Label Propagation
2
4
5 6
8
7
1
i=1
{2} {1, 3}
{2, 4, 5} {3, 5, 6}
{4, 5}{3, 4, 6, 7}{5, 8}
{7}
3
5 6
7 6
8
8
2
3
Classification via Label Propagation
3
5 6
7 6
8
8
2
i=2
5
7 7
6 7
8
8
3
{3} {2, 5}
{3, 6, 7} {5, 6, 7}
{6, 7}{5, 6, 6, 8}{7, 8}
{8}
Classification via Label Propagation
5
7 7
6 7
8
8
3
i=3
7
7 7
7 7
8
8
5
{5} {3, 7}
{5, 6, 7} {6, 7, 7}
{6, 7}{7, 7, 7, 8}{6, 8}
{8}
Classification via Label Propagation
7
7 7
7 7
8
8
5
i=4
7
7 7
7 7
8
8
7
{7} {5, 7}
{7, 7, 7} {7, 7, 7}
{7, 7}{7, 7, 7, 8}{7, 8}
{8}
Classification via Label Propagation
7
7 7
7 7
8
8
7 7
7 7
7 7
8
8
7
Results
28
Conclusions
● Web trackers are well-connected with each other
○ 94% of web trackers are in the same connected component
● Web trackers are mainly connected to other trackers
○ High clustering, tight communities
● 97% classification accuracy and < 2% FPR with simple
methods
○ Can be used to build robust and fully automated privacy preservation
systems
29
Like a Pack of Wolves:
Community Structure of Web Trakcers
V. Kalavri, kalavri@kth.se (KTH Royal Institute of Technology)
J. Blackburn, M. Varvello, K. Papagiannaki (Telefonica Research)
Passive and Active Measurements Conference
31 March - 1 April 2016, Heraklion, Crete, Greece
Extra Slides
Referer-Hosts Graph: Degrees
32
#unique referers that tracker / other hosts are embedded within

More Related Content

Viewers also liked

m2r2: A Framework for Results Materialization and Reuse
m2r2: A Framework for Results Materialization and Reusem2r2: A Framework for Results Materialization and Reuse
m2r2: A Framework for Results Materialization and ReuseVasia Kalavri
 
Big data processing systems research
Big data processing systems researchBig data processing systems research
Big data processing systems researchVasia Kalavri
 
Block Sampling: Efficient Accurate Online Aggregation in MapReduce
Block Sampling: Efficient Accurate Online Aggregation in MapReduceBlock Sampling: Efficient Accurate Online Aggregation in MapReduce
Block Sampling: Efficient Accurate Online Aggregation in MapReduceVasia Kalavri
 
Asymmetry in Large-Scale Graph Analysis, Explained
Asymmetry in Large-Scale Graph Analysis, ExplainedAsymmetry in Large-Scale Graph Analysis, Explained
Asymmetry in Large-Scale Graph Analysis, ExplainedVasia Kalavri
 
The shortest path is not always a straight line
The shortest path is not always a straight lineThe shortest path is not always a straight line
The shortest path is not always a straight lineVasia Kalavri
 
MapReduce: Optimizations, Limitations, and Open Issues
MapReduce: Optimizations, Limitations, and Open IssuesMapReduce: Optimizations, Limitations, and Open Issues
MapReduce: Optimizations, Limitations, and Open IssuesVasia Kalavri
 
Graphs as Streams: Rethinking Graph Processing in the Streaming Era
Graphs as Streams: Rethinking Graph Processing in the Streaming EraGraphs as Streams: Rethinking Graph Processing in the Streaming Era
Graphs as Streams: Rethinking Graph Processing in the Streaming EraVasia Kalavri
 
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15Vasia Kalavri
 
Apache Flink Deep Dive
Apache Flink Deep DiveApache Flink Deep Dive
Apache Flink Deep DiveVasia Kalavri
 
Apache Flink & Graph Processing
Apache Flink & Graph ProcessingApache Flink & Graph Processing
Apache Flink & Graph ProcessingVasia Kalavri
 
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache FlinkGelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache FlinkVasia Kalavri
 
Gelly in Apache Flink Bay Area Meetup
Gelly in Apache Flink Bay Area MeetupGelly in Apache Flink Bay Area Meetup
Gelly in Apache Flink Bay Area MeetupVasia Kalavri
 
Label propagation - Semisupervised Learning with Applications to NLP
Label propagation - Semisupervised Learning with Applications to NLPLabel propagation - Semisupervised Learning with Applications to NLP
Label propagation - Semisupervised Learning with Applications to NLPDavid Przybilla
 
Batch and Stream Graph Processing with Apache Flink
Batch and Stream Graph Processing with Apache FlinkBatch and Stream Graph Processing with Apache Flink
Batch and Stream Graph Processing with Apache FlinkVasia Kalavri
 
A Skype case study (2011)
A Skype case study (2011)A Skype case study (2011)
A Skype case study (2011)Vasia Kalavri
 
Demystifying Distributed Graph Processing
Demystifying Distributed Graph ProcessingDemystifying Distributed Graph Processing
Demystifying Distributed Graph ProcessingVasia Kalavri
 

Viewers also liked (17)

m2r2: A Framework for Results Materialization and Reuse
m2r2: A Framework for Results Materialization and Reusem2r2: A Framework for Results Materialization and Reuse
m2r2: A Framework for Results Materialization and Reuse
 
Big data processing systems research
Big data processing systems researchBig data processing systems research
Big data processing systems research
 
Block Sampling: Efficient Accurate Online Aggregation in MapReduce
Block Sampling: Efficient Accurate Online Aggregation in MapReduceBlock Sampling: Efficient Accurate Online Aggregation in MapReduce
Block Sampling: Efficient Accurate Online Aggregation in MapReduce
 
Asymmetry in Large-Scale Graph Analysis, Explained
Asymmetry in Large-Scale Graph Analysis, ExplainedAsymmetry in Large-Scale Graph Analysis, Explained
Asymmetry in Large-Scale Graph Analysis, Explained
 
The shortest path is not always a straight line
The shortest path is not always a straight lineThe shortest path is not always a straight line
The shortest path is not always a straight line
 
MapReduce: Optimizations, Limitations, and Open Issues
MapReduce: Optimizations, Limitations, and Open IssuesMapReduce: Optimizations, Limitations, and Open Issues
MapReduce: Optimizations, Limitations, and Open Issues
 
Graphs as Streams: Rethinking Graph Processing in the Streaming Era
Graphs as Streams: Rethinking Graph Processing in the Streaming EraGraphs as Streams: Rethinking Graph Processing in the Streaming Era
Graphs as Streams: Rethinking Graph Processing in the Streaming Era
 
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
 
Apache Flink Deep Dive
Apache Flink Deep DiveApache Flink Deep Dive
Apache Flink Deep Dive
 
Apache Flink & Graph Processing
Apache Flink & Graph ProcessingApache Flink & Graph Processing
Apache Flink & Graph Processing
 
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache FlinkGelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
 
Gelly in Apache Flink Bay Area Meetup
Gelly in Apache Flink Bay Area MeetupGelly in Apache Flink Bay Area Meetup
Gelly in Apache Flink Bay Area Meetup
 
Label propagation - Semisupervised Learning with Applications to NLP
Label propagation - Semisupervised Learning with Applications to NLPLabel propagation - Semisupervised Learning with Applications to NLP
Label propagation - Semisupervised Learning with Applications to NLP
 
Batch and Stream Graph Processing with Apache Flink
Batch and Stream Graph Processing with Apache FlinkBatch and Stream Graph Processing with Apache Flink
Batch and Stream Graph Processing with Apache Flink
 
A Skype case study (2011)
A Skype case study (2011)A Skype case study (2011)
A Skype case study (2011)
 
Demystifying Distributed Graph Processing
Demystifying Distributed Graph ProcessingDemystifying Distributed Graph Processing
Demystifying Distributed Graph Processing
 
Flink vs. Spark
Flink vs. SparkFlink vs. Spark
Flink vs. Spark
 

Similar to Like a Pack of Wolves: Community Structure of Web Trackers

Automatic Detection of Web Trackers by Vasia Kalavri
Automatic Detection of Web Trackers by Vasia KalavriAutomatic Detection of Web Trackers by Vasia Kalavri
Automatic Detection of Web Trackers by Vasia KalavriFlink Forward
 
Poster Abstracts
Poster AbstractsPoster Abstracts
Poster Abstractsbutest
 
Neo4j GraphTour New York_Leveraging Graphs for AI_Neo4j
Neo4j GraphTour New York_Leveraging Graphs for AI_Neo4jNeo4j GraphTour New York_Leveraging Graphs for AI_Neo4j
Neo4j GraphTour New York_Leveraging Graphs for AI_Neo4jNeo4j
 
Behavioral Data Mining to Produce Novel and Serendipitous Friend Recommendati...
Behavioral Data Mining to Produce Novel and Serendipitous Friend Recommendati...Behavioral Data Mining to Produce Novel and Serendipitous Friend Recommendati...
Behavioral Data Mining to Produce Novel and Serendipitous Friend Recommendati...Matteo Manca
 
Production knowledge imass-olhao_24-4-2014_en
Production knowledge imass-olhao_24-4-2014_enProduction knowledge imass-olhao_24-4-2014_en
Production knowledge imass-olhao_24-4-2014_enBO TRUE ACTIVITIES SL
 
Data mining java titles adrit solutions
Data mining java titles adrit solutionsData mining java titles adrit solutions
Data mining java titles adrit solutionsAdrit Techno Solutions
 
Extending the Stream/Table Duality into a Trinity, with Graphs (David Allen &...
Extending the Stream/Table Duality into a Trinity, with Graphs (David Allen &...Extending the Stream/Table Duality into a Trinity, with Graphs (David Allen &...
Extending the Stream/Table Duality into a Trinity, with Graphs (David Allen &...confluent
 
Experiments in Data Portability 2
Experiments in Data Portability 2Experiments in Data Portability 2
Experiments in Data Portability 2Glenn Jones
 
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...Finding Emerging Topics Using Chaos and Community Detection in Social Media G...
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...Paragon_Science_Inc
 
Cold-Start Management with Cross-Domain Collaborative Filtering and Tags
Cold-Start Management with Cross-Domain Collaborative Filtering and TagsCold-Start Management with Cross-Domain Collaborative Filtering and Tags
Cold-Start Management with Cross-Domain Collaborative Filtering and TagsMatthias Braunhofer
 
Survey in Online Social Media Skelton by Network based Spam
Survey in Online Social Media Skelton by Network based SpamSurvey in Online Social Media Skelton by Network based Spam
Survey in Online Social Media Skelton by Network based SpamIRJET Journal
 
Discovery Hub: on-the-fly linked data exploratory search
Discovery Hub: on-the-fly linked data exploratory searchDiscovery Hub: on-the-fly linked data exploratory search
Discovery Hub: on-the-fly linked data exploratory searchFabien Gandon
 
El camino hacia el éxito con las bases de datos de grafos, la ciencia de dato...
El camino hacia el éxito con las bases de datos de grafos, la ciencia de dato...El camino hacia el éxito con las bases de datos de grafos, la ciencia de dato...
El camino hacia el éxito con las bases de datos de grafos, la ciencia de dato...Neo4j
 
Social Networks Protection against Fake Profiles and Social Bots Attacks
Social Networks Protection against Fake Profiles and Social Bots AttacksSocial Networks Protection against Fake Profiles and Social Bots Attacks
Social Networks Protection against Fake Profiles and Social Bots AttacksDr. Mohamed Torky
 
Secure Spatial Top-k Query Processing via Untrusted Location- Based Service P...
Secure Spatial Top-k Query Processing via Untrusted Location- Based Service P...Secure Spatial Top-k Query Processing via Untrusted Location- Based Service P...
Secure Spatial Top-k Query Processing via Untrusted Location- Based Service P...1crore projects
 
PHISHING URL DETECTION USING MACHINE LEARNING
PHISHING URL DETECTION USING MACHINE LEARNINGPHISHING URL DETECTION USING MACHINE LEARNING
PHISHING URL DETECTION USING MACHINE LEARNINGIRJET Journal
 
Social networks protection against fake profiles and social bots attacks
Social networks protection against  fake profiles and social bots attacksSocial networks protection against  fake profiles and social bots attacks
Social networks protection against fake profiles and social bots attacksAboul Ella Hassanien
 
IRJET-A Novel Technic to Notice Spam Reviews on e-Shopping
IRJET-A Novel Technic to Notice Spam Reviews on e-ShoppingIRJET-A Novel Technic to Notice Spam Reviews on e-Shopping
IRJET-A Novel Technic to Notice Spam Reviews on e-ShoppingIRJET Journal
 

Similar to Like a Pack of Wolves: Community Structure of Web Trackers (20)

Automatic Detection of Web Trackers by Vasia Kalavri
Automatic Detection of Web Trackers by Vasia KalavriAutomatic Detection of Web Trackers by Vasia Kalavri
Automatic Detection of Web Trackers by Vasia Kalavri
 
Poster Abstracts
Poster AbstractsPoster Abstracts
Poster Abstracts
 
Neo4j GraphTour New York_Leveraging Graphs for AI_Neo4j
Neo4j GraphTour New York_Leveraging Graphs for AI_Neo4jNeo4j GraphTour New York_Leveraging Graphs for AI_Neo4j
Neo4j GraphTour New York_Leveraging Graphs for AI_Neo4j
 
Behavioral Data Mining to Produce Novel and Serendipitous Friend Recommendati...
Behavioral Data Mining to Produce Novel and Serendipitous Friend Recommendati...Behavioral Data Mining to Produce Novel and Serendipitous Friend Recommendati...
Behavioral Data Mining to Produce Novel and Serendipitous Friend Recommendati...
 
Production knowledge imass-olhao_24-4-2014_en
Production knowledge imass-olhao_24-4-2014_enProduction knowledge imass-olhao_24-4-2014_en
Production knowledge imass-olhao_24-4-2014_en
 
Data mining java titles adrit solutions
Data mining java titles adrit solutionsData mining java titles adrit solutions
Data mining java titles adrit solutions
 
Extending the Stream/Table Duality into a Trinity, with Graphs (David Allen &...
Extending the Stream/Table Duality into a Trinity, with Graphs (David Allen &...Extending the Stream/Table Duality into a Trinity, with Graphs (David Allen &...
Extending the Stream/Table Duality into a Trinity, with Graphs (David Allen &...
 
Experiments in Data Portability 2
Experiments in Data Portability 2Experiments in Data Portability 2
Experiments in Data Portability 2
 
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...Finding Emerging Topics Using Chaos and Community Detection in Social Media G...
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...
 
Cold-Start Management with Cross-Domain Collaborative Filtering and Tags
Cold-Start Management with Cross-Domain Collaborative Filtering and TagsCold-Start Management with Cross-Domain Collaborative Filtering and Tags
Cold-Start Management with Cross-Domain Collaborative Filtering and Tags
 
Survey in Online Social Media Skelton by Network based Spam
Survey in Online Social Media Skelton by Network based SpamSurvey in Online Social Media Skelton by Network based Spam
Survey in Online Social Media Skelton by Network based Spam
 
Discovery Hub: on-the-fly linked data exploratory search
Discovery Hub: on-the-fly linked data exploratory searchDiscovery Hub: on-the-fly linked data exploratory search
Discovery Hub: on-the-fly linked data exploratory search
 
El camino hacia el éxito con las bases de datos de grafos, la ciencia de dato...
El camino hacia el éxito con las bases de datos de grafos, la ciencia de dato...El camino hacia el éxito con las bases de datos de grafos, la ciencia de dato...
El camino hacia el éxito con las bases de datos de grafos, la ciencia de dato...
 
Konrad cedem praesi
Konrad cedem praesiKonrad cedem praesi
Konrad cedem praesi
 
paper_97
paper_97paper_97
paper_97
 
Social Networks Protection against Fake Profiles and Social Bots Attacks
Social Networks Protection against Fake Profiles and Social Bots AttacksSocial Networks Protection against Fake Profiles and Social Bots Attacks
Social Networks Protection against Fake Profiles and Social Bots Attacks
 
Secure Spatial Top-k Query Processing via Untrusted Location- Based Service P...
Secure Spatial Top-k Query Processing via Untrusted Location- Based Service P...Secure Spatial Top-k Query Processing via Untrusted Location- Based Service P...
Secure Spatial Top-k Query Processing via Untrusted Location- Based Service P...
 
PHISHING URL DETECTION USING MACHINE LEARNING
PHISHING URL DETECTION USING MACHINE LEARNINGPHISHING URL DETECTION USING MACHINE LEARNING
PHISHING URL DETECTION USING MACHINE LEARNING
 
Social networks protection against fake profiles and social bots attacks
Social networks protection against  fake profiles and social bots attacksSocial networks protection against  fake profiles and social bots attacks
Social networks protection against fake profiles and social bots attacks
 
IRJET-A Novel Technic to Notice Spam Reviews on e-Shopping
IRJET-A Novel Technic to Notice Spam Reviews on e-ShoppingIRJET-A Novel Technic to Notice Spam Reviews on e-Shopping
IRJET-A Novel Technic to Notice Spam Reviews on e-Shopping
 

Recently uploaded

Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 

Recently uploaded (20)

Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 

Like a Pack of Wolves: Community Structure of Web Trackers

  • 1. Like a Pack of Wolves: Community Structure of Web Trackers V. Kalavri, kalavri@kth.se (KTH Royal Institute of Technology) J. Blackburn, M. Varvello, K. Papagiannaki (Telefonica Research) Passive and Active Measurements Conference 31 March - 1 April 2016, Heraklion, Crete, Greece
  • 4. 4 The study's authors defined "creepiness" by the feeling consumers get when they sense an ad is too personal because it uses data the consumer did not agree to provide, such as online-search and browsing history. Consumers are even more creeped out by this because they don't know how and where that information will be used.
  • 5. 5
  • 6. Can’t we block them? proxy Tracker Tracker Ad Server 6 Legitimate site
  • 7. ● not frequently updated ● not sure who or based on what criteria URLs are blacklisted ● miss “hidden” trackers or dual-role nodes ● blocking requires manual matching against the list ● can you buy your way into the whitelist? Available Solutions AdBlock, DoNotTrack, EasyPrivacy: crowd-sourced “black lists” of tracker URLs 7
  • 8. 8
  • 9. Towards Automatic Tracker Detection Exploit fundamental properties of web tracker operation to automate tracker detection ● Structural attributes: network positions, connections ● Operational aspects: data exchanged, communication patterns 9
  • 10. DataSet 6 months (Nov 2014 - April 2015) of augmented Apache logs from a web proxy ● 80m requests ● 2m distinct URLs ● 3k users 10 ● User identification ● URL requested ● Headers ● Performance information, i.e. latency, bytes ● Tagged as Trackers or non-Trackers with EasyPrivacy
  • 11. Web Tracking as a Graph Problem 11 facebook.com youtube.com google-analytics.com b.scorecardresearch.com V: hosts U: Referers Referer-Hosts Graph U: URLs visited by the user V: embedded URLs
  • 12. Referer-Hosts Graph: Connected Components 12 94% of all trackers belong to the same connected component!
  • 13. Communities in Graphs 13 Vertices in the same community are likely to be similar with respect to network position and connectivity Do trackers form communities? Densely connected internally Sparsely connected with each other
  • 14. h2 h3 h4 h5 h6 h8 h7 h1 h3 h4 h5 h6 h1 h2 h7 h8 r1 r2 r3 r5 r6 r7 NT NT T T ? T NT NT r4 referer-hosts graph r1 r2 r3 r3 r3 r4 r5r6 r7 hosts-projection graph : referer : non-tracker host : tracker host : unlabeled host The Hosts-Projection Graph 14
  • 15. Hosts-Projection Graph: Degrees 15 #unique referers that tracker / other host are embedded within
  • 16. Hosts-Projection Graph: Tracker Neighbors 16 Trackers are mainly connected to other Trackers
  • 17. Web Tracker Communities 17 Popular trackers, e.g. google-analytics Smaller trackers Ad servers Normal webpages
  • 18. Data Pipeline raw logs cleaned logs 1: logs pre- processing 2: bipartite graph creation 3: largest connected component extraction 4: hosts- projection graph creation 5: community detection google-analytics.com: T bscored-research.com: T facebook.com: NT github.com: NT cdn.cxense.com: NT ... 6: results 18
  • 19. h5 h7 h8 h3 h4 h6 h2 h3 h4 h5 h6 h8 h7 h1 Classification via Neighborhood Analysis 19 : non-tracker host : tracker host : unlabeled host ⅖ non-tracker neighbors ⅗ tracker neighbors if % of tracker neighbors > threshold => classify as tracker
  • 21. Classification via Label Propagation non-tracker tracker unlabeled Iterative Algorithm for Community Detection ● Vertices propagate their labels to their neighbors and adopt the most popular label in their neighborhood. ● Upon convergence, vertices with the same label belong to the same community. ● If an unlabeled node ends up in a trackers community, it is classified as a tracker
  • 22. Classification via Label Propagation 2 3 4 5 6 8 7 1 i=0
  • 23. Classification via Label Propagation 2 4 5 6 8 7 1 i=1 {2} {1, 3} {2, 4, 5} {3, 5, 6} {4, 5}{3, 4, 6, 7}{5, 8} {7} 3 5 6 7 6 8 8 2 3
  • 24. Classification via Label Propagation 3 5 6 7 6 8 8 2 i=2 5 7 7 6 7 8 8 3 {3} {2, 5} {3, 6, 7} {5, 6, 7} {6, 7}{5, 6, 6, 8}{7, 8} {8}
  • 25. Classification via Label Propagation 5 7 7 6 7 8 8 3 i=3 7 7 7 7 7 8 8 5 {5} {3, 7} {5, 6, 7} {6, 7, 7} {6, 7}{7, 7, 7, 8}{6, 8} {8}
  • 26. Classification via Label Propagation 7 7 7 7 7 8 8 5 i=4 7 7 7 7 7 8 8 7 {7} {5, 7} {7, 7, 7} {7, 7, 7} {7, 7}{7, 7, 7, 8}{7, 8} {8}
  • 27. Classification via Label Propagation 7 7 7 7 7 8 8 7 7 7 7 7 7 8 8 7
  • 29. Conclusions ● Web trackers are well-connected with each other ○ 94% of web trackers are in the same connected component ● Web trackers are mainly connected to other trackers ○ High clustering, tight communities ● 97% classification accuracy and < 2% FPR with simple methods ○ Can be used to build robust and fully automated privacy preservation systems 29
  • 30. Like a Pack of Wolves: Community Structure of Web Trakcers V. Kalavri, kalavri@kth.se (KTH Royal Institute of Technology) J. Blackburn, M. Varvello, K. Papagiannaki (Telefonica Research) Passive and Active Measurements Conference 31 March - 1 April 2016, Heraklion, Crete, Greece
  • 32. Referer-Hosts Graph: Degrees 32 #unique referers that tracker / other hosts are embedded within