SlideShare a Scribd company logo
1 of 51
Download to read offline
Tracking the Trackers
Zhonghao Yu zhonghao@cliqz.com
Sam Macbeth sam@cliqz.com
Konark Modi konarkm@cliqz.com
Josep M. Pujol josep@cliqz.com
Page load triggers
requests to multiple 3rd
parties
Even on pages on sites
that you probably want
to keep private, like
this dating site.
Of course, general
news domains also load
many 3rd parties
as well as electronic
commerce sites like
Ebay
Twitter pages only
accessible to the
authenticated user also
load 3rd parties like GA
Twitter pages only
accessible to the
authenticated user also
load 3rd parties like GA
This browsing session
on 5 different sites
involved more than 60
different 3rd parties.
GET /css?family=Open+Sans+Condensed:300,700
Host: fonts.googleapis.com
User-Agent: Mozilla/5.0 ... Firefox/45.0
Referer: http://www.meetic.com/home/index.php
IP: 79.227.235.241
fonts.googleapis.com is a potential tracker
<meetic.com/home/index.php, UID>
<www20016.ca/, UID>
<wired.com/, UID>
However, in THIS request, there is no data element that can be used as a
UID.
Since there is no unsafe data element, the request is safe.
GET /impression.php/f3ae074XXX/api_key=597038480XXX&lid=115…
Host: www.facebook.com
User-Agent: Mozilla/5.0 … Firefox/45.0
Referer: http://www.meetic.com/home/index.php
Cookie: datr=0IPhVj5YHEJ20XXX; c_user=10973XXXX; … csm=2;
IP: 79.227.235.241
facebook.com is a potential tracker too,
<meetic.com/home/index.php, 10973XXXX>
<www20016.ca/, 10973XXXX>
<wired.com/, 10973XXXX>
<ebay-kleinanzeigen.de/s-muenchen/cyclocross/k0l6411r200, 10973XXXX>
Unlike fonts.googleapi.com, the request above is not safe with regards to
privacy because it contain two values that we consider unsafe, thus could be
used as UIDs,
c_user=10973XXXX and datr=0IPhVj5YHEJ20XXX
Because it contains at least one unsafe value, the request is considered unsafe.
GET /collect?
v=1&_v=j41&a=321948996&t=event&ni=0&_s=1&...&vp=1291x524&
..._u=QCCAAAABI~&jid=&cid=6531474...
Host: www.google-analytics.com
Referer: http://www.meetic.com/home/index.php
IP: 79.227.235.241
google-analytics.com is a potential tracker too,
<meetic.com/home/index.php, 1291x522:79.227.235.241>
<www20016.ca/, 1291x522:79.227.235.241>
<wired.com/, 1291x522:79.227.235.241>
<ebay-kleinanzeigen.de/s-muenchen/cyclocross/k0l6411r200,
1291x522:79.227.235.241>
<analytics.twitter.com/user/solso/home, 1291x522:79.227.235.241>
The UID is not as evident as for Facebook. But the combination vp+IP is an
unsafe data element, it can be used as a UID. Therefore this request is also
unsafe.
vp+IP = 1291x522:79.227.235.241
GET /collect?
v=1&_v=j41&a=321948996&t=event&ni=0&_s=1&...&vp=1291x524&
..._u=QCCAAAABI~&jid=&cid=6531474...
Host: www.google-analytics.com
Referer: http://www.meetic.com/home/index.php
IP: 79.227.235.241
google-analytics.com is a potential tracker too,
<meetic.com/home/index.php, 1291x522:79.227.235.241>
<www20016.ca/, 1291x522:79.227.235.241>
<wired.com/, 1291x522:79.227.235.241>
<ebay-kleinanzeigen.de/s-muenchen/cyclocross/k0l6411r200,
1291x522:79.227.235.241>
<analytics.twitter.com/user/solso/home, 1291x522:79.227.235.241>
The UID is not as evident as for Facebook. But the combination vp+IP is an
unsafe data element, it can be used as a UID. Therefore this request is also
unsafe.
vp+IP = 1291x522:79.227.235.241
Not a conveniently chosen example…
...tracking is a pervasive problem.
Tracking in the Wild
Largest field study with real traffic to date,
200,000 users in Germany for a week(*)
21M page loads,
5M unique pages (URLs)
from 350K domains
(*) Between 09/09/2015 and 16/09/2015
Tracking in the Wild: Prevalence
Potential trackers
are 3rd parties that are
present in many different
domains.
Unsafe data
elements
are data elements for which
we cannot rule out that
possibility that they are
UIDs.
21	M	
page	
loads	
without	
poten3al	
trackers	
with	
poten3al	
trackers	
1	to	9	 >=	10	
5% 95%
24%76%
Tracking in the Wild: Prevalence
Potential trackers
are 3rd parties that are
present in many different
domains.
Unsafe data
elements
are data elements for which
we cannot rule out that
possibility that they are
UIDs.
21	M	
page	
loads	
without	
unsafe	
values	
with	
unsafe	
values	
1	to	9	 >=	10	
22% 78%
21%79%
Tracking in the Wild: Prevalence
Potential trackers
are 3rd parties that are
loaded in many different
domains.
Unsafe values
are data elements for which
we cannot rule out that
possibility that they are
UIDs.
21	M	
page	
loads	
without	
unsafe	
values	
with	
unsafe	
values	
1	to	9	 >=	10	
22% 78%
21%79%
78%
of all page loads
can be tracked
Tracking in the Wild: Reach
% of page loads
seen
% of page loads seen with unsafe
data elements (tracking)
rank
Google 62.4% 42.4% 1st
Facebook 21.1% 18.5% 2nd
AppNexus 10.15% 9.9% 3rd
ADITION 8.7% 8.4% 4th
Criteo 8.7% 8.2% 5th
…
Comscore 6.1% 5.9% --
DoublePimp 0.5% 0.5% --
NewRelic 2% 0.03% --
…
Tracking in the Wild: Reach
% of page loads
seen
% of page loads seen with unsafe
data elements (tracking)
rank
Google 62.4% 42.4% 1st
Facebook 21.1% 18.5% 2nd
AppNexus 10.15 9.9% 3rd
ADITION 8.7% 8.4% 4th
Criteo 8.7% 8.2% 5th
…
Comscore 6.1% 5.9% --
DoublePimp 0.5% 0.5% --
NewRelic 2% 0.03% --
…
58
organizations
with a reach
larger than 1%
CLIQZ Tracking Protection
Maximize
coverage,
minimize false
positives
CLIQZ Tracking Protection
Maximize
coverage,
minimize false
positives
Aggressiveness is counter-productive…
•  increases site breakage, which forces users to add exceptions, thus
reducing protection coverage.
•  affects legitimate services and data collection
Block only the Ability to Track
GET /collect?
v=1&_v=j41&a=321948996&t=event&ni=0&_s=1&...&vp=1291x524&
..._u=QCCAAAABI~&jid=&cid=6531474...
Host: www.google-analytics.com
Referer: http://www.meetic.com/home/index.php
IP: 79.227.235.241
Intervention only on unsafe data elements – those
elements that can be used as UIDs,
Should protect the user, while minimizing side-effects:
a)  site-breakage for users
b) legitimate data collection for 3rd parties
Blocklists are coarse-grained
CDF of the number of requests with observed unsafe data elements by 3rd
party domains contained both in Disconnect Blocklist and CLIQZ list of
potential trackers (~2000 domains each). Intersection is 477 domains.
Blocklists are coarse-grained
CDF of the number of requests with observed unsafe data elements by 3rd
party domains contained both in Disconnect Blocklist and CLIQZ list of
potential trackers (~2000 domains each). Intersection is 477 domains.
Only 2% of
tracker
domains in
Disconnect
always send
unsafe data
elements.
Blocklists are coarse-grained
CDF of the number of requests with observed unsafe data elements by 3rd
party domains contained both in Disconnect Blocklist and CLIQZ list of
potential trackers (~2000 domains each). Intersection is 477 domains.
98% of tracker domains
have a MIXED
behavior
Lack of resolution…
Only 2% of
tracker
domains is
Disconnect
always send
unsafe data
elements.
Blocklists are coarse-grained
Blocklists by domain (reverse suffix) are too coarse-grained.
BLOCKLIST by Domain
Blocklists are too coarse-grained
EasyPrivacy (from Adblock Plus) has hundreds of regular
expressions to cover for mixed behavior of trackers.
BLOCKLIST by Domain + RegExp Exceptions
Blocklists are too coarse-grained
BLOCKLIST by Domain + More RegExp Exceptions
EasyPrivacy (from Adblock Plus) has hundreds of regular
expressions to cover for mixed behavior of trackers.
We propose a more fine-grained approach
to algorithmically determine the safeness
level of individual data elements within a
request to a 3rd party
Determining Safeness
Each 3rd party request to a potential tracker is parsed to obtain a list of tuples
T = [<s, d, k, v>] whose safeness level is evaluated in real-time,
T = [
<s=wired.com/, d=3rdparty.com, k=z, v=1501498154>,
<s=wired.com/, d=3rdparty.com, k=fl,v=21.0>,
<s=wired.com/, d=3rdparty.com, k=u, v=CCAAAABI>,
<s=wired.com/, d=3rdparty.com, k=vr,v=1440x900>,
<s=wired.com/, d=3rdparty.com, k=ua,v=3FeFF2301E>,
<s=wired.com/, d=3rdparty.com, k=vp,v=1322x781>,
<s=wired.com/, d=3rdparty.com, k=c7,v=e9d4a7e4d2185cec>,
]
The aim is to identify which data elements (including combinations) are unsafe,
and therefore, they are candidates to be used as UIDs.
Determining Safeness
Each 3rd party request to a potential tracker is parsed to obtain a list of tuples
T = [<s, d, k, v>] whose safeness level is evaluated in real-time,
T = [
<s=wired.com/, d=3rdparty.com, k=z, v=1501498154>,
<s=wired.com/, d=3rdparty.com, k=fl,v=21.0>,
<s=wired.com/, d=3rdparty.com, k=u, v=CCAAAABI>,
<s=wired.com/, d=3rdparty.com, k=vr,v=1440x900>,
<s=wired.com/, d=3rdparty.com, k=ua,v=3FeFF2301E>,
<s=wired.com/, d=3rdparty.com, k=vp,v=1322x781>,
<s=wired.com/, d=3rdparty.com, k=c7,v=e9d4a7e4d2185cec>,
]
The aim is to identify which data elements (including combinations) are unsafe,
and therefore, they are candidates to be used as UIDs.
	
	
	
	
We cannot do this effectively. But we can do the opposite, identify data
elements that cannot be used effectively as UIDs, and consider them safe.
Determining Safeness
T = [
<s=w..., d=3rdparty.com, k=z, v=1501498154>,
<s=w..., d=3rdparty.com, k=fl,v=21.0>,
<s=w..., d=3rdparty.com, k=u, v=CCAAAABI>,
<s=w..., d=3rdparty.com, k=vr,v=1440x900>,
<s=w..., d=3rdparty.com, k=ua,v=3FeFF2301E>,
<s=w..., d=3rdparty.com, k=vp,v=1322x781>,
<s=w..., d=3rdparty.com, k=c7,v=e9d4a7e4d2185cec>,
]
All tuples are
UNSAFE by
default unless we
can determine
that the given
data-element is
not a good UID,
hence safe.
Determining Safeness
T = [
<s=w..., d=3rdparty.com, k=z, v=1501498154>,
<s=w..., d=3rdparty.com, k=fl,v=21.0>,
<s=w..., d=3rdparty.com, k=u, v=CCAAAABI>,
<s=w..., d=3rdparty.com, k=vr,v=1440x900>,
<s=w..., d=3rdparty.com, k=ua,v=3FeFF2301E>,
<s=w..., d=3rdparty.com, k=vp,v=1322x781>,
<s=w..., d=3rdparty.com, k=c7,v=e9d4a7e4d2185cec>,
]
The value
1501498154 has
never been seen
before
for <d, k>.
Thus, cannot be
used as UID =>
SAFE
Determining Safeness
T = [
<s=w..., d=3rdparty.com, k=z, v=1501498154>,
<s=w..., d=3rdparty.com, k=fl,v=21.0>,
<s=w..., d=3rdparty.com, k=u, v=CCAAAABI>,
<s=w..., d=3rdparty.com, k=vr,v=1440x900>,
<s=w..., d=3rdparty.com, k=ua,v=3FeFF2301E>,
<s=w..., d=3rdparty.com, k=vp,v=1322x781>,
<s=w..., d=3rdparty.com, k=c7,v=e9d4a7e4d2185cec>,
]
The value 21.0 is
to short to encode
any UID => SAFE
Determining Safeness
T = [
<s=w..., d=3rdparty.com, k=z, v=1501498154>,
<s=w..., d=3rdparty.com, k=fl,v=21.0>,
<s=w..., d=3rdparty.com, k=u, v=CCAAAABI>,
<s=w..., d=3rdparty.com, k=vr,v=1440x900>,
<s=w..., d=3rdparty.com, k=ua,v=3FeFF2301E>,
<s=w..., d=3rdparty.com, k=vp,v=1322x781>,
<s=w..., d=3rdparty.com, k=c7,v=e9d4a7e4d2185cec>,
]
More than 3
different values in
less than 2 days
by the same tuple
<d,k>.
Not persistent,
bad UID => SAFE
Determining Safeness
T = [
<s=w..., d=3rdparty.com, k=z, v=1501498154>,
<s=w..., d=3rdparty.com, k=fl,v=21.0>,
<s=w..., d=3rdparty.com, k=u, v=CCAAAABI>,
<s=w..., d=3rdparty.com, k=vr,v=1440x900>,
<s=w..., d=3rdparty.com, k=ua,v=3FeFF2301E>,
<s=w..., d=3rdparty.com, k=vp,v=1322x781>,
<s=w..., d=3rdparty.com, k=c7,v=e9d4a7e4d2185cec>,
]
Always the same
value for
<d,k>.
We cannot rule
out that the data-
elements are UID
=> keep as
UNSAFE
Only using local information is not
enough; vr=1440x1024 is not a UID…
We need something extra.
Determining Safeness
T = [
<s=w..., d=3rdparty.com, k=z, v=1501498154>,
<s=w..., d=3rdparty.com, k=fl,v=21.0>,
<s=w..., d=3rdparty.com, k=u, v=CCAAAABI>,
<s=w..., d=3rdparty.com, k=vr,v=1440x900>,
<s=w..., d=3rdparty.com, k=ua,v=3FeFF2301E>,
<s=w..., d=3rdparty.com, k=vp,v=1322x781>,
<s=w..., d=3rdparty.com, k=c7,v=e9d4a7e4d2185cec>,
]
Locally UNSAFE, i.e. always the same
value for <d, k>.
Globally SAFE since more than 20 other
users have observed the same value
1440x900 for tuple <d,k> =
<3rdparty.com,vr> in the last 2 days.
Determining Safeness
T = [
<s=w..., d=3rdparty.com, k=z, v=1501498154>,
<s=w..., d=3rdparty.com, k=fl,v=21.0>,
<s=w..., d=3rdparty.com, k=u, v=CCAAAABI>,
<s=w..., d=3rdparty.com, k=vr,v=1440x900>,
<s=w..., d=3rdparty.com, k=ua,v=3FeFF2301E>,
<s=w..., d=3rdparty.com, k=vp,v=1322x781>,
<s=w..., d=3rdparty.com, k=c7,v=e9d4a7e4d2185cec>,
]
Locally UNSAFE, i.e. always the same
value for tuple <d,k>.
Globally SAFE since it has reach
the safeness-quorum based on
k-Anonymity.
Determining Safeness
T = [
<s=w..., d=3rdparty.com, k=z, v=1501498154>,
<s=w..., d=3rdparty.com, k=fl,v=21.0>,
<s=w..., d=3rdparty.com, k=u, v=CCAAAABI>,
<s=w..., d=3rdparty.com, k=vr,v=1440x900>,
<s=w..., d=3rdparty.com, k=ua,v=3FeFF2301E>,
<s=w..., d=3rdparty.com, k=vp,v=1322x781>,
<s=w..., d=3rdparty.com, k=c7,v=e9d4a7e4d2185cec>,
]
Locally UNSAFE, i.e. always the same value for <d, k>.
Globally UNSAFE not enough people has seen the value
for <d, k>, always same <d, k, u>. Not safe to send.
Two options:
a)  it is a UID, or an element that could be used as such.
b)  a false positive due to the Transient State (0.07%)
Determining Safeness
T = [
<s=w..., d=3rdparty.com, k=z, v=1501498154>,
<s=w..., d=3rdparty.com, k=fl,v=21.0>,
<s=w..., d=3rdparty.com, k=u, v=CCAAAABI>,
<s=w..., d=3rdparty.com, k=vr,v=1440x900>,
<s=w..., d=3rdparty.com, k=ua,v=3FeFF2301E>,
<s=w..., d=3rdparty.com, k=vp,v=1322x781>,
<s=w..., d=3rdparty.com, k=c7,v=e9d4a7e4d2185cec>,
]
Locally UNSAFE and Globally UNSAFE
At this point the request analysis is complete:
1)  ALLOW Request removing unsafe data-elements
2)  ALLOW Request obfuscating unsafe data-elements
3)  BLOCK Request or ALLOW Request without alteration
Safeness Quorum without Tracking
To determine that a data-element is globally safe we need to count the number of
unique users that have observed a tuple
<d,k,v> e.g. <d=3rdparty.com, k=c7,v=e9d4a7e4d2185cec>
Users could share tuples with a field that identifies them (u),
<u=usrXXX, d=3rdparty.com, k=c7,v=e9d4a7e4d2185cec>
with CLIQZ. But that would make CLIQZ a tracker! Instead, each user sends the
tuple – if observed – once and only once per hour:
<d=3rdparty.com, k=c7,v=e9d4a7e4d2185cec>
Actual values are not needed; counting and membership test on GWL
<d=ed5c0cf7b05572eb, k=4d3a21d8c684c09c19b93be911827fd5,
v=e60f936dc719ca649a80a97490a09940>
Evaluation: Protection Coverage
Requests
Blocked
False positives
ratio (requests
blocked without
unsafe data-
elements)
Protection Misses
(requests allowed
with unsafe data-
elements)
CLIQZ 51.7% -- --
Disconnect 66.1% 38.8% 12.3%
Kontaxis & Chew
(Firefox Tracking
Protection) [est.]
36.6% 29.4% 25.4%
Evaluation: Site Breakage
Reload Rate % Increase
over baseline
% Increase
over CLIQZ
BASELINE
(without tracking
protection)
0.00101 -- --
CLIQZ 0.00104 4% --
Adblock Plus
(counting exceptions added by users)
0.00110 10% 150%
CLIQZ as
Blocklist
0.00125 25% 525%
Conclusions
Tracking is a BIG problem
–  Privacy is seriously at risk
Tracking Protection is not an easy task
–  Trade-off between site breakage and protection
coverage
Blocklist-based approaches have limitations
–  Maintainability
–  Coarse-grained resolution
–  Too many false positives
CLIQZ tracking protection addresses them to a large extent
Future Work
CLIQZ tracking protection
might be better than the
state-of-the-art. But it is far
from perfect,
•  still produces site-
breakages
•  protection coverage is
not 100%
•  it can be attacked in
multiple ways
[Picture from http://mtthwhgn.com/tag/flooding/]
we provide a bigger hammer for the whack-a-tracker
Thanks a lot!
Q&A
Zhonghao Yu Sam Macbeth Konark Modi
Appendix
Implementation Details
Realtime Component
1) Parsing request
2) Local safeness:
membership test on LWL
3) Global safeness:
membership test on GWL
LWL and GWL are Bloom
Filters, combined less
than < 512KB, FP ratio
of 0.1%.
Takes about 1-12 ms.
Offline Component
Data from users needs to be sent to CLIQZ to build
GWL for the safeness quorum.
GWL needs to be sent back to the users’ browsers.
We use an eventual consistency model with
incremental updates over daily snapshots.
Bandwidth costs per user per day: 90KB upload,
566KB download. For a worse-case propagation lag
of 10 minutes.
False positive unsafe data elements due to
transient state is 0.07%
Determining Safeness
T = [
<s=w..., d=3rdparty.com, k=z, v=1501498154>,
<s=w..., d=3rdparty.com, k=fl,v=21.0>,
<s=w..., d=3rdparty.com, k=u, v=CCAAAABI>,
<s=w..., d=3rdparty.com, k=vr,v=1440x1024>,
<s=w..., d=3rdparty.com, k=ua,v=3FeFF2301E>,
<s=w..., d=3rdparty.com, k=vp,v=1322x981>,
<s=w..., d=3rdparty.com, k=c7,v=e9d4a7e4d2185cec>,
]
Cookies from potential trackers
are always blocked.
POST requests are also analyzed,
blocked only if:
•  match Cookie values
•  match QS values declared
unsafe
•  match values from browser-
fingerprinting
User initiated actions are always
ALLOWED (even if tracking)
Protection Coverage
Unsafe Data Origins

More Related Content

Similar to Tracking The Trackers WWW 2016

Analyzing the effectualness of Phishing Algorithms in Web Applications Inques...
Analyzing the effectualness of Phishing Algorithms in Web Applications Inques...Analyzing the effectualness of Phishing Algorithms in Web Applications Inques...
Analyzing the effectualness of Phishing Algorithms in Web Applications Inques...Editor IJMTER
 
Pennington - Defending Against Targeted Ransomware with MITRE ATT&CK
Pennington - Defending Against Targeted Ransomware with MITRE ATT&CKPennington - Defending Against Targeted Ransomware with MITRE ATT&CK
Pennington - Defending Against Targeted Ransomware with MITRE ATT&CKAdam Pennington
 
RH-ISAC Summit 2019 - Adam Pennington - Leveraging MITRE ATT&CK™ for Detectio...
RH-ISAC Summit 2019 - Adam Pennington - Leveraging MITRE ATT&CK™ for Detectio...RH-ISAC Summit 2019 - Adam Pennington - Leveraging MITRE ATT&CK™ for Detectio...
RH-ISAC Summit 2019 - Adam Pennington - Leveraging MITRE ATT&CK™ for Detectio...Adam Pennington
 
Sandbox kiev
Sandbox kievSandbox kiev
Sandbox kievuisgslide
 
RH-ISAC Summit 2019 - Adam Pennington - Leveraging MITRE ATT&CK™ for Detectio...
RH-ISAC Summit 2019 - Adam Pennington - Leveraging MITRE ATT&CK™ for Detectio...RH-ISAC Summit 2019 - Adam Pennington - Leveraging MITRE ATT&CK™ for Detectio...
RH-ISAC Summit 2019 - Adam Pennington - Leveraging MITRE ATT&CK™ for Detectio...Adam Pennington
 
Symantec Webinar | How to Detect Targeted Ransomware with MITRE ATT&CK
Symantec Webinar | How to Detect Targeted Ransomware with MITRE ATT&CKSymantec Webinar | How to Detect Targeted Ransomware with MITRE ATT&CK
Symantec Webinar | How to Detect Targeted Ransomware with MITRE ATT&CKSymantec
 
Web Application Security
Web Application SecurityWeb Application Security
Web Application SecurityColin English
 
Owasp Top 10 - Owasp Pune Chapter - January 2008
Owasp Top 10 - Owasp Pune Chapter - January 2008Owasp Top 10 - Owasp Pune Chapter - January 2008
Owasp Top 10 - Owasp Pune Chapter - January 2008abhijitapatil
 
Anatomy of an Advanced Retail Breach
Anatomy of an Advanced Retail BreachAnatomy of an Advanced Retail Breach
Anatomy of an Advanced Retail BreachIBM Security
 
Automatic Detection of Web Trackers by Vasia Kalavri
Automatic Detection of Web Trackers by Vasia KalavriAutomatic Detection of Web Trackers by Vasia Kalavri
Automatic Detection of Web Trackers by Vasia KalavriFlink Forward
 
Gov.uscourts.mied.350905.1.15
Gov.uscourts.mied.350905.1.15Gov.uscourts.mied.350905.1.15
Gov.uscourts.mied.350905.1.15ELIAS OMEGA
 
Slideshare.net rh-isac summit 2019 - adam pennington - leveraging mitre at ta...
Slideshare.net rh-isac summit 2019 - adam pennington - leveraging mitre at ta...Slideshare.net rh-isac summit 2019 - adam pennington - leveraging mitre at ta...
Slideshare.net rh-isac summit 2019 - adam pennington - leveraging mitre at ta...Robert Brandel
 
FBI & Secret Service- Business Email Compromise Workshop
FBI & Secret Service- Business Email Compromise WorkshopFBI & Secret Service- Business Email Compromise Workshop
FBI & Secret Service- Business Email Compromise WorkshopErnest Staats
 
Automation: The Wonderful Wizard of CTI (or is it?)
Automation: The Wonderful Wizard of CTI (or is it?) Automation: The Wonderful Wizard of CTI (or is it?)
Automation: The Wonderful Wizard of CTI (or is it?) MITRE ATT&CK
 
When developers api simplify user mode rootkits development – part ii
When developers api simplify user mode rootkits development – part iiWhen developers api simplify user mode rootkits development – part ii
When developers api simplify user mode rootkits development – part iiYury Chemerkin
 
OSINT Basics for Threat Hunters and Practitioners
OSINT Basics for Threat Hunters and PractitionersOSINT Basics for Threat Hunters and Practitioners
OSINT Basics for Threat Hunters and PractitionersMegan DeBlois
 
IRJET - Detection and Prevention of Phishing Websites using Machine Learning ...
IRJET - Detection and Prevention of Phishing Websites using Machine Learning ...IRJET - Detection and Prevention of Phishing Websites using Machine Learning ...
IRJET - Detection and Prevention of Phishing Websites using Machine Learning ...IRJET Journal
 

Similar to Tracking The Trackers WWW 2016 (20)

Analyzing the effectualness of Phishing Algorithms in Web Applications Inques...
Analyzing the effectualness of Phishing Algorithms in Web Applications Inques...Analyzing the effectualness of Phishing Algorithms in Web Applications Inques...
Analyzing the effectualness of Phishing Algorithms in Web Applications Inques...
 
Pennington - Defending Against Targeted Ransomware with MITRE ATT&CK
Pennington - Defending Against Targeted Ransomware with MITRE ATT&CKPennington - Defending Against Targeted Ransomware with MITRE ATT&CK
Pennington - Defending Against Targeted Ransomware with MITRE ATT&CK
 
RH-ISAC Summit 2019 - Adam Pennington - Leveraging MITRE ATT&CK™ for Detectio...
RH-ISAC Summit 2019 - Adam Pennington - Leveraging MITRE ATT&CK™ for Detectio...RH-ISAC Summit 2019 - Adam Pennington - Leveraging MITRE ATT&CK™ for Detectio...
RH-ISAC Summit 2019 - Adam Pennington - Leveraging MITRE ATT&CK™ for Detectio...
 
Sandbox kiev
Sandbox kievSandbox kiev
Sandbox kiev
 
RH-ISAC Summit 2019 - Adam Pennington - Leveraging MITRE ATT&CK™ for Detectio...
RH-ISAC Summit 2019 - Adam Pennington - Leveraging MITRE ATT&CK™ for Detectio...RH-ISAC Summit 2019 - Adam Pennington - Leveraging MITRE ATT&CK™ for Detectio...
RH-ISAC Summit 2019 - Adam Pennington - Leveraging MITRE ATT&CK™ for Detectio...
 
Symantec Webinar | How to Detect Targeted Ransomware with MITRE ATT&CK
Symantec Webinar | How to Detect Targeted Ransomware with MITRE ATT&CKSymantec Webinar | How to Detect Targeted Ransomware with MITRE ATT&CK
Symantec Webinar | How to Detect Targeted Ransomware with MITRE ATT&CK
 
Web Application Security
Web Application SecurityWeb Application Security
Web Application Security
 
Owasp Top 10 - Owasp Pune Chapter - January 2008
Owasp Top 10 - Owasp Pune Chapter - January 2008Owasp Top 10 - Owasp Pune Chapter - January 2008
Owasp Top 10 - Owasp Pune Chapter - January 2008
 
Com Ed 8 Finals
Com Ed 8 FinalsCom Ed 8 Finals
Com Ed 8 Finals
 
Penetration testing by Burpsuite
Penetration testing by  BurpsuitePenetration testing by  Burpsuite
Penetration testing by Burpsuite
 
Anatomy of an Advanced Retail Breach
Anatomy of an Advanced Retail BreachAnatomy of an Advanced Retail Breach
Anatomy of an Advanced Retail Breach
 
Automatic Detection of Web Trackers by Vasia Kalavri
Automatic Detection of Web Trackers by Vasia KalavriAutomatic Detection of Web Trackers by Vasia Kalavri
Automatic Detection of Web Trackers by Vasia Kalavri
 
Gov.uscourts.mied.350905.1.15
Gov.uscourts.mied.350905.1.15Gov.uscourts.mied.350905.1.15
Gov.uscourts.mied.350905.1.15
 
Slideshare.net rh-isac summit 2019 - adam pennington - leveraging mitre at ta...
Slideshare.net rh-isac summit 2019 - adam pennington - leveraging mitre at ta...Slideshare.net rh-isac summit 2019 - adam pennington - leveraging mitre at ta...
Slideshare.net rh-isac summit 2019 - adam pennington - leveraging mitre at ta...
 
FBI & Secret Service- Business Email Compromise Workshop
FBI & Secret Service- Business Email Compromise WorkshopFBI & Secret Service- Business Email Compromise Workshop
FBI & Secret Service- Business Email Compromise Workshop
 
Repost _Healthcare
Repost _HealthcareRepost _Healthcare
Repost _Healthcare
 
Automation: The Wonderful Wizard of CTI (or is it?)
Automation: The Wonderful Wizard of CTI (or is it?) Automation: The Wonderful Wizard of CTI (or is it?)
Automation: The Wonderful Wizard of CTI (or is it?)
 
When developers api simplify user mode rootkits development – part ii
When developers api simplify user mode rootkits development – part iiWhen developers api simplify user mode rootkits development – part ii
When developers api simplify user mode rootkits development – part ii
 
OSINT Basics for Threat Hunters and Practitioners
OSINT Basics for Threat Hunters and PractitionersOSINT Basics for Threat Hunters and Practitioners
OSINT Basics for Threat Hunters and Practitioners
 
IRJET - Detection and Prevention of Phishing Websites using Machine Learning ...
IRJET - Detection and Prevention of Phishing Websites using Machine Learning ...IRJET - Detection and Prevention of Phishing Websites using Machine Learning ...
IRJET - Detection and Prevention of Phishing Websites using Machine Learning ...
 

Recently uploaded

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 

Recently uploaded (20)

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 

Tracking The Trackers WWW 2016

  • 1. Tracking the Trackers Zhonghao Yu zhonghao@cliqz.com Sam Macbeth sam@cliqz.com Konark Modi konarkm@cliqz.com Josep M. Pujol josep@cliqz.com
  • 2.
  • 3. Page load triggers requests to multiple 3rd parties
  • 4. Even on pages on sites that you probably want to keep private, like this dating site.
  • 5. Of course, general news domains also load many 3rd parties
  • 6. as well as electronic commerce sites like Ebay
  • 7. Twitter pages only accessible to the authenticated user also load 3rd parties like GA
  • 8. Twitter pages only accessible to the authenticated user also load 3rd parties like GA This browsing session on 5 different sites involved more than 60 different 3rd parties.
  • 9. GET /css?family=Open+Sans+Condensed:300,700 Host: fonts.googleapis.com User-Agent: Mozilla/5.0 ... Firefox/45.0 Referer: http://www.meetic.com/home/index.php IP: 79.227.235.241 fonts.googleapis.com is a potential tracker <meetic.com/home/index.php, UID> <www20016.ca/, UID> <wired.com/, UID> However, in THIS request, there is no data element that can be used as a UID. Since there is no unsafe data element, the request is safe.
  • 10. GET /impression.php/f3ae074XXX/api_key=597038480XXX&lid=115… Host: www.facebook.com User-Agent: Mozilla/5.0 … Firefox/45.0 Referer: http://www.meetic.com/home/index.php Cookie: datr=0IPhVj5YHEJ20XXX; c_user=10973XXXX; … csm=2; IP: 79.227.235.241 facebook.com is a potential tracker too, <meetic.com/home/index.php, 10973XXXX> <www20016.ca/, 10973XXXX> <wired.com/, 10973XXXX> <ebay-kleinanzeigen.de/s-muenchen/cyclocross/k0l6411r200, 10973XXXX> Unlike fonts.googleapi.com, the request above is not safe with regards to privacy because it contain two values that we consider unsafe, thus could be used as UIDs, c_user=10973XXXX and datr=0IPhVj5YHEJ20XXX Because it contains at least one unsafe value, the request is considered unsafe.
  • 11. GET /collect? v=1&_v=j41&a=321948996&t=event&ni=0&_s=1&...&vp=1291x524& ..._u=QCCAAAABI~&jid=&cid=6531474... Host: www.google-analytics.com Referer: http://www.meetic.com/home/index.php IP: 79.227.235.241 google-analytics.com is a potential tracker too, <meetic.com/home/index.php, 1291x522:79.227.235.241> <www20016.ca/, 1291x522:79.227.235.241> <wired.com/, 1291x522:79.227.235.241> <ebay-kleinanzeigen.de/s-muenchen/cyclocross/k0l6411r200, 1291x522:79.227.235.241> <analytics.twitter.com/user/solso/home, 1291x522:79.227.235.241> The UID is not as evident as for Facebook. But the combination vp+IP is an unsafe data element, it can be used as a UID. Therefore this request is also unsafe. vp+IP = 1291x522:79.227.235.241
  • 12. GET /collect? v=1&_v=j41&a=321948996&t=event&ni=0&_s=1&...&vp=1291x524& ..._u=QCCAAAABI~&jid=&cid=6531474... Host: www.google-analytics.com Referer: http://www.meetic.com/home/index.php IP: 79.227.235.241 google-analytics.com is a potential tracker too, <meetic.com/home/index.php, 1291x522:79.227.235.241> <www20016.ca/, 1291x522:79.227.235.241> <wired.com/, 1291x522:79.227.235.241> <ebay-kleinanzeigen.de/s-muenchen/cyclocross/k0l6411r200, 1291x522:79.227.235.241> <analytics.twitter.com/user/solso/home, 1291x522:79.227.235.241> The UID is not as evident as for Facebook. But the combination vp+IP is an unsafe data element, it can be used as a UID. Therefore this request is also unsafe. vp+IP = 1291x522:79.227.235.241
  • 13. Not a conveniently chosen example… ...tracking is a pervasive problem.
  • 14. Tracking in the Wild Largest field study with real traffic to date, 200,000 users in Germany for a week(*) 21M page loads, 5M unique pages (URLs) from 350K domains (*) Between 09/09/2015 and 16/09/2015
  • 15. Tracking in the Wild: Prevalence Potential trackers are 3rd parties that are present in many different domains. Unsafe data elements are data elements for which we cannot rule out that possibility that they are UIDs. 21 M page loads without poten3al trackers with poten3al trackers 1 to 9 >= 10 5% 95% 24%76%
  • 16. Tracking in the Wild: Prevalence Potential trackers are 3rd parties that are present in many different domains. Unsafe data elements are data elements for which we cannot rule out that possibility that they are UIDs. 21 M page loads without unsafe values with unsafe values 1 to 9 >= 10 22% 78% 21%79%
  • 17. Tracking in the Wild: Prevalence Potential trackers are 3rd parties that are loaded in many different domains. Unsafe values are data elements for which we cannot rule out that possibility that they are UIDs. 21 M page loads without unsafe values with unsafe values 1 to 9 >= 10 22% 78% 21%79% 78% of all page loads can be tracked
  • 18. Tracking in the Wild: Reach % of page loads seen % of page loads seen with unsafe data elements (tracking) rank Google 62.4% 42.4% 1st Facebook 21.1% 18.5% 2nd AppNexus 10.15% 9.9% 3rd ADITION 8.7% 8.4% 4th Criteo 8.7% 8.2% 5th … Comscore 6.1% 5.9% -- DoublePimp 0.5% 0.5% -- NewRelic 2% 0.03% -- …
  • 19. Tracking in the Wild: Reach % of page loads seen % of page loads seen with unsafe data elements (tracking) rank Google 62.4% 42.4% 1st Facebook 21.1% 18.5% 2nd AppNexus 10.15 9.9% 3rd ADITION 8.7% 8.4% 4th Criteo 8.7% 8.2% 5th … Comscore 6.1% 5.9% -- DoublePimp 0.5% 0.5% -- NewRelic 2% 0.03% -- … 58 organizations with a reach larger than 1%
  • 21. CLIQZ Tracking Protection Maximize coverage, minimize false positives Aggressiveness is counter-productive… •  increases site breakage, which forces users to add exceptions, thus reducing protection coverage. •  affects legitimate services and data collection
  • 22. Block only the Ability to Track GET /collect? v=1&_v=j41&a=321948996&t=event&ni=0&_s=1&...&vp=1291x524& ..._u=QCCAAAABI~&jid=&cid=6531474... Host: www.google-analytics.com Referer: http://www.meetic.com/home/index.php IP: 79.227.235.241 Intervention only on unsafe data elements – those elements that can be used as UIDs, Should protect the user, while minimizing side-effects: a)  site-breakage for users b) legitimate data collection for 3rd parties
  • 23. Blocklists are coarse-grained CDF of the number of requests with observed unsafe data elements by 3rd party domains contained both in Disconnect Blocklist and CLIQZ list of potential trackers (~2000 domains each). Intersection is 477 domains.
  • 24. Blocklists are coarse-grained CDF of the number of requests with observed unsafe data elements by 3rd party domains contained both in Disconnect Blocklist and CLIQZ list of potential trackers (~2000 domains each). Intersection is 477 domains. Only 2% of tracker domains in Disconnect always send unsafe data elements.
  • 25. Blocklists are coarse-grained CDF of the number of requests with observed unsafe data elements by 3rd party domains contained both in Disconnect Blocklist and CLIQZ list of potential trackers (~2000 domains each). Intersection is 477 domains. 98% of tracker domains have a MIXED behavior Lack of resolution… Only 2% of tracker domains is Disconnect always send unsafe data elements.
  • 26. Blocklists are coarse-grained Blocklists by domain (reverse suffix) are too coarse-grained. BLOCKLIST by Domain
  • 27. Blocklists are too coarse-grained EasyPrivacy (from Adblock Plus) has hundreds of regular expressions to cover for mixed behavior of trackers. BLOCKLIST by Domain + RegExp Exceptions
  • 28. Blocklists are too coarse-grained BLOCKLIST by Domain + More RegExp Exceptions EasyPrivacy (from Adblock Plus) has hundreds of regular expressions to cover for mixed behavior of trackers.
  • 29. We propose a more fine-grained approach to algorithmically determine the safeness level of individual data elements within a request to a 3rd party
  • 30. Determining Safeness Each 3rd party request to a potential tracker is parsed to obtain a list of tuples T = [<s, d, k, v>] whose safeness level is evaluated in real-time, T = [ <s=wired.com/, d=3rdparty.com, k=z, v=1501498154>, <s=wired.com/, d=3rdparty.com, k=fl,v=21.0>, <s=wired.com/, d=3rdparty.com, k=u, v=CCAAAABI>, <s=wired.com/, d=3rdparty.com, k=vr,v=1440x900>, <s=wired.com/, d=3rdparty.com, k=ua,v=3FeFF2301E>, <s=wired.com/, d=3rdparty.com, k=vp,v=1322x781>, <s=wired.com/, d=3rdparty.com, k=c7,v=e9d4a7e4d2185cec>, ] The aim is to identify which data elements (including combinations) are unsafe, and therefore, they are candidates to be used as UIDs.
  • 31. Determining Safeness Each 3rd party request to a potential tracker is parsed to obtain a list of tuples T = [<s, d, k, v>] whose safeness level is evaluated in real-time, T = [ <s=wired.com/, d=3rdparty.com, k=z, v=1501498154>, <s=wired.com/, d=3rdparty.com, k=fl,v=21.0>, <s=wired.com/, d=3rdparty.com, k=u, v=CCAAAABI>, <s=wired.com/, d=3rdparty.com, k=vr,v=1440x900>, <s=wired.com/, d=3rdparty.com, k=ua,v=3FeFF2301E>, <s=wired.com/, d=3rdparty.com, k=vp,v=1322x781>, <s=wired.com/, d=3rdparty.com, k=c7,v=e9d4a7e4d2185cec>, ] The aim is to identify which data elements (including combinations) are unsafe, and therefore, they are candidates to be used as UIDs. We cannot do this effectively. But we can do the opposite, identify data elements that cannot be used effectively as UIDs, and consider them safe.
  • 32. Determining Safeness T = [ <s=w..., d=3rdparty.com, k=z, v=1501498154>, <s=w..., d=3rdparty.com, k=fl,v=21.0>, <s=w..., d=3rdparty.com, k=u, v=CCAAAABI>, <s=w..., d=3rdparty.com, k=vr,v=1440x900>, <s=w..., d=3rdparty.com, k=ua,v=3FeFF2301E>, <s=w..., d=3rdparty.com, k=vp,v=1322x781>, <s=w..., d=3rdparty.com, k=c7,v=e9d4a7e4d2185cec>, ] All tuples are UNSAFE by default unless we can determine that the given data-element is not a good UID, hence safe.
  • 33. Determining Safeness T = [ <s=w..., d=3rdparty.com, k=z, v=1501498154>, <s=w..., d=3rdparty.com, k=fl,v=21.0>, <s=w..., d=3rdparty.com, k=u, v=CCAAAABI>, <s=w..., d=3rdparty.com, k=vr,v=1440x900>, <s=w..., d=3rdparty.com, k=ua,v=3FeFF2301E>, <s=w..., d=3rdparty.com, k=vp,v=1322x781>, <s=w..., d=3rdparty.com, k=c7,v=e9d4a7e4d2185cec>, ] The value 1501498154 has never been seen before for <d, k>. Thus, cannot be used as UID => SAFE
  • 34. Determining Safeness T = [ <s=w..., d=3rdparty.com, k=z, v=1501498154>, <s=w..., d=3rdparty.com, k=fl,v=21.0>, <s=w..., d=3rdparty.com, k=u, v=CCAAAABI>, <s=w..., d=3rdparty.com, k=vr,v=1440x900>, <s=w..., d=3rdparty.com, k=ua,v=3FeFF2301E>, <s=w..., d=3rdparty.com, k=vp,v=1322x781>, <s=w..., d=3rdparty.com, k=c7,v=e9d4a7e4d2185cec>, ] The value 21.0 is to short to encode any UID => SAFE
  • 35. Determining Safeness T = [ <s=w..., d=3rdparty.com, k=z, v=1501498154>, <s=w..., d=3rdparty.com, k=fl,v=21.0>, <s=w..., d=3rdparty.com, k=u, v=CCAAAABI>, <s=w..., d=3rdparty.com, k=vr,v=1440x900>, <s=w..., d=3rdparty.com, k=ua,v=3FeFF2301E>, <s=w..., d=3rdparty.com, k=vp,v=1322x781>, <s=w..., d=3rdparty.com, k=c7,v=e9d4a7e4d2185cec>, ] More than 3 different values in less than 2 days by the same tuple <d,k>. Not persistent, bad UID => SAFE
  • 36. Determining Safeness T = [ <s=w..., d=3rdparty.com, k=z, v=1501498154>, <s=w..., d=3rdparty.com, k=fl,v=21.0>, <s=w..., d=3rdparty.com, k=u, v=CCAAAABI>, <s=w..., d=3rdparty.com, k=vr,v=1440x900>, <s=w..., d=3rdparty.com, k=ua,v=3FeFF2301E>, <s=w..., d=3rdparty.com, k=vp,v=1322x781>, <s=w..., d=3rdparty.com, k=c7,v=e9d4a7e4d2185cec>, ] Always the same value for <d,k>. We cannot rule out that the data- elements are UID => keep as UNSAFE Only using local information is not enough; vr=1440x1024 is not a UID… We need something extra.
  • 37. Determining Safeness T = [ <s=w..., d=3rdparty.com, k=z, v=1501498154>, <s=w..., d=3rdparty.com, k=fl,v=21.0>, <s=w..., d=3rdparty.com, k=u, v=CCAAAABI>, <s=w..., d=3rdparty.com, k=vr,v=1440x900>, <s=w..., d=3rdparty.com, k=ua,v=3FeFF2301E>, <s=w..., d=3rdparty.com, k=vp,v=1322x781>, <s=w..., d=3rdparty.com, k=c7,v=e9d4a7e4d2185cec>, ] Locally UNSAFE, i.e. always the same value for <d, k>. Globally SAFE since more than 20 other users have observed the same value 1440x900 for tuple <d,k> = <3rdparty.com,vr> in the last 2 days.
  • 38. Determining Safeness T = [ <s=w..., d=3rdparty.com, k=z, v=1501498154>, <s=w..., d=3rdparty.com, k=fl,v=21.0>, <s=w..., d=3rdparty.com, k=u, v=CCAAAABI>, <s=w..., d=3rdparty.com, k=vr,v=1440x900>, <s=w..., d=3rdparty.com, k=ua,v=3FeFF2301E>, <s=w..., d=3rdparty.com, k=vp,v=1322x781>, <s=w..., d=3rdparty.com, k=c7,v=e9d4a7e4d2185cec>, ] Locally UNSAFE, i.e. always the same value for tuple <d,k>. Globally SAFE since it has reach the safeness-quorum based on k-Anonymity.
  • 39. Determining Safeness T = [ <s=w..., d=3rdparty.com, k=z, v=1501498154>, <s=w..., d=3rdparty.com, k=fl,v=21.0>, <s=w..., d=3rdparty.com, k=u, v=CCAAAABI>, <s=w..., d=3rdparty.com, k=vr,v=1440x900>, <s=w..., d=3rdparty.com, k=ua,v=3FeFF2301E>, <s=w..., d=3rdparty.com, k=vp,v=1322x781>, <s=w..., d=3rdparty.com, k=c7,v=e9d4a7e4d2185cec>, ] Locally UNSAFE, i.e. always the same value for <d, k>. Globally UNSAFE not enough people has seen the value for <d, k>, always same <d, k, u>. Not safe to send. Two options: a)  it is a UID, or an element that could be used as such. b)  a false positive due to the Transient State (0.07%)
  • 40. Determining Safeness T = [ <s=w..., d=3rdparty.com, k=z, v=1501498154>, <s=w..., d=3rdparty.com, k=fl,v=21.0>, <s=w..., d=3rdparty.com, k=u, v=CCAAAABI>, <s=w..., d=3rdparty.com, k=vr,v=1440x900>, <s=w..., d=3rdparty.com, k=ua,v=3FeFF2301E>, <s=w..., d=3rdparty.com, k=vp,v=1322x781>, <s=w..., d=3rdparty.com, k=c7,v=e9d4a7e4d2185cec>, ] Locally UNSAFE and Globally UNSAFE At this point the request analysis is complete: 1)  ALLOW Request removing unsafe data-elements 2)  ALLOW Request obfuscating unsafe data-elements 3)  BLOCK Request or ALLOW Request without alteration
  • 41. Safeness Quorum without Tracking To determine that a data-element is globally safe we need to count the number of unique users that have observed a tuple <d,k,v> e.g. <d=3rdparty.com, k=c7,v=e9d4a7e4d2185cec> Users could share tuples with a field that identifies them (u), <u=usrXXX, d=3rdparty.com, k=c7,v=e9d4a7e4d2185cec> with CLIQZ. But that would make CLIQZ a tracker! Instead, each user sends the tuple – if observed – once and only once per hour: <d=3rdparty.com, k=c7,v=e9d4a7e4d2185cec> Actual values are not needed; counting and membership test on GWL <d=ed5c0cf7b05572eb, k=4d3a21d8c684c09c19b93be911827fd5, v=e60f936dc719ca649a80a97490a09940>
  • 42. Evaluation: Protection Coverage Requests Blocked False positives ratio (requests blocked without unsafe data- elements) Protection Misses (requests allowed with unsafe data- elements) CLIQZ 51.7% -- -- Disconnect 66.1% 38.8% 12.3% Kontaxis & Chew (Firefox Tracking Protection) [est.] 36.6% 29.4% 25.4%
  • 43. Evaluation: Site Breakage Reload Rate % Increase over baseline % Increase over CLIQZ BASELINE (without tracking protection) 0.00101 -- -- CLIQZ 0.00104 4% -- Adblock Plus (counting exceptions added by users) 0.00110 10% 150% CLIQZ as Blocklist 0.00125 25% 525%
  • 44. Conclusions Tracking is a BIG problem –  Privacy is seriously at risk Tracking Protection is not an easy task –  Trade-off between site breakage and protection coverage Blocklist-based approaches have limitations –  Maintainability –  Coarse-grained resolution –  Too many false positives CLIQZ tracking protection addresses them to a large extent
  • 45. Future Work CLIQZ tracking protection might be better than the state-of-the-art. But it is far from perfect, •  still produces site- breakages •  protection coverage is not 100% •  it can be attacked in multiple ways [Picture from http://mtthwhgn.com/tag/flooding/] we provide a bigger hammer for the whack-a-tracker
  • 46. Thanks a lot! Q&A Zhonghao Yu Sam Macbeth Konark Modi
  • 48. Implementation Details Realtime Component 1) Parsing request 2) Local safeness: membership test on LWL 3) Global safeness: membership test on GWL LWL and GWL are Bloom Filters, combined less than < 512KB, FP ratio of 0.1%. Takes about 1-12 ms. Offline Component Data from users needs to be sent to CLIQZ to build GWL for the safeness quorum. GWL needs to be sent back to the users’ browsers. We use an eventual consistency model with incremental updates over daily snapshots. Bandwidth costs per user per day: 90KB upload, 566KB download. For a worse-case propagation lag of 10 minutes. False positive unsafe data elements due to transient state is 0.07%
  • 49. Determining Safeness T = [ <s=w..., d=3rdparty.com, k=z, v=1501498154>, <s=w..., d=3rdparty.com, k=fl,v=21.0>, <s=w..., d=3rdparty.com, k=u, v=CCAAAABI>, <s=w..., d=3rdparty.com, k=vr,v=1440x1024>, <s=w..., d=3rdparty.com, k=ua,v=3FeFF2301E>, <s=w..., d=3rdparty.com, k=vp,v=1322x981>, <s=w..., d=3rdparty.com, k=c7,v=e9d4a7e4d2185cec>, ] Cookies from potential trackers are always blocked. POST requests are also analyzed, blocked only if: •  match Cookie values •  match QS values declared unsafe •  match values from browser- fingerprinting User initiated actions are always ALLOWED (even if tracking)