SlideShare a Scribd company logo
1 of 132
Download to read offline
Tracing Information Flows
Between Ad Exchanges
Using Retargeted Ads
Muhammad Ahmad Bashir, Sajjad Arshad, William Robertson, Christo Wilson
Northeastern University
Your Privacy Footprint
2
Your Privacy Footprint
2
Your Privacy Footprint
2
Your Privacy Footprint
2
Your Privacy Footprint
2
Your Privacy Footprint
2
Real Time Bidding
3
Real Time Bidding
• RTB brings more flexibility in the ad ecosystem.
• Ad request managed by an Ad Exchange which holds an auction.
• Advertisers bid on each ad impression.
3
Real Time Bidding
• RTB brings more flexibility in the ad ecosystem.
• Ad request managed by an Ad Exchange which holds an auction.
• Advertisers bid on each ad impression.
3
Exchange Advertiser
Cookie matching is a
prerequisite.
Real Time Bidding
• RTB brings more flexibility in the ad ecosystem.
• Ad request managed by an Ad Exchange which holds an auction.
• Advertisers bid on each ad impression.
• RTB spending to cross $20B by 2017[1].
• 49% annual growth.
• Will account for 80% of US Display Ad spending by 2022.
3
[1] http://www.prnewswire.com/news-releases/new-idc-study-shows-real-time-bidding-rtb-display-ad-
spend-to-grow-worldwide-to-208-billion-by-2017-228061051.html
Exchange Advertiser
Cookie matching is a
prerequisite.
4
User Publisher Ad Exchange Advertisers
4
GET, CNN’s Cookie
User Publisher Ad Exchange Advertisers
4
GET, CNN’s Cookie
GET, DoubleClick’s Cookie
User Publisher Ad Exchange Advertisers
4
GET, CNN’s Cookie
GET, DoubleClick’s Cookie
User Publisher Ad Exchange Advertisers
Solicit bids, DoubleClick’s Cookie
Bid
Real Time Bidding (RTB)
4
GET, CNN’s Cookie
GET, DoubleClick’s Cookie
User Publisher Ad Exchange Advertisers
Solicit bids, DoubleClick’s Cookie
Bid
Real Time Bidding (RTB)
4
GET, CNN’s Cookie
GET, DoubleClick’s Cookie
User Publisher Ad Exchange Advertisers
Solicit bids, DoubleClick’s Cookie
GET, RightMedia’s Cookie
Advertisement
Bid
Real Time Bidding (RTB)
4
GET, CNN’s Cookie
GET, DoubleClick’s Cookie
User Publisher Ad Exchange Advertisers
Solicit bids, DoubleClick’s Cookie
GET, RightMedia’s Cookie
Advertisement
Bid
Advertisers cannot read
their cookie!
Cookie Matching
Key problem: Advertisers cannot read their cookies in the RTB auction
• How can they submit reasonable bids if they cannot identify the user?
Solution: cookie matching
• Also known as cookie synching
• Process of linking the identifiers used by two ad exchanges
5
Cookie Matching
Key problem: Advertisers cannot read their cookies in the RTB auction
• How can they submit reasonable bids if they cannot identify the user?
Solution: cookie matching
• Also known as cookie synching
• Process of linking the identifiers used by two ad exchanges
5
GET, Cookie=12345
301 Redirect, Location=http://criteo.com/?dblclk_id=12345
Cookie Matching
Key problem: Advertisers cannot read their cookies in the RTB auction
• How can they submit reasonable bids if they cannot identify the user?
Solution: cookie matching
• Also known as cookie synching
• Process of linking the identifiers used by two ad exchanges
5
GET, Cookie=12345
301 Redirect, Location=http://criteo.com/?dblclk_id=12345
Cookie Matching
Key problem: Advertisers cannot read their cookies in the RTB auction
• How can they submit reasonable bids if they cannot identify the user?
Solution: cookie matching
• Also known as cookie synching
• Process of linking the identifiers used by two ad exchanges
5
GET, Cookie=12345
GET ?dblclk_id=12345, Cookie=ABCDE
301 Redirect, Location=http://criteo.com/?dblclk_id=12345
Cookie Matching
Key problem: Advertisers cannot read their cookies in the RTB auction
• How can they submit reasonable bids if they cannot identify the user?
Solution: cookie matching
• Also known as cookie synching
• Process of linking the identifiers used by two ad exchanges
5
GET, Cookie=12345
GET ?dblclk_id=12345, Cookie=ABCDE
301 Redirect, Location=http://criteo.com/?dblclk_id=12345
Cookie Matching
Key problem: Advertisers cannot read their cookies in the RTB auction
• How can they submit reasonable bids if they cannot identify the user?
Solution: cookie matching
• Also known as cookie synching
• Process of linking the identifiers used by two ad exchanges
5
GET, Cookie=12345
GET ?dblclk_id=12345, Cookie=ABCDE
301 Redirect, Location=http://criteo.com/?dblclk_id=12345
Prior Work
• Several studies have examined cookie matching
• Acar et al. found hundreds of domains passing identifiers to each other
• Olejnik et al. found 125 exchanges matching cookies
• Falahrastegar et al. analyzed clusters of exchanges that share the exact same
cookies
6
Prior Work
• Several studies have examined cookie matching
• Acar et al. found hundreds of domains passing identifiers to each other
• Olejnik et al. found 125 exchanges matching cookies
• Falahrastegar et al. analyzed clusters of exchanges that share the exact same
cookies
• These studies rely on studying HTTP requests/responses.
6
Challenge 1: Server Side Matching
7
Challenge 1: Server Side Matching
7
1)
Criteo observes the user.
(IP: 207.91.160.7)
Challenge 1: Server Side Matching
7
1)
2)
Criteo observes the user.
(IP: 207.91.160.7)
RightMedia observes the user.
(IP: 207.91.160.7)
Challenge 1: Server Side Matching
7
1)
2)
Criteo observes the user.
(IP: 207.91.160.7)
RightMedia observes the user.
(IP: 207.91.160.7)
Behind the scene, RightMedia and Criteo sync up.
(IP: 207.91.160.7)
Challenge 2: Obfuscation
8
Challenge 2: Obfuscation
8
Challenge 2: Obfuscation
8
amazon.com
Challenge 2: Obfuscation
8
amazon.com
dbclk.js
Challenge 2: Obfuscation
8
GET %^$ck#&93#&, Cookie=XYZYX
amazon.com
dbclk.js
Challenge 2: Obfuscation
8
GET %^$ck#&93#&, Cookie=XYZYX
amazon.com
dbclk.js
Challenge 2: Obfuscation
8
GET %^$ck#&93#&, Cookie=XYZYX
amazon.com
dbclk.js
Challenge 2: Obfuscation
8
GET %^$ck#&93#&, Cookie=XYZYX
amazon.com
dbclk.js
Challenge 2: Obfuscation
8
GET %^$ck#&93#&, Cookie=XYZYX
amazon.com
dbclk.js
Goal
Develop a method to identify information flows (cookie matching)
between ad exchanges
• Mechanism agnostic: resilient to obfuscation
• Platform agnostic: detect sharing on the client- and server-side
9
Goal
Develop a method to identify information flows (cookie matching)
between ad exchanges
• Mechanism agnostic: resilient to obfuscation
• Platform agnostic: detect sharing on the client- and server-side
9
?
Key Insight: Use Retargeted Ads
Retargeted ads are the most highly targeted form of online ads
10
$15.99
Key Insight: Use Retargeted Ads
Retargeted ads are the most highly targeted form of online ads
10
$15.99
Key Insight: Use Retargeted Ads
Retargeted ads are the most highly targeted form of online ads
10
Key insight: because retargets are so specific, they can be used to
conduct controlled experiments
• Information must be shared between ad exchanges to serve retargeted ads
$15.99
Contributions
1. Novel methodology for identifying information flows between ad
exchanges
2. Demonstrate the impact of ad network obfuscation in practice
• 31% of cookie matching partners cannot be identified using heuristics
3. Develop a method to categorize information sharing relationships
4. Use graph analysis to infer the roles of actors in the ad ecosystem
11
Contributions
1. Novel methodology for identifying information flows between ad
exchanges
2. Demonstrate the impact of ad network obfuscation in practice
• 31% of cookie matching partners cannot be identified using heuristics
3. Develop a method to categorize information sharing relationships
4. Use graph analysis to infer the roles of actors in the ad ecosystem
11
Data Collection
Classifying Ad Network Flows
Results
12
Using Retargets as an Experimental Tool
13
Key observation: retargets are only served under very specific circumstances
1)
Using Retargets as an Experimental Tool
13
Key observation: retargets are only served under very specific circumstances
1)
Advertiser observes the user at a shop
Using Retargets as an Experimental Tool
13
Key observation: retargets are only served under very specific circumstances
1)
2)
Advertiser observes the user at a shop
Using Retargets as an Experimental Tool
13
Key observation: retargets are only served under very specific circumstances
1)
2)
Advertiser observes the user at a shop
Advertiser and the exchange must have
matched cookies
Using Retargets as an Experimental Tool
This implies a causal flow of information from Exchange  Advertiser
13
Key observation: retargets are only served under very specific circumstances
1)
2)
Advertiser observes the user at a shop
Advertiser and the exchange must have
matched cookies
Data Collection Overview
14
Data Collection Overview
14
Single Persona
10 websites/persona
10 products/website
Visit Persona
Data Collection Overview
14
150 Publishers
15 pages/publisher
Single Persona
10 websites/persona
10 products/website
Visit Persona Visit Publishers
Data Collection Overview
14
150 Publishers
15 pages/publisher
Single Persona
10 websites/persona
10 products/website
Visit Persona Visit Publishers
Store Images,
Inclusion Chains,
HTTP requests/
responses
571,636
Images
Data Collection Overview
14
150 Publishers
15 pages/publisher
Single Persona
10 websites/persona
10 products/website
Visit Persona Visit Publishers
Store Images,
Inclusion Chains,
HTTP requests/
responses
571,636
Images
Data Collection Overview
14
150 Publishers
15 pages/publisher
Single Persona
10 websites/persona
10 products/website
Visit Persona Visit Publishers
Store Images,
Inclusion Chains,
HTTP requests/
responses
90Personas
571,636
Images
{
Data Collection Overview
14
150 Publishers
15 pages/publisher
Single Persona
10 websites/persona
10 products/website
Visit Persona Visit Publishers
Store Images,
Inclusion Chains,
HTTP requests/
responses
Potential Targeted
Ads
31,850
Ad Detection
Filter Images
which appeared
in > 1 persona
90Personas
571,636
Images
{
Data Collection Overview
14
150 Publishers
15 pages/publisher
Single Persona
10 websites/persona
10 products/website
Visit Persona Visit Publishers
Store Images,
Inclusion Chains,
HTTP requests/
responses
Potential Targeted
Ads
31,850
Ad DetectionIsolated
Retargeted Ads
Filter Images
which appeared
in > 1 persona
90Personas
571,636
Images
Crowd Sourcing
{
Crowd Sourcing
15
We used Amazon Mechanical Turk (AMT) to label 31,850 ads.
Crowd Sourcing
15
We used Amazon Mechanical Turk (AMT) to label 31,850 ads.
• Total 1,142 Tasks.
• 30 ads / Task.
• 27 unlabeled.
• 3 labeled by us.
• 2 workers per ad.
• $415 spent.
Crowd Sourcing
15
We used Amazon Mechanical Turk (AMT) to label 31,850 ads.
• Total 1,142 Tasks.
• 30 ads / Task.
• 27 unlabeled.
• 3 labeled by us.
• 2 workers per ad.
• $415 spent.
Crowd Sourcing
15
We used Amazon Mechanical Turk (AMT) to label 31,850 ads.
• Total 1,142 Tasks.
• 30 ads / Task.
• 27 unlabeled.
• 3 labeled by us.
• 2 workers per ad.
• $415 spent.
Crowd Sourcing
15
We used Amazon Mechanical Turk (AMT) to label 31,850 ads.
• Total 1,142 Tasks.
• 30 ads / Task.
• 27 unlabeled.
• 3 labeled by us.
• 2 workers per ad.
• $415 spent.
Final Dataset
5,102 unique retargeted ads
• From 281 distinct online retailers
35,448 publisher-side chains that served the retargets
• We observed some retargets multiple times
16
Data Collection
Classifying Ad Network Flows
Results
17
A look at Publisher Chains
18
A look at Publisher Chains
18
Example
Publisher-side chain
A look at Publisher Chains
18
Example
Shopper-side chain Publisher-side chain
A look at Publisher Chains
18
Example
Shopper-side chain Publisher-side chain
• How does Criteo know to serve ad on BBC?
A look at Publisher Chains
18
Example
Shopper-side chain Publisher-side chain
• How does Criteo know to serve ad on BBC?
• In this case it is pretty trivial.
• Criteo observed us on the shopper.
A look at Publisher Chains
18
Example
Shopper-side chain Publisher-side chain
• How does Criteo know to serve ad on BBC?
• In this case it is pretty trivial.
• Criteo observed us on the shopper.
• Can we classify all such publisher-side chains?
What is a Chain?
19
What is a Chain?
19
What is a Chain?
19
What is a Chain?
19
What is a Chain?
19
What is a Chain?
19
What is a Chain?
19
What is a Chain?
19
What is a Chain?
19
What is a Chain?
19
What is a Chain?
19
What is a Chain?
19
a
a
e
e
What is a Chain?
19
^pub .* e a$
a
a
e
e
Four Classifications
Four possible ways for a retargeted ad to be served
1. Direct (Trivial) Matching
2. Cookie Matching
3. Indirect Matching
4. Latent (Server-side) Matching
20
Four Classifications
Four possible ways for a retargeted ad to be served
1. Direct (Trivial) Matching
2. Cookie Matching
3. Indirect Matching
4. Latent (Server-side) Matching
20
1) Direct (Trivial) Matching
21
Shopper-side Publisher-side
ExampleRule
1) Direct (Trivial) Matching
21
Shopper-side Publisher-side
ExampleRule
^shop .* a .*$ ^pub a$
1) Direct (Trivial) Matching
21
Shopper-side Publisher-side
ExampleRule
^shop .* a .*$ ^pub a$
a is the
advertiser that
serves the
retarget
1) Direct (Trivial) Matching
21
Shopper-side Publisher-side
ExampleRule
^shop .* a .*$ ^pub a$
a is the
advertiser that
serves the
retarget
a must appear
on the shopper-
side…
… but other
trackers may
also appear
2) Cookie Matching
22
Shopper-side Publisher-side
ExampleRule
2) Cookie Matching
22
Shopper-side Publisher-side
ExampleRule
^shop .* a .*$ ^pub .* e a$
2) Cookie Matching
22
Shopper-side Publisher-side
ExampleRule
^shop .* a .*$ ^pub .* e a$
e precedes a,
which implies an
RTB auction
2) Cookie Matching
22
Shopper-side Publisher-side
ExampleRule
^shop .* a .*$
a must appear
on the
shopper-side
^pub .* e a$
e precedes a,
which implies an
RTB auction
2) Cookie Matching
22
Shopper-side Publisher-side
ExampleRule
^shop .* a .*$
a must appear
on the
shopper-side
^pub .* e a$^* .* e a .*$
Anywhere
e precedes a,
which implies an
RTB auction
Transition ea is where
cookie match occurs
3) Latent (Server-side) Matching
23
Shopper-side Publisher-side
ExampleRule
3) Latent (Server-side) Matching
23
Shopper-side Publisher-side
ExampleRule
^shop [^ea]$ ^pub .* e a$
3) Latent (Server-side) Matching
23
Shopper-side Publisher-side
ExampleRule
^shop [^ea]$
Neither e nor a
appears on the
shopper-side
^pub .* e a$
3) Latent (Server-side) Matching
23
Shopper-side Publisher-side
ExampleRule
^shop [^ea]$
Neither e nor a
appears on the
shopper-side
^pub .* e a$
a must receive information from
some shopper-side tracker
3) Latent (Server-side) Matching
23
Shopper-side Publisher-side
ExampleRule
^shop [^ea]$
Neither e nor a
appears on the
shopper-side
^pub .* e a$
a must receive information from
some shopper-side tracker
We find latent matches in practice!
Data Collection
Classifying Ad Network Flows
Results
24
Categorizing Chains
Type Chains % Chains %
Direct (Trivial) Match 1770 5 8449 24
Cookie Match 25049 71 25873 73
Latent (Server-side) Match 5362 15 343 1
No Match 775 2 183 1
25
ClusteredRaw Chains
Categorizing Chains
Type Chains % Chains %
Direct (Trivial) Match 1770 5 8449 24
Cookie Match 25049 71 25873 73
Latent (Server-side) Match 5362 15 343 1
No Match 775 2 183 1
25
Clustered
Take away:
Raw Chains
Categorizing Chains
Type Chains % Chains %
Direct (Trivial) Match 1770 5 8449 24
Cookie Match 25049 71 25873 73
Latent (Server-side) Match 5362 15 343 1
No Match 775 2 183 1
25
Clustered
Take away:
1- As expected, most retargets are due to cookie matching
Raw Chains
Categorizing Chains
Type Chains % Chains %
Direct (Trivial) Match 1770 5 8449 24
Cookie Match 25049 71 25873 73
Latent (Server-side) Match 5362 15 343 1
No Match 775 2 183 1
25
Clustered
Take away:
1- As expected, most retargets are due to cookie matching
2- Very small number of chains that cannot be categorized
• Suggests low false positive rate of AMT image labeling task
Raw Chains
Categorizing Chains
Type Chains % Chains %
Direct (Trivial) Match 1770 5 8449 24
Cookie Match 25049 71 25873 73
Latent (Server-side) Match 5362 15 343 1
No Match 775 2 183 1
25
Clustered
Take away:
1- As expected, most retargets are due to cookie matching
2- Very small number of chains that cannot be categorized
• Suggests low false positive rate of AMT image labeling task
3- Surprisingly large amount latent matches…
Raw Chains
Categorizing Chains
Type Chains % Chains %
Direct (Trivial) Match 1770 5 8449 24
Cookie Match 25049 71 25873 73
Latent (Server-side) Match 5362 15 343 1
No Match 775 2 183 1
26
Raw Chains
Clustered
Chains
Cluster together domains by “owner”
• E.g. google.com, doubleclick.com, googlesyndication.com
Categorizing Chains
Type Chains % Chains %
Direct (Trivial) Match 1770 5 8449 24
Cookie Match 25049 71 25873 73
Latent (Server-side) Match 5362 15 343 1
No Match 775 2 183 1
26
Raw Chains
Clustered
Chains
Cluster together domains by “owner”
• E.g. google.com, doubleclick.com, googlesyndication.com
Latent matches essentially disappear
• The vast majority of these chains involve Google
• Suggests that Google shares tracking data across their services
Who is Cookie Matching?
Participant 1 Participant 2 Chains Ads Heuristics
criteo  googlesyndication 9090 1887  P
criteo  doubleclick 3610 1144  E, P  DC, P
criteo  adnxs 3263 1066  E, P
criteo  rubiconproject 1586 749  E, P
criteo  servedbyopenx 707 460  P
doubleclick  steelhousemedia 362 27  P  E, P
mathtag  mediaforge 360 124  E, P
netmng  scene7 267 119  E  ?
googlesyndication  adsrvr 107 29  P
rubiconproject  steelhousemedia 86 30  E
googlesyndication  steelhousemedia 47 22 ?
adtechus  adacado 36 18 ?
atwola  adacado 32 6 ?
adroll  adnxs 31 8 ? 27
Heuristics Key
(used by prior work)
E – share exact
cookies
P – special URL
parameters
DC – DoubleClick
URL parameters
? – Unknown
sharing method
Who is Cookie Matching?
Participant 1 Participant 2 Chains Ads Heuristics
criteo  googlesyndication 9090 1887  P
criteo  doubleclick 3610 1144  E, P  DC, P
criteo  adnxs 3263 1066  E, P
criteo  rubiconproject 1586 749  E, P
criteo  servedbyopenx 707 460  P
doubleclick  steelhousemedia 362 27  P  E, P
mathtag  mediaforge 360 124  E, P
netmng  scene7 267 119  E  ?
googlesyndication  adsrvr 107 29  P
rubiconproject  steelhousemedia 86 30  E
googlesyndication  steelhousemedia 47 22 ?
adtechus  adacado 36 18 ?
atwola  adacado 32 6 ?
adroll  adnxs 31 8 ? 27
Heuristics Key
(used by prior work)
E – share exact
cookies
P – special URL
parameters
DC – DoubleClick
URL parameters
? – Unknown
sharing method
Who is Cookie Matching?
Participant 1 Participant 2 Chains Ads Heuristics
criteo  googlesyndication 9090 1887  P
criteo  doubleclick 3610 1144  E, P  DC, P
criteo  adnxs 3263 1066  E, P
criteo  rubiconproject 1586 749  E, P
criteo  servedbyopenx 707 460  P
doubleclick  steelhousemedia 362 27  P  E, P
mathtag  mediaforge 360 124  E, P
netmng  scene7 267 119  E  ?
googlesyndication  adsrvr 107 29  P
rubiconproject  steelhousemedia 86 30  E
googlesyndication  steelhousemedia 47 22 ?
adtechus  adacado 36 18 ?
atwola  adacado 32 6 ?
adroll  adnxs 31 8 ? 27
Heuristics Key
(used by prior work)
E – share exact
cookies
P – special URL
parameters
DC – DoubleClick
URL parameters
? – Unknown
sharing method
Who is Cookie Matching?
Participant 1 Participant 2 Chains Ads Heuristics
criteo  googlesyndication 9090 1887  P
criteo  doubleclick 3610 1144  E, P  DC, P
criteo  adnxs 3263 1066  E, P
criteo  rubiconproject 1586 749  E, P
criteo  servedbyopenx 707 460  P
doubleclick  steelhousemedia 362 27  P  E, P
mathtag  mediaforge 360 124  E, P
netmng  scene7 267 119  E  ?
googlesyndication  adsrvr 107 29  P
rubiconproject  steelhousemedia 86 30  E
googlesyndication  steelhousemedia 47 22 ?
adtechus  adacado 36 18 ?
atwola  adacado 32 6 ?
adroll  adnxs 31 8 ? 27
Heuristics Key
(used by prior work)
E – share exact
cookies
P – special URL
parameters
DC – DoubleClick
URL parameters
? – Unknown
sharing method
Who is Cookie Matching?
Participant 1 Participant 2 Chains Ads Heuristics
criteo  googlesyndication 9090 1887  P
criteo  doubleclick 3610 1144  E, P  DC, P
criteo  adnxs 3263 1066  E, P
criteo  rubiconproject 1586 749  E, P
criteo  servedbyopenx 707 460  P
doubleclick  steelhousemedia 362 27  P  E, P
mathtag  mediaforge 360 124  E, P
netmng  scene7 267 119  E  ?
googlesyndication  adsrvr 107 29  P
rubiconproject  steelhousemedia 86 30  E
googlesyndication  steelhousemedia 47 22 ?
adtechus  adacado 36 18 ?
atwola  adacado 32 6 ?
adroll  adnxs 31 8 ? 27
Heuristics Key
(used by prior work)
E – share exact
cookies
P – special URL
parameters
DC – DoubleClick
URL parameters
? – Unknown
sharing method
Who is Cookie Matching?
Participant 1 Participant 2 Chains Ads Heuristics
criteo  googlesyndication 9090 1887  P
criteo  doubleclick 3610 1144  E, P  DC, P
criteo  adnxs 3263 1066  E, P
criteo  rubiconproject 1586 749  E, P
criteo  servedbyopenx 707 460  P
doubleclick  steelhousemedia 362 27  P  E, P
mathtag  mediaforge 360 124  E, P
netmng  scene7 267 119  E  ?
googlesyndication  adsrvr 107 29  P
rubiconproject  steelhousemedia 86 30  E
googlesyndication  steelhousemedia 47 22 ?
adtechus  adacado 36 18 ?
atwola  adacado 32 6 ?
adroll  adnxs 31 8 ? 27
Heuristics Key
(used by prior work)
E – share exact
cookies
P – special URL
parameters
DC – DoubleClick
URL parameters
? – Unknown
sharing method
Who is Cookie Matching?
Participant 1 Participant 2 Chains Ads Heuristics
criteo  googlesyndication 9090 1887  P
criteo  doubleclick 3610 1144  E, P  DC, P
criteo  adnxs 3263 1066  E, P
criteo  rubiconproject 1586 749  E, P
criteo  servedbyopenx 707 460  P
doubleclick  steelhousemedia 362 27  P  E, P
mathtag  mediaforge 360 124  E, P
netmng  scene7 267 119  E  ?
googlesyndication  adsrvr 107 29  P
rubiconproject  steelhousemedia 86 30  E
googlesyndication  steelhousemedia 47 22 ?
adtechus  adacado 36 18 ?
atwola  adacado 32 6 ?
adroll  adnxs 31 8 ? 27
Heuristics Key
(used by prior work)
E – share exact
cookies
P – special URL
parameters
DC – DoubleClick
URL parameters
? – Unknown
sharing method
31% of cookie
matching partners
would be missed.
Summary
We develop a novel methodology to detect information flows between
ad exchanges
• Controlled methodology enables causal inference
• Defeats obfuscation attempts
• Detects client- and server-side flows
Dataset gives a better picture of ad ecosystem
• Reveals which ad exchanges are linking information about users
• Allows us to reason about how information is being transferred
28
Questions?
Muhammad Ahmad Bashir
ahmad@ccs.neu.edu
29
Inclusion Chains
• Instrumented Chromium binary that records the provenance of page
elements
• Uses Information Flow Analysis techniques (IFA)
• Handles Flash, exec(), setTimeout(), cross-frame, inline scripts, etc.
30
Inclusion Chains
• Instrumented Chromium binary that records the provenance of page
elements
• Uses Information Flow Analysis techniques (IFA)
• Handles Flash, exec(), setTimeout(), cross-frame, inline scripts, etc.
30
<html>
<body>
<script src=“b.com/adlib.js”></script>
<iframe src=“c.net/adbox.html”>
<html>
<script src=“code.js”></script>
<object data=“d.org/flash.swf”>
</object>
</html>
</iframe>
</body>
</html>
DOM: a.com/index.html
Inclusion Chains
• Instrumented Chromium binary that records the provenance of page
elements
• Uses Information Flow Analysis techniques (IFA)
• Handles Flash, exec(), setTimeout(), cross-frame, inline scripts, etc.
30
<html>
<body>
<script src=“b.com/adlib.js”></script>
<iframe src=“c.net/adbox.html”>
<html>
<script src=“code.js”></script>
<object data=“d.org/flash.swf”>
</object>
</html>
</iframe>
</body>
</html>
DOM: a.com/index.html Inclusion Chain
a.com/index.html
b.com/adlib.js
c.net/adbox.html
c.net/code.js
d.org/flash.swf
3) Indirect Matching
31
Shopper-side Publisher-side
ExampleRule
[^a]
3) Indirect Matching
31
Shopper-side Publisher-side
ExampleRule
^shop e [^a]$ ^pub .* e a$
[^a]
3) Indirect Matching
31
Shopper-side Publisher-side
ExampleRule
^shop e [^a]$
Only the exchange
e appears on the
shopper-side…
^pub .* e a$
[^a]
3) Indirect Matching
31
Shopper-side Publisher-side
ExampleRule
^shop e [^a]$
Only the exchange
e appears on the
shopper-side…
^pub .* e a$
e must pass browsing history data to
participants in the auction, thus no
cookie matching is necessary
[^a]
3) Indirect Matching
31
Shopper-side Publisher-side
ExampleRule
^shop e [^a]$
Only the exchange
e appears on the
shopper-side…
^pub .* e a$
e must pass browsing history data to
participants in the auction, thus no
cookie matching is necessary
We do not expect to find indirect matches in the data.
References
Gunes Acar, Christian Eubank, Steven Englehardt, Marc Juarez, Arvind
Narayanan, Claudia Diaz. “The web never forgets: Persistent tracking
mechanisms in the wild.” CCS, 2014.
Muhammad Ahmad Bashir, Sajjad Arshad, William Robertson, Christo
Wilson. “Tracing Information Flows Between Ad Exchanges Using
Retargeted Ads.” Usenix Security, 2016.
Marjan Falahrastegar, Hamed Haddadi, Steve Uhlig, Richard Mortier.
“Tracking personal identifiers across the web.” PAM, 2016.
Lukasz Olejnik, Tran Minh-Dung, Claude Castelluccia. “Selling off privacy
at auction.” NDSS, 2014.
32
Filtering Images
Filter Total Unique Images
All images from the crawlers 571,636
Use EasyList to identify advertisements 93,726
Remove ads that are shown to >1 persona 31,850
Use crowdsourcing to locate retargets 5,102
33
Filtering Images
Filter Total Unique Images
All images from the crawlers 571,636
Use EasyList to identify advertisements 93,726
Remove ads that are shown to >1 persona 31,850
Use crowdsourcing to locate retargets 5,102
33
Filtering Images
Filter Total Unique Images
All images from the crawlers 571,636
Use EasyList to identify advertisements 93,726
Remove ads that are shown to >1 persona 31,850
Use crowdsourcing to locate retargets 5,102
33
• Personas visited non-overlapping retailers
• By definition, retargets should only be shown to a single persona
Filtering Images
Filter Total Unique Images
All images from the crawlers 571,636
Use EasyList to identify advertisements 93,726
Remove ads that are shown to >1 persona 31,850
Use crowdsourcing to locate retargets 5,102
33
• Personas visited non-overlapping retailers
• By definition, retargets should only be shown to a single persona
• Spent $415 uploading 1,142 HITs to Amazon Mechanical Turk
• Each HIT asked the worker to label 30 ad images
• 27 were unlabeled, 3 were known retargets (control images)
• All ads were labeled by 2 workers
• Any ad identified as targeted was also manually inspected by us

More Related Content

Similar to Tracing Information Flows Between Ad Exchanges Using Retargeted Ads

Header Bidding Overview (Thomvest Research)
Header Bidding Overview (Thomvest Research)Header Bidding Overview (Thomvest Research)
Header Bidding Overview (Thomvest Research)Thomvest Ventures
 
Fundamentals of Pay-Per-Click Marketing
Fundamentals of Pay-Per-Click MarketingFundamentals of Pay-Per-Click Marketing
Fundamentals of Pay-Per-Click MarketingIntelligent_ly
 
Lecture_three_Digital_Customer_and_Internet_advertising-min.pdf
Lecture_three_Digital_Customer_and_Internet_advertising-min.pdfLecture_three_Digital_Customer_and_Internet_advertising-min.pdf
Lecture_three_Digital_Customer_and_Internet_advertising-min.pdfAmanyaLaban
 
Field Guide for Validating Premium Ad Inventory
Field Guide for Validating Premium Ad InventoryField Guide for Validating Premium Ad Inventory
Field Guide for Validating Premium Ad InventoryDistil Networks
 
seo tutorial mahesh gangurde techniques - google adwords
seo tutorial mahesh gangurde techniques - google adwordsseo tutorial mahesh gangurde techniques - google adwords
seo tutorial mahesh gangurde techniques - google adwordsMahesh Gangurde
 
Wicked Tuna Has Nothing on This Hook: Leveraging the Almighty Power of Remark...
Wicked Tuna Has Nothing on This Hook: Leveraging the Almighty Power of Remark...Wicked Tuna Has Nothing on This Hook: Leveraging the Almighty Power of Remark...
Wicked Tuna Has Nothing on This Hook: Leveraging the Almighty Power of Remark...Lauren Sickel
 
Syndacast AdBoost RTB Display Advertising
Syndacast AdBoost RTB Display AdvertisingSyndacast AdBoost RTB Display Advertising
Syndacast AdBoost RTB Display AdvertisingPriceza
 
Leveraging Competitive Intelligence in Native Advertising
Leveraging Competitive Intelligence in Native AdvertisingLeveraging Competitive Intelligence in Native Advertising
Leveraging Competitive Intelligence in Native AdvertisingAffiliate Summit
 
Creating Quality Remarketing Lists - #SFGettingSmarter
Creating Quality Remarketing Lists - #SFGettingSmarterCreating Quality Remarketing Lists - #SFGettingSmarter
Creating Quality Remarketing Lists - #SFGettingSmarterSearch Factory
 
2. Affiliate konference / Postview a Retargeting v Affiliate marketingu
2. Affiliate konference / Postview a Retargeting v Affiliate marketingu2. Affiliate konference / Postview a Retargeting v Affiliate marketingu
2. Affiliate konference / Postview a Retargeting v Affiliate marketinguColpirio.com s.r.o.
 
Digital marketing tools online
Digital marketing tools onlineDigital marketing tools online
Digital marketing tools onlineInês Gomes Pinto
 
Quantcast - Dublin Tour May 2014
Quantcast - Dublin Tour May 2014Quantcast - Dublin Tour May 2014
Quantcast - Dublin Tour May 2014DDM Alliance
 
MLconf NYC Claudia Perlich
MLconf NYC Claudia PerlichMLconf NYC Claudia Perlich
MLconf NYC Claudia PerlichMLconf
 
Tutorial 10 (computational advertising)
Tutorial 10 (computational advertising)Tutorial 10 (computational advertising)
Tutorial 10 (computational advertising)Kira
 
The search(revised)
The search(revised)The search(revised)
The search(revised)보한 윤
 
Adword search-advertising-exam-answers-july-edition-2017
Adword search-advertising-exam-answers-july-edition-2017Adword search-advertising-exam-answers-july-edition-2017
Adword search-advertising-exam-answers-july-edition-20178908879746
 
How Tracking Companies Circumvented Ad Blockers Using WebSockets
How Tracking Companies Circumvented Ad Blockers Using WebSocketsHow Tracking Companies Circumvented Ad Blockers Using WebSockets
How Tracking Companies Circumvented Ad Blockers Using WebSocketsSajjad "JJ" Arshad
 
Leveraging Competitive Intelligence in Native Advertising
Leveraging Competitive Intelligence in Native AdvertisingLeveraging Competitive Intelligence in Native Advertising
Leveraging Competitive Intelligence in Native AdvertisingWhatRunsWhere
 

Similar to Tracing Information Flows Between Ad Exchanges Using Retargeted Ads (20)

Header Bidding Overview (Thomvest Research)
Header Bidding Overview (Thomvest Research)Header Bidding Overview (Thomvest Research)
Header Bidding Overview (Thomvest Research)
 
Fundamentals of Pay-Per-Click Marketing
Fundamentals of Pay-Per-Click MarketingFundamentals of Pay-Per-Click Marketing
Fundamentals of Pay-Per-Click Marketing
 
Lecture_three_Digital_Customer_and_Internet_advertising-min.pdf
Lecture_three_Digital_Customer_and_Internet_advertising-min.pdfLecture_three_Digital_Customer_and_Internet_advertising-min.pdf
Lecture_three_Digital_Customer_and_Internet_advertising-min.pdf
 
Field Guide for Validating Premium Ad Inventory
Field Guide for Validating Premium Ad InventoryField Guide for Validating Premium Ad Inventory
Field Guide for Validating Premium Ad Inventory
 
seo tutorial mahesh gangurde techniques - google adwords
seo tutorial mahesh gangurde techniques - google adwordsseo tutorial mahesh gangurde techniques - google adwords
seo tutorial mahesh gangurde techniques - google adwords
 
Wicked Tuna Has Nothing on This Hook: Leveraging the Almighty Power of Remark...
Wicked Tuna Has Nothing on This Hook: Leveraging the Almighty Power of Remark...Wicked Tuna Has Nothing on This Hook: Leveraging the Almighty Power of Remark...
Wicked Tuna Has Nothing on This Hook: Leveraging the Almighty Power of Remark...
 
Syndacast AdBoost RTB Display Advertising
Syndacast AdBoost RTB Display AdvertisingSyndacast AdBoost RTB Display Advertising
Syndacast AdBoost RTB Display Advertising
 
Leveraging Competitive Intelligence in Native Advertising
Leveraging Competitive Intelligence in Native AdvertisingLeveraging Competitive Intelligence in Native Advertising
Leveraging Competitive Intelligence in Native Advertising
 
Creating Quality Remarketing Lists - #SFGettingSmarter
Creating Quality Remarketing Lists - #SFGettingSmarterCreating Quality Remarketing Lists - #SFGettingSmarter
Creating Quality Remarketing Lists - #SFGettingSmarter
 
2. Affiliate konference / Postview a Retargeting v Affiliate marketingu
2. Affiliate konference / Postview a Retargeting v Affiliate marketingu2. Affiliate konference / Postview a Retargeting v Affiliate marketingu
2. Affiliate konference / Postview a Retargeting v Affiliate marketingu
 
Digital marketing tools online
Digital marketing tools onlineDigital marketing tools online
Digital marketing tools online
 
Quantcast - Dublin Tour May 2014
Quantcast - Dublin Tour May 2014Quantcast - Dublin Tour May 2014
Quantcast - Dublin Tour May 2014
 
MLconf NYC Claudia Perlich
MLconf NYC Claudia PerlichMLconf NYC Claudia Perlich
MLconf NYC Claudia Perlich
 
Tutorial 10 (computational advertising)
Tutorial 10 (computational advertising)Tutorial 10 (computational advertising)
Tutorial 10 (computational advertising)
 
The search(revised)
The search(revised)The search(revised)
The search(revised)
 
google adwords search advertising exam question and answers 2018 pdf
google adwords search advertising exam question and answers 2018 pdfgoogle adwords search advertising exam question and answers 2018 pdf
google adwords search advertising exam question and answers 2018 pdf
 
Adword search-advertising-exam-answers-july-edition-2017
Adword search-advertising-exam-answers-july-edition-2017Adword search-advertising-exam-answers-july-edition-2017
Adword search-advertising-exam-answers-july-edition-2017
 
Google Display Network - Searchstrategies 2010 - Pascal van Laere
Google Display Network - Searchstrategies 2010 - Pascal van LaereGoogle Display Network - Searchstrategies 2010 - Pascal van Laere
Google Display Network - Searchstrategies 2010 - Pascal van Laere
 
How Tracking Companies Circumvented Ad Blockers Using WebSockets
How Tracking Companies Circumvented Ad Blockers Using WebSocketsHow Tracking Companies Circumvented Ad Blockers Using WebSockets
How Tracking Companies Circumvented Ad Blockers Using WebSockets
 
Leveraging Competitive Intelligence in Native Advertising
Leveraging Competitive Intelligence in Native AdvertisingLeveraging Competitive Intelligence in Native Advertising
Leveraging Competitive Intelligence in Native Advertising
 

More from Sajjad "JJ" Arshad

Cached and Confused: Web Cache Deception in the Wild
Cached and Confused: Web Cache Deception in the WildCached and Confused: Web Cache Deception in the Wild
Cached and Confused: Web Cache Deception in the WildSajjad "JJ" Arshad
 
Cached and Confused: Web Cache Deception in the Wild
Cached and Confused: Web Cache Deception in the WildCached and Confused: Web Cache Deception in the Wild
Cached and Confused: Web Cache Deception in the WildSajjad "JJ" Arshad
 
Cached and Confused: Web Cache Deception in the Wild
Cached and Confused: Web Cache Deception in the WildCached and Confused: Web Cache Deception in the Wild
Cached and Confused: Web Cache Deception in the WildSajjad "JJ" Arshad
 
HotFuzz: Discovering Algorithmic Denial-of-Service Vulnerabilities Through Gu...
HotFuzz: Discovering Algorithmic Denial-of-Service Vulnerabilities Through Gu...HotFuzz: Discovering Algorithmic Denial-of-Service Vulnerabilities Through Gu...
HotFuzz: Discovering Algorithmic Denial-of-Service Vulnerabilities Through Gu...Sajjad "JJ" Arshad
 
Large-Scale Analysis of Style Injection by Relative Path Overwrite
Large-Scale Analysis of Style Injection by Relative Path OverwriteLarge-Scale Analysis of Style Injection by Relative Path Overwrite
Large-Scale Analysis of Style Injection by Relative Path OverwriteSajjad "JJ" Arshad
 
UNVEIL: A Large-Scale, Automated Approach to Detecting Ransomware
UNVEIL: A Large-Scale, Automated Approach to Detecting RansomwareUNVEIL: A Large-Scale, Automated Approach to Detecting Ransomware
UNVEIL: A Large-Scale, Automated Approach to Detecting RansomwareSajjad "JJ" Arshad
 
How Tracking Companies Circumvent Ad Blockers Using WebSockets
How Tracking Companies Circumvent Ad Blockers Using WebSocketsHow Tracking Companies Circumvent Ad Blockers Using WebSockets
How Tracking Companies Circumvent Ad Blockers Using WebSocketsSajjad "JJ" Arshad
 
Practical Challenges of Type Checking in Control Flow Integrity
Practical Challenges of Type Checking in Control Flow IntegrityPractical Challenges of Type Checking in Control Flow Integrity
Practical Challenges of Type Checking in Control Flow IntegritySajjad "JJ" Arshad
 
Understanding and Mitigating the Security Risks of Content Inclusion in Web B...
Understanding and Mitigating the Security Risks of Content Inclusion in Web B...Understanding and Mitigating the Security Risks of Content Inclusion in Web B...
Understanding and Mitigating the Security Risks of Content Inclusion in Web B...Sajjad "JJ" Arshad
 
Identifying Extension-based Ad Injection via Fine-grained Web Content Provenance
Identifying Extension-based Ad Injection via Fine-grained Web Content ProvenanceIdentifying Extension-based Ad Injection via Fine-grained Web Content Provenance
Identifying Extension-based Ad Injection via Fine-grained Web Content ProvenanceSajjad "JJ" Arshad
 
Thou Shalt Not Depend on Me: Analysing the Use of Outdated JavaScript Librari...
Thou Shalt Not Depend on Me: Analysing the Use of Outdated JavaScript Librari...Thou Shalt Not Depend on Me: Analysing the Use of Outdated JavaScript Librari...
Thou Shalt Not Depend on Me: Analysing the Use of Outdated JavaScript Librari...Sajjad "JJ" Arshad
 
A Longitudinal Analysis of the ads.txt Standard
A Longitudinal Analysis of the ads.txt StandardA Longitudinal Analysis of the ads.txt Standard
A Longitudinal Analysis of the ads.txt StandardSajjad "JJ" Arshad
 
"Recommended For You": A First Look at Content Recommendation Networks
"Recommended For You": A First Look at Content Recommendation Networks"Recommended For You": A First Look at Content Recommendation Networks
"Recommended For You": A First Look at Content Recommendation NetworksSajjad "JJ" Arshad
 
Include Me Out: In-Browser Detection of Malicious Third-Party Content Inclusions
Include Me Out: In-Browser Detection of Malicious Third-Party Content InclusionsInclude Me Out: In-Browser Detection of Malicious Third-Party Content Inclusions
Include Me Out: In-Browser Detection of Malicious Third-Party Content InclusionsSajjad "JJ" Arshad
 
On the Effectiveness of Type-based Control Flow Integrity
On the Effectiveness of Type-based Control Flow IntegrityOn the Effectiveness of Type-based Control Flow Integrity
On the Effectiveness of Type-based Control Flow IntegritySajjad "JJ" Arshad
 

More from Sajjad "JJ" Arshad (15)

Cached and Confused: Web Cache Deception in the Wild
Cached and Confused: Web Cache Deception in the WildCached and Confused: Web Cache Deception in the Wild
Cached and Confused: Web Cache Deception in the Wild
 
Cached and Confused: Web Cache Deception in the Wild
Cached and Confused: Web Cache Deception in the WildCached and Confused: Web Cache Deception in the Wild
Cached and Confused: Web Cache Deception in the Wild
 
Cached and Confused: Web Cache Deception in the Wild
Cached and Confused: Web Cache Deception in the WildCached and Confused: Web Cache Deception in the Wild
Cached and Confused: Web Cache Deception in the Wild
 
HotFuzz: Discovering Algorithmic Denial-of-Service Vulnerabilities Through Gu...
HotFuzz: Discovering Algorithmic Denial-of-Service Vulnerabilities Through Gu...HotFuzz: Discovering Algorithmic Denial-of-Service Vulnerabilities Through Gu...
HotFuzz: Discovering Algorithmic Denial-of-Service Vulnerabilities Through Gu...
 
Large-Scale Analysis of Style Injection by Relative Path Overwrite
Large-Scale Analysis of Style Injection by Relative Path OverwriteLarge-Scale Analysis of Style Injection by Relative Path Overwrite
Large-Scale Analysis of Style Injection by Relative Path Overwrite
 
UNVEIL: A Large-Scale, Automated Approach to Detecting Ransomware
UNVEIL: A Large-Scale, Automated Approach to Detecting RansomwareUNVEIL: A Large-Scale, Automated Approach to Detecting Ransomware
UNVEIL: A Large-Scale, Automated Approach to Detecting Ransomware
 
How Tracking Companies Circumvent Ad Blockers Using WebSockets
How Tracking Companies Circumvent Ad Blockers Using WebSocketsHow Tracking Companies Circumvent Ad Blockers Using WebSockets
How Tracking Companies Circumvent Ad Blockers Using WebSockets
 
Practical Challenges of Type Checking in Control Flow Integrity
Practical Challenges of Type Checking in Control Flow IntegrityPractical Challenges of Type Checking in Control Flow Integrity
Practical Challenges of Type Checking in Control Flow Integrity
 
Understanding and Mitigating the Security Risks of Content Inclusion in Web B...
Understanding and Mitigating the Security Risks of Content Inclusion in Web B...Understanding and Mitigating the Security Risks of Content Inclusion in Web B...
Understanding and Mitigating the Security Risks of Content Inclusion in Web B...
 
Identifying Extension-based Ad Injection via Fine-grained Web Content Provenance
Identifying Extension-based Ad Injection via Fine-grained Web Content ProvenanceIdentifying Extension-based Ad Injection via Fine-grained Web Content Provenance
Identifying Extension-based Ad Injection via Fine-grained Web Content Provenance
 
Thou Shalt Not Depend on Me: Analysing the Use of Outdated JavaScript Librari...
Thou Shalt Not Depend on Me: Analysing the Use of Outdated JavaScript Librari...Thou Shalt Not Depend on Me: Analysing the Use of Outdated JavaScript Librari...
Thou Shalt Not Depend on Me: Analysing the Use of Outdated JavaScript Librari...
 
A Longitudinal Analysis of the ads.txt Standard
A Longitudinal Analysis of the ads.txt StandardA Longitudinal Analysis of the ads.txt Standard
A Longitudinal Analysis of the ads.txt Standard
 
"Recommended For You": A First Look at Content Recommendation Networks
"Recommended For You": A First Look at Content Recommendation Networks"Recommended For You": A First Look at Content Recommendation Networks
"Recommended For You": A First Look at Content Recommendation Networks
 
Include Me Out: In-Browser Detection of Malicious Third-Party Content Inclusions
Include Me Out: In-Browser Detection of Malicious Third-Party Content InclusionsInclude Me Out: In-Browser Detection of Malicious Third-Party Content Inclusions
Include Me Out: In-Browser Detection of Malicious Third-Party Content Inclusions
 
On the Effectiveness of Type-based Control Flow Integrity
On the Effectiveness of Type-based Control Flow IntegrityOn the Effectiveness of Type-based Control Flow Integrity
On the Effectiveness of Type-based Control Flow Integrity
 

Recently uploaded

Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)DHURKADEVIBASKAR
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxAleenaTreesaSaji
 
The Black hole shadow in Modified Gravity
The Black hole shadow in Modified GravityThe Black hole shadow in Modified Gravity
The Black hole shadow in Modified GravitySubhadipsau21168
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Ahmedabad Call Girls Service 9537192988 can satisfy every one of your dreams
Ahmedabad Call Girls Service 9537192988 can satisfy every one of your dreamsAhmedabad Call Girls Service 9537192988 can satisfy every one of your dreams
Ahmedabad Call Girls Service 9537192988 can satisfy every one of your dreamsoolala9823
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzohaibmir069
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Genomic DNA And Complementary DNA Libraries construction.
Genomic DNA And Complementary DNA Libraries construction.Genomic DNA And Complementary DNA Libraries construction.
Genomic DNA And Complementary DNA Libraries construction.k64182334
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 

Recently uploaded (20)

Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptx
 
The Black hole shadow in Modified Gravity
The Black hole shadow in Modified GravityThe Black hole shadow in Modified Gravity
The Black hole shadow in Modified Gravity
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
Ahmedabad Call Girls Service 9537192988 can satisfy every one of your dreams
Ahmedabad Call Girls Service 9537192988 can satisfy every one of your dreamsAhmedabad Call Girls Service 9537192988 can satisfy every one of your dreams
Ahmedabad Call Girls Service 9537192988 can satisfy every one of your dreams
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistan
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Genomic DNA And Complementary DNA Libraries construction.
Genomic DNA And Complementary DNA Libraries construction.Genomic DNA And Complementary DNA Libraries construction.
Genomic DNA And Complementary DNA Libraries construction.
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 

Tracing Information Flows Between Ad Exchanges Using Retargeted Ads

  • 1. Tracing Information Flows Between Ad Exchanges Using Retargeted Ads Muhammad Ahmad Bashir, Sajjad Arshad, William Robertson, Christo Wilson Northeastern University
  • 9. Real Time Bidding • RTB brings more flexibility in the ad ecosystem. • Ad request managed by an Ad Exchange which holds an auction. • Advertisers bid on each ad impression. 3
  • 10. Real Time Bidding • RTB brings more flexibility in the ad ecosystem. • Ad request managed by an Ad Exchange which holds an auction. • Advertisers bid on each ad impression. 3 Exchange Advertiser Cookie matching is a prerequisite.
  • 11. Real Time Bidding • RTB brings more flexibility in the ad ecosystem. • Ad request managed by an Ad Exchange which holds an auction. • Advertisers bid on each ad impression. • RTB spending to cross $20B by 2017[1]. • 49% annual growth. • Will account for 80% of US Display Ad spending by 2022. 3 [1] http://www.prnewswire.com/news-releases/new-idc-study-shows-real-time-bidding-rtb-display-ad- spend-to-grow-worldwide-to-208-billion-by-2017-228061051.html Exchange Advertiser Cookie matching is a prerequisite.
  • 12. 4 User Publisher Ad Exchange Advertisers
  • 13. 4 GET, CNN’s Cookie User Publisher Ad Exchange Advertisers
  • 14. 4 GET, CNN’s Cookie GET, DoubleClick’s Cookie User Publisher Ad Exchange Advertisers
  • 15. 4 GET, CNN’s Cookie GET, DoubleClick’s Cookie User Publisher Ad Exchange Advertisers Solicit bids, DoubleClick’s Cookie Bid
  • 16. Real Time Bidding (RTB) 4 GET, CNN’s Cookie GET, DoubleClick’s Cookie User Publisher Ad Exchange Advertisers Solicit bids, DoubleClick’s Cookie Bid
  • 17. Real Time Bidding (RTB) 4 GET, CNN’s Cookie GET, DoubleClick’s Cookie User Publisher Ad Exchange Advertisers Solicit bids, DoubleClick’s Cookie GET, RightMedia’s Cookie Advertisement Bid
  • 18. Real Time Bidding (RTB) 4 GET, CNN’s Cookie GET, DoubleClick’s Cookie User Publisher Ad Exchange Advertisers Solicit bids, DoubleClick’s Cookie GET, RightMedia’s Cookie Advertisement Bid Advertisers cannot read their cookie!
  • 19. Cookie Matching Key problem: Advertisers cannot read their cookies in the RTB auction • How can they submit reasonable bids if they cannot identify the user? Solution: cookie matching • Also known as cookie synching • Process of linking the identifiers used by two ad exchanges 5
  • 20. Cookie Matching Key problem: Advertisers cannot read their cookies in the RTB auction • How can they submit reasonable bids if they cannot identify the user? Solution: cookie matching • Also known as cookie synching • Process of linking the identifiers used by two ad exchanges 5 GET, Cookie=12345 301 Redirect, Location=http://criteo.com/?dblclk_id=12345
  • 21. Cookie Matching Key problem: Advertisers cannot read their cookies in the RTB auction • How can they submit reasonable bids if they cannot identify the user? Solution: cookie matching • Also known as cookie synching • Process of linking the identifiers used by two ad exchanges 5 GET, Cookie=12345 301 Redirect, Location=http://criteo.com/?dblclk_id=12345
  • 22. Cookie Matching Key problem: Advertisers cannot read their cookies in the RTB auction • How can they submit reasonable bids if they cannot identify the user? Solution: cookie matching • Also known as cookie synching • Process of linking the identifiers used by two ad exchanges 5 GET, Cookie=12345 GET ?dblclk_id=12345, Cookie=ABCDE 301 Redirect, Location=http://criteo.com/?dblclk_id=12345
  • 23. Cookie Matching Key problem: Advertisers cannot read their cookies in the RTB auction • How can they submit reasonable bids if they cannot identify the user? Solution: cookie matching • Also known as cookie synching • Process of linking the identifiers used by two ad exchanges 5 GET, Cookie=12345 GET ?dblclk_id=12345, Cookie=ABCDE 301 Redirect, Location=http://criteo.com/?dblclk_id=12345
  • 24. Cookie Matching Key problem: Advertisers cannot read their cookies in the RTB auction • How can they submit reasonable bids if they cannot identify the user? Solution: cookie matching • Also known as cookie synching • Process of linking the identifiers used by two ad exchanges 5 GET, Cookie=12345 GET ?dblclk_id=12345, Cookie=ABCDE 301 Redirect, Location=http://criteo.com/?dblclk_id=12345
  • 25. Prior Work • Several studies have examined cookie matching • Acar et al. found hundreds of domains passing identifiers to each other • Olejnik et al. found 125 exchanges matching cookies • Falahrastegar et al. analyzed clusters of exchanges that share the exact same cookies 6
  • 26. Prior Work • Several studies have examined cookie matching • Acar et al. found hundreds of domains passing identifiers to each other • Olejnik et al. found 125 exchanges matching cookies • Falahrastegar et al. analyzed clusters of exchanges that share the exact same cookies • These studies rely on studying HTTP requests/responses. 6
  • 27. Challenge 1: Server Side Matching 7
  • 28. Challenge 1: Server Side Matching 7 1) Criteo observes the user. (IP: 207.91.160.7)
  • 29. Challenge 1: Server Side Matching 7 1) 2) Criteo observes the user. (IP: 207.91.160.7) RightMedia observes the user. (IP: 207.91.160.7)
  • 30. Challenge 1: Server Side Matching 7 1) 2) Criteo observes the user. (IP: 207.91.160.7) RightMedia observes the user. (IP: 207.91.160.7) Behind the scene, RightMedia and Criteo sync up. (IP: 207.91.160.7)
  • 35. Challenge 2: Obfuscation 8 GET %^$ck#&93#&, Cookie=XYZYX amazon.com dbclk.js
  • 36. Challenge 2: Obfuscation 8 GET %^$ck#&93#&, Cookie=XYZYX amazon.com dbclk.js
  • 37. Challenge 2: Obfuscation 8 GET %^$ck#&93#&, Cookie=XYZYX amazon.com dbclk.js
  • 38. Challenge 2: Obfuscation 8 GET %^$ck#&93#&, Cookie=XYZYX amazon.com dbclk.js
  • 39. Challenge 2: Obfuscation 8 GET %^$ck#&93#&, Cookie=XYZYX amazon.com dbclk.js
  • 40. Goal Develop a method to identify information flows (cookie matching) between ad exchanges • Mechanism agnostic: resilient to obfuscation • Platform agnostic: detect sharing on the client- and server-side 9
  • 41. Goal Develop a method to identify information flows (cookie matching) between ad exchanges • Mechanism agnostic: resilient to obfuscation • Platform agnostic: detect sharing on the client- and server-side 9 ?
  • 42. Key Insight: Use Retargeted Ads Retargeted ads are the most highly targeted form of online ads 10 $15.99
  • 43. Key Insight: Use Retargeted Ads Retargeted ads are the most highly targeted form of online ads 10 $15.99
  • 44. Key Insight: Use Retargeted Ads Retargeted ads are the most highly targeted form of online ads 10 Key insight: because retargets are so specific, they can be used to conduct controlled experiments • Information must be shared between ad exchanges to serve retargeted ads $15.99
  • 45. Contributions 1. Novel methodology for identifying information flows between ad exchanges 2. Demonstrate the impact of ad network obfuscation in practice • 31% of cookie matching partners cannot be identified using heuristics 3. Develop a method to categorize information sharing relationships 4. Use graph analysis to infer the roles of actors in the ad ecosystem 11
  • 46. Contributions 1. Novel methodology for identifying information flows between ad exchanges 2. Demonstrate the impact of ad network obfuscation in practice • 31% of cookie matching partners cannot be identified using heuristics 3. Develop a method to categorize information sharing relationships 4. Use graph analysis to infer the roles of actors in the ad ecosystem 11
  • 47. Data Collection Classifying Ad Network Flows Results 12
  • 48. Using Retargets as an Experimental Tool 13 Key observation: retargets are only served under very specific circumstances 1)
  • 49. Using Retargets as an Experimental Tool 13 Key observation: retargets are only served under very specific circumstances 1) Advertiser observes the user at a shop
  • 50. Using Retargets as an Experimental Tool 13 Key observation: retargets are only served under very specific circumstances 1) 2) Advertiser observes the user at a shop
  • 51. Using Retargets as an Experimental Tool 13 Key observation: retargets are only served under very specific circumstances 1) 2) Advertiser observes the user at a shop Advertiser and the exchange must have matched cookies
  • 52. Using Retargets as an Experimental Tool This implies a causal flow of information from Exchange  Advertiser 13 Key observation: retargets are only served under very specific circumstances 1) 2) Advertiser observes the user at a shop Advertiser and the exchange must have matched cookies
  • 54. Data Collection Overview 14 Single Persona 10 websites/persona 10 products/website Visit Persona
  • 55. Data Collection Overview 14 150 Publishers 15 pages/publisher Single Persona 10 websites/persona 10 products/website Visit Persona Visit Publishers
  • 56. Data Collection Overview 14 150 Publishers 15 pages/publisher Single Persona 10 websites/persona 10 products/website Visit Persona Visit Publishers Store Images, Inclusion Chains, HTTP requests/ responses 571,636 Images
  • 57. Data Collection Overview 14 150 Publishers 15 pages/publisher Single Persona 10 websites/persona 10 products/website Visit Persona Visit Publishers Store Images, Inclusion Chains, HTTP requests/ responses 571,636 Images
  • 58. Data Collection Overview 14 150 Publishers 15 pages/publisher Single Persona 10 websites/persona 10 products/website Visit Persona Visit Publishers Store Images, Inclusion Chains, HTTP requests/ responses 90Personas 571,636 Images {
  • 59. Data Collection Overview 14 150 Publishers 15 pages/publisher Single Persona 10 websites/persona 10 products/website Visit Persona Visit Publishers Store Images, Inclusion Chains, HTTP requests/ responses Potential Targeted Ads 31,850 Ad Detection Filter Images which appeared in > 1 persona 90Personas 571,636 Images {
  • 60. Data Collection Overview 14 150 Publishers 15 pages/publisher Single Persona 10 websites/persona 10 products/website Visit Persona Visit Publishers Store Images, Inclusion Chains, HTTP requests/ responses Potential Targeted Ads 31,850 Ad DetectionIsolated Retargeted Ads Filter Images which appeared in > 1 persona 90Personas 571,636 Images Crowd Sourcing {
  • 61. Crowd Sourcing 15 We used Amazon Mechanical Turk (AMT) to label 31,850 ads.
  • 62. Crowd Sourcing 15 We used Amazon Mechanical Turk (AMT) to label 31,850 ads. • Total 1,142 Tasks. • 30 ads / Task. • 27 unlabeled. • 3 labeled by us. • 2 workers per ad. • $415 spent.
  • 63. Crowd Sourcing 15 We used Amazon Mechanical Turk (AMT) to label 31,850 ads. • Total 1,142 Tasks. • 30 ads / Task. • 27 unlabeled. • 3 labeled by us. • 2 workers per ad. • $415 spent.
  • 64. Crowd Sourcing 15 We used Amazon Mechanical Turk (AMT) to label 31,850 ads. • Total 1,142 Tasks. • 30 ads / Task. • 27 unlabeled. • 3 labeled by us. • 2 workers per ad. • $415 spent.
  • 65. Crowd Sourcing 15 We used Amazon Mechanical Turk (AMT) to label 31,850 ads. • Total 1,142 Tasks. • 30 ads / Task. • 27 unlabeled. • 3 labeled by us. • 2 workers per ad. • $415 spent.
  • 66. Final Dataset 5,102 unique retargeted ads • From 281 distinct online retailers 35,448 publisher-side chains that served the retargets • We observed some retargets multiple times 16
  • 67. Data Collection Classifying Ad Network Flows Results 17
  • 68. A look at Publisher Chains 18
  • 69. A look at Publisher Chains 18 Example Publisher-side chain
  • 70. A look at Publisher Chains 18 Example Shopper-side chain Publisher-side chain
  • 71. A look at Publisher Chains 18 Example Shopper-side chain Publisher-side chain • How does Criteo know to serve ad on BBC?
  • 72. A look at Publisher Chains 18 Example Shopper-side chain Publisher-side chain • How does Criteo know to serve ad on BBC? • In this case it is pretty trivial. • Criteo observed us on the shopper.
  • 73. A look at Publisher Chains 18 Example Shopper-side chain Publisher-side chain • How does Criteo know to serve ad on BBC? • In this case it is pretty trivial. • Criteo observed us on the shopper. • Can we classify all such publisher-side chains?
  • 74. What is a Chain? 19
  • 75. What is a Chain? 19
  • 76. What is a Chain? 19
  • 77. What is a Chain? 19
  • 78. What is a Chain? 19
  • 79. What is a Chain? 19
  • 80. What is a Chain? 19
  • 81. What is a Chain? 19
  • 82. What is a Chain? 19
  • 83. What is a Chain? 19
  • 84. What is a Chain? 19
  • 85. What is a Chain? 19 a a e e
  • 86. What is a Chain? 19 ^pub .* e a$ a a e e
  • 87. Four Classifications Four possible ways for a retargeted ad to be served 1. Direct (Trivial) Matching 2. Cookie Matching 3. Indirect Matching 4. Latent (Server-side) Matching 20
  • 88. Four Classifications Four possible ways for a retargeted ad to be served 1. Direct (Trivial) Matching 2. Cookie Matching 3. Indirect Matching 4. Latent (Server-side) Matching 20
  • 89. 1) Direct (Trivial) Matching 21 Shopper-side Publisher-side ExampleRule
  • 90. 1) Direct (Trivial) Matching 21 Shopper-side Publisher-side ExampleRule ^shop .* a .*$ ^pub a$
  • 91. 1) Direct (Trivial) Matching 21 Shopper-side Publisher-side ExampleRule ^shop .* a .*$ ^pub a$ a is the advertiser that serves the retarget
  • 92. 1) Direct (Trivial) Matching 21 Shopper-side Publisher-side ExampleRule ^shop .* a .*$ ^pub a$ a is the advertiser that serves the retarget a must appear on the shopper- side… … but other trackers may also appear
  • 93. 2) Cookie Matching 22 Shopper-side Publisher-side ExampleRule
  • 94. 2) Cookie Matching 22 Shopper-side Publisher-side ExampleRule ^shop .* a .*$ ^pub .* e a$
  • 95. 2) Cookie Matching 22 Shopper-side Publisher-side ExampleRule ^shop .* a .*$ ^pub .* e a$ e precedes a, which implies an RTB auction
  • 96. 2) Cookie Matching 22 Shopper-side Publisher-side ExampleRule ^shop .* a .*$ a must appear on the shopper-side ^pub .* e a$ e precedes a, which implies an RTB auction
  • 97. 2) Cookie Matching 22 Shopper-side Publisher-side ExampleRule ^shop .* a .*$ a must appear on the shopper-side ^pub .* e a$^* .* e a .*$ Anywhere e precedes a, which implies an RTB auction Transition ea is where cookie match occurs
  • 98. 3) Latent (Server-side) Matching 23 Shopper-side Publisher-side ExampleRule
  • 99. 3) Latent (Server-side) Matching 23 Shopper-side Publisher-side ExampleRule ^shop [^ea]$ ^pub .* e a$
  • 100. 3) Latent (Server-side) Matching 23 Shopper-side Publisher-side ExampleRule ^shop [^ea]$ Neither e nor a appears on the shopper-side ^pub .* e a$
  • 101. 3) Latent (Server-side) Matching 23 Shopper-side Publisher-side ExampleRule ^shop [^ea]$ Neither e nor a appears on the shopper-side ^pub .* e a$ a must receive information from some shopper-side tracker
  • 102. 3) Latent (Server-side) Matching 23 Shopper-side Publisher-side ExampleRule ^shop [^ea]$ Neither e nor a appears on the shopper-side ^pub .* e a$ a must receive information from some shopper-side tracker We find latent matches in practice!
  • 103. Data Collection Classifying Ad Network Flows Results 24
  • 104. Categorizing Chains Type Chains % Chains % Direct (Trivial) Match 1770 5 8449 24 Cookie Match 25049 71 25873 73 Latent (Server-side) Match 5362 15 343 1 No Match 775 2 183 1 25 ClusteredRaw Chains
  • 105. Categorizing Chains Type Chains % Chains % Direct (Trivial) Match 1770 5 8449 24 Cookie Match 25049 71 25873 73 Latent (Server-side) Match 5362 15 343 1 No Match 775 2 183 1 25 Clustered Take away: Raw Chains
  • 106. Categorizing Chains Type Chains % Chains % Direct (Trivial) Match 1770 5 8449 24 Cookie Match 25049 71 25873 73 Latent (Server-side) Match 5362 15 343 1 No Match 775 2 183 1 25 Clustered Take away: 1- As expected, most retargets are due to cookie matching Raw Chains
  • 107. Categorizing Chains Type Chains % Chains % Direct (Trivial) Match 1770 5 8449 24 Cookie Match 25049 71 25873 73 Latent (Server-side) Match 5362 15 343 1 No Match 775 2 183 1 25 Clustered Take away: 1- As expected, most retargets are due to cookie matching 2- Very small number of chains that cannot be categorized • Suggests low false positive rate of AMT image labeling task Raw Chains
  • 108. Categorizing Chains Type Chains % Chains % Direct (Trivial) Match 1770 5 8449 24 Cookie Match 25049 71 25873 73 Latent (Server-side) Match 5362 15 343 1 No Match 775 2 183 1 25 Clustered Take away: 1- As expected, most retargets are due to cookie matching 2- Very small number of chains that cannot be categorized • Suggests low false positive rate of AMT image labeling task 3- Surprisingly large amount latent matches… Raw Chains
  • 109. Categorizing Chains Type Chains % Chains % Direct (Trivial) Match 1770 5 8449 24 Cookie Match 25049 71 25873 73 Latent (Server-side) Match 5362 15 343 1 No Match 775 2 183 1 26 Raw Chains Clustered Chains Cluster together domains by “owner” • E.g. google.com, doubleclick.com, googlesyndication.com
  • 110. Categorizing Chains Type Chains % Chains % Direct (Trivial) Match 1770 5 8449 24 Cookie Match 25049 71 25873 73 Latent (Server-side) Match 5362 15 343 1 No Match 775 2 183 1 26 Raw Chains Clustered Chains Cluster together domains by “owner” • E.g. google.com, doubleclick.com, googlesyndication.com Latent matches essentially disappear • The vast majority of these chains involve Google • Suggests that Google shares tracking data across their services
  • 111. Who is Cookie Matching? Participant 1 Participant 2 Chains Ads Heuristics criteo  googlesyndication 9090 1887  P criteo  doubleclick 3610 1144  E, P  DC, P criteo  adnxs 3263 1066  E, P criteo  rubiconproject 1586 749  E, P criteo  servedbyopenx 707 460  P doubleclick  steelhousemedia 362 27  P  E, P mathtag  mediaforge 360 124  E, P netmng  scene7 267 119  E  ? googlesyndication  adsrvr 107 29  P rubiconproject  steelhousemedia 86 30  E googlesyndication  steelhousemedia 47 22 ? adtechus  adacado 36 18 ? atwola  adacado 32 6 ? adroll  adnxs 31 8 ? 27 Heuristics Key (used by prior work) E – share exact cookies P – special URL parameters DC – DoubleClick URL parameters ? – Unknown sharing method
  • 112. Who is Cookie Matching? Participant 1 Participant 2 Chains Ads Heuristics criteo  googlesyndication 9090 1887  P criteo  doubleclick 3610 1144  E, P  DC, P criteo  adnxs 3263 1066  E, P criteo  rubiconproject 1586 749  E, P criteo  servedbyopenx 707 460  P doubleclick  steelhousemedia 362 27  P  E, P mathtag  mediaforge 360 124  E, P netmng  scene7 267 119  E  ? googlesyndication  adsrvr 107 29  P rubiconproject  steelhousemedia 86 30  E googlesyndication  steelhousemedia 47 22 ? adtechus  adacado 36 18 ? atwola  adacado 32 6 ? adroll  adnxs 31 8 ? 27 Heuristics Key (used by prior work) E – share exact cookies P – special URL parameters DC – DoubleClick URL parameters ? – Unknown sharing method
  • 113. Who is Cookie Matching? Participant 1 Participant 2 Chains Ads Heuristics criteo  googlesyndication 9090 1887  P criteo  doubleclick 3610 1144  E, P  DC, P criteo  adnxs 3263 1066  E, P criteo  rubiconproject 1586 749  E, P criteo  servedbyopenx 707 460  P doubleclick  steelhousemedia 362 27  P  E, P mathtag  mediaforge 360 124  E, P netmng  scene7 267 119  E  ? googlesyndication  adsrvr 107 29  P rubiconproject  steelhousemedia 86 30  E googlesyndication  steelhousemedia 47 22 ? adtechus  adacado 36 18 ? atwola  adacado 32 6 ? adroll  adnxs 31 8 ? 27 Heuristics Key (used by prior work) E – share exact cookies P – special URL parameters DC – DoubleClick URL parameters ? – Unknown sharing method
  • 114. Who is Cookie Matching? Participant 1 Participant 2 Chains Ads Heuristics criteo  googlesyndication 9090 1887  P criteo  doubleclick 3610 1144  E, P  DC, P criteo  adnxs 3263 1066  E, P criteo  rubiconproject 1586 749  E, P criteo  servedbyopenx 707 460  P doubleclick  steelhousemedia 362 27  P  E, P mathtag  mediaforge 360 124  E, P netmng  scene7 267 119  E  ? googlesyndication  adsrvr 107 29  P rubiconproject  steelhousemedia 86 30  E googlesyndication  steelhousemedia 47 22 ? adtechus  adacado 36 18 ? atwola  adacado 32 6 ? adroll  adnxs 31 8 ? 27 Heuristics Key (used by prior work) E – share exact cookies P – special URL parameters DC – DoubleClick URL parameters ? – Unknown sharing method
  • 115. Who is Cookie Matching? Participant 1 Participant 2 Chains Ads Heuristics criteo  googlesyndication 9090 1887  P criteo  doubleclick 3610 1144  E, P  DC, P criteo  adnxs 3263 1066  E, P criteo  rubiconproject 1586 749  E, P criteo  servedbyopenx 707 460  P doubleclick  steelhousemedia 362 27  P  E, P mathtag  mediaforge 360 124  E, P netmng  scene7 267 119  E  ? googlesyndication  adsrvr 107 29  P rubiconproject  steelhousemedia 86 30  E googlesyndication  steelhousemedia 47 22 ? adtechus  adacado 36 18 ? atwola  adacado 32 6 ? adroll  adnxs 31 8 ? 27 Heuristics Key (used by prior work) E – share exact cookies P – special URL parameters DC – DoubleClick URL parameters ? – Unknown sharing method
  • 116. Who is Cookie Matching? Participant 1 Participant 2 Chains Ads Heuristics criteo  googlesyndication 9090 1887  P criteo  doubleclick 3610 1144  E, P  DC, P criteo  adnxs 3263 1066  E, P criteo  rubiconproject 1586 749  E, P criteo  servedbyopenx 707 460  P doubleclick  steelhousemedia 362 27  P  E, P mathtag  mediaforge 360 124  E, P netmng  scene7 267 119  E  ? googlesyndication  adsrvr 107 29  P rubiconproject  steelhousemedia 86 30  E googlesyndication  steelhousemedia 47 22 ? adtechus  adacado 36 18 ? atwola  adacado 32 6 ? adroll  adnxs 31 8 ? 27 Heuristics Key (used by prior work) E – share exact cookies P – special URL parameters DC – DoubleClick URL parameters ? – Unknown sharing method
  • 117. Who is Cookie Matching? Participant 1 Participant 2 Chains Ads Heuristics criteo  googlesyndication 9090 1887  P criteo  doubleclick 3610 1144  E, P  DC, P criteo  adnxs 3263 1066  E, P criteo  rubiconproject 1586 749  E, P criteo  servedbyopenx 707 460  P doubleclick  steelhousemedia 362 27  P  E, P mathtag  mediaforge 360 124  E, P netmng  scene7 267 119  E  ? googlesyndication  adsrvr 107 29  P rubiconproject  steelhousemedia 86 30  E googlesyndication  steelhousemedia 47 22 ? adtechus  adacado 36 18 ? atwola  adacado 32 6 ? adroll  adnxs 31 8 ? 27 Heuristics Key (used by prior work) E – share exact cookies P – special URL parameters DC – DoubleClick URL parameters ? – Unknown sharing method 31% of cookie matching partners would be missed.
  • 118. Summary We develop a novel methodology to detect information flows between ad exchanges • Controlled methodology enables causal inference • Defeats obfuscation attempts • Detects client- and server-side flows Dataset gives a better picture of ad ecosystem • Reveals which ad exchanges are linking information about users • Allows us to reason about how information is being transferred 28
  • 120. Inclusion Chains • Instrumented Chromium binary that records the provenance of page elements • Uses Information Flow Analysis techniques (IFA) • Handles Flash, exec(), setTimeout(), cross-frame, inline scripts, etc. 30
  • 121. Inclusion Chains • Instrumented Chromium binary that records the provenance of page elements • Uses Information Flow Analysis techniques (IFA) • Handles Flash, exec(), setTimeout(), cross-frame, inline scripts, etc. 30 <html> <body> <script src=“b.com/adlib.js”></script> <iframe src=“c.net/adbox.html”> <html> <script src=“code.js”></script> <object data=“d.org/flash.swf”> </object> </html> </iframe> </body> </html> DOM: a.com/index.html
  • 122. Inclusion Chains • Instrumented Chromium binary that records the provenance of page elements • Uses Information Flow Analysis techniques (IFA) • Handles Flash, exec(), setTimeout(), cross-frame, inline scripts, etc. 30 <html> <body> <script src=“b.com/adlib.js”></script> <iframe src=“c.net/adbox.html”> <html> <script src=“code.js”></script> <object data=“d.org/flash.swf”> </object> </html> </iframe> </body> </html> DOM: a.com/index.html Inclusion Chain a.com/index.html b.com/adlib.js c.net/adbox.html c.net/code.js d.org/flash.swf
  • 123. 3) Indirect Matching 31 Shopper-side Publisher-side ExampleRule
  • 124. [^a] 3) Indirect Matching 31 Shopper-side Publisher-side ExampleRule ^shop e [^a]$ ^pub .* e a$
  • 125. [^a] 3) Indirect Matching 31 Shopper-side Publisher-side ExampleRule ^shop e [^a]$ Only the exchange e appears on the shopper-side… ^pub .* e a$
  • 126. [^a] 3) Indirect Matching 31 Shopper-side Publisher-side ExampleRule ^shop e [^a]$ Only the exchange e appears on the shopper-side… ^pub .* e a$ e must pass browsing history data to participants in the auction, thus no cookie matching is necessary
  • 127. [^a] 3) Indirect Matching 31 Shopper-side Publisher-side ExampleRule ^shop e [^a]$ Only the exchange e appears on the shopper-side… ^pub .* e a$ e must pass browsing history data to participants in the auction, thus no cookie matching is necessary We do not expect to find indirect matches in the data.
  • 128. References Gunes Acar, Christian Eubank, Steven Englehardt, Marc Juarez, Arvind Narayanan, Claudia Diaz. “The web never forgets: Persistent tracking mechanisms in the wild.” CCS, 2014. Muhammad Ahmad Bashir, Sajjad Arshad, William Robertson, Christo Wilson. “Tracing Information Flows Between Ad Exchanges Using Retargeted Ads.” Usenix Security, 2016. Marjan Falahrastegar, Hamed Haddadi, Steve Uhlig, Richard Mortier. “Tracking personal identifiers across the web.” PAM, 2016. Lukasz Olejnik, Tran Minh-Dung, Claude Castelluccia. “Selling off privacy at auction.” NDSS, 2014. 32
  • 129. Filtering Images Filter Total Unique Images All images from the crawlers 571,636 Use EasyList to identify advertisements 93,726 Remove ads that are shown to >1 persona 31,850 Use crowdsourcing to locate retargets 5,102 33
  • 130. Filtering Images Filter Total Unique Images All images from the crawlers 571,636 Use EasyList to identify advertisements 93,726 Remove ads that are shown to >1 persona 31,850 Use crowdsourcing to locate retargets 5,102 33
  • 131. Filtering Images Filter Total Unique Images All images from the crawlers 571,636 Use EasyList to identify advertisements 93,726 Remove ads that are shown to >1 persona 31,850 Use crowdsourcing to locate retargets 5,102 33 • Personas visited non-overlapping retailers • By definition, retargets should only be shown to a single persona
  • 132. Filtering Images Filter Total Unique Images All images from the crawlers 571,636 Use EasyList to identify advertisements 93,726 Remove ads that are shown to >1 persona 31,850 Use crowdsourcing to locate retargets 5,102 33 • Personas visited non-overlapping retailers • By definition, retargets should only be shown to a single persona • Spent $415 uploading 1,142 HITs to Amazon Mechanical Turk • Each HIT asked the worker to label 30 ad images • 27 were unlabeled, 3 were known retargets (control images) • All ads were labeled by 2 workers • Any ad identified as targeted was also manually inspected by us